Skip to main content
Text-to-Speech (TTS) & Speech-to-Text (STT) Examples with Pawa AI
Audio & Transcription with Pawa AI

Text-to-Speech (TTS)

Text-to-Speech (TTS) allows you to generate high-quality, natural-sounding speech directly from text.
This capability is essential for building voice-enabled applications, making content accessible to wider audiences, and creating immersive user experiences.
By default, our tts api endpoint implements streaming, so in your app you can directly use Server Side Event to get streaming of the audio back. If you dont use the SSE then the api will fallback to normal non sctreaming to wait for the full audio to be generated back give back answer.
You can learn about streaming here.

Use Cases

  • Voice assistants: Let your chatbot respond with speech instead of text only.
  • Learning platforms: Automatically generate audio versions of documents, lessons, or Q&A sessions.
  • Accessibility tools: Help users with visual impairments interact with your app through audio.
  • Media & podcasts: Generate narrations from written articles or blogs.

Models with Audio Capabilities

  • Pawa Text To Speech (pawa-tts-v1-20250704) with text to speech conversation.
  • Pawa Speech To Text (pawa-stt-v1-20240701) with audio input to text conversation.
Example Playback Speech

Original Text: “Jina la jamhuri ya muungano wa Tanzania, ni nchi iliyopo Afrika ya Mashariki ndani ya ukanda wa maziwa makuu ya Afrika, imepakana na Uganda na Kenya upande wa kaskazini, Bahari ya Hindi upande wa mashariki, Msumbiji malawi na Zambia upande wa kusini, Congo, Burundi na Rwanda upande wa magharibi, eneo la Tanzania ni takribani kilometa za mraba 940 mb/h. Saa arobaini na dakika elfu 300, eneo linalokaliwa na maji ne asalimia 6.2 - Mlima Kilimanjaro - Mlima mrefu zaidi barani Afrika upo kaskazini mashariki wa Tanzania.”

Text to Speech Request Example

curl --request POST \
     --url https://api.pawa-ai.com/v1/voice/text-to-speech \
     --header "Authorization: Bearer $PAWA_AI_API_KEY" \
     --header 'Content-Type: application/json' \
      --data '{
                "model": "pawa-tts-v1-20250704",
                "voice": "ame",
                "max_tokens": 65536,
                "temperature": 0.5,
                "top_p": 0.95,
                "text": "Hello, welcome to Pawa AI. Upgrade now to enjoy Unlimited access to advanced AI"
                "repetition_penalty": 1.1
}' \
     --output speech.mp3
Check the audio file saved in current directory, open it to play and listen to generated audio.

Speech-to-Text (STT)

Speech-to-Text (STT) converts audio into text with high accuracy.
This is powerful for transcription, audio search, summarization, and voice-enabled interfaces.

Use Cases

  • Meeting & call centers transcription: Turn long discussions into structured notes.
  • Customer service: Convert call center conversations into searchable text.
  • Education: Transcribe lectures, podcasts, and webinars.
  • Productivity: Voice notes and dictation apps.
Original Playback Speech

Example Text: “Jina la jamhuri ya muungano wa Tanzania, ni nchi iliyopo Afrika ya Mashariki ndani ya ukanda wa maziwa makuu ya Afrika, imepakana na Uganda na Kenya upande wa kaskazini, Bahari ya Hindi upande wa mashariki, Msumbiji malawi na Zambia upande wa kusini, Congo, Burundi na Rwanda upande wa magharibi, eneo la Tanzania ni takribani kilometa za mraba 940 mb/h. Saa arobaini na dakika elfu 300, eneo linalokaliwa na maji ne asalimia 6.2 - Mlima Kilimanjaro - Mlima mrefu zaidi barani Afrika upo kaskazini mashariki wa Tanzania.”

Speech to Text Request Example

curl  --request POST \
      --url https://api.pawa-ai.com/v1/voice/speech-to-text \
      --header "Authorization: Bearer $PAWA_AI_API_KEY" \
      --header 'Content-Type: multipart/form-data' \
      --form model=pawa-stt-v1-20240701 \
      --form file=@tanzania_info2.wav

Speech to Text Response Example

{
  "success": true,
  "message": "Audio transcribed succesfully",
  "data": {
    "text": "Hello, my name is Innocent Charles, welcome to Pawa AI"
  }
}
I