Text-to-Speech (TTS) & Speech-to-Text (STT) Examples with Pawa AI

Text-to-Speech (TTS)
Text-to-Speech (TTS) allows you to generate high-quality, natural-sounding speech directly from text.This capability is essential for building voice-enabled applications, making content accessible to wider audiences, and creating immersive user experiences.
tts api endpoint
implements streaming, so in your app you can directly use Server Side Event to get streaming of the audio back. If you dont use the SSE then the api will fallback to normal non sctreaming to wait for the full audio to be generated back give back answer.Use Cases
- Voice assistants: Let your chatbot respond with speech instead of text only.
- Learning platforms: Automatically generate audio versions of documents, lessons, or Q&A sessions.
- Accessibility tools: Help users with visual impairments interact with your app through audio.
- Media & podcasts: Generate narrations from written articles or blogs.
Models with Audio Capabilities
- Pawa Text To Speech (
pawa-tts-v1-20250704
) with text to speech conversation. - Pawa Speech To Text (
pawa-stt-v1-20240701
) with audio input to text conversation.
Original Text: “Jina la jamhuri ya muungano wa Tanzania, ni nchi iliyopo Afrika ya Mashariki ndani ya ukanda wa maziwa makuu ya Afrika, imepakana na Uganda na Kenya upande wa kaskazini, Bahari ya Hindi upande wa mashariki, Msumbiji malawi na Zambia upande wa kusini, Congo, Burundi na Rwanda upande wa magharibi, eneo la Tanzania ni takribani kilometa za mraba 940 mb/h. Saa arobaini na dakika elfu 300, eneo linalokaliwa na maji ne asalimia 6.2 - Mlima Kilimanjaro - Mlima mrefu zaidi barani Afrika upo kaskazini mashariki wa Tanzania.”
Text to Speech Request Example
Speech-to-Text (STT)
Speech-to-Text (STT) converts audio into text with high accuracy.This is powerful for transcription, audio search, summarization, and voice-enabled interfaces.
Use Cases
- Meeting & call centers transcription: Turn long discussions into structured notes.
- Customer service: Convert call center conversations into searchable text.
- Education: Transcribe lectures, podcasts, and webinars.
- Productivity: Voice notes and dictation apps.
Example Text: “Jina la jamhuri ya muungano wa Tanzania, ni nchi iliyopo Afrika ya Mashariki ndani ya ukanda wa maziwa makuu ya Afrika, imepakana na Uganda na Kenya upande wa kaskazini, Bahari ya Hindi upande wa mashariki, Msumbiji malawi na Zambia upande wa kusini, Congo, Burundi na Rwanda upande wa magharibi, eneo la Tanzania ni takribani kilometa za mraba 940 mb/h. Saa arobaini na dakika elfu 300, eneo linalokaliwa na maji ne asalimia 6.2 - Mlima Kilimanjaro - Mlima mrefu zaidi barani Afrika upo kaskazini mashariki wa Tanzania.”