Introduction

Text-to-Speech (TTS)

Convert text responses into natural, expressive speech across multiple languages and voices.
Ideal for voice assistants, narration, accessibility, and interactive experiences.

Features:

Expressive, natural voices
Low-latency streaming
Multilingual support (Swahili, English, and other African languages)
Flexible output formats (MP3, WAV, streaming response)

Speech-to-Text (STT)

Accurately transcribe audio into text in real-time or batch mode. Perfect for meeting notes, captions, accessibility, and voice commands.

Features:

High transcription accuracy
Real-time streaming or async batch processing
Multilingual speech recognition (optimized for African accents + Swahili)
Works with noisy environments

Voice models include:

Pawa Text-to-Speech (TTS) → Natural, expressive multilingual voices (supports Swahili, English, and African accents). Optimized for real-time streaming and low latency.
Pawa Speech-to-Text (STT) → Accurate, multilingual speech recognition tuned for African languages and noisy environments. Supports streaming transcription and batch mode.

Send Request

Convert Text to speech

Getting Started

Models

Chat

Voice

Agents

Vectors

Documents

Storage

Transcribe

Text-to-Speech (TTS)

Speech-to-Text (STT)

Voice models include:

​Text-to-Speech (TTS)

​Speech-to-Text (STT)

​Voice models include:

Text-to-Speech (TTS)

Speech-to-Text (STT)

Voice models include: