Skip to main content
POST
/
voice
/
speech-to-text
Speech to Text
curl --request POST \
  --url https://api.pawa-ai.com/v1/voice/speech-to-text \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form files='@example-file' \
  --form model=pawa-stt-v1-20240701 \
  --form language=English \
  --form is_speaker_diarization=false \
  --form 'prompt=Nipe maneno yaliyokwenye hii audio' \
  --form temperature=0.1
{
  "success": true,
  "message": "Audio transcribed succesfully",
  "data": {
    "transcriptions": [
      {
        "filename": "innocent.wav",
        "transcript": "Hello, my name is Innocent Charles, welcome to Pawa AI"
      }
    ]
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
files
file
required

[Note: In real app you can upload mulitple or list files but here the docs allow only one for testing] . Audio files to be transcribed (max 10). Types accepted are: audio/mp3, audio/mpeg, audio/x-mp3, audio/wav, audio/wave, audio/x-wav, audio/x-pn-wav, audio/aac, audio/m4a, audio/x-m4a, audio/x-mp4, audio/ogg, audio/opus, audio/x-ms-wma, audio/wma

model
enum<string>
required
Available options:
pawa-stt-v1-20240701
Example:

"pawa-stt-v1-20240701"

language
enum<string>
required
Available options:
English,
Swahili,
Luo,
Meru,
Kamba,
Kikuyu,
Hausa,
Igbo,
Yoruba,
Pidgin,
Zulu,
Tswana,
Afrikaans,
Xhosa,
Nyankole,
Ganda,
Lugbara
Example:

"English"

is_speaker_diarization
enum<boolean>
Available options:
true,
false
Example:

false

prompt
string
Example:

"Nipe maneno yaliyokwenye hii audio"

temperature
number
default:0.1
Example:

0.1

Response

Audio files transcribed successfully

success
boolean
required
Example:

true

message
string
required
Example:

"Audio transcribed succesfully"

data
object
required