Skip to main content
Streaming responses are beneficial for providing real-time updates, enhancing user interaction by allowing text to be displayed as it’s generated. This approach improves user experience by lowering perceived latency — letting text appear word by word, similar to typing, instead of a single block at the end.
Streaming Example
When you use Pawa Chat or the Sandbox, you’re already benefiting from streaming: the text flows in gradually, enhancing interactivity and responsiveness.
Streaming responses use Server-Sent Events (SSE) to deliver partial outputs (deltas) in real time, instead of waiting for the entire generation to complete. To enable streaming, you must set stream : true in your chat request request, or agent chat request. But for the text to speech streaming is already done on our side so you change the request set-up to stream the response from our servers.

Basic chat request with Streaming

curl --request POST \
  --url https://api.pawa-ai.com/v1/chat/request \
  --header "Authorization: Bearer $PAWA_AI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "pawa-v1-ember-20240924",
  "temperature": 0.1,
  "top_p": 0.95,
  "messages": [
      {"role": "user", "content": "Explain RAG in simple terms"}
    ]
  "max_tokens": 4096,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.3,
  "seed": 2024,
  "stream": true
}'

Streaming Response Example

You will receive the chunks of the response, been given out at a time, so so you should show that to the user in real-time.
{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"N"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"ait"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"wa"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":" P"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"awa"},"status":null,"createdAt":null,"updatedAt":null}}

Tool calling request with Streaming.

In a tool-calling request, when the model decides to call one of your custom tools, it will not stream the response, even if stream=true is set.
This makes it easier to parse the tool response and then send a follow-up request.
curl --request POST \
  --url https://api.pawa-ai.com/v1/chat/request \
  --header "Authorization: Bearer $PAWA_AI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "pawa-v1-ember-20240924",
  "temperature": 0.1,
  "top_p": 0.95,
  "messages": [
      {"role": "user", "content": "convert 45 USD to tsh"}
    ]
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "convert_usd_to_tsh",
        "description": "Converts an amount in USD to Tanzanian Shillings.",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "amount_usd": {
              "description": "Amount in USD",
              "type": "number"
            }
          },
          "required": [
            "amount_usd"
          ],
          "additionalProperties": false
        }
      }
    }
  ],
  "max_tokens": 4096,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.3,
  "seed": 2024,
  "stream": true
}'

Response format example for tools calling stream

{
  "success": true,
  "message": "Chat request processed successfully",
  "data": {
    "request": [
      {
        "finish_reason": "tool_calls",
        "message": {
          "role": "assistant",
          "content": "In this request, the model has made a tool call, extract the tool call from the response, process it accordingly and return the result",
  "tool_calls": [
    { "id": "call_01", "type": "function", "function": { "name": "convert_usd_to_tsh", "arguments": { "amount_usd": "45"} } }
  ]
        }
      }
    ],
    "created": "2025-09-25",
    "model": "pawa-v1-blaze-20250318",
    "object": "chat.request"
  }
}

Error recovery

When a streaming request is interrupted due to network issues, timeouts, or other errors, you can recover by resuming from where the stream was interrupted. This approach saves you from re-processing the entire response.

The basic recovery strategy involves:

  • Capture the partial response: Save all content that was successfully received before the error occurred
  • Construct a continuation request: Create a new API request that includes the partial assistant response as the beginning of a new assistant message
  • Resume streaming: Continue receiving the rest of the response from where it was interrupted ​
⌘I