Streaming Responses

Streaming responses are beneficial for providing real-time updates, enhancing user interaction by allowing text to be displayed as it’s generated. This approach improves user experience by lowering perceived latency — letting text appear word by word, similar to typing, instead of a single block at the end.

Streaming Example
When you use Pawa Chat or the Sandbox, you’re already benefiting from streaming: the text flows in gradually, enhancing interactivity and responsiveness.

Streaming responses use Server-Sent Events (SSE) to deliver partial outputs (deltas) in real time, instead of waiting for the entire generation to complete. To enable streaming, you must set stream : true in your chat request request, or agent chat request. But for the text to speech streaming is already done on our side so you change the request set-up to stream the response from our servers.

Basic chat request with Streaming

curl --request POST \
  --url https://api.pawa-ai.com/v1/chat/request \
  --header "Authorization: Bearer $PAWA_AI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "pawa-v1-ember-20240924",
  "temperature": 0.1,
  "top_p": 0.95,
  "messages": [
      {"role": "user", "content": "Explain RAG in simple terms"}
    ]
  "max_tokens": 4096,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.3,
  "seed": 2024,
  "stream": true
}'

Streaming Response Example

You will receive the chunks of the response, been given out at a time, so so you should show that to the user in real-time.

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"N"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"ait"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"wa"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":" P"},"status":null,"createdAt":null,"updatedAt":null}}

{"success":true,"message":"Chunk completions generated successfully","data":{"id":null,"conversationId":null,"feedback":null,"message":{"role":"assistant","content":"awa"},"status":null,"createdAt":null,"updatedAt":null}}

Tool calling request with Streaming.

In a tool-calling request, when the model decides to call one of your custom tools, it will not stream the response, even if stream=true is set.
This makes it easier to parse the tool response and then send a follow-up request.

curl --request POST \
  --url https://api.pawa-ai.com/v1/chat/request \
  --header "Authorization: Bearer $PAWA_AI_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "pawa-v1-ember-20240924",
  "temperature": 0.1,
  "top_p": 0.95,
  "messages": [
      {"role": "user", "content": "convert 45 USD to tsh"}
    ]
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "convert_usd_to_tsh",
        "description": "Converts an amount in USD to Tanzanian Shillings.",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "amount_usd": {
              "description": "Amount in USD",
              "type": "number"
            }
          },
          "required": [
            "amount_usd"
          ],
          "additionalProperties": false
        }
      }
    }
  ],
  "max_tokens": 4096,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.3,
  "seed": 2024,
  "stream": true
}'

Response format example for tools calling stream

{
  "success": true,
  "message": "Chat request processed successfully",
  "data": {
    "request": [
      {
        "finish_reason": "tool_calls",
        "message": {
          "role": "assistant",
          "content": "In this request, the model has made a tool call, extract the tool call from the response, process it accordingly and return the result",
  "tool_calls": [
    { "id": "call_01", "type": "function", "function": { "name": "convert_usd_to_tsh", "arguments": { "amount_usd": "45"} } }
  ]
        }
      }
    ],
    "created": "2025-09-25",
    "model": "pawa-v1-blaze-20250318",
    "object": "chat.request"
  }
}

Error recovery

When a streaming request is interrupted due to network issues, timeouts, or other errors, you can recover by resuming from where the stream was interrupted. This approach saves you from re-processing the entire response.

The basic recovery strategy involves:

Capture the partial response: Save all content that was successfully received before the error occurred
Construct a continuation request: Create a new API request that includes the partial assistant response as the beginning of a new assistant message
Resume streaming: Continue receiving the rest of the response from where it was interrupted

Getting Started

Learn More

Capabilities

Agents

Going Production

Guides

Resources

Basic chat request with Streaming

Streaming Response Example

Tool calling request with Streaming.

Response format example for tools calling stream

Error recovery

The basic recovery strategy involves:

Getting Started

Learn More

Capabilities

Agents

Going Production

Guides

Resources

​Basic chat request with Streaming

​Streaming Response Example

​Tool calling request with Streaming.

​Response format example for tools calling stream

​Error recovery

​The basic recovery strategy involves:

Basic chat request with Streaming

Streaming Response Example

Tool calling request with Streaming.

Response format example for tools calling stream

Error recovery

The basic recovery strategy involves: