Limits

This includes rate limits, storage limits, and others.

1. Rate Limits

Pawa AI APIs implement rate limits to ensure fair usage and maintain system stability. These limits restrict the number of requests you can make within a specific timeframe. Exceeding these limits may result in temporary blocking of your API key.

Why Rate Limits?

Rate limits are a common practice for APIs, and they’re put in place for several important reasons:

Prevent Abuse & Misuse: They protect against malicious actors flooding the API with requests to overload it or disrupt service. Pawa AI prevent this actions.
Fair Access: They ensure everyone has fair access to the API, preventing one user from consuming excessive resources and impacting others.
Infrastructure Management: They help us manage the overall load on our infrastructure, preventing performance issues and maintaining a consistent experience for all users.

This guide covers Pawa AI’s rate limits: how they work, common issues and solutions (with code examples), and how they automatically increase based on your usage tier.

How Rate Limits work?

Pawa AI measures the rate limits in RPM (request per minute) meaning you have a specific number of requests you can make each minute based on your current tier. For example, for the free tier you can send up to 5 requests per minute, if you send the next requests afterward and within the minute, the request will be discarded. Exceeding this limit results in a rate limiting error.

Tiered Rate Limits

Rate limits vary depending on your Pawa AI subscription tier. You can view the rate and usage limits for your tier under the limits section of your billing settings. If your spend on our API goes up, we recommend you to upgrade to the next tier. This usually results in an increase in rate limits across most models, storage and APIs.

Tier	Requests per Minute (RPM)
Free	5
Starter	50
Pro	500
Enterprise	Unlimited

Please note: These limits are subject to change. You can monitor your current usage and remaining quota through the Usage Dashboard. This dashboard provides real-time insights into your API consumption.

Handling Rate Limit Errors

If you exceed the rate limit, you’ll receive an HTTP 429 “Too Many Requests” error. When this happens, please do the following:

Wait: The most straightforward solution is to wait for the rate limit window to reset. Usually after the one minute.
Implement Exponential Backoff: A more robust approach is to implement exponential backoff in your application. This means increasing the waiting time between retries after each failed attempt. This helps avoid overwhelming the API and improves reliability.

Unsuccessful requests also do contribute to your per-minute limit, so continuously resending a request won’t work.

Using the Tenacity python library

import httpx
from tenacity import retry, stop_after_attempt, wait_random_exponential

API_URL = "https://api.pawa-ai.com/v1/chat/request"
API_KEY = "PAWA_AI_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def chat_with_tenacity(**kwargs):
    with httpx.Client() as client:
        response = client.post(API_URL, headers=headers, json=kwargs)
        response.raise_for_status()
        return response.json()

if __name__ == "__main__":
    result = chat_with_tenacity(
        model="pawa-v1-ember-20240924",
        messages=[
            {
                "role": "system",
                "content": [
                    {"type": "text", "text": "You are a TanzaBot, a helpful assistant who answers questions about Tanzania"}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is the current president of Tanzania?"}
                ]
            }
        ],
        temperature=0.1,
        max_tokens=150,
        stream=False
    )
    print(result)

Using the backoff python library

import backoff
import httpx

API_URL = "https://api.pawa-ai.com/v1/chat/request"
API_KEY = "PAWA_AI_API_KEY" 

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

@backoff.on_exception(backoff.expo, httpx.RequestError, max_tries=5)
def chat_with_backoff(**kwargs):
    with httpx.Client() as client:
        response = client.post(API_URL, headers=headers, json=kwargs)
        response.raise_for_status()
        return response.json()

if __name__ == "__main__":
    result = chat_with_backoff(
        model="pawa-v1-ember-20240924",
        messages=[
            {
                "role": "system",
                "content": [
                    {
                        "type": "text",
                        "text": "You are a TanzaBot, a helpful assistant who answers questions about Tanzania"
                    }
                ]
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What is the current president of Tanzania?"
                    }
                ]
            }
        ],
        temperature=0.1,
        max_tokens=150,
        stream=False
    )
    print(result)

⚠️ Disclaimer: Pawa AI makes no guarantees regarding the efficiency of those solutions for rate limit control. It is intended as a starting point to help you build your own implementation.

2. Storage Limits (Knowledge Base)

Pawa AI lets you build and manage your own knowledge base by uploading files, documents, and have embeddings into our vector store. To ensure fair use and system performance, we apply storage limits based on your subscription tier. These limits define how much data you can store in our vector store database. Once you exceed the quota, you won’t be able to upload more files and build embeddings until you either free up space or upgrade to a higher tier.

How Knowledge Base Storage Limits Work

When you build a knowledge base through our storage platform you upload content (e.g., PDFs, text files, CSVs, Images, Audio), Pawa AI processes and stores them in the vector database.

If your usage is within your tier’s limit, uploads succeed as normal.
If you exceed the limit, new uploads will be rejected with a Storage quota exceeded error.

Example: In the ree Tier, you can store up to 1 GB of knowledge base content. Attempting to upload a new file beyond this limit will fail until you delete existing files or upgrade.

Tiered Storage Limits

Tier	Knowledge Base Storage
Free	1 GB
Starter	5 GB
Pro	50 GB
Enterprise	Unlimited

Enterprise plans can request custom allocations to handle large-scale datasets. You can track how much storage you’ve used in the Usage Dashboard, which shows real-time usage and remaining space.

3. Other limits based on tier type apply to

Supports (custom vs community)
Security checking (basic & advanced)
Intergrations (free or no assistance at all)
Fine-tuning(no vs yes)
Observality(no vs yes), and etc…

Getting Started

Learn More

Capabilities

Agents

Going Production

Guides

Resources

1. Rate Limits

Why Rate Limits?

How Rate Limits work?

Tiered Rate Limits

Handling Rate Limit Errors

2. Storage Limits (Knowledge Base)

How Knowledge Base Storage Limits Work

Tiered Storage Limits

3. Other limits based on tier type apply to

Getting Started

Learn More

Capabilities

Agents

Going Production

Guides

Resources

​1. Rate Limits

​Why Rate Limits?

​How Rate Limits work?

​Tiered Rate Limits

​Handling Rate Limit Errors

​2. Storage Limits (Knowledge Base)

​How Knowledge Base Storage Limits Work

​Tiered Storage Limits

​3. Other limits based on tier type apply to

1. Rate Limits

Why Rate Limits?

How Rate Limits work?

Tiered Rate Limits

Handling Rate Limit Errors

2. Storage Limits (Knowledge Base)

How Knowledge Base Storage Limits Work

Tiered Storage Limits

3. Other limits based on tier type apply to