Vision

With Pawa AI’s vision capabilities, your applications can go beyond text-only understanding and incorporate visual information into reasoning, analysis, and decision-making.

Pawa AI Vision Example

For more specific use cases related to document parsing and data extraction, we recommend exploring our Document Parsing documentation here.

Vision Based Models in Pawa AI.

Currently we have one vision model:

pawa-v1-blaze-20250318: A powerful small language model (SLM) optimized for reasoning, complex generation, multimodal, tools understanding, agentic workflow, and advanced knowledge tasks.

How to send multimodal chat request with Pawa AI ?

curl https://api.pawa-ai.com/v1/chat/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PAWA_AI_API_KEY" \
  -d '{
    "model": "pawa-v1-blaze-20250318",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe the main objects in this image and their colors." },
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/image.jpg" } }
        ]
      }
    ]
  }'

curl https://api.pawa-ai.com/v1/chat/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PAWA_AI_API_KEY" \
  -d '{
    "model": "pawa-v1-blaze-20250318",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is the difference of these images?" },
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/image.jpg" } }
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/image.jpg" } }
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/image.jpg" } }
        ]
      }
    ]
  }'

Use Cases

Image classification

Classify the overall content of an image into one or more categories.

   curl https://api.pawa-ai.com/v1/chat/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PAWA_AI_API_KEY" \
  -d '{
    "model": "pawa-v1-blaze-20250318",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Classify this image by scene and mood." },
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/image.jpg" } }
        ]
      }
    ]
  }'

Example output:

{
  "labels": ["outdoor", "campfire", "night"],
  "confidence": {"outdoor": 0.97, "campfire": 0.94, "night": 0.90},
  "summary": "A night outdoor scene with a campfire and people nearby."
}

Object detection

Detect and locate multiple objects in the image with rough descriptions.

curl https://api.pawa-ai.com/v1/chat/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PAWA_AI_API_KEY" \
  -d '{
    "model": "pawa-v1-blaze-20250318",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "List the objects you see and their approximate positions." },
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/store-shelf.jpg" } }
        ]
      }
    ]
  }'

Example output:

{
  "objects": [
    {"label": "laptop", "bbox": [120, 80, 460, 320]},
    {"label": "coffee-cup", "bbox": [480, 260, 560, 340]}
  ],
  "summary": "A laptop on a desk with a coffee cup on the right."
}

OCR and document parsing

Extract text or structured fields from receipts, invoices, or forms. For advanced usage you can combine with response_format

curl https://api.pawa-ai.com/v1/chat/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PAWA_AI_API_KEY" \
  -d '{
    "model": "pawa-v1-blaze-20250318",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Extract the total, date and vendor name from this receipt." },
          { "type": "image_url", "image_url": { "url": "https://example.com/path/to/receipt.jpg" } }
        ]
      }
    ]
  }'

Example output:

{
  "vendor": "Moshi Supermarket",
  "date": "2025-03-18",
  "total": 16450,
  "currency": "TZS"
}

Best practices

Resize large images to a reasonable resolution (e.g., 1024px max on the longest side) to reduce latency without losing salient details.
Prefer image_url for repeatable, production traffic. Use signed URLs that expire quickly if assets are private.
Provide clear textual instructions that set expectations (“list objects with colors and approximate positions”).
For OCR, favor sharp, high‑contrast images. If possible, crop to the relevant region before sending.
When sending multiple images, order them logically and reference them in the text (“in the second photo, compare…”).

Supported formats and limits

Formats: jpg/jpeg, png
Max image size: 25 MiB per image
Number of images per request: no hard limit; practical limits depend on latency and payload size
Base64 payloads should include the MIME type in the content object

Privacy & security

Assets sent via image_url are fetched server‑side only for the duration of the request. Use time‑limited signed URLs for private content.
Inputs may be logged for abuse monitoring and quality unless you enable data controls in your dashboard.

Troubleshooting

Image not found: Ensure the image_url is publicly reachable or signed correctly; test with curl -I <url>.
Payload too large: Downscale images or switch to image_url.
Poor OCR quality: Increase resolution, improve lighting/contrast, or crop tighter around text.
Slow responses: Host images on a nearby region/CDN and reduce image dimensions.

Getting Started

Learn More

Capabilities

Agents

Going Production

Guides

Resources

Vision Based Models in Pawa AI.

How to send multimodal chat request with Pawa AI ?

Use Cases

Best practices

Supported formats and limits

Privacy & security

Troubleshooting

Getting Started

Learn More

Capabilities

Agents

Going Production

Guides

Resources

​Vision Based Models in Pawa AI.

​How to send multimodal chat request with Pawa AI ?

​Use Cases

​Best practices

​Supported formats and limits

​Privacy & security

​Troubleshooting

Vision Based Models in Pawa AI.

How to send multimodal chat request with Pawa AI ?

Use Cases

Best practices

Supported formats and limits

Privacy & security

Troubleshooting