Pawa AI Vision Example

Vision Based Models in Pawa AI.
Currently we have one vision model:- pawa-v1-blaze-20250318: A powerful small language model (SLM) optimized for reasoning, complex generation, multimodal, tools understanding, agentic workflow, and advanced knowledge tasks.
How to send multimodal chat request with Pawa AI ?
Use Cases
Image classification
Image classification
Classify the overall content of an image into one or more categories.Example output:
Object detection
Object detection
Detect and locate multiple objects in the image with rough descriptions.Example output:
OCR and document parsing
OCR and document parsing
Extract text or structured fields from receipts, invoices, or forms. For advanced usage you can combine with Example output:
response_format
Best practices
- Resize large images to a reasonable resolution (e.g., 1024px max on the longest side) to reduce latency without losing salient details.
- Prefer
image_url
for repeatable, production traffic. Use signed URLs that expire quickly if assets are private. - Provide clear textual instructions that set expectations (“list objects with colors and approximate positions”).
- For OCR, favor sharp, high‑contrast images. If possible, crop to the relevant region before sending.
- When sending multiple images, order them logically and reference them in the text (“in the second photo, compare…”).
Supported formats and limits
- Formats:
jpg/jpeg
,png
- Max image size: 25 MiB per image
- Number of images per request: no hard limit; practical limits depend on latency and payload size
- Base64 payloads should include the MIME type in the content object
Privacy & security
- Assets sent via
image_url
are fetched server‑side only for the duration of the request. Use time‑limited signed URLs for private content. - Inputs may be logged for abuse monitoring and quality unless you enable data controls in your dashboard.
Troubleshooting
- Image not found: Ensure the
image_url
is publicly reachable or signed correctly; test withcurl -I <url>
. - Payload too large: Downscale images or switch to
image_url
. - Poor OCR quality: Increase resolution, improve lighting/contrast, or crop tighter around text.
- Slow responses: Host images on a nearby region/CDN and reduce image dimensions.