Skip to main content
Before diving into Agentic RAG, it’s helpful to first understand the fundamentals of RAG. Traditional RAG typically requires generating embeddings for every request you send to a language model. While effective, this approach can be time-consuming, limited and often introduces additional costs, especially when using other LLM providers.
Traditional RAG Example
Agentic RAG Flow Recent advancements in language models have introduced the concept of Agents.
An Agent is an AI-driven system that leverages reasoning, planning, and tool use to achieve user-defined objectives. By combining decision-making with execution, an agent can dynamically determine the best path to solve a task.
In the context of RAG, this means that instead of always performing embeddings and semantic retrieval for every query even of those that does not require it, Pawa AI implements Agentic RAG. Our models can now intelligently reason about the request and decide whether:
  • The query can be answered directly by the model, without retrieval, use available tools supplied such as web_search_tool or
  • Retrieval is required, in which case the model automatically calls the RAG tool, fetches the relevant knowledge, and generates an accurate, context-grounded response, and doing this in muilt-steps if required.
In an enterprise setting where data sources are diverse with non-homogeneous formats this approach becomes even more important. For example, the data sources could be a mix of structured, semi-structured and unstructured data. This is where agentic RAG comes into play. This agentic approach reduces unnecessary computation, lowers cost, and improves efficiency—all while delivering reliable answers powered by your data.

Agentic Models in Pawa AI.

Currently we have one agentic model:
  • pawa-v1-blaze-20250318: Our smaller version language that is Fast, visual, tools & most intelligent.

Implementing Agentic RAG With Pawa AI in Your System

To implement Agentic RAG, you first need a Knowledge Base (KB) — think of it as your private reference library for the Pawa model. Traditionally, creating a KB involves setting up complex pipelines for extraction, chunking, embeddings, and retrieval. Each step needs to be efficient; otherwise, you risk retrieving irrelevant sources and producing incorrect answers. With Pawa AI, we’ve simplified this process. Our platform automatically handles extraction, chunking, and embedding using our models. Once complete, you’ll receive a Knowledge Base Reference ID, which you can pass as a parameter during chat requests. The Pawa API then incorporates your private reference source using semantic retrieval, ensuring that the model answers accurately based on your custom data.

Step 1: Create a Pawa AI Account

  • Sign up for a Pawa AI account.
  • Verify your email and log in as a Builder.

Step 2: Create a Knowledge Base

  • Navigate to the Storage page in your Builder Dashboard.
  • Click Create Vector Store under the Vectors section.
  • Upload files in supported formats: .pdf, .docx, .txt, .mp3, .wav, .pptx, .xlsx, .png, .jpg, or provide URLs to your websites or online data.
  • Copy the generated Knowledge Base Reference ID. Eg. kb-930d251e-8a8b-4ba9-bae1-5fceb47bd654

Step 3: Make Your RAG-Based API Request

  • Use the Knowledge Base Reference ID in your chat API request.
  • Make sure the model is always pawa-v1-blaze-20250318.
  • Send the knowledge base as isMust: False
  • You can optional send tools, or combining reasoning.
  • The model will decide either to answer normally or retrieve relevant information from your KB to generate grounded, context-aware responses.
curl --request POST \
     --url https://api.pawa-ai.com/v1/chat/request \
     --header "Authorization: Bearer $PAWA_AI_API_KEY" \  
     --header 'Content-Type: application/json' \
     --data '{
                 "messages": [
                  {
                     "role": "user",
                     "content": [
                           {
                              "type": "text",
                              "text": "What was the last years sales of the company?"
                           }
                     ]
                  }]
               "knowledgeBase": {
                  "kbReferenceId": "kb-930d251e-8a8b-4ba9-bae1-5fceb47bd654",
                  "isMust": "False"
               }}'

Example Response demonstrating Agentic RAG based chat

{
  "success": true,
  "message": "Chat request processed successfully",
  "data": {
    "request": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "At Pawa AI, the last year sales was about 400K$ ARR, do you have any question i can help with?"
        },
        "matched_stop": 128001
      }
    ],
    "created": "2025-09-09",
    "model": "pawa-v1-ember-20240924",
    "object": "chat.request"
  }
}
So this saves cost, and at ease. With this capability you don’t have to worry about building complex retrieval pipelines or managing embeddings yourself, you wont be charged for embeddings and you will reduce latency. Pawa AI handles the heavy lifting, letting you focus on integrating your data and delivering accurate, context-aware responses to your users.
I