Vercel

This guide demonstrates how to build an agentic RAG (Retrieval-Augmented Generation) application using vectorstores with Vercel’s AI SDK. The example shows how to create a vector store index from documents and use it as a tool within Vercel’s streaming text generation.

Overview

The agentic RAG example combines:

vectorstores for document indexing and retrieval
Vercel AI SDK for streaming text generation with tool calling
OpenAI as the LLM provider

The application allows the LLM to autonomously query your knowledge base by providing it with a retrieval tool, enabling multi-step reasoning and information gathering.

Complete Example

Here’s the full example code:

import { openai } from "@ai-sdk/openai";
import { Document, formatLLM, Settings, VectorStoreIndex } from "@vectorstores/core";
import { stepCountIs, streamText, tool } from "ai";
import fs from "node:fs/promises";
import { fileURLToPath } from "node:url";
import { OpenAI } from "openai";
import { z } from "zod";

async function main() {
  // Ensure OpenAI API key is available
  if (!process.env.OPENAI_API_KEY) {
    console.error('Error: OpenAI API key not found in environment variables.');
    return;
  }

  // Configure OpenAI embeddings with vectorstores
   const openaiClient = new OpenAI();
  Settings.embedFunc = async (input) => {
    const { data } = await openaiClient.embeddings.create({
      model: "text-embedding-3-small",
      input,
    });
    return data.map((d) => d.embedding);
  };

  const filePath = fileURLToPath(
    new URL("../shared/data/abramov.txt", import.meta.url),
  );
  const essay = await fs.readFile(filePath, "utf-8");
  const document = new Document({ text: essay, id_: filePath });

  const index = await VectorStoreIndex.fromDocuments([document]);
  console.log("Successfully created index");

  const retriever = index.asRetriever();
  const result = streamText({
    model: openai("gpt-5.1-mini"),
    prompt: "Cost of moving cat from Russia to UK?",
    tools: {
      queryTool: tool({
        description:
          "get information from your knowledge base to answer questions.",
        inputSchema: z.object({
          query: z
            .string()
            .describe("The query to get information about your documents."),
        }),
        execute: async ({ query }) => {
          return (
            formatLLM(await retriever.retrieve({ query })) ||
            "No result found in documents"
          );
        },
      }),
    },
    stopWhen: stepCountIs(5),
  });

  for await (const textPart of result.textStream) {
    process.stdout.write(textPart);
  }
}

main().catch(console.error);

Step-by-Step Explanation

1. Setup and Configuration

The example starts by ensuring the OpenAI API key is available and configuring the embedding model:

// Ensure OpenAI API key is available
if (!process.env.OPENAI_API_KEY) {
  console.error("OpenAI API key not found in environment variables.");
  return;
}

// Configure OpenAI embeddings
const openaiClient = new OpenAI();
Settings.embedFunc = async (input) => {
  const { data } = await openaiClient.embeddings.create({
    model: "text-embedding-3-small",
    input,
  });
  return data.map((d) => d.embedding);
};

2. Loading and Indexing Documents

A document is loaded from a file and indexed:

const filePath = fileURLToPath(
  new URL("../shared/data/abramov.txt", import.meta.url),
);
const essay = await fs.readFile(filePath, "utf-8");
const document = new Document({ text: essay, id_: filePath });

const index = await VectorStoreIndex.fromDocuments([document]);

The document is read from the filesystem
A Document object is created with the text content and a unique ID
VectorStoreIndex.fromDocuments() creates a searchable vector index from the document

3. Creating a Retriever

A retriever is created from the index to enable querying:

const retriever = index.asRetriever();

The retriever can search the indexed documents and return relevant chunks based on semantic similarity.

4. Defining the Query Tool

A tool is defined that allows the LLM to query the knowledge base:

queryTool: tool({
  description: "get information from your knowledge base to answer questions.",
  inputSchema: z.object({
    query: z
      .string()
      .describe("The query to get information about your documents."),
  }),
  execute: async ({ query }) => {
    return (
      formatLLM(await retriever.retrieve({ query })) ||
      "No result found in documents"
    );
  },
}),

Key components:

description: Tells the LLM when and how to use this tool
inputSchema: Defines the tool’s input parameters using Zod
execute: The function that runs when the tool is called
- Retrieves relevant document chunks using the retriever
- Formats the results using formatLLM() for LLM consumption
- Returns a fallback message if no results are found

5. Streaming Text Generation

The streamText function generates responses with tool calling capabilities:

const result = streamText({
  model: openai("gpt-4o"),
  prompt: "Cost of moving cat from Russia to UK?",
  tools: { queryTool },
  stopWhen: stepCountIs(5),
});

model: Uses OpenAI’s GPT-4o model via Vercel’s AI SDK
prompt: The user’s question
tools: Makes the query tool available to the LLM
stopWhen: stepCountIs(5): Limits the agent to 5 reasoning steps to prevent infinite loops

6. Streaming the Response

The response is streamed to the console:

for await (const textPart of result.textStream) {
  process.stdout.write(textPart);
}

This allows the user to see the response as it’s generated, providing a better user experience.

How It Works

The LLM receives the user’s question
It decides whether to use the queryTool to search the knowledge base
If it calls the tool, the retriever searches the indexed documents
The retrieved information is formatted and returned to the LLM
The LLM uses this information to generate a response
The process can repeat for multi-step reasoning (up to 5 steps)
The final response is streamed to the user

Key Benefits

Autonomous Information Retrieval: The LLM decides when to query the knowledge base
Multi-step Reasoning: Can perform multiple queries to gather comprehensive information
Streaming Responses: Provides real-time feedback to users
Flexible Tool Usage: The LLM uses tools only when needed

Next Steps

Experiment with different retrieval strategies and tool configurations to improve the agent’s performance
Try using different Vercel AI model providers