Skip to content

Vercel

This guide demonstrates how to build an agentic RAG (Retrieval-Augmented Generation) application using vectorstores with Vercel’s AI SDK. The example shows how to create a vector store index from documents and use it as a tool within Vercel’s streaming text generation.

The agentic RAG example combines:

  • vectorstores for document indexing and retrieval
  • Vercel AI SDK for streaming text generation with tool calling
  • OpenAI as the LLM provider

The application allows the LLM to autonomously query your knowledge base by providing it with a retrieval tool, enabling multi-step reasoning and information gathering.

Here’s the full example code:

import { openai } from "@ai-sdk/openai";
import { Document, formatLLM, Settings, VectorStoreIndex } from "@vectorstores/core";
import { stepCountIs, streamText, tool } from "ai";
import fs from "node:fs/promises";
import { fileURLToPath } from "node:url";
import { OpenAI } from "openai";
import { z } from "zod";
async function main() {
// Ensure OpenAI API key is available
if (!process.env.OPENAI_API_KEY) {
console.error('Error: OpenAI API key not found in environment variables.');
return;
}
// Configure OpenAI embeddings with vectorstores
const openaiClient = new OpenAI();
Settings.embedFunc = async (input) => {
const { data } = await openaiClient.embeddings.create({
model: "text-embedding-3-small",
input,
});
return data.map((d) => d.embedding);
};
const filePath = fileURLToPath(
new URL("../shared/data/abramov.txt", import.meta.url),
);
const essay = await fs.readFile(filePath, "utf-8");
const document = new Document({ text: essay, id_: filePath });
const index = await VectorStoreIndex.fromDocuments([document]);
console.log("Successfully created index");
const retriever = index.asRetriever();
const result = streamText({
model: openai("gpt-5.1-mini"),
prompt: "Cost of moving cat from Russia to UK?",
tools: {
queryTool: tool({
description:
"get information from your knowledge base to answer questions.",
inputSchema: z.object({
query: z
.string()
.describe("The query to get information about your documents."),
}),
execute: async ({ query }) => {
return (
formatLLM(await retriever.retrieve({ query })) ||
"No result found in documents"
);
},
}),
},
stopWhen: stepCountIs(5),
});
for await (const textPart of result.textStream) {
process.stdout.write(textPart);
}
}
main().catch(console.error);

The example starts by ensuring the OpenAI API key is available and configuring the embedding model:

// Ensure OpenAI API key is available
if (!process.env.OPENAI_API_KEY) {
console.error("OpenAI API key not found in environment variables.");
return;
}
// Configure OpenAI embeddings
const openaiClient = new OpenAI();
Settings.embedFunc = async (input) => {
const { data } = await openaiClient.embeddings.create({
model: "text-embedding-3-small",
input,
});
return data.map((d) => d.embedding);
};

A document is loaded from a file and indexed:

const filePath = fileURLToPath(
new URL("../shared/data/abramov.txt", import.meta.url),
);
const essay = await fs.readFile(filePath, "utf-8");
const document = new Document({ text: essay, id_: filePath });
const index = await VectorStoreIndex.fromDocuments([document]);
  • The document is read from the filesystem
  • A Document object is created with the text content and a unique ID
  • VectorStoreIndex.fromDocuments() creates a searchable vector index from the document

A retriever is created from the index to enable querying:

const retriever = index.asRetriever();

The retriever can search the indexed documents and return relevant chunks based on semantic similarity.

A tool is defined that allows the LLM to query the knowledge base:

queryTool: tool({
description: "get information from your knowledge base to answer questions.",
inputSchema: z.object({
query: z
.string()
.describe("The query to get information about your documents."),
}),
execute: async ({ query }) => {
return (
formatLLM(await retriever.retrieve({ query })) ||
"No result found in documents"
);
},
}),

Key components:

  • description: Tells the LLM when and how to use this tool
  • inputSchema: Defines the tool’s input parameters using Zod
  • execute: The function that runs when the tool is called
    • Retrieves relevant document chunks using the retriever
    • Formats the results using formatLLM() for LLM consumption
    • Returns a fallback message if no results are found

The streamText function generates responses with tool calling capabilities:

const result = streamText({
model: openai("gpt-4o"),
prompt: "Cost of moving cat from Russia to UK?",
tools: { queryTool },
stopWhen: stepCountIs(5),
});
  • model: Uses OpenAI’s GPT-4o model via Vercel’s AI SDK
  • prompt: The user’s question
  • tools: Makes the query tool available to the LLM
  • stopWhen: stepCountIs(5): Limits the agent to 5 reasoning steps to prevent infinite loops

The response is streamed to the console:

for await (const textPart of result.textStream) {
process.stdout.write(textPart);
}

This allows the user to see the response as it’s generated, providing a better user experience.

  1. The LLM receives the user’s question
  2. It decides whether to use the queryTool to search the knowledge base
  3. If it calls the tool, the retriever searches the indexed documents
  4. The retrieved information is formatted and returned to the LLM
  5. The LLM uses this information to generate a response
  6. The process can repeat for multi-step reasoning (up to 5 steps)
  7. The final response is streamed to the user
  • Autonomous Information Retrieval: The LLM decides when to query the knowledge base
  • Multi-step Reasoning: Can perform multiple queries to gather comprehensive information
  • Streaming Responses: Provides real-time feedback to users
  • Flexible Tool Usage: The LLM uses tools only when needed
  1. Experiment with different retrieval strategies and tool configurations to improve the agent’s performance
  2. Try using different Vercel AI model providers