Skip to content

Instantly share code, notes, and snippets.

@vguhesan
Last active February 17, 2026 00:50
Show Gist options
  • Select an option

  • Save vguhesan/4843dc59d2b9e3445073aa1034181496 to your computer and use it in GitHub Desktop.

Select an option

Save vguhesan/4843dc59d2b9e3445073aa1034181496 to your computer and use it in GitHub Desktop.
RAG Implementation using LangChain, OpenAI, [ChromaDB or SingleStore]

RAG Implementation using LangChain, OpenAI, with ChromaDB or SingleStore

(LangChain, OpenAI, with ChromaDB or SingleStore)

This solutioning requires internet and API access to tools and services externally

A simple RAG system over a text document using LangChain, OpenAI for embeddings/LLM, and Chroma as the vector DB.

Prerequisites

Install packages

pip install langchain langchain-openai langchain-chroma openai chromadb

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key"

It is recommended that you setup a venv (Python Environment) for this code to create an isolated Python environment.

RAG implementation as an Agent

In the ras-as-an-agent.py we will enhance the setup by implementing the RAG as an Agent (in another Gist)

import os
# See prerequisite: OpenAI API key:
os.environ["OPENAI_API_KEY"] = "your-api-key"
# =========================================
# Step 1: Load and Split Documents
# Load data and chunk it to ~500-1000 tokens per piece with overlap for context.
from langchain_community.document_loaders import WebBaseLoader # Or TextLoader for local files
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load example data (e.g., a blog post)
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
# For PDFs or custom files, use PyPDFLoader or similar.
# =========================================
# Step 2: Embed and Index in Vector DB
# Use embeddings to vectorize chunks and store them.
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = Chroma.from_documents(documents=splits, embedding=embeddings)
# As an alternative you can store them in SingleStoreDB
# from langchain_community.vectorstores.singlestoredb import SingleStoreDB
# SingleStore integrates well for production-scale apps with SQL querying alongside vectors.
# vector_database = SingleStoreDB.from_documents(splits, embeddings, table_name="my_rag_table")
# =========================================
# Step 3: Retrieval
# Embed the query and fetch relevant chunks.
query = "What is task decomposition?"
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retrieved_docs = retriever.invoke(query)
# Combine retrieved content
context = "\n\n".join(doc.page_content for doc in retrieved_docs)
# =========================================
# Step 4: Augment and Generate
# Craft a prompt with context and query, then generate.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-3.5-turbo")
prompt_template = ChatPromptTemplate.from_template(
"Answer the question based on this context: {context}\n\nQuestion: {query}"
)
chain = prompt_template | llm
response = chain.invoke({"context": context, "query": query})
print(response.content)
# =========================================
# This creates a basic RAG chain: retrieve → augment → generate.
# For local LLMs (e.g., Llama 3 via Ollama):
# Install Ollama and pull a model:
# ollama pull llama3
# Replace ChatOpenAI with ChatOllama(model="llama3").
# Implementing RAG as an Agent
# For more dynamic setups (e.g., deciding when to retrieve), use LangChain agents.
# Agents allow conditional retrieval, making them suitable for complex queries.
# Create a Retrieval Tool
from langchain.tools import tool
@tool
def retrieve_context(query: str) -> str:
"""Retrieve relevant context for a query."""
retrieved = vector_store.similarity_search(query, k=2)
return "\n\n".join(doc.page_content for doc in retrieved)
# Build The Agent
from langchain.agents import create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Use the retrieve_context tool when needed."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, [retrieve_context], prompt)
# Run the Agent
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=[retrieve_context])
result = agent_executor.invoke({"input": query})
print(result["output"])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment