Skip to content

Instantly share code, notes, and snippets.

@anpigon
Last active January 16, 2025 13:25
Show Gist options
  • Select an option

  • Save anpigon/6f736459a68aa7d9c923740d11ed11a4 to your computer and use it in GitHub Desktop.

Select an option

Save anpigon/6f736459a68aa7d9c923740d11ed11a4 to your computer and use it in GitHub Desktop.
Local RAG agent with LLaMA3
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-stderr\n",
"%pip install --quiet -U langchain langchain_community tiktoken langchain-nomic \"nomic[local]\" langchain-ollama scikit-learn langgraph tavily-python bs4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Local RAG agent with LLaMA3\n",
"\n",
"> 원문: https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag_local/\n",
"\n",
"우리는 RAG 논문에서 아이디어를 모아 RAG 에이전트를 만들 거예요:\n",
"\n",
"- **Routing**: Adaptive RAG ([논문](https://arxiv.org/abs/2403.14403)). 질문을 다양한 검색 방식으로 라우팅해요\n",
"- **Fallback**: Corrective RAG ([논문](https://arxiv.org/pdf/2401.15884.pdf)). 문서가 쿼리와 관련이 없으면 웹 검색으로 대체해요\n",
"- **Self-correction**: Self-RAG ([논문](https://arxiv.org/abs/2310.11511)). 헛소리로 정답을 수정하거나 질문을 해결하지 않아요\n",
"\n",
"<img src=\"https://i.imgur.com/l47fEVV.png\" width=800>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Local models[¶](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag_local/#local-models)\n",
"\n",
"### Embedding\n",
"\n",
"[GPT4All Embeddings](https://blog.nomic.ai/posts/nomic-embed-text-v1):\n",
"\n",
"`pip install langchain-nomic`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### LLM\n",
"\n",
"[Ollama](https://x.com/ollama/status/1839007158865899651)와 [llama3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/)를 사용:\n",
"\n",
"`ollama pull llama3.2:3b-instruct-fp16`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"### LLM\n",
"from langchain_ollama import ChatOllama\n",
"\n",
"# 올라마 모델명\n",
"local_llm = 'llama3.2:3b-instruct-fp16'\n",
"\n",
"# 일반적인 응답 모델\n",
"llm = ChatOllama(model=local_llm, temperature=0)\n",
"\n",
"# json으로 출력 모델\n",
"llm_json_mode = ChatOllama(model=local_llm, temperature=0, format='json')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tavuky API Key 설정\n",
"\n",
"웹검색을 위해 우리는 LLM과 RAG에 최적화된 검색 엔진인 [Tavily](https://tavily.com/)를 사용합니다."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()\n",
"\n",
"def _set_env(var: str):\n",
" if not os.environ.get(var):\n",
" os.environ[var] = getpass.getpass(f\"Enter {var}: \")\n",
"\n",
"_set_env(\"TAVILY_API_KEY\")\n",
"os.environ['TOKENIZERS_PARALLELISM'] = 'true'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 트레이싱(Tracing)\n",
"\n",
"선택적으로 트레이싱에는 [LangSmith](https://www.langchain.com/langsmith)를 사용합니다."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"_set_env(\"LANGCHAIN_API_KEY\")\n",
"os.environ[\"LANGCHAIN_TRACING_V2\"] = 'true'\n",
"os.environ[\"LANGCHAIN_PROJECT\"] = 'local-llama32-rag'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 웹검색 도구(Web Search Tool)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"### Search\n",
"from langchain_community.tools.tavily_search import TavilySearchResults\n",
"\n",
"web_search_tool = TavilySearchResults(k=3);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 벡터스토어(Vectorstore)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Downloading: 100%|██████████| 274M/274M [00:05<00:00, 51.3MiB/s] \n",
"Verifying: 100%|██████████| 274M/274M [00:00<00:00, 625MiB/s] \n"
]
}
],
"source": [
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain_community.document_loaders import WebBaseLoader\n",
"from langchain_community.vectorstores import SKLearnVectorStore\n",
"from langchain_nomic.embeddings import NomicEmbeddings\n",
"\n",
"urls = [\n",
" \"https://lilianweng.github.io/posts/2023-06-23-agent/\",\n",
" \"https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/\",\n",
" \"https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/\",\n",
"]\n",
"\n",
"# 문서 로더\n",
"docs = WebBaseLoader(urls).load()\n",
"\n",
"# 문서 분할\n",
"text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
" chunk_size = 1000,\n",
" chunk_overlap=200,\n",
")\n",
"docs_splits = text_splitter.split_documents(docs)\n",
"\n",
"# 벡터스토어 생성\n",
"vectorstore = SKLearnVectorStore.from_documents(\n",
" embedding = NomicEmbeddings(model=\"nomic-embed-text-v1.5\", inference_mode=\"local\"), \n",
" documents = docs_splits,\n",
")\n",
"\n",
"# 리트리버 생성\n",
"retriever = vectorstore.as_retriever(k=3)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'id': 'fa10df4b-5401-4ec0-a1c4-eb57454eeee0', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)\\n\\nPrompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.\\n\\n\\nPlanning & Reacting: translate the reflections and the environment information into actions\\n\\nPlanning is essentially in order to optimize believability at the moment vs in time.\\nPrompt template: {Intro of an agent X}. Here is X\\'s plan today in broad strokes: 1)\\nRelationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.\\nEnvironment information is present in a tree structure.\\n\\n\\n\\n\\nFig. 13. The generative agent architecture. (Image source: Park et al. 2023)\\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\\nProof-of-Concept Examples#\\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\\nYou are {{ai-name}}, {{user-provided AI bot description}}.\\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\\n\\nGOALS:\\n\\n1. {{user-provided goal 1}}\\n2. {{user-provided goal 2}}\\n3. ...\\n4. ...\\n5. ...\\n\\nConstraints:\\n1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.\\n2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.\\n3. No user assistance\\n4. Exclusively use the commands listed in double quotes e.g. \"command name\"\\n5. Use subprocesses for commands that will not terminate within a few minutes'),\n",
" Document(metadata={'id': '7222226d-772f-4696-b694-25fed7f3df27', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content=\"LLM Powered Autonomous Agents | Lil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPosts\\n\\n\\n\\n\\nArchive\\n\\n\\n\\n\\nSearch\\n\\n\\n\\n\\nTags\\n\\n\\n\\n\\nFAQ\\n\\n\\n\\n\\nemojisearch.app\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n LLM Powered Autonomous Agents\\n \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\\n\\n\\n \\n\\n\\nTable of Contents\\n\\n\\n\\nAgent System Overview\\n\\nComponent One: Planning\\n\\nTask Decomposition\\n\\nSelf-Reflection\\n\\n\\nComponent Two: Memory\\n\\nTypes of Memory\\n\\nMaximum Inner Product Search (MIPS)\\n\\n\\nComponent Three: Tool Use\\n\\nCase Studies\\n\\nScientific Discovery Agent\\n\\nGenerative Agents Simulation\\n\\nProof-of-Concept Examples\\n\\n\\nChallenges\\n\\nCitation\\n\\nReferences\\n\\n\\n\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview#\\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\\n\\nPlanning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\"),\n",
" Document(metadata={'id': '58a746d5-0cd7-4b13-8b6e-278534c50163', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Planning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n\\n\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\\nSelf-Reflection#\\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\\nThought: ...\\nAction: ...\\nObservation: ...\\n... (Repeated many times)'),\n",
" Document(metadata={'id': '080b35eb-5a3c-40a0-bc5f-a444086474f5', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:\\n\\nExplicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semantic memory (facts and concepts).\\nImplicit / procedural memory: This type of memory is unconscious and involves skills and routines that are performed automatically, like riding a bike or typing on a keyboard.\\n\\n\\n\\n\\nFig. 8. Categorization of human memory.\\nWe can roughly consider the following mappings:\\n\\nSensory memory as learning embedding representations for raw inputs, including text, image or other modalities;\\nShort-term memory as in-context learning. It is short and finite, as it is restricted by the finite context window length of Transformer.\\nLong-term memory as the external vector store that the agent can attend to at query time, accessible via fast retrieval.\\n\\nMaximum Inner Product Search (MIPS)#\\nThe external memory can alleviate the restriction of finite attention span. A standard practice is to save the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS). To optimize the retrieval speed, the common choice is the approximate nearest neighbors (ANN)\\u200b algorithm to return approximately top k nearest neighbors to trade off a little accuracy lost for a huge speedup.\\nA couple common choices of ANN algorithms for fast MIPS:\\n\\nLSH (Locality-Sensitive Hashing): It introduces a hashing function such that similar input items are mapped to the same buckets with high probability, where the number of buckets is much smaller than the number of inputs.\\nANNOY (Approximate Nearest Neighbors Oh Yeah): The core data structure are random projection trees, a set of binary trees where each non-leaf node represents a hyperplane splitting the input space into half and each leaf stores one data point. Trees are built independently and at random, so to some extent, it mimics a hashing function. ANNOY search happens in all the trees to iteratively search through the half that is closest to the query and then aggregates the results. The idea is quite related to KD tree but a lot more scalable.\\nHNSW (Hierarchical Navigable Small World): It is inspired by the idea of small world networks where most nodes can be reached by any other nodes within a small number of steps; e.g. “six degrees of separation” feature of social networks. HNSW builds hierarchical layers of these small-world graphs, where the bottom layers contain the actual data points. The layers in the middle create shortcuts to speed up search. When performing a search, HNSW starts from a random node in the top layer and navigates towards the target. When it can’t get any closer, it moves down to the next layer, until it reaches the bottom layer. Each move in the upper layers can potentially cover a large distance in the data space, and each move in the lower layers refines the search quality.\\nFAISS (Facebook AI Similarity Search): It operates on the assumption that in high dimensional space, distances between nodes follow a Gaussian distribution and thus there should exist clustering of data points. FAISS applies vector quantization by partitioning the vector space into clusters and then refining the quantization within clusters. Search first looks for cluster candidates with coarse quantization and then further looks into each cluster with finer quantization.\\nScaNN (Scalable Nearest Neighbors): The main innovation in ScaNN is anisotropic vector quantization. It quantizes a data point $x_i$ to $\\\\tilde{x}_i$ such that the inner product $\\\\langle q, x_i \\\\rangle$ is as similar to the original distance of $\\\\angle q, \\\\tilde{x}_i$ as possible, instead of picking the closet quantization centroid points.\\n\\n\\nFig. 9. Comparison of MIPS algorithms, measured in recall@10. (Image source: Google Blog, 2020)\\nCheck more MIPS algorithms and performance comparison in ann-benchmarks.com.\\nComponent Three: Tool Use#\\nTool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities.')]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.invoke(\"agent memory\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 컴포넌트\n",
"\n",
"웹검색 또는 벡터스토어 검색으로 라우터하는 프롬프트를 작성합니다.\n",
"\n",
"``` \n",
"당신은 사용자 질문을 벡터 스토어 또는 웹 검색으로 라우팅하는 전문가입니다.\n",
"\n",
"벡터스토어에는 에이전트, 프롬프트 엔지니어링 및 적대적 공격과 관련된 문서가 포함되어 있습니다. \n",
"\n",
"이러한 주제에 대한 질문은 벡터스토어를 사용하세요. 그 외의 모든 질문, 특히 최신 이슈에 대해서는 웹 검색을 사용하세요.\n",
"\n",
"질문에 따라 single key, datasource 즉 'websearch' 또는 'vectorstore'가 포함된 JSON을 반환합니다.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"### Router\n",
"import json\n",
"from langchain_core.messages import HumanMessage, SystemMessage\n",
"\n",
"# Prompt \n",
"router_instructions = \"\"\"You are an expert at routing a user question to a vectorstore or web search.\n",
"\n",
"The vectorstore contains documents related to agents, prompt engineering, and adversarial attacks.\n",
" \n",
"Use the vectorstore for questions on these topics. For all else, and especially for current events, use web-search.\n",
"\n",
"Return JSON with single key, datasource, that is 'websearch' or 'vectorstore' depending on the question.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'datasource': 'vectorstore'}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Test router\n",
"question = HumanMessage(content=\"What are the types of agent memory?\")\n",
"test_vector_store = llm_json_mode.invoke([SystemMessage(router_instructions), question])\n",
"json.loads(test_vector_store.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"평가자를 위한 프롬프트를 작성합니다.\n",
"\n",
"**문서 평가를 위한 프롬프트**\n",
"\n",
"```\n",
"평가자는 검색된 문서와 사용자 질문의 관련성을 평가합니다.\n",
"\n",
"문서에 질문과 관련된 키워드나 의미론적 의미가 포함되어 있으면 관련성이 있는 것으로 평가합니다.\n",
"```\n",
"\n",
"\n",
"**최종 평가자 프롬프트**\n",
"\n",
"```\n",
"검색된 문서는 다음과 같습니다: \\n\\n {document} \\n\\n 다음은 사용자 질문입니다: \\n\\n {question}. \n",
"\n",
"이렇게 하면 문서에 질문과 관련된 정보가 최소한 일부라도 포함되어 있는지 여부를 신중하고 객관적으로 평가합니다.\n",
"\n",
"문서에 질문과 관련된 정보가 적어도 일부 포함되어 있는지 여부를 나타내는 'yes' 또는 'no' 결과인 single key, binary_score가 포함된 JSON을 반환합니다.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"### Retrieval Grader \n",
"\n",
"# Doc grader instructions \n",
"doc_grader_instructions = \"\"\"You are a grader assessing relevance of a retrieved document to a user question.\n",
"\n",
"If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant.\"\"\"\n",
"\n",
"# Grader prompt\n",
"doc_grader_prompt = \"\"\"Here is the retrieved document: \\n\\n {document} \\n\\n Here is the user question: \\n\\n {question}. \n",
"\n",
"This carefully and objectively assess whether the document contains at least some information that is relevant to the question.\n",
"\n",
"Return JSON with single key, binary_score, that is 'yes' or 'no' score to indicate whether the document contains at least some information that is relevant to the question.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'binary_score': 'yes'}"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Test\n",
"question = \"What is Chain of thought prompting?\" # 생각의 연결 고리란 무엇인가요?\n",
"\n",
"docs = retriever.invoke(question) # 벡터스토어에서 관련 문서 검색\n",
"doc_txt = docs[1].page_content\n",
"doc_grader_prompt_formatted = doc_grader_prompt.format(document=doc_txt, question=question)\n",
"\n",
"result = llm_json_mode.invoke([\n",
" SystemMessage(content=doc_grader_instructions), \n",
" HumanMessage(content=doc_grader_prompt_formatted)\n",
"])\n",
"json.loads(result.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Chain of Thought (CoT) prompting is a technique used in natural language processing to generate human-like responses by iteratively asking questions and refining the search space through external search queries, such as Wikipedia APIs. CoT prompting involves decomposing problems into multiple thought steps, generating multiple thoughts per step, and evaluating each state using a classifier or majority vote. The goal is to find an optimal instruction that leads to the desired output, which can be achieved by optimizing prompt parameters directly on the embedding space via gradient descent or searching over a pool of model-generated instruction candidates.\n"
]
}
],
"source": [
"### Generate\n",
"\n",
"# Prompt\n",
"rag_prompt = \"\"\"You are an assistant for question-answering tasks. \n",
"\n",
"Here is the context to use to answer the question:\n",
"\n",
"{context} \n",
"\n",
"Think carefully about the above context. \n",
"\n",
"Now, review the user question:\n",
"\n",
"{question}\n",
"\n",
"Provide an answer to this questions using only the above context. \n",
"\n",
"Use three sentences maximum and keep the answer concise.\n",
"\n",
"Answer:\"\"\"\n",
"\n",
"# Post-processing\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"# Test\n",
"docs = retriever.invoke(question)\n",
"docs_txt = format_docs(docs)\n",
"rag_prompt_formatted = rag_prompt.format(context=docs_txt, question=question)\n",
"generation = llm.invoke([HumanMessage(content=rag_prompt_formatted)])\n",
"print(generation.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hallucination Grader "
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'binary_score': 'yes',\n",
" 'explanation': 'The student answer provides a clear and accurate description of Chain of Thought (CoT) prompting, its components, and its goals. It also mentions various techniques used in CoT prompting, such as external search queries, prompt tuning, and automatic prompt engineering. The answer demonstrates an understanding of the concept and its applications in natural language processing.'}"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### Hallucination Grader \n",
"\n",
"# Hallucination grader instructions \n",
"hallucination_grader_instructions = \"\"\"\n",
"\n",
"You are a teacher grading a quiz. \n",
"\n",
"You will be given FACTS and a STUDENT ANSWER. \n",
"\n",
"Here is the grade criteria to follow:\n",
"\n",
"(1) Ensure the STUDENT ANSWER is grounded in the FACTS. \n",
"\n",
"(2) Ensure the STUDENT ANSWER does not contain \"hallucinated\" information outside the scope of the FACTS.\n",
"\n",
"Score:\n",
"\n",
"A score of yes means that the student's answer meets all of the criteria. This is the highest (best) score. \n",
"\n",
"A score of no means that the student's answer does not meet all of the criteria. This is the lowest possible score you can give.\n",
"\n",
"Explain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct. \n",
"\n",
"Avoid simply stating the correct answer at the outset.\"\"\"\n",
"\n",
"# Grader prompt\n",
"hallucination_grader_prompt = \"\"\"FACTS: \\n\\n {documents} \\n\\n STUDENT ANSWER: {generation}. \n",
"\n",
"Return JSON with two two keys, binary_score is 'yes' or 'no' score to indicate whether the STUDENT ANSWER is grounded in the FACTS. And a key, explanation, that contains an explanation of the score.\"\"\"\n",
"\n",
"# Test using documents and generation from above \n",
"hallucination_grader_prompt_formatted = hallucination_grader_prompt.format(documents=docs_txt, generation=generation.content)\n",
"result = llm_json_mode.invoke([SystemMessage(content=hallucination_grader_instructions)] + [HumanMessage(content=hallucination_grader_prompt_formatted)])\n",
"json.loads(result.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Answer Grader "
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'binary_score': 'yes',\n",
" 'explanation': \"The student's answer helps to answer the question by providing specific details about the vision models released as part of Llama 3.2. The answer mentions two vision models (Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct) and their availability on Azure AI Model Catalog via managed compute. Additionally, the student provides context about Meta's first foray into multimodal AI and compares these models to other visual reasoning models like Claude 3 Haiku and GPT-4o mini. This extra information is not explicitly asked for in the question, but it demonstrates a thorough understanding of the topic. The answer also correctly states that these models replace the older text-only Llama 3.1 models, which meets all the criteria specified in the question.\"}"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### Answer Grader \n",
"\n",
"# Answer grader instructions \n",
"answer_grader_instructions = \"\"\"You are a teacher grading a quiz. \n",
"\n",
"You will be given a QUESTION and a STUDENT ANSWER. \n",
"\n",
"Here is the grade criteria to follow:\n",
"\n",
"(1) The STUDENT ANSWER helps to answer the QUESTION\n",
"\n",
"Score:\n",
"\n",
"A score of yes means that the student's answer meets all of the criteria. This is the highest (best) score. \n",
"\n",
"The student can receive a score of yes if the answer contains extra information that is not explicitly asked for in the question.\n",
"\n",
"A score of no means that the student's answer does not meet all of the criteria. This is the lowest possible score you can give.\n",
"\n",
"Explain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct. \n",
"\n",
"Avoid simply stating the correct answer at the outset.\"\"\"\n",
"\n",
"# Grader prompt\n",
"answer_grader_prompt = \"\"\"QUESTION: \\n\\n {question} \\n\\n STUDENT ANSWER: {generation}. \n",
"\n",
"Return JSON with two two keys, binary_score is 'yes' or 'no' score to indicate whether the STUDENT ANSWER meets the criteria. And a key, explanation, that contains an explanation of the score.\"\"\"\n",
"\n",
"# Test \n",
"question = \"What are the vision models released today as part of Llama 3.2?\"\n",
"answer = \"The Llama 3.2 models released today include two vision models: Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct, which are available on Azure AI Model Catalog via managed compute. These models are part of Meta's first foray into multimodal AI and rival closed models like Anthropic's Claude 3 Haiku and OpenAI's GPT-4o mini in visual reasoning. They replace the older text-only Llama 3.1 models.\"\n",
"\n",
"# Test using question and generation from above \n",
"answer_grader_prompt_formatted = answer_grader_prompt.format(question=question, generation=answer)\n",
"result = llm_json_mode.invoke([SystemMessage(content=answer_grader_instructions)] + [HumanMessage(content=answer_grader_prompt_formatted)])\n",
"json.loads(result.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Graph\n",
"\n",
"We build the above workflow as a graph using [LangGraph](https://langchain-ai.github.io/langgraph/).\n",
"\n",
"### Graph state\n",
"\n",
"The graph `state` schema contains keys that we want to:\n",
"\n",
"- Pass to each node in our graph\n",
"- Optionally, modify in each node of our graph\n",
"\n",
"See conceptual docs [here](https://langchain-ai.github.io/langgraph/concepts/low_level/#state)."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"import operator\n",
"from typing_extensions import TypedDict\n",
"from typing import List, Annotated\n",
"\n",
"class GraphState(TypedDict):\n",
" \"\"\"\n",
" Graph state is a dictionary that contains information we want to propagate to, and modify in, each graph node.\n",
" \"\"\"\n",
" question : str # User question\n",
" generation : str # LLM generation\n",
" web_search : str # Binary decision to run web search\n",
" max_retries : int # Max number of retries for answer generation \n",
" answers : int # Number of answers generated\n",
" loop_step: Annotated[int, operator.add] \n",
" documents : List[str] # List of retrieved documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each node in our graph is simply a function that:\n",
"\n",
"(1) Take `state` as an input\n",
"\n",
"(2) Modifies `state`\n",
"\n",
"(3) Write the modified `state` to the state schema (dict)\n",
"\n",
"See conceptual docs [here](https://langchain-ai.github.io/langgraph/concepts/low_level/#nodes).\n",
"\n",
"Each edge routes between nodes in the graph.\n",
"\n",
"See conceptual docs [here](https://langchain-ai.github.io/langgraph/concepts/low_level/#edges)."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
"from langgraph.graph import END\n",
"\n",
"### Nodes\n",
"def retrieve(state):\n",
" \"\"\"\n",
" Retrieve documents from vectorstore\n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" state (dict): New key added to state, documents, that contains retrieved documents\n",
" \"\"\"\n",
" print(\"---RETRIEVE---\")\n",
" question = state[\"question\"]\n",
"\n",
" # Write retrieved documents to documents key in state\n",
" documents = retriever.invoke(question)\n",
" return {\"documents\": documents}\n",
"\n",
"def generate(state):\n",
" \"\"\"\n",
" Generate answer using RAG on retrieved documents\n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" state (dict): New key added to state, generation, that contains LLM generation\n",
" \"\"\"\n",
" print(\"---GENERATE---\")\n",
" question = state[\"question\"]\n",
" documents = state[\"documents\"]\n",
" loop_step = state.get(\"loop_step\", 0)\n",
" \n",
" # RAG generation\n",
" docs_txt = format_docs(documents)\n",
" rag_prompt_formatted = rag_prompt.format(context=docs_txt, question=question)\n",
" generation = llm.invoke([HumanMessage(content=rag_prompt_formatted)])\n",
" return {\"generation\": generation, \"loop_step\": loop_step+1}\n",
"\n",
"def grade_documents(state):\n",
" \"\"\"\n",
" Determines whether the retrieved documents are relevant to the question\n",
" If any document is not relevant, we will set a flag to run web search\n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" state (dict): Filtered out irrelevant documents and updated web_search state\n",
" \"\"\"\n",
"\n",
" print(\"---CHECK DOCUMENT RELEVANCE TO QUESTION---\")\n",
" question = state[\"question\"]\n",
" documents = state[\"documents\"]\n",
" \n",
" # Score each doc\n",
" filtered_docs = []\n",
" web_search = \"No\" \n",
" for d in documents:\n",
" doc_grader_prompt_formatted = doc_grader_prompt.format(document=d.page_content, question=question)\n",
" result = llm_json_mode.invoke([SystemMessage(content=doc_grader_instructions)] + [HumanMessage(content=doc_grader_prompt_formatted)])\n",
" grade = json.loads(result.content)['binary_score']\n",
" # Document relevant\n",
" if grade.lower() == \"yes\":\n",
" print(\"---GRADE: DOCUMENT RELEVANT---\")\n",
" filtered_docs.append(d)\n",
" # Document not relevant\n",
" else:\n",
" print(\"---GRADE: DOCUMENT NOT RELEVANT---\")\n",
" # We do not include the document in filtered_docs\n",
" # We set a flag to indicate that we want to run web search\n",
" web_search = \"Yes\"\n",
" continue\n",
" return {\"documents\": filtered_docs, \"web_search\": web_search}\n",
" \n",
"def web_search(state):\n",
" \"\"\"\n",
" Web search based based on the question\n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" state (dict): Appended web results to documents\n",
" \"\"\"\n",
"\n",
" print(\"---WEB SEARCH---\")\n",
" question = state[\"question\"]\n",
" documents = state.get(\"documents\", [])\n",
"\n",
" # Web search\n",
" docs = web_search_tool.invoke({\"query\": question})\n",
" web_results = \"\\n\".join([d[\"content\"] for d in docs])\n",
" web_results = Document(page_content=web_results)\n",
" documents.append(web_results)\n",
" return {\"documents\": documents}\n",
"\n",
"### Edges\n",
"\n",
"def route_question(state):\n",
" \"\"\"\n",
" Route question to web search or RAG \n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" str: Next node to call\n",
" \"\"\"\n",
"\n",
" print(\"---ROUTE QUESTION---\")\n",
" route_question = llm_json_mode.invoke([SystemMessage(content=router_instructions)] + [HumanMessage(content=state[\"question\"])])\n",
" source = json.loads(route_question.content)['datasource']\n",
" if source == 'websearch':\n",
" print(\"---ROUTE QUESTION TO WEB SEARCH---\")\n",
" return \"websearch\"\n",
" elif source == 'vectorstore':\n",
" print(\"---ROUTE QUESTION TO RAG---\")\n",
" return \"vectorstore\"\n",
"\n",
"def decide_to_generate(state):\n",
" \"\"\"\n",
" Determines whether to generate an answer, or add web search\n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" str: Binary decision for next node to call\n",
" \"\"\"\n",
"\n",
" print(\"---ASSESS GRADED DOCUMENTS---\")\n",
" question = state[\"question\"]\n",
" web_search = state[\"web_search\"]\n",
" filtered_documents = state[\"documents\"]\n",
"\n",
" if web_search == \"Yes\":\n",
" # All documents have been filtered check_relevance\n",
" # We will re-generate a new query\n",
" print(\"---DECISION: NOT ALL DOCUMENTS ARE RELEVANT TO QUESTION, INCLUDE WEB SEARCH---\")\n",
" return \"websearch\"\n",
" else:\n",
" # We have relevant documents, so generate answer\n",
" print(\"---DECISION: GENERATE---\")\n",
" return \"generate\"\n",
"\n",
"def grade_generation_v_documents_and_question(state):\n",
" \"\"\"\n",
" Determines whether the generation is grounded in the document and answers question\n",
"\n",
" Args:\n",
" state (dict): The current graph state\n",
"\n",
" Returns:\n",
" str: Decision for next node to call\n",
" \"\"\"\n",
"\n",
" print(\"---CHECK HALLUCINATIONS---\")\n",
" question = state[\"question\"]\n",
" documents = state[\"documents\"]\n",
" generation = state[\"generation\"]\n",
" max_retries = state.get(\"max_retries\", 3) # Default to 3 if not provided\n",
"\n",
" hallucination_grader_prompt_formatted = hallucination_grader_prompt.format(documents=format_docs(documents), generation=generation.content)\n",
" result = llm_json_mode.invoke([SystemMessage(content=hallucination_grader_instructions)] + [HumanMessage(content=hallucination_grader_prompt_formatted)])\n",
" grade = json.loads(result.content)['binary_score']\n",
"\n",
" # Check hallucination\n",
" if grade == \"yes\":\n",
" print(\"---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---\")\n",
" # Check question-answering\n",
" print(\"---GRADE GENERATION vs QUESTION---\")\n",
" # Test using question and generation from above \n",
" answer_grader_prompt_formatted = answer_grader_prompt.format(question=question, generation=generation.content)\n",
" result = llm_json_mode.invoke([SystemMessage(content=answer_grader_instructions)] + [HumanMessage(content=answer_grader_prompt_formatted)])\n",
" grade = json.loads(result.content)['binary_score']\n",
" if grade == \"yes\":\n",
" print(\"---DECISION: GENERATION ADDRESSES QUESTION---\")\n",
" return \"useful\"\n",
" elif state[\"loop_step\"] <= max_retries:\n",
" print(\"---DECISION: GENERATION DOES NOT ADDRESS QUESTION---\")\n",
" return \"not useful\"\n",
" else:\n",
" print(\"---DECISION: MAX RETRIES REACHED---\")\n",
" return \"max retries\" \n",
" elif state[\"loop_step\"] <= max_retries:\n",
" print(\"---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---\")\n",
" return \"not supported\"\n",
" else:\n",
" print(\"---DECISION: MAX RETRIES REACHED---\")\n",
" return \"max retries\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" ## Control Flow"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"image/jpeg": "",
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langgraph.graph import StateGraph\n",
"from IPython.display import Image, display\n",
"\n",
"workflow = StateGraph(GraphState)\n",
"\n",
"# Define the nodes\n",
"workflow.add_node(\"websearch\", web_search) # web search\n",
"workflow.add_node(\"retrieve\", retrieve) # retrieve\n",
"workflow.add_node(\"grade_documents\", grade_documents) # grade documents\n",
"workflow.add_node(\"generate\", generate) # generate\n",
"\n",
"# Build graph\n",
"workflow.set_conditional_entry_point(\n",
" route_question,\n",
" {\n",
" \"websearch\": \"websearch\",\n",
" \"vectorstore\": \"retrieve\",\n",
" },\n",
")\n",
"workflow.add_edge(\"websearch\", \"generate\")\n",
"workflow.add_edge(\"retrieve\", \"grade_documents\")\n",
"workflow.add_conditional_edges(\n",
" \"grade_documents\",\n",
" decide_to_generate,\n",
" {\n",
" \"websearch\": \"websearch\",\n",
" \"generate\": \"generate\",\n",
" },\n",
")\n",
"workflow.add_conditional_edges(\n",
" \"generate\",\n",
" grade_generation_v_documents_and_question,\n",
" {\n",
" \"not supported\": \"generate\",\n",
" \"useful\": END,\n",
" \"not useful\": \"websearch\",\n",
" \"max retries\": END,\n",
" },\n",
")\n",
"\n",
"# Compile\n",
"graph = workflow.compile()\n",
"display(Image(graph.get_graph().draw_mermaid_png()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"---ROUTE QUESTION---\n",
"---ROUTE QUESTION TO RAG---\n",
"{'question': 'What are the types of agent memory?', 'max_retries': 3, 'loop_step': 0}\n",
"---RETRIEVE---\n",
"{'question': 'What are the types of agent memory?', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={'id': '7222226d-772f-4696-b694-25fed7f3df27', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content=\"LLM Powered Autonomous Agents | Lil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPosts\\n\\n\\n\\n\\nArchive\\n\\n\\n\\n\\nSearch\\n\\n\\n\\n\\nTags\\n\\n\\n\\n\\nFAQ\\n\\n\\n\\n\\nemojisearch.app\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n LLM Powered Autonomous Agents\\n \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\\n\\n\\n \\n\\n\\nTable of Contents\\n\\n\\n\\nAgent System Overview\\n\\nComponent One: Planning\\n\\nTask Decomposition\\n\\nSelf-Reflection\\n\\n\\nComponent Two: Memory\\n\\nTypes of Memory\\n\\nMaximum Inner Product Search (MIPS)\\n\\n\\nComponent Three: Tool Use\\n\\nCase Studies\\n\\nScientific Discovery Agent\\n\\nGenerative Agents Simulation\\n\\nProof-of-Concept Examples\\n\\n\\nChallenges\\n\\nCitation\\n\\nReferences\\n\\n\\n\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview#\\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\\n\\nPlanning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\"), Document(metadata={'id': '080b35eb-5a3c-40a0-bc5f-a444086474f5', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:\\n\\nExplicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semantic memory (facts and concepts).\\nImplicit / procedural memory: This type of memory is unconscious and involves skills and routines that are performed automatically, like riding a bike or typing on a keyboard.\\n\\n\\n\\n\\nFig. 8. Categorization of human memory.\\nWe can roughly consider the following mappings:\\n\\nSensory memory as learning embedding representations for raw inputs, including text, image or other modalities;\\nShort-term memory as in-context learning. It is short and finite, as it is restricted by the finite context window length of Transformer.\\nLong-term memory as the external vector store that the agent can attend to at query time, accessible via fast retrieval.\\n\\nMaximum Inner Product Search (MIPS)#\\nThe external memory can alleviate the restriction of finite attention span. A standard practice is to save the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS). To optimize the retrieval speed, the common choice is the approximate nearest neighbors (ANN)\\u200b algorithm to return approximately top k nearest neighbors to trade off a little accuracy lost for a huge speedup.\\nA couple common choices of ANN algorithms for fast MIPS:\\n\\nLSH (Locality-Sensitive Hashing): It introduces a hashing function such that similar input items are mapped to the same buckets with high probability, where the number of buckets is much smaller than the number of inputs.\\nANNOY (Approximate Nearest Neighbors Oh Yeah): The core data structure are random projection trees, a set of binary trees where each non-leaf node represents a hyperplane splitting the input space into half and each leaf stores one data point. Trees are built independently and at random, so to some extent, it mimics a hashing function. ANNOY search happens in all the trees to iteratively search through the half that is closest to the query and then aggregates the results. The idea is quite related to KD tree but a lot more scalable.\\nHNSW (Hierarchical Navigable Small World): It is inspired by the idea of small world networks where most nodes can be reached by any other nodes within a small number of steps; e.g. “six degrees of separation” feature of social networks. HNSW builds hierarchical layers of these small-world graphs, where the bottom layers contain the actual data points. The layers in the middle create shortcuts to speed up search. When performing a search, HNSW starts from a random node in the top layer and navigates towards the target. When it can’t get any closer, it moves down to the next layer, until it reaches the bottom layer. Each move in the upper layers can potentially cover a large distance in the data space, and each move in the lower layers refines the search quality.\\nFAISS (Facebook AI Similarity Search): It operates on the assumption that in high dimensional space, distances between nodes follow a Gaussian distribution and thus there should exist clustering of data points. FAISS applies vector quantization by partitioning the vector space into clusters and then refining the quantization within clusters. Search first looks for cluster candidates with coarse quantization and then further looks into each cluster with finer quantization.\\nScaNN (Scalable Nearest Neighbors): The main innovation in ScaNN is anisotropic vector quantization. It quantizes a data point $x_i$ to $\\\\tilde{x}_i$ such that the inner product $\\\\langle q, x_i \\\\rangle$ is as similar to the original distance of $\\\\angle q, \\\\tilde{x}_i$ as possible, instead of picking the closet quantization centroid points.\\n\\n\\nFig. 9. Comparison of MIPS algorithms, measured in recall@10. (Image source: Google Blog, 2020)\\nCheck more MIPS algorithms and performance comparison in ann-benchmarks.com.\\nComponent Three: Tool Use#\\nTool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities.'), Document(metadata={'id': 'fa10df4b-5401-4ec0-a1c4-eb57454eeee0', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)\\n\\nPrompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.\\n\\n\\nPlanning & Reacting: translate the reflections and the environment information into actions\\n\\nPlanning is essentially in order to optimize believability at the moment vs in time.\\nPrompt template: {Intro of an agent X}. Here is X\\'s plan today in broad strokes: 1)\\nRelationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.\\nEnvironment information is present in a tree structure.\\n\\n\\n\\n\\nFig. 13. The generative agent architecture. (Image source: Park et al. 2023)\\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\\nProof-of-Concept Examples#\\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\\nYou are {{ai-name}}, {{user-provided AI bot description}}.\\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\\n\\nGOALS:\\n\\n1. {{user-provided goal 1}}\\n2. {{user-provided goal 2}}\\n3. ...\\n4. ...\\n5. ...\\n\\nConstraints:\\n1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.\\n2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.\\n3. No user assistance\\n4. Exclusively use the commands listed in double quotes e.g. \"command name\"\\n5. Use subprocesses for commands that will not terminate within a few minutes'), Document(metadata={'id': '58a746d5-0cd7-4b13-8b6e-278534c50163', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Planning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n\\n\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\\nSelf-Reflection#\\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\\nThought: ...\\nAction: ...\\nObservation: ...\\n... (Repeated many times)')]}\n",
"---CHECK DOCUMENT RELEVANCE TO QUESTION---\n",
"---GRADE: DOCUMENT RELEVANT---\n",
"---GRADE: DOCUMENT NOT RELEVANT---\n",
"---GRADE: DOCUMENT RELEVANT---\n",
"---GRADE: DOCUMENT RELEVANT---\n",
"---ASSESS GRADED DOCUMENTS---\n",
"---DECISION: NOT ALL DOCUMENTS ARE RELEVANT TO QUESTION, INCLUDE WEB SEARCH---\n",
"{'question': 'What are the types of agent memory?', 'web_search': 'Yes', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={'id': '7222226d-772f-4696-b694-25fed7f3df27', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content=\"LLM Powered Autonomous Agents | Lil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPosts\\n\\n\\n\\n\\nArchive\\n\\n\\n\\n\\nSearch\\n\\n\\n\\n\\nTags\\n\\n\\n\\n\\nFAQ\\n\\n\\n\\n\\nemojisearch.app\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n LLM Powered Autonomous Agents\\n \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\\n\\n\\n \\n\\n\\nTable of Contents\\n\\n\\n\\nAgent System Overview\\n\\nComponent One: Planning\\n\\nTask Decomposition\\n\\nSelf-Reflection\\n\\n\\nComponent Two: Memory\\n\\nTypes of Memory\\n\\nMaximum Inner Product Search (MIPS)\\n\\n\\nComponent Three: Tool Use\\n\\nCase Studies\\n\\nScientific Discovery Agent\\n\\nGenerative Agents Simulation\\n\\nProof-of-Concept Examples\\n\\n\\nChallenges\\n\\nCitation\\n\\nReferences\\n\\n\\n\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview#\\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\\n\\nPlanning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\"), Document(metadata={'id': 'fa10df4b-5401-4ec0-a1c4-eb57454eeee0', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)\\n\\nPrompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.\\n\\n\\nPlanning & Reacting: translate the reflections and the environment information into actions\\n\\nPlanning is essentially in order to optimize believability at the moment vs in time.\\nPrompt template: {Intro of an agent X}. Here is X\\'s plan today in broad strokes: 1)\\nRelationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.\\nEnvironment information is present in a tree structure.\\n\\n\\n\\n\\nFig. 13. The generative agent architecture. (Image source: Park et al. 2023)\\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\\nProof-of-Concept Examples#\\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\\nYou are {{ai-name}}, {{user-provided AI bot description}}.\\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\\n\\nGOALS:\\n\\n1. {{user-provided goal 1}}\\n2. {{user-provided goal 2}}\\n3. ...\\n4. ...\\n5. ...\\n\\nConstraints:\\n1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.\\n2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.\\n3. No user assistance\\n4. Exclusively use the commands listed in double quotes e.g. \"command name\"\\n5. Use subprocesses for commands that will not terminate within a few minutes'), Document(metadata={'id': '58a746d5-0cd7-4b13-8b6e-278534c50163', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Planning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n\\n\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\\nSelf-Reflection#\\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\\nThought: ...\\nAction: ...\\nObservation: ...\\n... (Repeated many times)')]}\n",
"---WEB SEARCH---\n",
"{'question': 'What are the types of agent memory?', 'web_search': 'Yes', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={'id': '7222226d-772f-4696-b694-25fed7f3df27', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content=\"LLM Powered Autonomous Agents | Lil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPosts\\n\\n\\n\\n\\nArchive\\n\\n\\n\\n\\nSearch\\n\\n\\n\\n\\nTags\\n\\n\\n\\n\\nFAQ\\n\\n\\n\\n\\nemojisearch.app\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n LLM Powered Autonomous Agents\\n \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\\n\\n\\n \\n\\n\\nTable of Contents\\n\\n\\n\\nAgent System Overview\\n\\nComponent One: Planning\\n\\nTask Decomposition\\n\\nSelf-Reflection\\n\\n\\nComponent Two: Memory\\n\\nTypes of Memory\\n\\nMaximum Inner Product Search (MIPS)\\n\\n\\nComponent Three: Tool Use\\n\\nCase Studies\\n\\nScientific Discovery Agent\\n\\nGenerative Agents Simulation\\n\\nProof-of-Concept Examples\\n\\n\\nChallenges\\n\\nCitation\\n\\nReferences\\n\\n\\n\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview#\\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\\n\\nPlanning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\"), Document(metadata={'id': 'fa10df4b-5401-4ec0-a1c4-eb57454eeee0', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)\\n\\nPrompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.\\n\\n\\nPlanning & Reacting: translate the reflections and the environment information into actions\\n\\nPlanning is essentially in order to optimize believability at the moment vs in time.\\nPrompt template: {Intro of an agent X}. Here is X\\'s plan today in broad strokes: 1)\\nRelationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.\\nEnvironment information is present in a tree structure.\\n\\n\\n\\n\\nFig. 13. The generative agent architecture. (Image source: Park et al. 2023)\\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\\nProof-of-Concept Examples#\\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\\nYou are {{ai-name}}, {{user-provided AI bot description}}.\\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\\n\\nGOALS:\\n\\n1. {{user-provided goal 1}}\\n2. {{user-provided goal 2}}\\n3. ...\\n4. ...\\n5. ...\\n\\nConstraints:\\n1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.\\n2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.\\n3. No user assistance\\n4. Exclusively use the commands listed in double quotes e.g. \"command name\"\\n5. Use subprocesses for commands that will not terminate within a few minutes'), Document(metadata={'id': '58a746d5-0cd7-4b13-8b6e-278534c50163', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Planning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n\\n\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\\nSelf-Reflection#\\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\\nThought: ...\\nAction: ...\\nObservation: ...\\n... (Repeated many times)'), Document(metadata={}, page_content='Sign up for Latest SuperAGI Updates\\n\"*\" indicates required fields\\nSuperAGI builds infrastructure components, tools, frameworks and models to enable opensource AGI\\ncommunity@superagi.com\\nFor Developers\\nDocs\\nGitHub\\nReleases\\nRoadmap\\nAPIs\\nCommunity\\nSupport Forum\\nMarketplace\\nSocial Mentions\\nReddit\\nCollectibles\\nResources\\nBlog\\nUse Cases\\nAGI Research Lab\\nTutorials\\nImportant Links\\nSuperAGI Cloud\\nApp Spotlight\\nSuperCoder\\nArchitecture\\n Check it out✨\\nFeatures\\nAction Console\\nResource Manager\\nTrajectory Fine-Tuning\\n\\u200c\\nMultiple Vector DBs\\nMulti-LLM Support\\nAgent Workflows\\nMarketplace\\nAgent Templates\\nDiscord\\nGitHub\\nTwitter\\nReddit\\nYoutube\\nTowards AGI (part 1): Agents with Memory\\nFebruary 6, 2024\\n7 mins read\\nAgents are an emerging class of artificial intelligence (AI) systems that use large language models (LLMs) to interact with the world. In a professional setup like this, the agent is responsible for extracting tasks from conversations and passing them to an employee and once the employee completes the task the agent will convey the output to the user. Then use the solutions of those basic tasks as in-context examples to solve the current task.\\nConclusion & Next Steps\\nIn this blog, we saw that design choices for Memory depend on the end use case. This analogy is better captured in the following table:\\nDeep dive into various types of Agent Memory\\nChoosing the right Memory design in Production\\nSince agents are powered by LLMs, they are inherently probabilistic.\\nA Survey on the Memory Mechanism of Large Language Model based Agents Zeyu Zhang1, Xiaohe Bo1, Chen Ma1, Rui Li1, Xu Chen1, Quanyu Dai2, Jieming Zhu2, Zhenhua Dong2, Ji-Rong Wen1 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Huawei Noah’s Ark Lab, China zeyuzhang@ruc.edu.cn, xu.chen@ruc.edu.cn Abstract Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Some previous works also apply LLM-based agents in the finance domain, whose memory can store financial knowledge [113], market information [154, 156], and successful experi-ences [157, 155]. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory.\\nThe key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective ...\\nThe memory module can help the agent to gather experiences, self-learn, and act in a more reasonable and effective manner. Short-term memory keeps and preserves relevant information in the symbolic forms, guaranteeing its accessibility in the decision process.\\nMoreover we highlight open problems, such as the separation of different types of memories and the management of memory over the agent\\'s lifetime. Lastly, we propose several topics for future research to address these challenges and further enhance the capabilities of LLM agents, including the use of metadata in procedural and semantic memory ...')]}\n",
"---GENERATE---\n",
"---CHECK HALLUCINATIONS---\n",
"---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---\n",
"---GRADE GENERATION vs QUESTION---\n",
"---DECISION: GENERATION ADDRESSES QUESTION---\n",
"{'question': 'What are the types of agent memory?', 'generation': AIMessage(content='There are two main types of agent memory mentioned in the context: short-term memory, which keeps and preserves relevant information in symbolic forms for accessibility in decision-making, and long-term memory, which is not explicitly defined but implied to be a more general form of memory that can store experiences, knowledge, and successful examples. Additionally, procedural memory is also hinted at as a potential type of agent memory. These types of memories are essential for agents to gather experiences, self-learn, and act in a more reasonable and effective manner.', additional_kwargs={}, response_metadata={'model': 'llama3.2:3b-instruct-fp16', 'created_at': '2024-09-29T03:38:16.742724Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 5998996834, 'load_duration': 30563000, 'prompt_eval_count': 1026, 'prompt_eval_duration': 1370970000, 'eval_count': 106, 'eval_duration': 4595573000}, id='run-2472e908-197f-4ff3-bfc8-4d495d6e5c4e-0', usage_metadata={'input_tokens': 1026, 'output_tokens': 106, 'total_tokens': 1132}), 'web_search': 'Yes', 'max_retries': 3, 'loop_step': 1, 'documents': [Document(metadata={'id': '7222226d-772f-4696-b694-25fed7f3df27', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content=\"LLM Powered Autonomous Agents | Lil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPosts\\n\\n\\n\\n\\nArchive\\n\\n\\n\\n\\nSearch\\n\\n\\n\\n\\nTags\\n\\n\\n\\n\\nFAQ\\n\\n\\n\\n\\nemojisearch.app\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n LLM Powered Autonomous Agents\\n \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\\n\\n\\n \\n\\n\\nTable of Contents\\n\\n\\n\\nAgent System Overview\\n\\nComponent One: Planning\\n\\nTask Decomposition\\n\\nSelf-Reflection\\n\\n\\nComponent Two: Memory\\n\\nTypes of Memory\\n\\nMaximum Inner Product Search (MIPS)\\n\\n\\nComponent Three: Tool Use\\n\\nCase Studies\\n\\nScientific Discovery Agent\\n\\nGenerative Agents Simulation\\n\\nProof-of-Concept Examples\\n\\n\\nChallenges\\n\\nCitation\\n\\nReferences\\n\\n\\n\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview#\\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\\n\\nPlanning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\"), Document(metadata={'id': 'fa10df4b-5401-4ec0-a1c4-eb57454eeee0', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)\\n\\nPrompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.\\n\\n\\nPlanning & Reacting: translate the reflections and the environment information into actions\\n\\nPlanning is essentially in order to optimize believability at the moment vs in time.\\nPrompt template: {Intro of an agent X}. Here is X\\'s plan today in broad strokes: 1)\\nRelationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.\\nEnvironment information is present in a tree structure.\\n\\n\\n\\n\\nFig. 13. The generative agent architecture. (Image source: Park et al. 2023)\\nThis fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).\\nProof-of-Concept Examples#\\nAutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.\\nHere is the system message used by AutoGPT, where {{...}} are user inputs:\\nYou are {{ai-name}}, {{user-provided AI bot description}}.\\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\\n\\nGOALS:\\n\\n1. {{user-provided goal 1}}\\n2. {{user-provided goal 2}}\\n3. ...\\n4. ...\\n5. ...\\n\\nConstraints:\\n1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.\\n2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.\\n3. No user assistance\\n4. Exclusively use the commands listed in double quotes e.g. \"command name\"\\n5. Use subprocesses for commands that will not terminate within a few minutes'), Document(metadata={'id': '58a746d5-0cd7-4b13-8b6e-278534c50163', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Planning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory\\n\\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n\\n\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\\nSelf-Reflection#\\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\\nThought: ...\\nAction: ...\\nObservation: ...\\n... (Repeated many times)'), Document(metadata={}, page_content='Sign up for Latest SuperAGI Updates\\n\"*\" indicates required fields\\nSuperAGI builds infrastructure components, tools, frameworks and models to enable opensource AGI\\ncommunity@superagi.com\\nFor Developers\\nDocs\\nGitHub\\nReleases\\nRoadmap\\nAPIs\\nCommunity\\nSupport Forum\\nMarketplace\\nSocial Mentions\\nReddit\\nCollectibles\\nResources\\nBlog\\nUse Cases\\nAGI Research Lab\\nTutorials\\nImportant Links\\nSuperAGI Cloud\\nApp Spotlight\\nSuperCoder\\nArchitecture\\n Check it out✨\\nFeatures\\nAction Console\\nResource Manager\\nTrajectory Fine-Tuning\\n\\u200c\\nMultiple Vector DBs\\nMulti-LLM Support\\nAgent Workflows\\nMarketplace\\nAgent Templates\\nDiscord\\nGitHub\\nTwitter\\nReddit\\nYoutube\\nTowards AGI (part 1): Agents with Memory\\nFebruary 6, 2024\\n7 mins read\\nAgents are an emerging class of artificial intelligence (AI) systems that use large language models (LLMs) to interact with the world. In a professional setup like this, the agent is responsible for extracting tasks from conversations and passing them to an employee and once the employee completes the task the agent will convey the output to the user. Then use the solutions of those basic tasks as in-context examples to solve the current task.\\nConclusion & Next Steps\\nIn this blog, we saw that design choices for Memory depend on the end use case. This analogy is better captured in the following table:\\nDeep dive into various types of Agent Memory\\nChoosing the right Memory design in Production\\nSince agents are powered by LLMs, they are inherently probabilistic.\\nA Survey on the Memory Mechanism of Large Language Model based Agents Zeyu Zhang1, Xiaohe Bo1, Chen Ma1, Rui Li1, Xu Chen1, Quanyu Dai2, Jieming Zhu2, Zhenhua Dong2, Ji-Rong Wen1 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Huawei Noah’s Ark Lab, China zeyuzhang@ruc.edu.cn, xu.chen@ruc.edu.cn Abstract Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Some previous works also apply LLM-based agents in the finance domain, whose memory can store financial knowledge [113], market information [154, 156], and successful experi-ences [157, 155]. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory.\\nThe key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective ...\\nThe memory module can help the agent to gather experiences, self-learn, and act in a more reasonable and effective manner. Short-term memory keeps and preserves relevant information in the symbolic forms, guaranteeing its accessibility in the decision process.\\nMoreover we highlight open problems, such as the separation of different types of memories and the management of memory over the agent\\'s lifetime. Lastly, we propose several topics for future research to address these challenges and further enhance the capabilities of LLM agents, including the use of metadata in procedural and semantic memory ...')]}\n"
]
}
],
"source": [
"inputs = {\"question\": \"What are the types of agent memory?\", \"max_retries\": 3}\n",
"for event in graph.stream(inputs, stream_mode=\"values\"):\n",
" print(event)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"---ROUTE QUESTION---\n",
"---ROUTE QUESTION TO WEB SEARCH---\n",
"{'question': 'What are the models released today for llama3.2?', 'max_retries': 3, 'loop_step': 0}\n",
"---WEB SEARCH---\n",
"{'question': 'What are the models released today for llama3.2?', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={}, page_content='Meta’s Llama 3.2 models now available on watsonx, including multimodal 11B and 90B models | IBM Meta Llama 3.2 models now available on watsonx, including multimodal 11B and 90B models IBM is announcing the availability of multiple Llama 3.2 models on watsonx.ai, IBM’s enterprise studio for AI developers, following the launch of the Llama 3.2 collection of pretrained and instruction tuned multilingual large language models (LLMs) at MetaConnect earlier today. Most notably, Llama 3.2 marks Meta’s first foray into multimodal AI: the release includes two models, in sizes of 11B and 90B, that can take images as input. The instruction-tuned Llama 3.2 90B Vision and 11B Vision models are immediately available in watsonx.ai through SaaS.\\nMeta Releases Llama 3.2—and Gives Its AI a Voice | WIRED Meta Releases Llama 3.2—and Gives Its AI a Voice Meta Releases Llama 3.2—and Gives Its AI a Voice Meta today also announced Llama 3.2, the first version of its free AI models to have visual abilities, broadening their usefulness and relevance for robotics, virtual reality, and so-called AI agents. Powering Meta AI’s new capabilities is an upgraded version of Llama, Meta’s premier large language model. “With Llama 3.1, Meta showed that open models could finally close the gap with their proprietary counterparts,” says Nathan Benaich, founder and general partner of Air Street Capital, and the author of an influential yearly report on AI.\\nLlama 3.2 Meta\\'s New generation Models Vertex AI | Google Cloud Blog Today, we’re announcing that Llama 3.2, Meta’s new generation of multimodal models, is available on Vertex AI Model Garden. All four Llama 3.2 models are ready for self-service deployment through Vertex AI Model Garden, starting today. \"Using Llama 3.1 on Google Cloud Vertex AI has made high-quality data generation easier and more efficient for Shopify. \"We\\'re thrilled to partner with Google Cloud, bringing the power of Vertex AI and Llama 3.1 to our BMC Helix platform,\" said Margaret Lee, GM and SVP of the Digital Service and Operations Management Business Unit at BMC. Operate within your enterprise guardrails: Deploy with confidence with not only support for Meta’s Llama Guard for the models, but also Google Cloud\\'s built-in security, privacy, and compliance measures.\\nMeta Releases Llama 3.2 Models with Vision Capability For the First Time | Beebom Home > News > Meta Releases Llama 3.2 Models with Vision Capability For the First Time Meta Releases Llama 3.2 Models with Vision Capability For the First Time You can start using Llama 3.2 11B and 90B vision models through the Meta AI chatbot on the web, WhatsApp, Facebook, Instagram, and Messenger. Llama 3.2 Models Give Vision to Meta AI These new Llama 3.2 11B and 90B vision models will be available through the Meta AI chatbot on the web, WhatsApp, Instagram, Facebook, and Messenger. Meta Releases Llama 3.2 Models with Vision Capability For the First Time\\nMeta’s new Llama 3.2 models available on Azure AI Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog In collaboration with Meta, Microsoft is excited to announce that Meta’s new Llama 3.2 models are now available on the Azure AI Model Catalog. Developers using Meta Llama 3 models can work seamlessly with tools in Azure AI Studio, such as Azure AI Content Safety, Azure AI Search, and prompt flow to enhance ethical and effective AI practices.')]}\n",
"---GENERATE---\n",
"---CHECK HALLUCINATIONS---\n",
"---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---\n",
"---GRADE GENERATION vs QUESTION---\n",
"---DECISION: GENERATION ADDRESSES QUESTION---\n",
"{'question': 'What are the models released today for llama3.2?', 'generation': AIMessage(content='The Llama 3.2 models released today include multimodal models in sizes of 11B and 90B, which can take images as input. These instruction-tuned models are available on watsonx.ai through SaaS, including the Llama 3.2 90B Vision and 11B Vision models. Additionally, these models are also available on Vertex AI Model Garden and Azure AI Model Catalog for self-service deployment.', additional_kwargs={}, response_metadata={'model': 'llama3.2:3b-instruct-fp16', 'created_at': '2024-09-29T03:38:35.771051Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 5167497834, 'load_duration': 26749084, 'prompt_eval_count': 955, 'prompt_eval_duration': 1294945000, 'eval_count': 90, 'eval_duration': 3842945000}, id='run-5fb2be9c-8050-4f08-8510-182abb9feb3f-0', usage_metadata={'input_tokens': 955, 'output_tokens': 90, 'total_tokens': 1045}), 'max_retries': 3, 'loop_step': 1, 'documents': [Document(metadata={}, page_content='Meta’s Llama 3.2 models now available on watsonx, including multimodal 11B and 90B models | IBM Meta Llama 3.2 models now available on watsonx, including multimodal 11B and 90B models IBM is announcing the availability of multiple Llama 3.2 models on watsonx.ai, IBM’s enterprise studio for AI developers, following the launch of the Llama 3.2 collection of pretrained and instruction tuned multilingual large language models (LLMs) at MetaConnect earlier today. Most notably, Llama 3.2 marks Meta’s first foray into multimodal AI: the release includes two models, in sizes of 11B and 90B, that can take images as input. The instruction-tuned Llama 3.2 90B Vision and 11B Vision models are immediately available in watsonx.ai through SaaS.\\nMeta Releases Llama 3.2—and Gives Its AI a Voice | WIRED Meta Releases Llama 3.2—and Gives Its AI a Voice Meta Releases Llama 3.2—and Gives Its AI a Voice Meta today also announced Llama 3.2, the first version of its free AI models to have visual abilities, broadening their usefulness and relevance for robotics, virtual reality, and so-called AI agents. Powering Meta AI’s new capabilities is an upgraded version of Llama, Meta’s premier large language model. “With Llama 3.1, Meta showed that open models could finally close the gap with their proprietary counterparts,” says Nathan Benaich, founder and general partner of Air Street Capital, and the author of an influential yearly report on AI.\\nLlama 3.2 Meta\\'s New generation Models Vertex AI | Google Cloud Blog Today, we’re announcing that Llama 3.2, Meta’s new generation of multimodal models, is available on Vertex AI Model Garden. All four Llama 3.2 models are ready for self-service deployment through Vertex AI Model Garden, starting today. \"Using Llama 3.1 on Google Cloud Vertex AI has made high-quality data generation easier and more efficient for Shopify. \"We\\'re thrilled to partner with Google Cloud, bringing the power of Vertex AI and Llama 3.1 to our BMC Helix platform,\" said Margaret Lee, GM and SVP of the Digital Service and Operations Management Business Unit at BMC. Operate within your enterprise guardrails: Deploy with confidence with not only support for Meta’s Llama Guard for the models, but also Google Cloud\\'s built-in security, privacy, and compliance measures.\\nMeta Releases Llama 3.2 Models with Vision Capability For the First Time | Beebom Home > News > Meta Releases Llama 3.2 Models with Vision Capability For the First Time Meta Releases Llama 3.2 Models with Vision Capability For the First Time You can start using Llama 3.2 11B and 90B vision models through the Meta AI chatbot on the web, WhatsApp, Facebook, Instagram, and Messenger. Llama 3.2 Models Give Vision to Meta AI These new Llama 3.2 11B and 90B vision models will be available through the Meta AI chatbot on the web, WhatsApp, Instagram, Facebook, and Messenger. Meta Releases Llama 3.2 Models with Vision Capability For the First Time\\nMeta’s new Llama 3.2 models available on Azure AI Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog In collaboration with Meta, Microsoft is excited to announce that Meta’s new Llama 3.2 models are now available on the Azure AI Model Catalog. Developers using Meta Llama 3 models can work seamlessly with tools in Azure AI Studio, such as Azure AI Content Safety, Azure AI Search, and prompt flow to enhance ethical and effective AI practices.')]}\n"
]
}
],
"source": [
"# Test on current events\n",
"inputs = {\"question\": \"What are the models released today for llama3.2?\", \"max_retries\": 3}\n",
"for event in graph.stream(inputs, stream_mode=\"values\"):\n",
" print(event)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment