Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save barakplasma/12696adbddb5a2de7c8ced4fd2aeb3dd to your computer and use it in GitHub Desktop.

Select an option

Save barakplasma/12696adbddb5a2de7c8ced4fd2aeb3dd to your computer and use it in GitHub Desktop.

Context Engineering: The Architecture of AI Agents

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." [cite_start]— Andrej Karpathy [cite: 79]

Introduction: Beyond Prompt Engineering

--{{0}}--

Welcome to the chapter on Context Engineering. As software engineers, we often start working with LLMs by tweaking prompts—changing a word here, adding "please" there. This is known as Prompt Engineering. However, for industrial-strength applications, this is not enough.

--{{1}}--

Think of the Large Language Model (LLM) as a CPU. It has processing power and logic, but it needs data to work on. The Context Window is the RAM.

Context Engineering is the process of programming this RAM. [cite_start]It is about the systematic orchestration of information to ensure the model has exactly what it needs to solve a task[cite: 7, 8].

--{{2}}--

If your RAG (Retrieval Augmented Generation) system is failing, it is rarely because the model is "dumb." It is likely a context retrieval issue. [cite_start]A good retrieval system can make a weak model perform well, but bad retrieval will fail even with state-of-the-art models[cite: 30, 31].


The Evolution of Context

Feature Prompt Engineering Context Engineering
Focus "Magic words" and phrasing Architecture of information flow
Scope Single interaction System-wide state, memory, and tools
Analogy Writing a better search query Managing the Heap and Stack in RAM
Goal Better text generation Reliable, structured task completion

[cite_start][cite: 50, 51, 75]

The 6 Types of Context

--{{0}}--

To build robust agents, we need to treat context as a multi-dimensional design layer. There are six distinct types of context that an agent requires to function effectively.

--{{1}}--

Let's visualize this architecture.

mindmap
  root((6 Types of Context))
    Instructions
      Role (Who)
      Objective (Why)
      Requirements (How - JSON/XML)
    Examples
      Behavior (Chain of Thought)
      Responses (Format)
    Knowledge
      External (Domain Specifics)
      Task (API Specs, Docs)
    Memory
      Short-term (Session history)
      Long-term (User prefs, Facts)
    Tools
      Description
      Parameters
      Results (Feedback Loop)
    Guardrails
      Inputs (Cleaning)
      Actions (Allowed list)
      Outputs (Schema validation)
Loading

@mermaid_eval

deep dive

  1. Instructions: The "System Prompt." Defines the Persona (e.g., Coding Agent), the Objective, and strict constraints (e.g., "Always output JSON").

  2. Examples: Few-shot prompting. Models learn patterns better than rules. Show them how to think and what a good response looks like.

  3. Knowledge: The RAG layer. Bridging the gap between the model's training data and your proprietary data (Docs, API specs).

  4. Memory:

  • Short-term: The chat history of the current session.

Long-term: Stored facts about the user or project that persist across sessions.

  1. Tools: The API layer. The agent needs to know what tools are available, how to call them (parameters), and crucially, it needs to see the Results of those calls.

  2. Guardrails: Safety and schema enforcement. Ensuring inputs are clean and outputs match the expected format before they reach the user or downstream systems.

The Context Lifecycle

--{{0}}--

Context Engineering is not static; it is a pipeline. We can break down the lifecycle of context into four stages: Write, Read, Compress, and Isolate.

--{{1}}--

1. Write Context Saving information outside the context window.

  • Long-term: Vector databases, Knowledge Graphs.
  • Short-term: Session arrays, fast-access caches.

State: Runtime objects tracking the current task progress.

--{{2}}-- 2. Read Context Pulling relevant info into the "RAM" (Context Window) at runtime.

  • Retrieved from Vector DBs.
  • Fetched via Tool execution (e.g., SQL query results).
  • Loaded from conversation history.

--{{3}}-- 3. Compress Context Optimization. The retrieved context might be verbose. We use summarization steps or smaller LLMs to distill the information into token-efficient formats before passing it to the main reasoning model.

--{{4}}-- 4. Isolate Context Security and Focus. Splitting context so specific agents only see what they need.

  • Sandboxing: Code execution environments.

Sub-agents: A "Writer" agent doesn't need the raw SQL logs that the "Analyst" agent used.

graph LR
    A[Raw Data] --> B(Write: Store in DB/State)
    B --> C{Trigger}
    C --> D(Read: Retrieve & Fetch)
    D --> E(Compress: Summarize)
    E --> F(Isolate: Route to Agent)
    F --> G[LLM Processing]

Loading

Architecture: Building a Research Agent

--{{0}}--

Let's implement a "Context Engineering Workflow" for a Research Assistant. Tech Stack:

  • Orchestration: CrewAI
  • Ingestion: Tensorlake (for parsing complex docs)
  • Memory: Zep (Knowledge Graph memory)
  • Search: Firecrawl (Semantic web scraping)
  • Vector DB: Milvus
--{{1}}--

The Workflow:

  1. Ingest: Parse docs into RAG-ready chunks.
  2. Retrieve: Hybrid search (Web + Docs + ArXiv).
  3. Filter: An evaluator agent removes irrelevant context.
  4. Synthesize: A final agent generates the answer.

Python Implementation

Below is a conceptual implementation of the Retrieval & Filter logic.

# Pseudo-code based on the architecture
from crewai import Agent, Task, Crew
from firecrawl import FirecrawlApp
from pymilvus import Collection

class ContextManager:
    def __init__(self):
        self.milvus = Collection("research_docs")
        self.firecrawl = FirecrawlApp()
        
    def gather_context(self, query):
        # 1. Internal Knowledge (Vector Search)
        # We retrieve top-k chunks similar to the query
        internal_results = self.milvus.search(
            data=[encode(query)], 
            anns_field="embedding", 
            param={"metric_type": "L2", "params": {"nprobe": 10}}, 
            limit=5
        )
        
        # 2. External Knowledge (Web Search)
        # Using Firecrawl for LLM-ready markdown
        web_results = self.firecrawl.search(query, params={"formats": ["markdown"]})
        
        # 3. Aggregate
        return {
            "internal": [res.text for res in internal_results],
            "external": web_results['data']
        }

# Agent Definitions
researcher = Agent(
    role='Context Evaluator',
    goal='Filter irrelevant information from gathered context',
    backstory='You are a strict editor. You only allow facts relevant to the query.',
    verbose=True
)

synthesizer = Agent(
    role='Research Writer',
    goal='Synthesize a final answer from filtered context',
    backstory='You write clear, citation-backed summaries.'
)

# Workflow Execution
def run_workflow(user_query):
    ctx_manager = ContextManager()
    raw_context = ctx_manager.gather_context(user_query)
    
    # Task 1: Filter Context (Compress/Isolate)
    task_filter = Task(
        description=f"Analyze this context: {raw_context}. Remove noise relative to query: {user_query}",
        agent=researcher
    )
    
    # Task 2: Generate Answer
    task_synthesize = Task(
        description="Write a final answer using the filtered context.",
        context=[task_filter], # Passing the output of task 1
        agent=synthesizer
    )
    
    crew = Crew(agents=[researcher, synthesizer], tasks=[task_filter, task_synthesize])
    return crew.kickoff()

Advanced Pattern: Progressive Context Loading

--{{0}}--

As applications grow, you cannot dump everything into the context window. Anthropic's "Claude Skills" pattern offers a solution for Progressive Loading.

--{{1}}--

It uses a 3-layer system to keep the "RAM" clean:

  1. Main Context: Always loaded (Project config).
  2. Skill Metadata: Lightweight YAML definitions (Name, Description). Loaded to help the model decide which tool to use.
  3. Active Skill Context: Loaded only when the skill is triggered. Contains the heavy instructions and examples.
sequenceDiagram
    participant User
    participant System
    participant LLM

    Note over System: Layer 1: Main Context Loaded
    User->>System: "Analyze the Q3 financial report"
    System->>LLM: Sends Query + Skill Metadata (Layer 2)
    LLM->>System: "I need the 'Financial_Analysis' skill"
    System->>System: Loads 'Financial_Analysis.md' (Layer 3)
    System->>LLM: Sends Query + Full Skill Instructions
    LLM->>User: Generates Report

Loading

Challenge: The "Chicago Office" Problem

--{{0}}--

Scenario: A user asks, "What’s blocking the Chicago office project, and when’s our next meeting about it?"

--{{1}}--

The Problem:

  • Blockers are in the issue tracker (e.g., Linear/Jira).
  • Meetings are in the Calendar.
  • Discussions are in Slack/Email.
  • Permissions: Does the user have access to all this? --{{2}}-- The Solution: Agentic Context Engineering A manual RAG pipeline fails here. You need an Ingestion Layer that handles auth and updates (syncing), and a Retrieval Layer that can expand the query into multiple sub-queries (one for Linear, one for Calendar).

Key Takeaway: Context retrieval for Agents is an infrastructure problem, not just an embedding problem. You must build for continuous sync and hybrid search.

Exercise: Design a Context Pipeline

Objective: Design the Context Engineering flow for a Code Review Bot.

Requirements:

  1. The bot must know the coding standards (Knowledge).
  2. It must access the specific PR diff (Input).
  3. It must remember previous comments on this PR (Short-term Memory).
  4. It must verify if the user is a senior dev before allowing a "merge" action (Guardrails).

Task: Using the editor below, sketch the JSON structure you would inject into the context window for the "Instructions" and "Guardrails" layers.

{
  "role": "Code Reviewer",
  "guardrails": {
    "allowed_actions": ["comment", "request_changes"],
    "restricted_actions": ["merge"],
    "validation": "Check if user_role == 'senior_dev' for merge"
  },
  "context_sources": [
    "docs/coding_standards.md",
    "api/github/pr_diff"
  ]
}
<script> @input </script>

Chapter Quiz

--{{0}}--

Test your understanding of Context Engineering.

  1. According to the "CPU vs RAM" analogy, what represents the RAM?
  • [( )] The Vector Database
  • [(X)] The Context Window
  • [( )] The Prompt
  1. Which stage of the Context Lifecycle involves splitting information so specific agents only see what they need?
  • [( )] Write Context
  • [( )] Compress Context
  • [(X)] Isolate Context
  1. Why is "Skill Metadata" loaded separately from "Active Skill Context" in the Progressive Loading pattern?
  • [( )] To make the model faster
  • [(X)] To save tokens and keep the context lightweight until needed
  • [( )] Because YAML is easier to read than Markdown
  1. In the "Chicago Office" problem, why does simple RAG fail?
  • [(X)] Data is spread across multiple dynamic sources (Calendar, Slack, Jira) with different permissions
  • [( )] The model cannot understand the word "Chicago"
  • [( )] Vector databases cannot store calendar events
  1. What are the 6 types of context?
  • [[X]] Instructions
  • [[X]] Examples
  • [[ ]] Emotions
  • [[X]] Knowledge
  • [[X]] Memory
  • [[X]] Tools
  • [[X]] Guardrails
  • [[ ]] Hardware Specs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment