Traditional RAG (Vending Machine)
─────────────────────────────────
Query → Retrieve → Generate → Answer
(fixed pipeline, no decisions)
Naive Agentic RAG (Junior Librarian)
────────────────────────────────────
Question → [DECISION: search?] → Query → Retrieve → [DECISION: use docs?] → Answer
(decision points introduced)
| Aspect | Traditional RAG | Naive Agent |
|---|---|---|
| Control flow | Fixed pipeline | Agent decides when/whether to search |
| Failure mode | Silent (bad docs → bad answer) | Same, but can be taught to notice |
| Observability | Query → Docs → Answer | Query → Tool Decision → Docs → Answer |
User Question
│
▼
┌─────────────────────────────────────────────────────┐
│ DECISION QUALITY │
│ • Did I call the tool only when needed? │
│ • Did I translate the question into a good query? │
└─────────────────────────────────────────────────────┘
│
▼ query
┌─────────────────────────────────────────────────────┐
│ SEARCH RELEVANCE │
│ • Did Solr return good docs? │
│ • Did the right pipeline get selected? │
└─────────────────────────────────────────────────────┘
│
▼ docs
┌─────────────────────────────────────────────────────┐
│ REASONING QUALITY │
│ • Did I synthesize correctly from docs? │
│ • Did I refuse when I should have? │
└─────────────────────────────────────────────────────┘
│
▼
Answer
Key insight: Failures can hide. A bad answer might be caused by any of the three surfaces, and they can mask each other.
┌──────────────────────────────────────────────────────────────┐
│ AGENT LAYER │
│ Decision: "Is this about our products?" → search / don't │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ SEARCH LAYER │
│ Query Classification → {error, cve, errata, default} │
│ Pipeline-specific: boosting, field selection, filters │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ CONTENT LAYER │
│ Technical docs, KCS articles, CVEs, Errata │
└──────────────────────────────────────────────────────────────┘
Grounding Rule: Agent must refuse or explicitly state when retrieved docs don't support an answer.
Operational Risk > User Frustration from "I don't know"
| Surface | Failure Type | Severity | Example |
|---|---|---|---|
| Decision Quality | False negative | High | Product question routed to "don't search" |
| False positive | Low | Non-product question triggers unnecessary search | |
| Search Relevance | Misclassification | High | CVE question routed to default pipeline |
| Ranking failure | Medium | Right pipeline, wrong docs in top_k | |
| Reasoning Quality | Grounding violation | Critical | Agent answers despite weak doc support |
| Over-refusal (False refusal) | Medium | Agent refuses when docs did support answer | |
| Correct refusal | N/A (signal) | Docs genuinely missing — corpus gap indicator |
High-severity failures (given your policy):
- Grounding violation (agent hallucinates)
- Misclassification (CVE routed wrong → wrong docs → plausible but dangerous answer)
| Input | Output |
|---|---|
| Query logs | Intent taxonomy: {error, cve, errata, default} |
| Clickthrough logs | Definition of "answerability" |
| Corpus structure | Policy: prefer refusal over hallucination |
| Phase | Key Activities | Failure if Neglected |
|---|---|---|
| Onboarding | Ingest new docs, assign metadata, validate structure | New product features undiscoverable |
| Maintenance | Update for product changes, deprecations | Agent cites outdated procedures |
| Retirement | Remove/archive unsupported product docs | Agent retrieves irrelevant content |
| Quality Audit | Check formatting, metadata completeness | Parsing failures, misclassification |
Your mitigation: Content from last 3 years only + freshness boosting
Signals to track:
- Staleness score (last-modified vs product-EOL)
- Metadata completeness
- Formatting violations
- Orphan docs
Traditional vs Agent-Aware Relevance:
| Concern | Human Consumer | Agent Consumer |
|---|---|---|
| Top result quality | Critical | Less critical (agent sees top_k) |
| Recall@K | Nice to have | Critical |
| Snippet/formatting | For readability | For parseability |
| Redundancy in top_k | Minor annoyance | Wastes context window |
| Missing docs | User might rephrase | Agent may confidently hallucinate |
Solr Readiness Checklist:
| Area | Concerns | Lifecycle Artifact |
|---|---|---|
| Schema | Right fields indexed? Copyfields? | Field inventory document |
| Analyzers | Stemming, synonyms, domain jargon | Analyzer config + known gaps |
| Embeddings (if hybrid) | Model choice, chunk boundaries | Embedding eval report |
| Query-time | Boosts, filters, per-pipeline tuning | Pipeline config registry |
| Content quality | Formatting, stale docs, missing metadata | Content audit log |
Key Signals:
- Recall@K
- Missing doc rate (corpus gap indicator)
- Blind spots (acronyms, synonyms, jargon failures)
Traceability artifact linking relevance changes to agent behavior.
| Field | Purpose |
|---|---|
| ID / Version | e.g., RFDR-2025-007 |
| Date | When change was deployed |
| Change Type | Analyzer, boost, synonym, new field, embedding model |
| Rationale | Why this change was made |
| Baseline Metrics Before | Recall@K, precision, known blind spots |
| Baseline Metrics After | Same metrics post-change |
| Downstream Agent Impact | Observed/expected effect on decision/reasoning quality |
| Rollback Plan | How to revert if agent behavior degrades |
Correlation Pattern:
Agent behavior regression detected
↓
Check: recent RFDRs?
↓
RFDR-2025-007: synonym expansion on 2025-06-01
↓
Compare agent metrics before/after that date
↓
Root cause identified
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENTIC PRODUCT LIFECYCLE │
└─────────────────────────────────────────────────────────────────────────────┘
CONTENT CORPUS SEARCH RELEVANCE AGENT BEHAVIOR
LIFECYCLE LIFECYCLE LIFECYCLE
───────────── ──────────────── ──────────────
┌───────────┐ ┌───────────────┐ ┌──────────────┐
│ Onboard │ │ Schema/ │ │ Decision │
│ New Docs │──────────▶│ Analyzers │ │ Policy │
└───────────┘ └───────────────┘ │ (search Y/N) │
│ │ └──────────────┘
▼ ▼ │
┌───────────┐ ┌───────────────┐ │
│ Maintain │ │ Query │ ▼
│ & Update │──────────▶│ Pipelines │──────────▶┌──────────────┐
└───────────┘ │ (cve,errata, │ │ Tool Call │
│ │ error,default)│ │ (Solr query) │
▼ └───────────────┘ └──────────────┘
┌───────────┐ │ │
│ Retire │ ▼ ▼
│ Stale │ ┌───────────────┐ ┌──────────────┐
│ Content │ │ Relevance │ │ Reasoning │
└───────────┘ │ Eval/Tune │ │ & Grounding │
│ └───────────────┘ └──────────────┘
▼ │ │
┌───────────┐ ▼ ▼
│ Quality │ ┌───────────────┐ ┌──────────────┐
│ Audit │◀──────────│ RFDR │◀─────────│ Agent Eval │
└───────────┘ │ (versioned) │ │ (3 surfaces) │
└───────────────┘ └──────────────┘
│ │ │
└────────────────────────┴──────────────────────────┘
│
FEEDBACK LOOPS
(attribute failures, iterate)
| What | How | Artifact |
|---|---|---|
| User Intents | Query classification → pipeline routing | Intent taxonomy |
| Content Health | 3-year window + freshness boost + audits | Content audit log |
| Search Relevance | Recall@K, blind spots, pipeline tuning | Baseline relevance report |
| Change Traceability | RFDR linking relevance → agent behavior | Decision record registry |
| Agent Policy | Grounding rule: refuse > hallucinate | Policy constraint doc |
| Failure Attribution | 3-surface observability | Failure taxonomy |