Title: Diagnosis with Agent Evals: Building an MCP Server for Network Troubleshooting
Session Type: 25-minute Session
Track: Security & Operations / MCP Best Practices
Troubleshooting Kubernetes network ingress and DNS issues is notoriously complex, requiring deep domain knowledge and context hopping between multiple layers (HAProxy, CoreDNS, generic K8s resources). While LLMs promise to democratize this knowledge, giving them raw kubectl access is risky and often ineffective.
In this session, we present our journey building and rigorously validating a production-grade NetEdge MCP server. We detail how we evolved from a "Phase 0" prototype using gen-mcp to a robust Go implementation, but more importantly, we ask the hard question: Do these specialized MCP tools actually help agents solve these networking problems better?
To answer this, we used gevals, our agentic evaluation framework, to codify 6 real-world infrastructure failure scenarios (e.g., misconfigured backendRefs, stale DNS caches, selector mismatches). We ran extensive comparative trials to measure success rates and "Time to Diagnosis" across three distinct agent configurations:
- Baseline: Agent with raw
kubectl/ CLI access (Zero tooling). - Generalist: Agent with generic Kubernetes MCP tools (Create/Delete Pods, etc.).
- Specialized: Agent with our domain-specific NetEdge MCP toolset. Our results show that specialized tools don't just improve safety—they fundamentally change the agent's problem-solving trajectory. We will share these findings along with the dual-mode architecture that makes such rigorous evaluation possible.
- Evaluation First: How to use gevals to engineer valid benchmarks for agent performance against real-world broken scenarios.
- The Case for Specialization: Comparative data showing why domain-specific MCP tools outperform generalist K8s tools for complex troubleshooting.
- Architecting for Testability: Designing MCP servers that work offline to enable reproducible agent benchmarks.
- The Problem Space (3 mins)
- Why K8s networking is hard (layers: Ingress -> Service -> Pod -> DNS).
- The hypothesis: Specialized tools > Generic access.
- Architecting for Diagnosis (7 mins)
- Evolution from CLI wrappers to Native Go tools (
inspect_route,get_coredns_config). - The "NidsStore" Interface: Decoupling logic from data to enable offline replay.
- The "Offline" Innovation: Why running agents against static diagnostic snapshots is crucial for reproducible benchmarks.
- Evolution from CLI wrappers to Native Go tools (
- Proving It: Rigorous Evaluation with gevals (10 mins)
- The Methodology: defining 6 "Broken Cluster" scenarios based on real support tickets.
- The Showdown:
- Scenario A: Agent vs.
kubectl(The "Hallucination Hazard"). - Scenario B: Agent vs. Generic K8s MCP (The "Context Overload").
- Scenario C: Agent vs. NetEdge MCP (The "Guided Path").
- Scenario A: Agent vs.
- Results: Metrics on success rate, step count, and safety violations.
- Future of Agentic Ops (3 mins)
- Using these benchmarks to drive future tool design.
- Call to action: Stop building tools; start building evaluations.
- Q&A (2 mins)
DevOps Engineers, Platform Builders, and MCP Tool Developers looking to build robust, secure, and testable server implementations for complex infrastructure.
Overall, a nice proposal but it covers alot, maybe too much. It's almost enough for two talks. Also, concluding with a call to action for evals but not mentioning evals in the title is dissonant. It might be nice to shorten the title and add "evals" and "troubleshooting" to it. E.g. "Diagnosis with agent evals: Building an MCP Server for Network Troubleshooting". You may want to drop the mention of the OpenShift tool (must-gather), which afaik isn't used in Kubernetes.