bentito/mcp-dev-2026.md

## mcp-dev-2026.md

      
    Raw
  

              mcp-dev-2026.md
            
          
    Presentation Proposal: MCP Dev Summit North America 2026

Title: Diagnosis with Agent Evals: Building an MCP Server for Network Troubleshooting
Session Type: 25-minute Session
Track: Security & Operations / MCP Best Practices
Abstract

Troubleshooting Kubernetes network ingress and DNS issues is notoriously complex, requiring deep domain knowledge and context hopping between multiple layers (HAProxy, CoreDNS, generic K8s resources). While LLMs promise to democratize this knowledge, giving them raw kubectl access is risky and often ineffective.
In this session, we present our journey building and rigorously validating a production-grade NetEdge MCP server. We detail how we evolved from a "Phase 0" prototype using gen-mcp to a robust Go implementation, but more importantly, we ask the hard question: Do these specialized MCP tools actually help agents solve these networking problems better?
To answer this, we used gevals, our agentic evaluation framework, to codify 6 real-world infrastructure failure scenarios (e.g., misconfigured backendRefs, stale DNS caches, selector mismatches). We ran extensive comparative trials to measure success rates and "Time to Diagnosis" across three distinct agent configurations:

Baseline: Agent with raw kubectl / CLI access (Zero tooling).
Generalist: Agent with generic Kubernetes MCP tools (Create/Delete Pods, etc.).
Specialized: Agent with our domain-specific NetEdge MCP toolset.
Our results show that specialized tools don't just improve safety—they fundamentally change the agent's problem-solving trajectory. We will share these findings along with the dual-mode architecture that makes such rigorous evaluation possible.

Key Takeaways


Evaluation First: How to use gevals to engineer valid benchmarks for agent performance against real-world broken scenarios.
The Case for Specialization: Comparative data showing why domain-specific MCP tools outperform generalist K8s tools for complex troubleshooting.
Architecting for Testability: Designing MCP servers that work offline to enable reproducible agent benchmarks.

Session Outline


The Problem Space (3 mins)

Why K8s networking is hard (layers: Ingress -> Service -> Pod -> DNS).
The hypothesis: Specialized tools > Generic access.


Architecting for Diagnosis (7 mins)

Evolution from CLI wrappers to Native Go tools (inspect_route, get_coredns_config).
The "NidsStore" Interface: Decoupling logic from data to enable offline replay.
The "Offline" Innovation: Why running agents against static diagnostic snapshots is crucial for reproducible benchmarks.


Proving It: Rigorous Evaluation with gevals (10 mins)

The Methodology: defining 6 "Broken Cluster" scenarios based on real support tickets.
The Showdown:

Scenario A: Agent vs. kubectl (The "Hallucination Hazard").
Scenario B: Agent vs. Generic K8s MCP (The "Context Overload").
Scenario C: Agent vs. NetEdge MCP (The "Guided Path").


Results: Metrics on success rate, step count, and safety violations.


Future of Agentic Ops (3 mins)

Using these benchmarks to drive future tool design.
Call to action: Stop building tools; start building evaluations.


Q&A (2 mins)

Intended Audience

DevOps Engineers, Platform Builders, and MCP Tool Developers looking to build robust, secure, and testable server implementations for complex infrastructure.
No results found