Two research reports trace how a tool call flows through the Model Context Protocol. This critique compares their depth, quality, and gaps.
| Dimension | Report A (mcp-tool-call-flow.md) |
Report B (how-does-a-tool-call-flow...) |
|---|---|---|
| Length | ~450 lines | ~700 lines (~32KB) |
| Structure | Phase-based (4 phases) | Layer-based (4 layers, 9 numbered steps) |
| Session | 27ed82ad |
f349ea56 |
Report A covers the protocol lifecycle in four clean phases (Negotiation → Discovery → Invocation → Updates) and then branches into topics Report B omits entirely:
- Sampling/Agentic context — a full mermaid sequence diagram showing how tools participate in
sampling/createMessagemulti-turn loops, with the server executing tools locally. This is a significant protocol feature that Report B ignores completely. - Security considerations — a structured breakdown of server responsibilities (input validation, rate limiting, output sanitization), client responsibilities (user confirmation, timeouts, audit logging), and the trust model around annotations.
- Python SDK patterns — traces the decorator-based registration pattern in the fetch reference server using Pydantic, showing that the tool call pattern is cross-language.
- Client-side middleware — documents the composable fetch middleware pipeline (
withOAuth,withLogging, custom middleware), which is the mechanism real-world HTTP clients use. - Content types — enumerates all five content block types (
TextContent,ImageContent,AudioContent,ResourceLink,EmbeddedResource), while Report B only shows text.
Report B traces the exact same core flow but at a significantly deeper level of implementation detail:
- 9 numbered steps with actual source code at each layer — not just the API surface, but the internal dispatch mechanisms (
_requestWithSchema,_onrequest,_responseHandlersmap keying by message ID). - Precise citations — 23 footnotes with
file:linereferences (e.g.,packages/core/src/shared/protocol.ts:761-870). Report A cites files but rarely lines. - Complete end-to-end ASCII trace — a ~60-line diagram tracing a single
echocall fromclient.callTool()through every internal method, across the wire, through server dispatch, validation, execution, and back. This is the most valuable artifact in either report. - 5 validation checkpoints — explicitly enumerates every point where validation occurs (capability check, request schema, input args, result schema, output schema). Report A mentions validation exists but doesn't map the full pipeline.
- Confidence assessment — explicitly states what's high-confidence vs. medium-confidence and documents assumptions. Report A presents everything as equally certain.
- Task-based execution guard — documents the
isToolTaskRequired()check that rejects tools requiring the experimental tasks API. Report A mentionsexecutionas a Tool field but doesn't trace the runtime behavior.
- Better architectural overview — the layered ASCII diagram showing McpServer → Server → Protocol → Transport is cleaner and easier to scan than Report B's more detailed but denser version.
- Better breadth — covers cross-cutting concerns (security, sampling, middleware, Python) that a reader needs for full protocol understanding.
- Cleaner mermaid diagrams — the invocation sequence diagram is immediately legible.
- More complete Tool schema table — includes
title,execution, and all annotation fields with their purposes. - Annotation defaults — Report B lists defaults (readOnlyHint=false, destructiveHint=true, etc.) but Report A does not. This is actually a Report B strength.
- Significantly more rigorous — every claim traces to source code with line numbers. If the SDK changes, you know exactly which claims to re-verify.
- Better for implementers — someone building an MCP client or server could follow Report B step-by-step and understand the actual code they'll interact with.
- The end-to-end trace is exceptional — no other artifact in either report provides as much clarity on the actual runtime flow.
- Honest about uncertainty — the confidence assessment is a mark of research quality that Report A lacks.
- Schema interface definitions — includes the actual TypeScript interface definitions from
schema.ts, not just field tables.
- No source citations below file level — you can't verify claims without re-reading entire files.
- No confidence assessment — presents all information with equal certainty, which is misleading for draft-version features.
- Sampling diagram is uncited — the agentic flow diagram is valuable but has no specification reference beyond a file name.
- Missing sampling/agentic context — this is the biggest gap. The multi-turn tool use loop via
sampling/createMessageis a core protocol capability. - Missing security analysis — no discussion of the trust model, rate limiting, or client-side safeguards.
- TypeScript-only — doesn't acknowledge the Python SDK or show that the pattern is cross-language.
- No middleware coverage — omits the HTTP middleware pipeline that real-world clients use.
- Verbose — at 32KB, it's harder to use as a quick reference. The depth is earned but could benefit from a summary section.
-
Cancellation flow — both reports mention
AbortController/AbortSignalin passing but neither traces what happens when a client sendsnotifications/cancelledfor an in-flighttools/call. The spec supports this and the SDK implements it. -
Progress notifications —
tools/callsupports progress tokens via_meta.progressToken. Neither report traces how a long-running tool sendsnotifications/progressback to the client during execution. -
Pagination mechanics — Report A mentions cursor-based pagination for
tools/list; neither report shows the actual cursor flow or how a client iterates through a large tool set. -
Transport-specific behaviors — neither report addresses how the tool call flow differs across transports. For example, Streamable HTTP requires session management headers, and SSE has different request/response semantics than stdio.
-
Error recovery patterns — the two-level error model is documented, but neither report discusses what happens after an error: retry strategies, how LLMs use
isErrorresponses to self-correct, or client-side error handling patterns. -
Tool output schema validation asymmetry — Report B documents that the client validates
structuredContentagainstoutputSchema, but neither report discusses the design decision: why is output validation done on both server AND client? What happens when they disagree? -
Dynamic tool registration at runtime — both mention
notifications/tools/list_changed, but neither traces the full flow of a server adding a tool after initialization and the client re-discovering it. -
Authorization/authentication — Report A mentions security broadly, but neither traces how auth flows through tool calls, particularly in HTTP transports where
authInfois extracted from the request and passed through the context object.
Report B is the stronger research artifact for its intended purpose (understanding the implementation). The footnoted citations, validation checkpoint map, and end-to-end trace demonstrate genuine source-level investigation rather than summarization.
Report A is the better reference document for someone who needs to understand the protocol holistically — sampling, security, cross-language patterns, and middleware are all things a practitioner needs.
The ideal document would combine both: Report B's depth and rigor for the core flow, with Report A's breadth sections on sampling, security, Python patterns, and middleware appended as additional chapters. The confidence assessment from Report B should be standard practice.
If merging into a single document:
- Use Report B's layer-based structure and end-to-end trace as the spine
- Add Report A's sampling/agentic context section (with citations added)
- Add Report A's security considerations section
- Add Report A's Python SDK patterns section
- Add Report A's middleware section
- Fill the 8 gaps identified above
- Add a quick-reference summary at the top for scanning