Skip to content

Instantly share code, notes, and snippets.

@cdeil
Created March 1, 2026 11:14
Show Gist options
  • Select an option

  • Save cdeil/93ceacbdea17a7e744fb8c6ec95b3d9f to your computer and use it in GitHub Desktop.

Select an option

Save cdeil/93ceacbdea17a7e744fb8c6ec95b3d9f to your computer and use it in GitHub Desktop.

VSCode Copilot Parser - Technical Summary

Implementation notes for the VSCode Copilot chat session parser in agentsview.

What's Implemented

The parser ingests VSCode Copilot chat sessions from local disk, covering:

  • VSCode Stable (~/Library/Application Support/Code/User/)
  • VSCode Insiders (~/Library/Application Support/Code - Insiders/User/)
  • Both .json (pre-v1.109) and .jsonl (v1.109+ default) formats
  • Workspace sessions (workspaceStorage/<hash>/chatSessions/)
  • Global sessions (globalStorage/emptyWindowChatSessions/)

How VSCode Stores Sessions

VSCode uses a two-layer storage model:

  1. SQLite index (state.vscdb) -- metadata only (title, timestamps, isEmpty). Key: chat.ChatSessionStore.index in the ItemTable.
  2. Session files (chatSessions/<uuid>.{json,jsonl}) -- full conversation content.

Each opened workspace gets an MD5-hashed directory under workspaceStorage/. A workspace.json manifest inside maps the hash back to the human-readable project path.

JSON vs JSONL

Format Era Strategy
.json Pre v1.109 Full JSON rewrite on every save
.jsonl v1.109+ Append-only operation log (kind=0 initial, kind=1 set, kind=2 push, kind=3 delete)

Both formats produce the same session structure after parsing. When both exist for the same UUID, the .jsonl version takes priority. After reconstruction, the top-level schema includes sessionId, creationDate, customTitle, requests[] with nested message, response[], agent, modelId, and result.

Agent Sessions (Not Implemented)

VSCode's "coding agent" mode stores sessions differently -- only in state.vscdb under agentSessions.model.cache, not as separate files. Parsing these would require SQLite extraction and is outside the current scope.

Tool Call Extraction

VSCode Copilot response items include toolInvocationSerialized entries with:

  • toolId -- the raw tool identifier (e.g., copilot_readFile, copilot_runInTerminal)
  • invocationMessage -- human-readable description (string or {value: "..."} object)
  • pastTenseMessage -- past-tense version preferred for display
  • toolSpecificData -- structured data (e.g., {kind: "terminal", command: "npm test"})

The parser extracts these into InputJSON for frontend display, and normalizes the raw toolId to a standard category via two mapping steps:

copilot_readFile     -> read_file    -> Read
copilot_runInTerminal -> shell       -> Bash
copilot_replaceString -> edit_file   -> Edit (was Write)
copilot_findTextInFiles -> grep     -> Grep
copilot_listDirectory -> glob       -> Glob
copilot_createFile   -> create_file  -> Write
copilot_fetchWebPage -> read_web_page -> Read
runSubagent          -> Task         -> Task

There are 60+ unique tool IDs observed in practice, including MCP tools (mcp_dart_sdk_*, mcp_microsoft_pla_*, pgsql_*), GitHub PR tools, Python environment tools, and various extensions. Unmapped tools fall through to the Other category.

Mixed Content Handling

When a response contains both tool calls and markdown text, the parser always includes tool markers in the content (e.g., [Read: copilot_readFile]) followed by the text. This ensures the frontend can detect and render tool blocks regardless of whether text accompanies them.

How This Differs from Cursor

agentsview already has a separate Cursor parser (internal/parser/cursor.go). The two are architecturally different:

Aspect VSCode Copilot Cursor
Storage location ~/Library/Application Support/Code/User/workspaceStorage/<hash>/chatSessions/ ~/.cursor/projects/<project>/agent-transcripts/
File format JSON or JSONL (VSCode mutation log) Plain text (role markers) or JSONL (Anthropic API format)
Project mapping Via workspace.json manifest in hash directory Via parent directory name
Tool calls toolInvocationSerialized items in response array [Tool call] name markers in text
Session index state.vscdb SQLite None (files are self-contained)
Config env var COPILOT_DIR CURSOR_PROJECTS_DIR
Default path ~/Library/Application Support/Code/User/ ~/.cursor/projects

Cursor stores transcripts as plain text files or Anthropic API JSONL, making them simpler to parse but containing less structured metadata. VSCode Copilot's structured JSON/JSONL format preserves richer data (tool invocation details, timing info, model IDs) but requires more complex reconstruction logic.

Session Inventory (User's Machine)

Edition JSON JSONL Total
Code (stable) 959 71 ~1030
Code - Insiders 24 140 ~164
Total 983 211 ~1194

Before JSONL support, only 920 sessions were discoverable (JSON only).

Implementation Files

File Purpose
internal/parser/vscode_copilot.go JSON + JSONL parser, tool normalization
internal/parser/vscode_copilot_test.go Tests (tool extraction, mixed content, JSONL)
internal/parser/discovery.go DiscoverVSCodeCopilotSessions()
internal/parser/taxonomy.go NormalizeToolCategory() shared mapping
internal/parser/types.go AgentVSCodeCopilot constant
internal/sync/engine.go processVSCodeCopilot(), file watcher integration
frontend/src/lib/utils/agents.ts vscode-copilot in KNOWN_AGENTS
frontend/src/lib/utils/content-parser.ts Tool regex includes Other category

Key Design Decisions

  1. Parse files, not state.vscdb -- the SQLite database is volatile and undocumented. The JSON/JSONL files are the canonical source.
  2. JSONL reconstruction -- replay mutations into json.RawMessage, then unmarshal into the same structs used for flat JSON.
  3. Deduplication -- .jsonl wins over .json for the same UUID.
  4. Tool normalization -- two-step mapping (raw ID -> generic name -> category) using the same taxonomy as all other agents.
  5. Content format -- tool markers always included in message content so the frontend's regex-based parser can detect them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment