Skip to content

Instantly share code, notes, and snippets.

@andyed
Last active March 3, 2026 03:14
Show Gist options
  • Select an option

  • Save andyed/72f8af0fd2f737dfb9fa3ab343b593b3 to your computer and use it in GitHub Desktop.

Select an option

Save andyed/72f8af0fd2f737dfb9fa3ab343b593b3 to your computer and use it in GitHub Desktop.
Claude Code hooks: auto-log research URLs and session milestones for history viewers

Claude Code Session Hooks

Hooks that log session milestones and research URLs from Claude Code conversations. Designed as a data source for tools like claude-code-history-viewer — adding structured event data alongside the raw JSONL transcripts.

What gets logged

Two log files, both JSONL (one JSON object per line):

Log file What it captures Events
research-log.jsonl Every URL fetched or searched during sessions PostToolUse on WebFetch, WebSearch
session-milestones.jsonl Key moments within sessions PreCompact, SubagentStop

Why these events?

Research log — Claude Code sessions often involve deep literature review via WebFetch and WebSearch. This content lives only in conversation context and vanishes when the session ends. The hook captures every URL with metadata for later retrieval.

Session milestones — The session transcript JSONL already records every message, but some moments are structurally significant:

  • PreCompact — The context window is full and about to be compressed. This is the point of peak information density in a session. A viewer can use this as a "bookmark here" marker.
  • SubagentStop (filtered to Explore, Plan, general-purpose) — A deep research or planning agent just completed, often involving 20-100+ tool calls. Marks when substantial autonomous work finished.

Setup

Prerequisites

  • Claude Code CLI installed
  • jq available on PATH (brew install jq on macOS)
  • Python 3 available (used for URL encoding in milestone hook)

1. Create the hook scripts

Place these two files in ~/.claude/hooks/ and make them executable:

mkdir -p ~/.claude/hooks
# Copy the scripts (see Hook Scripts section below)
chmod +x ~/.claude/hooks/log-research.sh
chmod +x ~/.claude/hooks/log-session-milestones.sh

2. Add hooks to settings

Edit ~/.claude/settings.json and add a hooks key (or merge into existing hooks):

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "WebFetch|WebSearch",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/log-research.sh",
            "async": true
          }
        ]
      }
    ],
    "PreCompact": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/log-session-milestones.sh",
            "async": true
          }
        ]
      }
    ],
    "SubagentStop": [
      {
        "matcher": "Explore|Plan|general-purpose",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/log-session-milestones.sh",
            "async": true
          }
        ]
      }
    ]
  }
}

3. Choose log file locations

By default, both scripts write to ~/Documents/dev/. Edit the LOG_FILE variable at the top of each script to change the destination:

# In log-research.sh
LOG_FILE="$HOME/Documents/dev/research-log.jsonl"

# In log-session-milestones.sh
LOG_FILE="$HOME/Documents/dev/session-milestones.jsonl"

4. Restart Claude Code

Hooks are snapshotted at session startup. Start a new session for changes to take effect.

5. Verify

After your first WebFetch or WebSearch, check:

tail -3 ~/Documents/dev/research-log.jsonl | jq .

After a context compaction or subagent completion:

tail -3 ~/Documents/dev/session-milestones.jsonl | jq .

Log schemas

research-log.jsonl

Three entry types: fetch, search, and search_result.

fetch — a URL was loaded via WebFetch

{
  "timestamp": "2026-03-02T18:42:15Z",
  "type": "fetch",
  "url": "https://arxiv.org/abs/2602.03766",
  "prompt": "Extract the key findings about cortical magnification",
  "category": "research",
  "project": "scrutinizer-www",
  "session": "86fc3c77-c2b2-4f17-ac74-2b39751238ef",
  "transcript_path": "/Users/you/.claude/projects/-Users-you-myproject/86fc3c77-....jsonl"
}

search — a WebSearch query was issued

{
  "timestamp": "2026-03-02T18:40:02Z",
  "type": "search",
  "query": "CLAUDE.md best practices tips effective 2026",
  "project": "psychodeli-webgl-port",
  "session": "86fc3c77-c2b2-4f17-ac74-2b39751238ef",
  "transcript_path": "/Users/you/.claude/projects/.../.jsonl"
}

search_result — an individual URL from WebSearch results

{
  "timestamp": "2026-03-02T18:40:02Z",
  "type": "search_result",
  "url": "https://www.humanlayer.dev/blog/writing-a-good-claude-md",
  "title": "Writing a good CLAUDE.md | HumanLayer Blog",
  "query": "CLAUDE.md best practices tips effective 2026",
  "category": "blog",
  "project": "psychodeli-webgl-port",
  "session": "86fc3c77-c2b2-4f17-ac74-2b39751238ef",
  "transcript_path": "/Users/you/.claude/projects/.../.jsonl"
}

Field reference

Field Type Present in Description
timestamp ISO 8601 all UTC time of the event
type string all "fetch", "search", or "search_result"
url string fetch, search_result The URL fetched or found in results
prompt string fetch The prompt sent to WebFetch (what to extract from the page)
query string search, search_result The search query string
title string search_result Page title from search results (best-effort)
category string fetch, search_result Auto-detected from URL domain (see below)
project string all basename of the working directory
session string all Claude Code session ID
transcript_path string all Absolute path to the session's JSONL transcript file

Categories

Assigned automatically by URL domain pattern matching:

Category Matching domains
research arxiv.org, pubmed, pmc.ncbi, jov.arvojournals, biorxiv.org, semanticscholar, springer.com/article, elifesciences.org, researchgate.net/publication, dspace.mit.edu, sciencedirect.com
docs github.com, docs., developer., mdn.*, readthedocs, deepwiki.com
blog medium.com, dev.to, substack.com, *.blog, wordpress
news sciencedaily, arstechnica, theverge, wired.com, techcrunch, news.ycombinator
reference wikipedia.org, stackoverflow.com, stackexchange.com
other Everything else

The categorization lives in the categorize_url() function in log-research.sh. Edit it to match your domains.

session-milestones.jsonl

compaction_auto / compaction_manual — context window full

{
  "timestamp": "2026-03-02T20:15:33Z",
  "milestone": "compaction_auto",
  "description": "Context compaction (auto) — session at peak density",
  "session_id": "86fc3c77-c2b2-4f17-ac74-2b39751238ef",
  "transcript_path": "/Users/you/.claude/projects/-Users-you-myproject/86fc3c77-....jsonl",
  "deeplink": "claude-history://session/%2FUsers%2Fyou%2F.claude%2Fprojects%2F...%2F86fc3c77-....jsonl",
  "project": "psychodeli-webgl-port",
  "event": "PreCompact"
}

agent_Explore / agent_Plan / agent_general-purpose — subagent completed

{
  "timestamp": "2026-03-02T19:48:12Z",
  "milestone": "agent_Explore",
  "description": "Explore agent completed",
  "session_id": "86fc3c77-c2b2-4f17-ac74-2b39751238ef",
  "transcript_path": "/Users/you/.claude/projects/-Users-you-myproject/86fc3c77-....jsonl",
  "deeplink": "claude-history://session/%2FUsers%2Fyou%2F.claude%2Fprojects%2F...%2F86fc3c77-....jsonl",
  "project": "scrutinizer-www",
  "event": "SubagentStop"
}

Field reference

Field Type Description
timestamp ISO 8601 UTC time of the milestone
milestone string Machine-readable milestone type (e.g. compaction_auto, agent_Explore)
description string Human-readable description
session_id string Claude Code session ID
transcript_path string Absolute path to the session's JSONL transcript
deeplink string Proposed claude-history:// protocol URL (not yet implemented)
project string basename of the working directory
event string The Claude Code hook event that fired (PreCompact, SubagentStop)

Querying the logs

# Research: all unique URLs fetched today
jq -r 'select(.type=="fetch") | "\(.category)\t\(.url)"' research-log.jsonl | sort -u

# Research: group by category
jq -r 'select(.url) | .category' research-log.jsonl | sort | uniq -c | sort -rn

# Research: everything from a specific session
jq 'select(.session=="86fc3c77-c2b2-4f17-ac74-2b39751238ef")' research-log.jsonl

# Milestones: all compactions (sessions that hit context limits)
jq 'select(.milestone | startswith("compaction"))' session-milestones.jsonl

# Milestones: sessions with the most subagent completions
jq -r 'select(.milestone | startswith("agent")) | .session_id' session-milestones.jsonl | sort | uniq -c | sort -rn

# Cross-reference: what was being researched when context filled up?
SESS=$(jq -r 'select(.milestone=="compaction_auto") | .session_id' session-milestones.jsonl | tail -1)
jq --arg s "$SESS" 'select(.session==$s and .type=="fetch")' research-log.jsonl

Deep linking (future)

The transcript_path field in both logs points to the session's JSONL file:

~/.claude/projects/{encoded-project-path}/{session-uuid}.jsonl

The deeplink field in milestones proposes a claude-history:// URL scheme. Once a viewer implements protocol handling, these become clickable links from any tool that reads the logs. The session UUID in the filename is the primary key.

Hook scripts

log-research.sh

Source: ~/.claude/hooks/log-research.sh

log-session-milestones.sh

Source: ~/.claude/hooks/log-session-milestones.sh

#!/bin/bash
# PostToolUse hook: logs WebFetch and WebSearch to research-log.jsonl.
# Receives JSON on stdin with tool_input + tool_response (PostToolUse schema).
#
# Logs:
# WebFetch → one "fetch" entry with URL, prompt, auto-categorized
# WebSearch → one "search" entry (query) + one "search_result" per result URL
#
# Categories (auto-detected from URL domain):
# research — arxiv, pubmed, pmc, jov, biorxiv, semanticscholar, springer, elifesciences
# docs — github.com (repos/docs), MDN, official docs sites
# blog — medium, dev.to, substack, *.blog, wordpress
# news — news sites, sciencedaily, arstechnica, hackernews
# reference — wikipedia, stackexchange, stackoverflow
# other — everything else
#
# Ingest: cd ~/Documents/dev/interests/interests2025 && node src/ingest_research_log.js --index
LOG_FILE="$HOME/Documents/dev/research-log.jsonl"
INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // empty')
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // empty')
TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // empty')
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
PROJECT=$(basename "$CWD")
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
# Auto-categorize URL by domain
categorize_url() {
local url="$1"
local lower=$(echo "$url" | tr '[:upper:]' '[:lower:]')
case "$lower" in
*arxiv.org*|*pubmed*|*pmc.ncbi*|*jov.arvojournals*|*biorxiv.org*|*semanticscholar*|*springer.com/article*|*elifesciences.org*|*researchgate.net/publication*|*dspace.mit.edu*|*cvrl.org*|*ncbi.nlm.nih.gov/books*|*sciencedirect.com*)
echo "research" ;;
*github.com*|*docs.*|*developer.*|*mdn.*|*readthedocs*|*deepwiki.com*)
echo "docs" ;;
*medium.com*|*dev.to*|*substack.com*|*.blog*|*wordpress*|*humanlayer.dev/blog*|*gend.co/blog*|*promptlayer.com*|*eesel.ai/blog*|*smartscope.blog*|*boxesandarrows.com*)
echo "blog" ;;
*news.*|*sciencedaily*|*arstechnica*|*theverge*|*wired.com*|*techcrunch*|*hackernews*|*news.ycombinator*)
echo "news" ;;
*wikipedia.org*|*stackoverflow.com*|*stackexchange.com*)
echo "reference" ;;
*)
echo "other" ;;
esac
}
if [ "$TOOL_NAME" = "WebFetch" ]; then
URL=$(echo "$INPUT" | jq -r '.tool_input.url // empty')
PROMPT=$(echo "$INPUT" | jq -r '.tool_input.prompt // empty')
CATEGORY=$(categorize_url "$URL")
jq -n -c \
--arg ts "$TIMESTAMP" \
--arg type "fetch" \
--arg url "$URL" \
--arg prompt "$PROMPT" \
--arg category "$CATEGORY" \
--arg project "$PROJECT" \
--arg session "$SESSION_ID" \
--arg transcript "$TRANSCRIPT" \
'{timestamp: $ts, type: $type, url: $url, prompt: $prompt, category: $category, project: $project, session: $session, transcript_path: $transcript}' \
>> "$LOG_FILE"
elif [ "$TOOL_NAME" = "WebSearch" ]; then
QUERY=$(echo "$INPUT" | jq -r '.tool_input.query // empty')
# Log the search query itself
jq -n -c \
--arg ts "$TIMESTAMP" \
--arg type "search" \
--arg query "$QUERY" \
--arg project "$PROJECT" \
--arg session "$SESSION_ID" \
--arg transcript "$TRANSCRIPT" \
'{timestamp: $ts, type: $type, query: $query, project: $project, session: $session, transcript_path: $transcript}' \
>> "$LOG_FILE"
# Extract result URLs from tool_response and log each as search_result
# tool_response format varies; try to extract URLs from the links array
echo "$INPUT" | jq -r '
.tool_response // empty |
if type == "string" then
# Try to parse stringified JSON
(try fromjson catch null)
else
.
end |
if type == "array" then .[]
elif type == "object" then .links // .results // [] | .[]
else empty
end |
.url // empty
' 2>/dev/null | while IFS= read -r RESULT_URL; do
[ -z "$RESULT_URL" ] && continue
CATEGORY=$(categorize_url "$RESULT_URL")
# Extract title if available (best effort)
TITLE=$(echo "$INPUT" | jq -r --arg url "$RESULT_URL" '
.tool_response // "" |
if type == "string" then (try fromjson catch null) else . end |
if type == "array" then .[] elif type == "object" then .links // .results // [] | .[] else empty end |
select(.url == $url) | .title // ""
' 2>/dev/null | head -1)
jq -n -c \
--arg ts "$TIMESTAMP" \
--arg type "search_result" \
--arg url "$RESULT_URL" \
--arg title "$TITLE" \
--arg query "$QUERY" \
--arg category "$CATEGORY" \
--arg project "$PROJECT" \
--arg session "$SESSION_ID" \
--arg transcript "$TRANSCRIPT" \
'{timestamp: $ts, type: $type, url: $url, title: $title, query: $query, category: $category, project: $project, session: $session, transcript_path: $transcript}' \
>> "$LOG_FILE"
done
fi
# Always exit 0 — this is a passive logger, never blocks
exit 0
#!/bin/bash
# PostToolUse/lifecycle hook: logs session milestones with deep link info.
# Creates a timeline of "session bookmarks" for claude-code-history-viewer.
#
# Milestones logged:
# - PreCompact (auto/manual) — context is full, about to lose detail
# - SessionEnd — natural session close
# - SubagentStop — research/explore agents completing work
#
# Output: ~/Documents/dev/session-milestones.jsonl
# Future: claude-history://session/{encoded-path}?t={timestamp}
LOG_FILE="$HOME/Documents/dev/session-milestones.jsonl"
INPUT=$(cat)
EVENT=$(echo "$INPUT" | jq -r '.hook_event_name // empty')
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // empty')
TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // empty')
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
PROJECT=$(basename "$CWD")
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
# Derive a deep link URL (protocol TBD — log both raw path and proposed URL)
# Encode the transcript path for URL safety
ENCODED_PATH=$(echo "$TRANSCRIPT" | python3 -c "import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read().strip(), safe=''))" 2>/dev/null || echo "$TRANSCRIPT")
case "$EVENT" in
PreCompact)
TRIGGER=$(echo "$INPUT" | jq -r '.trigger // "unknown"')
MILESTONE="compaction_${TRIGGER}"
DESCRIPTION="Context compaction (${TRIGGER}) — session at peak density"
;;
SessionEnd)
REASON=$(echo "$INPUT" | jq -r '.reason // "unknown"')
MILESTONE="session_end_${REASON}"
DESCRIPTION="Session ended (${REASON})"
;;
SubagentStop)
AGENT_TYPE=$(echo "$INPUT" | jq -r '.agent_type // "unknown"')
# Only log interesting agent types (Explore, Plan, general-purpose)
case "$AGENT_TYPE" in
Explore|Plan|general-purpose)
MILESTONE="agent_${AGENT_TYPE}"
DESCRIPTION="${AGENT_TYPE} agent completed"
;;
*)
exit 0 # Skip noisy agent types (haiku, etc.)
;;
esac
;;
*)
exit 0
;;
esac
jq -n -c \
--arg ts "$TIMESTAMP" \
--arg milestone "$MILESTONE" \
--arg description "$DESCRIPTION" \
--arg session "$SESSION_ID" \
--arg transcript "$TRANSCRIPT" \
--arg deeplink "claude-history://session/${ENCODED_PATH}" \
--arg project "$PROJECT" \
--arg event "$EVENT" \
'{timestamp: $ts, milestone: $milestone, description: $description, session_id: $session, transcript_path: $transcript, deeplink: $deeplink, project: $project, event: $event}' \
>> "$LOG_FILE"
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment