Skip to content

Instantly share code, notes, and snippets.

@bacalj
Created February 26, 2026 19:44
Show Gist options
  • Select an option

  • Save bacalj/a8c0cc576b3125f26fde30f6ea8e4eed to your computer and use it in GitHub Desktop.

Select an option

Save bacalj/a8c0cc576b3125f26fde30f6ea8e4eed to your computer and use it in GitHub Desktop.
ACCESS-CI Q&A extraction pipeline — full-run cost estimates (two-shot + judge, Feb 2026)

ACCESS-CI Q&A Extraction — Full-Run Cost Estimates

Date: 2026-02-26 Pipeline: Two-shot (battery + discovery + judge) — 3 LLM calls per entity Branch: feat/two-shot

How costs are calculated

Each entity gets 3 LLM calls:

  1. Battery — system prompt (~800 tokens) + entity JSON (~500-2000 tokens) → response (~500-1500 tokens)
  2. Discovery — system prompt (~1000 tokens, includes existing pairs) + entity JSON → response (~500-1500 tokens)
  3. Judge — system prompt (~400 tokens) + all pairs + source data → response (~200-500 tokens)

Token estimates are based on our 2-entity-per-domain validation runs (83 total pairs across 10 entities). Actual costs will vary with entity data richness — allocations with long abstracts cost more per entity than affinity groups with thin data.

Full-run estimate (gpt-4o-mini)

gpt-4o-mini pricing (as of Feb 2026): $0.15/1M input tokens, $0.60/1M output tokens

Domain Entities Est. Pairs Input Tokens Output Tokens Est. Cost
Compute Resources 23 ~345 ~230K ~92K ~$0.09
Software Discovery 1,404 ~19,000 ~14M ~5.6M ~$5.46
Affinity Groups 55 ~670 ~440K ~165K ~$0.17
Allocations 5,440 ~79,000 ~54M ~22M ~$21.30
NSF Awards 10,000+ ~145,000 ~100M ~40M ~$39.00
Total ~17K ~244K ~169M ~68M ~$66

Notes on this estimate

  • Pairs-per-entity estimate: Based on validation runs averaging ~8 pairs/entity for data-rich domains (compute, software, allocations, nsf-awards) and ~5 for thin domains (affinity-groups). Battery produces 4-7, discovery adds 2-5.
  • Token-per-entity estimate: ~10K input tokens (across 3 calls) and ~4K output tokens. Allocations and NSF entities with long abstracts skew higher.
  • This is cheaper than the earlier estimate (~$203) which incorrectly doubled the single-pass cost. The two-shot calls share the same entity JSON payload, and the judge call is small.

Alternative model pricing

Model Input $/1M Output $/1M Est. Full-Run Cost Notes
gpt-4o-mini $0.15 $0.60 ~$66 Default. Good quality for extraction + judge.
claude-haiku $0.25 $1.25 ~$127 Slightly more expensive. Good alternative.
gpt-4o $2.50 $10.00 ~$1,100 Overkill for most entities.
claude-sonnet $3.00 $15.00 ~$1,500 Overkill for most entities.

Mixed-model strategy (recommended)

Use a cheaper model for judge (it's just scoring, not generating):

Role Model Cost Share
Battery + Discovery gpt-4o-mini ~$55
Judge gpt-4o-mini ~$11
Total ~$66

Or use a stronger model for extraction with cheap judge:

Role Model Cost Share
Battery + Discovery gpt-4o ~$920
Judge gpt-4o-mini ~$11
Total ~$931

Incremental re-runs

With --incremental, unchanged entities are skipped entirely (hash-based change detection). Only entities whose upstream data changed get re-extracted. In practice:

  • First run: Full cost (~$66 with gpt-4o-mini)
  • Subsequent runs: Cost proportional to % of entities that changed. If 5% of entities change, cost is ~$3.30.
  • Cache stores: entity hash + all pairs + judge scores. No LLM calls needed for cache hits.

Validation run data (2026-02-24)

Actual numbers from a 2-entity-per-domain run with gpt-4o-mini:

Domain Entities Pairs Avg Pairs/Entity Avg Judge Confidence
compute-resources 2 17 8.5 0.93
software-discovery 2 18 9.0 0.90
allocations 2 19 9.5 0.90
nsf-awards 2 18 9.0 0.88
affinity-groups 2 8 4.0 0.95
comparisons 3
Total 10 83 8.3 0.91

100% citation validity. All pairs scored suggested_decision: "approved".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment