bacalj/gist:a8c0cc576b3125f26fde30f6ea8e4eed

## gistfile0.md

      
    Raw
  

              gistfile0.md
            
          
    ACCESS-CI Q&A Extraction — Full-Run Cost Estimates

Date: 2026-02-26
Pipeline: Two-shot (battery + discovery + judge) — 3 LLM calls per entity
Branch: feat/two-shot
How costs are calculated

Each entity gets 3 LLM calls:

Battery — system prompt (~800 tokens) + entity JSON (~500-2000 tokens) → response (~500-1500 tokens)
Discovery — system prompt (~1000 tokens, includes existing pairs) + entity JSON → response (~500-1500 tokens)
Judge — system prompt (~400 tokens) + all pairs + source data → response (~200-500 tokens)

Token estimates are based on our 2-entity-per-domain validation runs (83 total pairs across 10 entities). Actual costs will vary with entity data richness — allocations with long abstracts cost more per entity than affinity groups with thin data.
Full-run estimate (gpt-4o-mini)

gpt-4o-mini pricing (as of Feb 2026): $0.15/1M input tokens, $0.60/1M output tokens


Domain
Entities
Est. Pairs
Input Tokens
Output Tokens
Est. Cost


Compute Resources
23
~345
~230K
~92K
~$0.09


Software Discovery
1,404
~19,000
~14M
~5.6M
~$5.46


Affinity Groups
55
~670
~440K
~165K
~$0.17


Allocations
5,440
~79,000
~54M
~22M
~$21.30


NSF Awards
10,000+
~145,000
~100M
~40M
~$39.00


Total
~17K
~244K
~169M
~68M
~$66


Notes on this estimate


Pairs-per-entity estimate: Based on validation runs averaging ~8 pairs/entity for data-rich domains (compute, software, allocations, nsf-awards) and ~5 for thin domains (affinity-groups). Battery produces 4-7, discovery adds 2-5.
Token-per-entity estimate: ~10K input tokens (across 3 calls) and ~4K output tokens. Allocations and NSF entities with long abstracts skew higher.
This is cheaper than the earlier estimate (~$203) which incorrectly doubled the single-pass cost. The two-shot calls share the same entity JSON payload, and the judge call is small.

Alternative model pricing


Model
Input $/1M
Output $/1M
Est. Full-Run Cost
Notes


gpt-4o-mini
$0.15
$0.60
~$66
Default. Good quality for extraction + judge.


claude-haiku
$0.25
$1.25
~$127
Slightly more expensive. Good alternative.


gpt-4o
$2.50
$10.00
~$1,100
Overkill for most entities.


claude-sonnet
$3.00
$15.00
~$1,500
Overkill for most entities.


Mixed-model strategy (recommended)

Use a cheaper model for judge (it's just scoring, not generating):


Role
Model
Cost Share


Battery + Discovery
gpt-4o-mini
~$55


Judge
gpt-4o-mini
~$11


Total

~$66


Or use a stronger model for extraction with cheap judge:


Role
Model
Cost Share


Battery + Discovery
gpt-4o
~$920


Judge
gpt-4o-mini
~$11


Total

~$931


Incremental re-runs

With --incremental, unchanged entities are skipped entirely (hash-based change detection). Only entities whose upstream data changed get re-extracted. In practice:

First run: Full cost (~$66 with gpt-4o-mini)
Subsequent runs: Cost proportional to % of entities that changed. If 5% of entities change, cost is ~$3.30.
Cache stores: entity hash + all pairs + judge scores. No LLM calls needed for cache hits.

Validation run data (2026-02-24)

Actual numbers from a 2-entity-per-domain run with gpt-4o-mini:


Domain
Entities
Pairs
Avg Pairs/Entity
Avg Judge Confidence


compute-resources
2
17
8.5
0.93


software-discovery
2
18
9.0
0.90


allocations
2
19
9.5
0.90


nsf-awards
2
18
9.0
0.88


affinity-groups
2
8
4.0
0.95


comparisons
—
3
—
—


Total
10
83
8.3
0.91


100% citation validity. All pairs scored suggested_decision: "approved".
Domain	Entities	Est. Pairs	Input Tokens	Output Tokens	Est. Cost
Compute Resources	23	~345	~230K	~92K	~$0.09
Software Discovery	1,404	~19,000	~14M	~5.6M	~$5.46
Affinity Groups	55	~670	~440K	~165K	~$0.17
Allocations	5,440	~79,000	~54M	~22M	~$21.30
NSF Awards	10,000+	~145,000	~100M	~40M	~$39.00
Total	~17K	~244K	~169M	~68M	~$66
Model	Input $/1M	Output $/1M	Est. Full-Run Cost	Notes
gpt-4o-mini	$0.15	$0.60	~$66	Default. Good quality for extraction + judge.
claude-haiku	$0.25	$1.25	~$127	Slightly more expensive. Good alternative.
gpt-4o	$2.50	$10.00	~$1,100	Overkill for most entities.
claude-sonnet	$3.00	$15.00	~$1,500	Overkill for most entities.
Role	Model	Cost Share
Battery + Discovery	gpt-4o-mini	~$55
Judge	gpt-4o-mini	~$11
Total		~$66
Role	Model	Cost Share
Battery + Discovery	gpt-4o	~$920
Judge	gpt-4o-mini	~$11
Total		~$931
Domain	Entities	Pairs	Avg Pairs/Entity	Avg Judge Confidence
compute-resources	2	17	8.5	0.93
software-discovery	2	18	9.0	0.90
allocations	2	19	9.5	0.90
nsf-awards	2	18	9.0	0.88
affinity-groups	2	8	4.0	0.95
comparisons	—	3	—	—
Total	10	83	8.3	0.91