Dialogue · Cost analysis

What each scheme actually cost to run.

Four runs of the same scene with the same characters. Two used the lean Python coordinator path against the API directly (006a on Anthropic Sonnet 4.6, 007 on OpenRouter / DeepSeek V4 Pro). Two were orchestrated inside a Claude Code session (006b = subagent scheme, 006c = monolithic) — both ran on Opus 4.7, not the intended Sonnet 4.6, because the CC session's default model was Opus at the time. Two confounders to disentangle: the model billed and the architecture doing the orchestration.

01What we actually paid

Sources of truth: Anthropic console for 006a and the two CC runs; OpenRouter console for 007. The CC-run token counts are reconstructed from the local CC session JSONL telemetry (parent session + all spawned Task sub-agents) and priced at Anthropic Opus list rates.

Exp Scheme Model billed Turns / char Tokens (total) Cost
006a python claude-sonnet-4-6 15 / 14 ~2.50 M $0.80
006b subagent claude-opus-4-7 * 7 / 7 26.45 M $134.81
006c monolithic claude-opus-4-7 * 23 / 23 4.43 M $22.97
007 python deepseek/deepseek-v4-pro 9 / 8 ~0.97 M $0.79

* 006b and 006c were intended to run on claude-sonnet-4-6 but the CC session was configured for Opus when the runs executed. Each notes.md records the discrepancy.

02Normalised to Opus 4.7 — strips the model confounder, keeps architecture

Assume all four runs had been billed at Anthropic Opus 4.7 list rates. 006a scales up by 5.00× (Opus is uniformly 5.00× Sonnet across input, cache, and output buckets). 007 is re-costed against its OpenRouter token mix, assuming the same cache_control discipline lib/run_dialogue.py uses.

Exp Scheme Actual bill Cost at Opus 4.7 × vs 006a
006a python $0.80 $4.00 1.00×
006b subagent $134.81 $134.81 33.70×
006c monolithic $22.97 $22.97 5.74×
007 python $0.79 ~$5.65 ~1.41×

At a constant model, the two python-coordinator runs sit at a $4.00–$5.65 floor; the two CC-orchestrated runs sit 5.74× to 33.70× higher. The model isn't the source of the gap. The architecture is.

03Normalised to Sonnet 4.6 — what the study was supposed to cost

Same exercise at Sonnet rates instead of Opus. This is the figure that would have appeared on the bill if the CC sessions hadn't been mis-configured to Opus. Sonnet list: $3.00 input / $3.75 cache-write-5m / $6.00 cache-write-1h / $0.30 cache-read / $15.00 output, per M tokens.

Exp Scheme Actual bill Cost at Sonnet 4.6 × vs 006a
006a python $0.80 $0.80 1.00×
006b subagent $134.81 $26.96 33.70×
006c monolithic $22.97 $4.59 5.74×
007 python $0.79 ~$1.13 ~1.41×

The architecture ratios are unchanged from the Opus-normalised view (they must be — Opus is a uniform 5.00× multiplier across every bucket). What's new is the absolute floor: at Sonnet rates, even the worst-case architecture (subagent via CC) is $26.96 per scene-run. Expensive relative to the python path but not catastrophic if used selectively.

04The picture, at Sonnet rates

Log scale, because the four runs span from $0.80 to $26.96 — a 33.70× range.

$0.10 $1.00 $10.00 $100.00 Cost at Sonnet 4.6 (USD, log scale) $0.80 006a python · sonnet baseline ~$1.13 007 python · deepseek → norm. sonnet $4.59 006c monolithic · CC 5.74× $26.96 006b subagent · CC 33.70× PYTHON COORDINATOR CC ORCHESTRATION

The two architectural families are visually distinct on the log axis. The python coordinator runs cluster around $1.00; the CC-orchestrated runs cluster an order of magnitude above. Model choice (Sonnet vs DeepSeek vs Opus) moves cost by 1.41× to 5.00×. Architecture moves cost by 5.74× (monolithic) to 33.70× (subagent).

05Where the money went (CC runs)

Per-bucket decomposition at the actually-billed Opus rates. Both CC runs show the same shape: a small fraction is the actor work; most of the cost is the parent CC session doing tool-loop bookkeeping (reading docs, composing per-turn briefs, narrating routing decisions).

Bucket 006b tokens 006b cost 006c tokens 006c cost Rate ($/M)
Input (fresh) <0.01 M $0.00 <0.01 M $0.00 $15.00
Cache write — 5-min TTL 0.53 M $9.86 0.29 M $5.35 $18.75
Cache write — 1-hour TTL 1.27 M $38.17 0.23 M $6.78 $30.00
Cache read 23.97 M $35.96 3.85 M $5.77 $1.50
Output 0.68 M $50.82 0.07 M $5.07 $75.00
Total 26.45 M $134.81 4.43 M $22.97
006b output spend
$50.82
Opus output narrating routing decisions in the parent CC session — 63.53× the entire 006a bill on its own.
006b cache reads
23.97 Mtokens
The growing conversation context, re-read every turn. The per-turn loop compounds.
006c monolithic premium
28.71×vs 006a
Even without the routing loop, the CC session's playbook reads + parse step cost an order of magnitude over a lean coordinator.

06Practical takeaways

Methodology — how these figures were obtained

006a and 007: token counts and costs are pulled from the API providers' own console logs (Anthropic console for 006a, OpenRouter console for 007). Bills reported by the user.

006b and 006c: the Anthropic console has no record of these runs — they were executed by a Claude Code session, which records every API call's usage object (input_tokens, cache_creation, cache_read, output_tokens) in the session's local JSONL transcript at ~/.claude/projects/<project>/<session>.jsonl for parent calls and ~/.claude/projects/<project>/<session>/subagents/agent-*.jsonl for spawned Task sub-agents. Totals are summed across the parent + all sub-agents for each experiment session. Costs apply Anthropic Opus 4.7 list rates.

Normalisation: Opus normalisation multiplies a Sonnet bill by 5.00× (Anthropic prices Opus at uniformly 5.00× Sonnet across input, cache, and output buckets). Sonnet normalisation divides an Opus bill by 5.00×. 007's normalisation re-costs its OpenRouter token mix (~0.93 M input, ~0.04 M output) against Anthropic list rates assuming lib/run_dialogue.py-style cache_control on ~88% of input. The mid-point estimates ($5.65 at Opus, $1.13 at Sonnet) sit inside a range that reflects how much of the input would actually have been cache-eligible.