B14PASS

Context Economy

Iranti uses 37% fewer input tokens by turn 15 of a coding session compared to an agent that re-reads files on recall turns. Token counts are exact — measured via the Anthropic countTokens API, not char/4 estimates.

v0.3.112026-04-06claude-sonnet-4-615 turns

37%token saving at turn 15

3,272tokens saved (absolute)

Turn 8first divergence

15session turns measured

Token divergence over time

Cumulative input tokens at each turn. Both arms are identical through turn 7. The gap widens monotonically from turn 8 as recall turns accumulate.

Per-turn results

Full 15-turn breakdown. Tokens are cumulative input tokens at the start of each turn.

Turn	Phase	No memory	With Iranti	Saved
1	establishment	1,081	1,081	—
2	establishment	1,556	1,556	—
3	establishment	1,969	1,969	—
4	establishment	2,379	2,379	—
5	establishment	2,779	2,779	—
6	establishment	3,252	3,252	—
7	establishment	3,781	3,781	—
8	recall	4,220	3,980	6%
9	recall	4,730	4,163	12%
10	recall	5,236	4,355	17%
11	recall	5,802	4,542	22%
12	recall	6,256	4,769	24%
13	recall	6,711	4,981	26%
14	recall	8,043	5,362	33%
15	recall	8,949	5,677	37%

What this measures

Every multi-turn AI coding session accumulates tokens. When an agent needs a specific value from an earlier file — a config key, a function signature, a database schema — it either keeps the file in context (inflating token count every turn) or re-reads it (adding a full tool result to the window). Either way, context grows faster than it needs to.

Iranti's inject blocks are the alternative: instead of the full file (~300–600 tok), the agent receives a compact structured fact (~50–150 tok) with exactly the value it needed. The difference compounds across every recall turn.

NO_MEMORY arm

Agent re-reads the relevant source file on recall turns. A file read adds the full file content as a tool result to the messages array. All prior tool results also accumulate — the window grows with every turn.

WITH_IRANTI arm

Agent receives a compact inject block on recall turns. Iranti's identity-first retrieval returns only the needed fact. The inject block is ~50–150 tokens vs. ~300–600 for a file re-read.

How we measured it

The benchmark uses the Anthropic client.beta.messages.countTokens() API to get exact token counts for the full messages array at each turn — no generation, no sampling, no char/4 approximation. Both arms run concurrently via Promise.all() per turn for a fair comparison.

→7 synthetic TypeScript/SQL files covering a fictional auth system (~300–600 tok each)
→15-turn DebugAuth session: 7 establishment turns then 8 recall turns
→Establishment turns identical across both arms — no divergence until recall
→Recall turns: NO_MEMORY re-reads the relevant file; WITH_IRANTI uses a pre-computed v0.3.11 inject block
→Model: claude-sonnet-4-6 (token counts are model-specific)
→Context window: 200k tokens — session reaches 4.5% (NO_MEMORY) vs 2.8% (WITH_IRANTI) by turn 15

LimitationAgent re-read behavior is scripted (deterministic). In real sessions, recall frequency varies. This benchmark represents a moderate-recall pattern — sessions with more recalls will see larger savings; sessions with fewer will see less.

Finding37% fewer tokens at turn 15 — exact, not estimated. Context window usage: 2.8% (WITH_IRANTI) vs 4.5% (NO_MEMORY) of a 200k window. The saving is not a one-time gain; it compounds across every recall turn as the gap widens monotonically from turn 8.

← B13: Upgrade continuity All benchmarks →