2026-04-06benchmarkstoken efficiencyB14

Why Iranti uses 37% fewer tokens in long coding sessions

We measured cumulative input token usage over a 15-turn coding session with and without Iranti. By turn 15, the Iranti arm uses 37% fewer tokens. Here's exactly how we measured it and why the gap grows over time.

The problem

AI coding sessions accumulate tokens. Every file read, every tool call, every assistant response — it all stays in the context window. When a coding agent needs a value from earlier in the session (a config key, a function signature, an environment variable), it either reads the file again or loses it to context compression.

Neither is great. Re-reading adds the full file to the window — 300 to 600 tokens for a typical TypeScript module. Losing it means the agent has to re-establish context from scratch. In a 15-turn session with 8 recall turns, the compounding effect is significant.

The measurement

We designed benchmark B14 (Context Economy) to measure this directly. The setup is a scripted 15-turn “DebugAuth” session — a fictional engineer debugging a JWT authentication system across 7 TypeScript and SQL files.

Two arms run the same session simultaneously:

NO_MEMORYThe agent re-reads the relevant source file when it needs a specific value. This is realistic — an agent without structured memory either keeps files in context or must re-read them.
WITH_IRANTIThe agent receives a compact Iranti inject block on recall turns — a structured fact containing exactly the value it needs, formatted in the v0.3.11 compact format.

Token counts are exact. We used the Anthropic client.beta.messages.countTokens() API — this returns the actual token count the model would see, with no approximation. Both arms run concurrently via Promise.all() per turn for a fair baseline.

The results

Turns 1–7 are establishment turns — both arms read the same files, build the same context. Token counts are identical. The divergence begins at turn 8, the first recall turn, and compounds from there:

Turn	NO_MEMORY	WITH_IRANTI	Saved
7	establishment	3,781	3,781	—
8	recall	4,220	3,980	6%
10	recall	5,236	4,355	17%
12	recall	6,256	4,769	24%
14	recall	8,043	5,362	33%
15	recall	8,949	5,677	37%

By turn 15, the session uses 8,949 tokens without Iranti and 5,677 with it — a saving of 3,272 tokens, or 37%. Context window usage at that point: 4.5% (NO_MEMORY) vs 2.8% (WITH_IRANTI) of a 200k window.

The curve is monotonically increasing. Each recall turn adds more separation because the NO_MEMORY arm accumulates one additional full file read while the WITH_IRANTI arm adds only the compact inject block.

Why the inject block is smaller

The token difference comes from the format of what gets added to the context window on recall turns.

A file re-read via the Read tool adds the full file content as a tool result. For a typical TypeScript auth module, that's 300–600 tokens of function signatures, imports, and comments — most of which is irrelevant to the specific value the agent needed.

An Iranti inject block contains only the fact. In v0.3.11's compact format, a single fact (entity key, summary, one structured value) takes 50–150 tokens. The identity-first retrieval means the agent receives exactly what it asked for, not the file it was in.

Caveats

The benchmark uses scripted, deterministic sessions. Real sessions vary in recall frequency. A session with heavy file I/O and low recall will show less divergence; a session with frequent lookups of previously-read values will show more.

The 37% figure represents a moderate-recall pattern (8 recall turns in 15). We treat this as a representative baseline, not a best-case scenario.

FindingStructured memory retrieval reduces context window pressure in proportion to recall frequency. In a typical 15-turn coding session, the saving reaches 37% — exact, not estimated. The mechanism is simple: compact inject blocks replace full file re-reads.

View the full B14 benchmark results →

Comments