Cross-Session Memory
Persistence.
Can facts written to Iranti in one process session be retrieved accurately in a later, entirely separate session — with no in-context knowledge of what was written? This is the core promise of durable memory infrastructure.
Results at a glance
The mechanism
Session 1 writes 20 facts via iranti_write. The KB is the only bridge between sessions. Session 2 opens with no knowledge of Session 1 — only iranti_query calls recover the facts.
Recall rate
How many of the 20 written facts were correctly recalled in Session 2? The baseline is 0% by definition — stateless LLMs have no memory between process invocations.
Definitional — a stateless LLM has no access to Session 1 data in Session 2. This is not an empirical measurement.
The evidence — 5 × 4 persistence grid
Five fictional researcher entities, each with four facts, written in Session 1 and retrieved in Session 2. Every cell below confirms a correct retrieval. The KB was the only persistence mechanism between sessions.
| Entity | S1 write | affiliation | pub_count | prev_employer | research_focus | S2 read |
|---|---|---|---|---|---|---|
| priya_nair | W | R | ||||
| james_osei | W | R | ||||
| yuki_tanaka | W | R | ||||
| fatima_al_rashid | W | R | ||||
| marco_deluca | W | R | ||||
| Total | 5 | 5/5 | 5/5 | 5/5 | 5/5 | 20/20 |
Each checkmark represents a fact correctly retrieved in Session 2 with no in-context knowledge of what was written. W = written (Session 1). R = retrieved (Session 2).
Ground truth entities
Five fictional researchers. All entity IDs and fact values are synthetic — fabricated specifically for this benchmark, not drawn from real individuals.
What this measures
Large language models are stateless. Every new API call or process invocation starts with a blank context window. Any facts a model encountered in a previous session are gone — unless they were written to durable external storage and re-injected into the next session.
This statefulness gap matters acutely for agents. An agent that remembers a user's preferences, builds a research profile over multiple calls, or tracks evolving project state — all of that continuity depends entirely on what is written to persistent storage between sessions.
B2 tests whether Iranti's KB is genuinely durable across distinct process invocations, not merely intra-session consistent. The test is intentionally simple: if anything breaks at this level (write, persist, retrieve across boundary), everything built on top of it fails.
This benchmark is not about recall precision under noise (that is B1). It is about the existence of the persistence guarantee — does the KB survive the session boundary?
Additional cross-session evidence
Beyond the synthetic benchmark, production writes from unrelated work confirm the same property.
Entity and fact data written during B1 benchmark execution. Retrieved correctly in B2 planning session the following day.
Cross-reference written during benchmark logging. Confirmed retrievable in separate process session.
These are incidental observations from live work, not controlled trials. They corroborate the benchmark result but do not replace it.
Threats to validity
Key properties confirmed
iranti_query responses include confidence, source, validFrom, and contested metadata. An LLM without memory returns none of this — only the value, with no audit trail.Full trial execution records, session logs, entity definitions, and methodology notes in the benchmarking repository.