Interrupted Session Recovery
Data always survives. Recovery depends on retrieval.
B12 tests whether an AI agent can recover its working state after a session interruption. An agent wrote 8 facts to Iranti during a session, which was then interrupted. A fresh session opened and recovery was tested four ways. The finding: write durability is solid — all 8 facts survive — but how many you get back depends entirely on how the new session retrieves them.
Results at a glance
What this measures
Long-running AI agent tasks are rarely uninterrupted. Sessions time out, contexts are reset, processes crash, or work is handed off to a different agent instance. When that happens, any working state that was not persisted is gone. Session recovery is the ability to reconstruct that working state — goals, progress, intermediate findings, open questions — in a fresh session, fast enough to continue without starting over.
B12 sets up a realistic scenario: an agent analyzing LLM multi-hop reasoning performance writes 8 facts to Iranti across the session — 4 describing the evaluation setup (high confidence) and 4 describing in-progress work (slightly lower confidence). The session is then interrupted. A new session opens, and recovery is tested four ways: no retrieval, handshake only, observe with a semantic hint, and explicit key-based query.
The benchmark isolates two distinct questions: first, did the data survive the session break at all (write durability)? Second, how much can a recovery session actually retrieve, and does the retrieval strategy matter? These turn out to have very different answers.
The four recovery modalities
Each modality represents a distinct retrieval strategy. Same underlying data in all four cases — the only variable is how the recovery session asks for it.
| Recovery method | Total | Setup (4) | Progress (4) |
|---|---|---|---|
| No Iranti | 0/8 | 0/4 | 0/4 |
| Handshake only | 0/8 | 0/4 | 0/4 |
| Observe + hint | 5/8 | 4/4 | 1/4 |
| Explicit query | 8/8 | 4/4 | 4/4 |
No Iranti and Handshake-only both score 0/8 for different reasons. No Iranti has no persistent storage at all. Handshake-only has the data but does not retrieve it — the handshake returns session metadata, not stored facts.
Session break and recovery flow
Session 1 writes 8 facts to Iranti, then is interrupted. The Iranti KB holds all 8 facts intact across the break. Session 2 opens fresh and attempts recovery via one of four modalities. The KB is the bridge — the question is only which retrieval path Session 2 uses.
The 8 facts: setup vs. progress
Setup facts describe the evaluation configuration — stable, high-confidence (95). Progress facts describe in-flight work — findings, next steps, open questions — written at confidence 90. The observe+hint column shows which facts were returned when the recovery session used iranti_observe with a semantic hint.
| Category | Key | Value | Conf. | Observe+hint |
|---|---|---|---|---|
| setup | evaluation_goal | compare GPT-4o, Claude 3.5, Gemini 1.5 Pro on multi-hop reasoning | 95 | |
| setup | dataset | HotpotQA bridge subset (questions 1–100) | 95 | |
| setup | models_under_test | GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro | 95 | |
| setup | primary_metric | exact_match on final answer | 95 | |
| progress | preliminary_finding | GPT-4o outperforming others on bridge questions by ~8% EM | 90 | |
| progress | next_step | run questions 81–100 to complete bridge subset | 90 | |
| progress | open_question | Whether bridge advantage holds on comparison questions (questions 101–200) | 90 | |
| progress | partial_result | questions 1–80 processed; GPT-4o EM=0.74 on bridge subset so far | 90 |
All 8 facts were confirmed present in the KB after the session break. The observe+hint column reflects what the recovery session actually received — 5 of 8, not 8 of 8.
Why progress facts are harder to recover
When iranti_observe is called with a semantic hint, it returns results ranked by confidence. Setup facts were written at confidence 95; progress facts at confidence 90. With a bounded result set, the higher-confidence setup facts rank first and fill the available slots. Progress facts are present in the KB but are ranked lower — they get crowded out before the result window closes.
This is not a data loss problem. Every fact is retrievable with an explicit query. It is a retrieval design problem: if you rely on observe to surface everything, confidence ranking will consistently deprioritize lower-confidence entries even when those entries represent the most critical in-progress work.
Practical recommendations
Write facts to Iranti as soon as they are established, not at the end of a session. If the session is interrupted before a final write, facts written mid-session survive. Facts written only at cleanup do not.
If you want iranti_observe to surface progress facts alongside setup facts, write them at the same confidence level. The 5-point gap (90 vs 95) was enough to produce a lopsided recovery result. When in-progress work is equally critical, mark it equally confident.
Before any long-running task, write a manifest fact that lists the entity IDs and key names the recovery session will need. Store the manifest at high confidence so observe surfaces it reliably. A recovery session that finds the manifest can then run explicit queries for everything else.
iranti_query with known entity IDs and key names gives 8/8 perfect recovery. If the recovery session has or can discover the entity IDs, prefer explicit query over observe for mission-critical state. Observe is useful for exploration; explicit query is the right tool for known recovery targets.
Honest limitations
Key findings
Full trial execution records, fact tables, per-modality recovery traces, and methodology notes in the benchmarking repository.