proof/b13
Benchmark B13

Runtime Upgrade Safety
Data written in v0.2.12 reads correctly in v0.2.16.

Memory systems are only useful if they can be trusted over time. An agent writes a fact on Monday running v0.2.12. The system is upgraded Wednesday. The agent reads that fact on Friday running v0.2.16. Is it still there? B13 tests exactly this: three incremental version upgrades, 5 cross-version probes, and a running research program as the most honest form of evidence.

Executed 2026-03-21Versions 0.2.12 → 0.2.14 → 0.2.16Cross-version durability

Results at a glance

4/5Cross-version reads confirmed (v0.2.12 → v0.2.16)
3/3Post-upgrade writes succeed immediately
1/1Conflict state preserved across all 3 versions
FindingIranti treats version upgrades as invisible to stored knowledge. Data written in v0.2.12 is readable in v0.2.16 without any migration step or warmup period.

What this measures

Every persistent system faces the upgrade problem. When storage formats, schema layouts, or serialization approaches change between versions, previously written data can become unreadable. For a memory infrastructure layer, this is a critical failure mode: agents accumulate knowledge over time and that knowledge must survive whatever version the operator chooses to run.

B13 tests upgrade safety across three incremental versions of Iranti: 0.2.12, 0.2.14, and 0.2.16. Data written in the earliest version is probed in the latest. Post-upgrade write capability is confirmed immediately after each upgrade. A conflict state created during the benchmark program is checked for integrity.

Beyond the targeted probe, the benchmark program itself is evidence: 11 benchmark tracks ran across all three versions, and agents naturally read data from earlier versions throughout. No upgrade ever caused a read failure during that program.

How we tested it

Two complementary approaches were used, each with a different signal quality.

Targeted probe

5 specific entity-key pairs were written in v0.2.12 and then explicitly queried in v0.2.16 after both intermediate upgrades completed. 4 of 5 returned correctly. The 1 miss was a probe design error — the query used a slightly different key name than what was written. The entity was present and readable with the correct key.

The program as evidence

More compelling than any synthetic probe: 11 benchmark tracks ran across all three versions. B6 was re-run in v0.2.14 and v0.2.16 reading entities first written in v0.2.12. B9 did the same. No upgrade caused a read failure. The absence of failures across a real research program running over weeks is the strongest possible evidence.

Version timeline

Data written at v0.2.12 flows intact through two incremental upgrades and remains readable at v0.2.16. Each upgrade step is a full version promotion, not a patch.

v0.2.12writev0.2.14upgradev0.2.16still readabledata written hereincremental upgradereads confirmed heredata integrity preserved across all 3 versions

Test results

Four upgrade safety properties measured across the v0.2.12 → v0.2.16 upgrade path.

TestResultNotePass
v0.2.12 facts in v0.2.164/5 reads correct1 miss = probe used wrong key name
Post-upgrade writes3/3Immediate, no warmup period
Conflict statePreservedNot resolved or altered by upgrade
API surfaceStableiranti_handshake returns same structure

The 4/5 cross-version read score reflects a probe design error, not a storage failure. The entity with the missed key was confirmed readable once the correct key was used.

The program as the strongest evidence

We built a research program on top of Iranti across 3 versions. Nothing broke. That is more meaningful than any synthetic test designed to pass.

Cross-version reads observed in the program
  • B6 re-run in v0.2.14: read entities originally written in v0.2.12. All reads succeeded.
  • B6 re-run in v0.2.16: same v0.2.12 entities, two upgrades later. All reads succeeded.
  • B9 re-run in v0.2.16: queried v0.2.12 entities for temporal consistency validation. No data loss detected.
  • 11 tracks across 3 versions: zero upgrade-related read failures recorded in the entire benchmark program log.

A synthetic probe can be designed to pass. A real research program running continuously across upgrades cannot. The absence of failures here is the kind of evidence operators should care about.

What this means in practice

Your data survives the upgrade

Agents that wrote facts in an earlier version will find those facts intact after upgrading. No migration step is required.

New writes work immediately

Post-upgrade write capability is confirmed within the same session. There is no warmup period before the system accepts new facts.

Conflict states are preserved exactly

The contested entity created during the benchmark program remained in its exact conflict state through all three versions. Upgrades do not resolve or alter contested knowledge.

Tools work the same way

The iranti_handshake tool returns the same structure in v0.2.16 as in v0.2.12. Agents do not need to adapt their tool calls across versions.

Honest limitations

LimitationNo cold restart test. The Iranti instance was live throughout the benchmark. A full cold restart — shutting the process down and starting a fresh one — after an upgrade was not tested. Data durability across a cold restart combined with an upgrade is a separate (and important) question.
LimitationIncremental upgrades only. Only sequential incremental upgrades were tested: 0.2.12 → 0.2.14 → 0.2.16, in order. Jumping versions (e.g., 0.2.12 → 0.2.16 directly) was not tested and the result is unknown.
LimitationNo adversarial scenarios. Upgrade safety was tested under normal operating conditions. Partial writes at the exact moment of upgrade, upgrade failures midway, or instance crashes during version transitions were not tested.
Limitation0.2.x series only. All tested versions are in the 0.2.x minor version series. This data cannot be used to make claims about major version boundary safety or the behavior of future 0.3.x or 1.0 transitions.

Key findings

FindingCross-version data integrity is solid. 4 of 5 entity-key pairs from v0.2.12 were readable in v0.2.16 without any migration. The 1 miss was a probe design error, not data loss. The entity was confirmed present with the correct key.
FindingWrite durability works across upgrades. 3 new facts written immediately after confirming v0.2.16 were all successful with no warmup period or state reconciliation step.
FindingAPI stability confirmed across 3 versions. The iranti_handshake tool returns the same structure in v0.2.16 as in v0.2.12. Agents do not need to change their tool usage patterns across minor version upgrades.
FindingThe benchmark program is the best evidence. Eleven benchmark tracks, three versions, weeks of research work accumulated on top of Iranti — no upgrade ever caused data loss. This is a stronger signal than any targeted synthetic probe.
Raw data

Full trial execution records, upgrade logs, probe results, and methodology notes in the benchmarking repository.