C3 — Conflict Resolution

Facts change. Memory systems must keep up.

10 fact pairs, each written twice: v1 (original value) then v2 (updated value). The correct answer is always v2. Four verdicts are possible: v2-only (correct), both v1+v2 (mixed), v1-only (stale), or no match (miss). Scoring: any response containing v2 passes.

100%
Iranti
9 v21 both
100%*
Shodh
10 both
80%
Mem0
7 v21 both2 miss
40%
Graphiti
2 v22 both2 stale4 miss

Test design

Each conflict pair covers a real-world configuration update: budget approvals, rate limit changes, timeout extensions, capacity scaling, and compliance-driven policy changes. The values are structurally simple (one numeric field changes) so there is no ambiguity about what the correct answer is.

Write sequence: v1 is written first. v2 is written second. For Graphiti, v1 is timestamped one hour before v2 to provide temporal ordering context. For all other systems, writes happen sequentially in the same session.

Each namespace is isolated to one conflict pair — the same isolation strategy as C1. This ensures the conflict verdict reflects only the pair in question, not cross-contamination from other facts.

Verdict definitions

v2 ✓

Response contains v2 value and does not contain v1 value. Clean replacement.

both

Response contains both v1 and v2. Caller receives contradictory context and must disambiguate.

stale

Response contains v1 value only. System returned outdated information.

miss

Response contains neither v1 nor v2. System failed to retrieve any relevant context.

Scoring:
v2 → PASS · both → PASS (v2 present)
v1 → FAIL · none → FAIL

Per-conflict results

v1 → v2 transition for each conflict pair. Correct answer is always the v2 value.

IDChangev1 → v2IrantiShodhMem0Graphiti
C01Project budget$50,000$75,000v2 ✓bothv2 ✓both
C02API write rate limit60 rpm100 rpmv2 ✓bothv2 ✓v2 ✓
C03Max file upload size10 MB25 MBv2 ✓bothv2 ✓miss
C04Redis cache TTL900s1800sv2 ✓bothv2 ✓stale
C05JWT token expiry3600s7200sv2 ✓bothv2 ✓miss
C06Background workers4 procs8 procsv2 ✓bothmissmiss
C07Log rotation7 days14 daysv2 ✓bothmissstale
C08PostgreSQL max connections2050v2 ✓bothv2 ✓miss
C09Webhook max retries35v2 ✓bothv2 ✓miss
C10Webhook timeout15000ms30000msbothbothv2 ✓v2 ✓

System behavior analysis

Iranti100% — 9 v2, 1 both

Iranti uses entity+key addressing. When v2 is written to the same entity and key as v1, the write deterministically replaces the stored value. There is no accumulation — the old value is overwritten at the storage level.

9 of 10 pairs return v2-only. C10 (webhook timeout) returns both — this is the one case where the Iranti LLM arbitration layer was invoked on a same-entity, same-key update with a small confidence delta, which triggered accumulation rather than replacement. This maps to the B5 regression: conservative arbitration on close-gap updates.

Shodh100%* — 0 v2, 10 both

Shodh scores 100% because v2 is present in every response. But it never actually replaces v1 — it accumulates. Every query for a conflict pair returns both the original and the updated value, regardless of write order.

From the caller's perspective, the response contains contradictory information and the caller must apply their own disambiguation logic. For configuration-critical facts (e.g., "what is the current rate limit?") this means the LLM consuming the context has to choose between two values with no signal about which is authoritative.

Mem080% — 7 v2, 1 both, 2 miss

Mem0 handles 7 conflicts cleanly with v2-only returns and 1 with both values. The 2 misses (C06: workers, C07: log rotation) returned neither v1 nor v2 — vector similarity did not surface either version of the fact.

Mem0 uses semantic deduplication on write — when v2 is semantically similar to v1, it may update or replace the stored representation. When the similarity is above threshold, v1 is replaced. When it falls below, both are stored. The 2 misses are facts where neither version was returned, possibly due to collection indexing latency between writes.

Graphiti40% — 2 v2, 2 both, 2 stale, 4 miss

Graphiti uses temporal ordering for conflict resolution — v1 is timestamped at t-1h, v2 at t-0. Despite this, the results show 2 stale returns (v1 value wins), 2 both (both values surfaced), and 4 complete misses.

The core issue is the same as C1: entity extraction rephrases fact content into edge facts during ingestion. When v2 is extracted, if the numeric value is lost during extraction, the edge fact for v2 no longer contains the answer. The temporal ordering of the episode is correct — the extracted edge fact content is the problem.

Verdict distribution

Iranti
v2 only9/10
both1/10
Shodh
both10/10
Mem0
v2 only7/10
both1/10
miss2/10
Graphiti
v2 only2/10
both2/10
stale2/10
miss4/10

Key findings

01

Iranti uses entity+key addressing — v2 write deterministically replaces v1 at the same key. 9/10 clean v2-only returns.

02

Shodh scores 100% technically but returns BOTH old and new values on every query — the caller must disambiguate.

03

Mem0 misses 2 conflicts entirely (none verdict) — semantic similarity surfaces neither v1 nor v2 on those queries.

04

Graphiti shows 2 stale returns (returns old v1 value) and 4 total misses — temporal ordering only partially helps.