Conflict Resolution
4/5 conditions resolved correctly.
Write ground truth facts at varying confidence levels, then inject adversarial (wrong) values at varying confidence levels. Query final KB state and compare retained value to ground truth. The score formula weights confidence against source reliability. Gap ≥ 10 points triggers deterministic rejection. Gap < 10 routes to LLM arbitration.
Results at a glance
conditions resolved correctly
The weighted score formula
Iranti does not use raw confidence values directly. It computes a weighted score that combines confidence with source reliability. This prevents high-confidence writes from low-reliability sources from automatically winning conflicts.
Default source reliability = 0.5, which simplifies to weighted = conf × 0.85. When reliability is explicitly set (as in C5), the full formula applies.
Resolution path
Every conflict is routed through a deterministic decision tree before any LLM is invoked. LLM arbitration is a last resort for genuinely ambiguous cases — not the default.
All 5 conditions
Each card shows the ground truth value vs the adversarial injection, their confidence scores, the resolution path taken, and whether the correct value was retained. Bars animate on scroll. C2 is visually distinguished — it is not hidden, but it is presented in context.
High-confidence first write blocks lower-confidence correction. By design — the system has no way to distinguish correction from adversarial injection.
LLM cited "more established source, minimal confidence difference".
Raw adversarial confidence is higher (80 vs 70). LLM read source names ("b3_trusted_reviewer" vs "b3_low_reliability") as semantic signals.
The most important finding from this benchmark.
C2 is not a bug. It is a direct consequence of how Iranti's conflict resolution is designed — and it has real operational implications that every developer deploying Iranti in write-heavy scenarios needs to understand before going to production.
The system has no way to know whether the incoming write is an adversarial injection or a legitimate correction. Both look identical at the protocol level. It can only compare scores. If the first write was wrong, it locks out the truth.
The LLM read source names as semantic signals.
C5 is the most interesting condition in this benchmark — and it exposes a behavior that is simultaneously a capability and a risk.
The LLM received both candidate facts with their source identifiers. It read "b3_trusted_reviewer" as a semantic signal of trustworthiness, and "b3_low_reliability" as a signal of low trust — and correctly preserved Columbia, despite NYU having the higher raw confidence score.
Threats to validity
Full trial execution records, confidence scoring logs, LLM arbitration transcripts, and condition definitions in the benchmarking repository.