DEALta

Stateful Review Orchestration for Multi-Team Workflows

Detects what changed in a document, routes review tasks to the right teams, and tracks which prior approvals are no longer safe to trust after later revisions.

Stateful Across Rounds Multi-Agent Orchestration Eval-First Langfuse Traced

Built from experience coordinating a supplier agreement across Legal, Finance, Commercial, Product/Tech, and Customer Support.

At a previous role, I coordinated a supplier agreement renegotiation across five business functions. By the third round, Finance had approved commission terms that v3 silently changed. Legal had signed off on liability terms that v3 made asymmetric. No single reviewer had visibility across all three broken approvals at once — and the contract was heading toward execution. The failure mode wasn't "nobody reviewed it." It was "everyone reviewed their part, and nobody saw the whole." DEALta is designed to catch that.

Recommendation

A supplier agreement under renegotiation

Three rounds of review. Five functions involved. Here's what changed in the final round — and why it matters.

The deal

Nexus, a travel platform, is finalising a supplier agreement with StayLink, an accommodation provider. The agreement governs commission structure, payment terms, liability, and content obligations.

In v2, five business functions reviewed the agreement: Legal (liability and governing law), Finance (commission and payment terms), Commercial (revenue structure), Product/Tech (integration obligations), and Customer Support (content accuracy SLAs). Legal, Finance, and Commercial each granted conditional approval — with specific terms that had to hold for their sign-off to remain valid. Product/Tech and Customer Support flagged concerns but did not formally sign off.

v3 arrived. Four clauses changed. Three of those conditions were broken.

Why it's hard

Finance approved commission terms assuming mutual control over review timing. v3 shifted that control unilaterally to StayLink.

Legal approved liability terms assuming a symmetric cap. v3 introduced an asymmetry tied to booking volumes.

No single reviewer had visibility across all three broken approvals at once. DEALta detects the breaks, routes them to the right functions, and escalates before anyone proceeds.

This is the coordination problem DEALta is designed to solve.

Orchestration trace

Six specialised agents, orchestrated via LangGraph. Each writes to shared typed state — decisions accumulate, nothing passes outside the graph.

Detected changes

Invalidated approvals

Sign-offs granted in v2. Re-evaluated against v3 changes — all three breached their stated conditions.

Policy violations

Compound risks

Risks that no single change creates alone — only visible when changes are analysed in combination.

Escalation queue

Items that require a human decision before the contract can proceed.

Required sign-off status

Sign-off status for functions with formal approvals recorded in v2. Product/Tech and Customer Support flagged concerns but did not formally sign off — see scenario for context.

How the system works

Contract v(n-1) + v(n) + Prior Approvals → 6 agents → escalation-ready decision pack

Change Detectiondetects clause deltas
Invalidationchecks stale approvals
Routingroutes team review
Policy Checkchecks policy rules
Dependencyfinds cross-clause risk
Decision Packassembles recommendation

Deterministic recommendation logic. LLM writes one paragraph. Everything structural is Python.

Change Detection
Reads both contract versions in full. Everything downstream depends on this output — no agent can run without it.
Invalidation
Matches changes against prior sign-off conditions. LLM judges whether natural-language conditions are breached.
Routing
Assigns each change to a business function. Needs change list only — runs before policy to keep routing logic clean.
Policy Check
Checks each change against rules.json. Needs change list only. Output feeds dependency analysis.
Dependency Check
Identifies compound risks across change combinations. Requires policy violations as input — can't run earlier.
Decision Pack
Python counts violations and sets ESCALATE/APPROVE/REJECT. LLM writes one narrative paragraph. Needs all prior outputs.

Evaluation + Observability

Ground truth written before agents, not after.

Evaluation

Stateful invalidation13/13
Change detection100%
Routing accuracy89%
Policy compliance100%
Compound risk detection2/2
Decision pack structure6/6
Narrative faithfulnessPASS (LLM-as-judge)

Ground truth written before agents, not after.

Observability

Langfuse pipeline waterfall

6 agent spans with per-step cost, latency, and I/O visibility

Pipeline audit record

Pipeline audit: inputs, outputs, metadata

Agent-level detail

Agent detail: prompt, response, tokens

$0.0017/run · 58s total · 6 traced spans · gpt-4o-mini as eval judge

Version comparison

What changed between the v2 review round and this v3 escalation.

Performance metrics

Token usage, latency, and estimated cost per agent. Full run on gpt-4o-mini.

Agent Time (s) % time Input tok Output tok Tok/s Est. cost % cost