1. System topology
Three independent services in one tailnet, all on the same Hetzner host. Each has its own Postgres database. They communicate over HTTP.
| Component | Tech | Port | Public |
|---|---|---|---|
| udcalc calculator | Vite/React + PostgREST | form.unfair-dismissal.uk | yes |
| et-agent frontend | Next.js + copilotkit | etagent.andyslab.uk | yes |
| et-agent API | FastAPI/uvicorn | :8000 (tailnet) → caddy | via caddy |
| et-pipeline API | FastAPI | :8055 (loopback) + et-pipeline.andyslab.uk | API-key gated |
| et-pipeline workers | asyncio | — | internal |
2. Models in use
| Role | Model | Provider | Why |
|---|---|---|---|
| Drafter (et1, demand, lbc, sar, grievance, settlement) | claude-sonnet-4-5 | Anthropic direct | High-stakes legal prose. Long-context, citation discipline. |
| Critic (compliance, opposition, tribunal, tone, costs-trail, ACAS) | claude-haiku-4-5 | Anthropic direct | Critics run 3–4× per draft; need to be cheap. |
| Scorer (rubric-based 0–100) | claude-haiku-4-5 | Anthropic direct | Rubric eval is pattern-match; Haiku is enough. |
| Extractor (document_classifier, claim_extractor, inbound_*) | claude-haiku-4-5 | Anthropic direct | Structured JSON from semi-structured text. |
| Verifier (fact_checker, consistency, legal_authority, red_team_accuracy_judge) | claude-haiku-4-5 | Anthropic direct | Cross-references citations, dates, numbers; not generative. |
| Red-team simulator (strike_out, worst_case_award, opposing_counsel) | claude-sonnet-4-5 | Anthropic direct | Adversarial reasoning needs depth. |
| Case agent (router + chat) | claude-haiku-4-5 | Anthropic direct | Tool-calling + claimant-facing chat; speed matters. |
| Corpus embeddings (et-pipeline) | voyage-4-large (1024-dim) | Voyage AI | Best legal-text recall in current benchmarks. |
| Corpus reranker | voyage-rerank-2.5 | Voyage AI | Boosts precision on top-k from HNSW. |
| Corpus LLM extraction (legal field JSON) | gemini-2.0-flash | concentrate.ai | $0.0006/row for 164k judgments; bake-off winner vs Haiku at 12× cheaper. |
| Test-case generator (our new tier) | claude-haiku-4-5, gemini-2.5-flash, gpt-5-mini | concentrate.ai | Three families → narrative variance per seed lead. |
et_agent/config.py; production uses Anthropic
direct, but each prompt-call goes through a thin wrapper that records token
counts + cost into case_llm_spend.3. Ingestion: udcalc lead → et-agent case
A claimant fills the udcalc unfair-dismissal calculator. On submit, the
lead lands in udcalc.leads (a separate database). When the
claimant elects to engage et-agent for representation, a handoff is created.
Step-by-step
- Calculator submission — frontend POSTs to PostgREST
/rest/v1/leads. Row created inudcalc.leadswith structured fields (dates, pay, claim_types JSONB, claim_details JSONB, evidence_scores JSONB). - Email handoff — claimant clicks "speak to a lawyer"
→ frontend POSTs to
/functions/v1/sa-submit-lead(Deno). The Deno function calls Attio (CRM) and then et-agent's/intake/handoffendpoint. - et-agent:
POST /intake/handoffcreates rows in:udcalc_handoffs— raw payload + claim_types snapshotcases— new case row, state=pre_engagementclaims— one row per claim_type,is_lead_claimset on the most-valuable one (heuristic: discrimination > whistleblowing > unfair dismissal)
- Engagement letter — operator (or claimant via portal)
signs an engagement letter. State →
engaged. A row inengagement_lettersstores the signed PDF link + version. - State →
evidence_gathering— triggers the intake_completion ensemble.
Where ingestion talks to which DB
// HTTP udcalc.uk frontend → PostgREST :5435 → udcalc DB (leads, sa_leads) udcalc.uk frontend → Deno :8787 → udcalc DB (sa_submit-lead) → Attio CRM (deals/contacts) → et-agent :8000 (handoff) et-agent /intake/handoff → asyncpg → et_agent DB (cases, claims, udcalc_handoffs)
4. Intake completion ensemble
Once a case is in evidence_gathering, three triggers fire
intake_orchestrator.run_intake_completion(case_id):
- Engagement signed (one-shot).
- Claimant sends a chat message (copilotkit
on_complete). - Claimant uploads/links a document (
/evidence/{id}/link).
The orchestrator throttles to one full run per minute per case, then
loads the snapshot + history and fires the intake_completion
ensemble.
Ensemble: 4 prompts in sequence
Inputs / outputs / DB touches
| Prompt | Input | Output | Writes to DB |
|---|---|---|---|
intake_gap_analyzer |
snapshot, claim_types, fact-contracts table | {gaps: [{field, status, claim_type}]} |
— |
intake_readiness_gate |
gaps + rules | {readiness: READY|NOT_READY, blockers: [...]} |
cases.udcalc_snapshot.intake_status |
intake_question_planner |
gaps, asked_already_json, last 3 messages | {questions: [{text, target_field, why}]} |
cases.udcalc_snapshot.questions_asked append |
intake_answer_integrator |
last claimant message, snapshot, target_field | {updates: {field: value, ...}, confidence} |
cases.udcalc_snapshot.gathered_facts merge |
5. Document handling
Claimants upload PDFs (dismissal letters, grievance correspondence, contracts, payslips, medical letters). Each upload runs:
POST /documentsstores the file in object storage (R2), creates adocumentsrow, status=uploaded.- Text extraction — pdfplumber over the bytes. The
extracted text is stored as
documents.raw_text. - Classification —
document_classifierprompt haiku-4-5 returns{kind: dismissal_letter|grievance|appeal_outcome|contract|payslip|..., confidence}. Writesdocuments.kind. - Kind-specific extraction:
document_extractorfor generic — pulls{parties, dates, key_facts, references}grievance_outcome_extractor— structured outcome JSON (upheld, partially_upheld, rejected, with reasons + actions_offered)lbc_response_extractor— employer's reply to the LBC, extracts{stance, amount_gbp, conditions, time_to_respond}sar_response_extractor— confirms compliance with the SAR + any redactions challengedinbound_classifierfor emails → then one ofinbound_acknowledgement_extractor,inbound_counter_offer_extractor, orinbound_info_request_extractor
- Evidence linking — operator (or autonomous rule) links
a document to one or more
evidence_itemsagainst specific pleaded claims. The link triggers a re-run of intake_completion.
All extractor prompts return strict JSON. Anthropic's tool_use
mode with a Pydantic-typed schema is the wire-level format.
6. Artefact production pipeline
This is the core loop. It runs for six artefact types — defined in
src/et_agent/artefact_configs/*.yaml:
grievance— claimant's pre-action grievance lettersar— Subject Access Requestdemand_letter— pre-LBC demandlbc— Letter Before Claimsettlement_letter— Calderbank / settlement proposalet1— Employment Tribunal claim form
The full pipeline (et1 example)
The whole pipeline is orchestrated by
services/produce_artefact.py::produce_artefact().
Sampling strategies (per artefact)
| Artefact | Sampling | Verifier gate | Red-team gate |
|---|---|---|---|
| et1 | parallel_n_pick_best (n=3) | standard | standard |
| lbc | single_shot | standard | standard |
| demand_letter | single_shot | none | none |
| settlement_letter | single_shot | standard | none |
| grievance | single_shot | standard | none |
| sar | single_shot | none | none |
7. Critics and scorer
Critics are not gatekeepers; they are peer reviewers. Each runs in parallel against the same draft + same snapshot, returning structured findings. The orchestrator then decides:
- Any
criticalissue → reject, re-draft with critique_block injected. - ≥2
majorissues → reject. - 1
major→ soft pass, operator review. - Only
minorissues → pass.
Critic role catalogue
| Role | What it checks | Used by |
|---|---|---|
| compliance | ACAS Code paragraphs cited correctly, statutory cap, time limits, format requirements. | et1, demand, settlement |
| opposition | Plays employer's solicitor: where would they attack? Hopeless heads, time-limit gaps, missing particulars. | et1, demand, lbc, settlement |
| tribunal | Plays the EJ: clarity of pleading, prospect of strike-out applications surviving. | et1 |
| negotiation | Demand anchored well? Concessions visible? BATNA implied? | demand, settlement |
| costs-trail | Does the document set up a costs argument under r.76 / Cal? Records breaches? | lbc |
| ACAS-compliance | Specifically: ACAS Code paragraph mapping for each procedural breach. | grievance |
| tone | Calm, factual, non-catastrophising. No invective. | grievance |
| Calderbank | Without-prejudice flagging, costs-shifting language correct. | settlement |
| strategic | Reads the offered amount; recommends accept / counter / reject. | settlement_response |
Scorer
Each scorer prompt has a paired _user template that bundles
the draft + a per-artefact rubric (loaded from JSON). The rubric defines
dimensions (e.g. for et1: pleading_clarity, statutory_citation,
factual_completeness, prospect_of_strike_out, prospects_of_success)
with weights. Output is JSON: {overall_score: int, dimensions:
{name: {score, evidence_quote, fix}}, overall_summary}.
8. Verification gate
Three Haiku prompts run after critics pass but before red-team. All read the draft + the case_snapshot + corpus blocks. They DO NOT have the temperature to generate; they cross-reference.
| Prompt | Checks | Pass criteria |
|---|---|---|
fact_checker |
Every factual statement in the draft (date, name, pay figure, event) must trace to case_snapshot OR a referenced document in evidence_items. |
Zero unverified factual claims. |
consistency_checker |
Internal contradictions: dates that disagree, parties named differently, money figures that don't reconcile across sections. | Zero internal contradictions. |
legal_authority_checker |
Every case citation must appear in comparable_cases_block; every statute citation must appear in relevant_statutes_block; no fabricated paragraph numbers in ACAS quotations. |
Zero fabricated cites. |
Verifier failures block the artefact and trigger one re-draft with the verifier's findings injected as a critique. Two consecutive verifier failures → operator alert.
All verifier runs persist to verification table for audit.
9. Red-team gate
Only ET1 currently runs the full red-team. Four prompts:
strike_out_simulatorsonnet-4-5 — Plays a respondent's solicitor. Reads the draft + comparable_cases_block + foundational_law_block. Output:{vectors: [{ground, severity, predicted_outcome, supporting_authority}]}. Severities: high (likely to succeed), medium, low.worst_case_award_calculatorsonnet-4-5 — Calculates the defensible floor award given the pleaded facts. Considers Polkey reductions, contributory fault, mitigation. Output:{floor_gbp, ceiling_gbp, rationale, deductions: [...]}.opposing_counsel_respondersonnet-4-5 — Drafts the response we expect from a competent employer solicitor. Used to find weaknesses we haven't pleaded around.red_team_accuracy_judgehaiku-4-5 — Evaluates whether the red-team predictions were credible (e.g. "you predicted vector X but cited authority Y which actually doesn't support it"). Filters noise.
If strike_out_simulator returns ≥1 high-severity
vector, the artefact is blocked and the drafter re-runs with the vectors
in the critique_block. Stored in red_team_predictions.
10. et-pipeline corpus / RAG
et-pipeline is a separate project at /srv/active/et-pipeline
that owns the UK employment-law corpus. It is the only system
allowed to write to the et_pipeline database.
Corpus baseline (2026-05-13)
| Source | Embedded rows | Notes |
|---|---|---|
| bailii | 9,144 | Higher courts; full LLM extraction |
| findcaselaw | 24,848 | SC/CA/EAT/UT; full LLM extraction |
| govuk (ET first instance) | 130,358 | 37,558 with LLM extraction, 92,800 raw-only |
| statute_sections | 1,273 | ERA 96, EqA 10, TULRCA 92, ER99, WPA23, WTR98 |
| citations | 333,000+ | Citation graph between judgments |
State machine (ingestion)
Govuk ET first-instance bypasses the LLM stage (raw_text → embed) — first-instance is persuasive only, semantic search on facts is sufficient. Higher courts get the full extraction because their reasoning + cited cases feed the citation graph.
Retrieval API (consumed by et-agent)
| Endpoint | Method | Input | Output |
|---|---|---|---|
/search/cases | POST | {query, limit, courts?, binding_share?, jurisdiction_codes?} | {results: [{id, case_name, court, decision_date, neutral_citation, source_url, extracted: {facts_summary, outcome, parties, ...}, cosine_score}]} |
/search/cases/reranked | POST | same + reranker run | same + rerank_score, pre_rerank_position |
/search/statutes | POST | {query, limit} | {results: [{id, act_id, act_title, section_number, section_title, body, score}]} |
/judgments/{id} | GET | — | full row inc. raw_text + extracted JSONB |
/judgments/{id}/cites | GET | — | citation graph outbound |
/judgments/{id}/cited-by | GET | — | citation graph inbound |
/statutes/by-act/{act}/section/{n} | GET | — | exact statute section (used by foundational_law) |
/health | GET | — | row counts |
Tiered search (binding precedent ↑)
/search/cases splits its top-k candidates 60/40 between
binding precedent (SC/HL/CA/EAT/UT/HC) and first-instance (ET). The binding
pool feeds from a partial HNSW index
idx_judgments_embedding_facts_binding over
WHERE state='embedded' AND court IN ('SC','HL','CA','EAT','UT','HC').
Without this index, the 130k ET rows dominate post-filter and only ~9
binding cases survive per query.
The API's IN clause MUST literally match the partial index predicate
(no parameterised arrays — planner can't prove subset). Constants live in
retrieval/api.py::_BINDING_IN_CLAUSE.
11. Voyage embeddings — chunking, shape, storage
Two embeddings per judgment
| Column | Source text | Dim | Purpose |
|---|---|---|---|
embedding_full |
First 14,000 chars of raw_text (≈3,500 tokens; voyage-4-large cap is 16k/doc). |
1024 | Full-text semantic recall. |
embedding_facts |
Compact text: case_name + extracted.facts_summary + outcome + legal_issues. For govuk raw-only rows, falls back to case_name + first 1000 chars of raw_text. |
1024 | Fact-pattern matching; used by the rerank pipeline. This is the index queried by /search/cases. |
Chunking
No chunking. One row → two embeddings → one document in the result set. Voyage-4-large's 16k token window accommodates ~30-page judgments; longer ones are truncated to the first 14k chars (which captures the head facts, issues, and most of the reasoning).
For statute sections: each section_number is its own row in
statute_sections. No further chunking; sections are already
the natural retrieval unit.
Indexing
CREATE INDEX idx_judgments_embedding_facts ON judgments USING hnsw (embedding_facts vector_cosine_ops) WHERE state = 'embedded'; -- Partial: binding-precedent only (130x faster recall on binding subset) CREATE INDEX idx_judgments_embedding_facts_binding ON judgments USING hnsw (embedding_facts vector_cosine_ops) WHERE state = 'embedded' AND court IN ('SC','HL','CA','EAT','UT','HC'); CREATE INDEX idx_statute_sections_embedding ON statute_sections USING hnsw (embedding vector_cosine_ops);
Retrieval flow
_HNSW_EF_SEARCH = 1000 is set SET LOCAL per
query for cases where the partial index can't be used (e.g.
jurisdiction_codes filter).
12. Testing scaffold (what we just built)
The system above had no objective accuracy measurement. This new scaffold gives us one.
Generation cost ledger
| Model | Successful cases | Failures |
|---|---|---|
| claude-haiku-4-5 | 137 | 1 |
| gemini-2.5-flash | 88 | 50 (JSON escaping in noisy variant) |
| gpt-5-mini | 117 | 21 |
| Total | 342 | 72 (17%) |
Total cost: $7.47 across 414 attempted, ~$0.018/case.
Next step (not built yet)
Eval / grader: read tests/runs/<prompt>/*.json + the
corresponding tests/fixtures/cases/*.json::ground_truth →
score each output against the labels → produce a per-prompt accuracy report
(precision/recall on claim_types extracted, exact-match on dates,
conflict-detection rate for fact_checker, citation grounding rate for
legal_authority_checker, etc.).
13. Database tables
et_agent (25 tables)
| Table | Purpose |
|---|---|
cases | One row per claimant case. udcalc_snapshot JSONB carries intake state. |
claims | One row per pleaded claim_type, FK to case. is_lead_claim flag. |
clients | Claimant contact + KYC metadata. |
engagement_letters | Signed engagement PDFs + version. |
udcalc_handoffs | Raw payload from udcalc form on case creation. |
messages | Chat messages claimant ↔ agent. |
communications | Outbound + inbound emails / posts. Links to documents. |
documents | Uploaded files. kind populated by document_classifier; raw_text by pdfplumber. |
evidence_items | Logical evidence (an event, a fact). Linked to documents + claims. |
drafts | Every artefact draft + version. JSONB body, score, model used, cost. |
ensemble_runs | One row per produce_artefact() invocation. Status, total cost, duration. |
ensemble_run_steps | Per-prompt step inside a run: drafter, each critic, scorer, verifier, red-team. Input/output JSON, model, tokens, latency. |
verification | Verifier prompt outputs (fact-check, consistency, legal-authority). |
red_team_predictions | Strike-out vectors + worst-case awards + opposing counsel drafts. |
correspondence_chain | Threaded inbound/outbound correspondence per case. |
case_events | State transitions + operator actions audit log. |
case_state_log | Append-only state machine ledger. |
case_llm_spend | Per-case cost ledger (rolled up from ensemble_run_steps). |
corpus_queries | Every retrieval call to et-pipeline + how the results were used downstream. |
deadlines | Time limits + key dates (ACAS Early Conciliation, ET1 deadline, hearing). |
operator_actions | Manual operator overrides (forced state transitions, soft-pass approvals). |
admin_impersonations | Audit log when operator impersonates claimant in chat. |
model_pricing | Per-model input/output $/M token table (synced from provider docs). |
account / user / session | BetterAuth standard tables. |
et_pipeline (6 tables)
| Table | Purpose |
|---|---|
judgments | 164k rows. raw_text, extracted JSONB, embedding_full + embedding_facts vectors, state machine column. |
statute_sections | 1.3k rows. Each ERA 96 / EqA 10 / etc. section. embedding vector. |
citations | Citation graph. (citing_id, cited_id, context). |
employment_rates | Statutory caps + Vento bands by effective date. Hand-curated. |
pipeline_runs | Ingestion runs telemetry. |
14. Data formats
CaseSnapshot (canonical case shape)
Defined in et_agent/domain/case_state.py. This is what
every drafter / critic / scorer sees as case_file_json.
class CaseSnapshot(BaseModel): case_id: str case_reference: str state: str claimant_name: str | None employer_name: str | None employment_status: str | None # dismissed | constructively_dismissed | ... employment_start_date: date | None employment_end_date: date | None weekly_pay_gross: float | None annual_pay_gross: float | None age_at_termination: int | None country: str | None # england-wales | scotland | northern-ireland udcalc_narrative: str | None # claimant's own story formula_compensation_low: float | None formula_compensation_high: float | None claims: list[ClaimSummary] # claim_type, is_lead, value range class ClaimSummary(BaseModel): claim_type: str is_lead_claim: bool estimated_value_low: float | None estimated_value_high: float | None
Prompt frontmatter
--- name: et1_drafter_user description: User message template for the ET1 drafter role: user expected_variables: - case_file_json - comparable_cases_block - relevant_statutes_block - current_rates_block - critique_block tags: - et1 - drafter version_notes: | Pure data envelope — tune wording sparingly; substantive guidance lives in et1_drafter.md. --- CASE FILE: $case_file_json COMPARABLE CASES (cite only from this list): $comparable_cases_block RELEVANT STATUTE SECTIONS: $relevant_statutes_block $current_rates_block $critique_block
Comparable cases block (rendered)
<comparable_cases>
<case id="findcaselaw-eat-1234" court="EAT" date="2023-04-12">
<name>Smith v Acme Ltd UKEAT/0123/23</name>
<facts>Claimant dismissed after raising health-and-safety
concerns. EAT held that the protected disclosure
principle applied even though the disclosure was made
informally...</facts>
</case>
...
</comparable_cases>
Test case fixture (what we just generated)
{
"meta": {
"id": "claudehaiku45_lead_0543_v1",
"source_lead_id": "<udcalc.leads.id uuid>",
"generator_model": "claude-haiku-4-5",
"generator_variant": 1,
"cost_usd": 0.0298
},
"case_snapshot": { ...full CaseSnapshot dict... },
"narrative": "I worked for Roslyn Care Foundation for nearly 28 years...",
"documents": {
"dismissal_letter": "Roslyn Care Foundation\n14 Ashford Lane...",
"grievance_letter": "...",
"grievance_outcome_letter": "...",
"appeal_letter": "...",
"appeal_outcome_letter": "...",
"inbound_messages": [
{"kind": "lbc_response", "from": "employer_solicitor",
"received_at": "2026-03-14",
"body": "...",
"amount_gbp": 4500}
]
},
"ground_truth": {
"expected_claim_types": ["unfair_dismissal"],
"expected_lead_claim": "unfair_dismissal",
"expected_acas_compliant": false,
"acas_failures": ["no_appeal_offered", "predetermined_outcome"],
"expected_outcome_category": "upheld",
"expected_value_range_gbp": {"low": 14000, "high": 32000},
"automatic_unfair_grounds": [],
"time_limit_status": "in_time",
"red_team_flags": ["procedural_unfairness"],
"conflicts": [
{"description": "narrative says March 2024, dismissal_letter dated April",
"where": ["narrative", "documents.dismissal_letter"],
"truth": "dismissal_letter date is correct (April)",
"trips_prompts": ["fact_checker", "consistency_checker"]}
]
}
}
Voyage embedding wire shape
POST https://api.voyageai.com/v1/embeddings
Authorization: Bearer pa-...
Content-Type: application/json
{
"input": ["text 1", "text 2", ...], # batch up to 8 docs
"model": "voyage-4-large",
"input_type": "document" # or "query" for retrieval
}
→ {
"data": [
{"embedding": [0.012, -0.034, ..., 0.022], "index": 0}, # 1024 floats
...
],
"usage": {"total_tokens": 12384}
}
Pgvector storage
judgments.embedding_full vector(1024) # nullable until embed worker fills judgments.embedding_facts vector(1024) statute_sections.embedding vector(1024)
Cosine similarity query (pgvector idiom used by et-pipeline)
SELECT id, case_name, court, decision_date, embedding_facts <=> :query_vec AS cosine_distance FROM judgments WHERE state = 'embedded' AND court IN ('SC','HL','CA','EAT','UT','HC') ORDER BY embedding_facts <=> :query_vec LIMIT 30;
Appendix — the 65 prompts
| Family | Count | Names |
|---|---|---|
| Intake | 4 | intake_gap_analyzer, intake_question_planner, intake_answer_integrator, intake_readiness_gate |
| Document handling | 9 | document_classifier, document_extractor, claim_extractor, inbound_classifier, inbound_acknowledgement_extractor, inbound_counter_offer_extractor, inbound_info_request_extractor, grievance_outcome_extractor, lbc_response_extractor, sar_response_extractor |
| Drafters (system + user) | 14 | et1_drafter(+_user), demand_letter_drafter(+_user), lbc_drafter(+_user), grievance_drafter, sar_drafter, settlement_drafter, settlement_response_drafter(+_user), case_strength_drafter(+_user) |
| Critics | 13 | et1_critic_compliance, et1_critic_opposition, et1_critic_tribunal, demand_letter_critic_compliance, demand_letter_critic_opposition, demand_letter_critic_negotiation, lbc_costs_trail_critic, lbc_opposition_critic, grievance_acas_compliance_critic, grievance_tone_critic, settlement_calderbank_critic, settlement_negotiation_critic, settlement_response_critic_compliance/opposition/strategic |
| Scorers | 10 | et1_scorer(+_user), demand_letter_scorer(+_user), grievance_scorer, lbc_scorer, case_strength_scorer(+_user), settlement_letter_scorer, settlement_response_scorer(+_user) |
| Verifier | 3 | fact_checker, consistency_checker, legal_authority_checker |
| Red-team | 4 | strike_out_simulator, worst_case_award_calculator, opposing_counsel_responder, red_team_accuracy_judge |
| Specialist | 8 | case_agent (chat router), costs_warning_simulator, per_claim_particulars, strategic_seam_finder, sar_compliance_checker, intro_message, voice, foundational_law (resolver, not a prompt) |