Architecture | Intactus Docs

Design Principles

Privilege by design. The architecture ensures attorney-client privilege and work product protection are structural properties of the system, not afterthoughts. The attorney directs all AI analysis. Enterprise API terms maintain confidentiality. Client data never touches consumer-grade AI services.
Tamper-evident from capture. Every artifact is SHA-256 hashed and recorded in a JSONL manifest at the moment of ingestion. The chain of custody is unbroken from capture through court presentation.
AI costs are tiered. Most operations require no LLM. Search and extraction are pre-computed at ingestion. Only attorney-directed synthesis and analysis invoke frontier models, and even then agentic retrieval ensures only relevant evidence is sent.
The vault is the platform. Evidence capture, preservation, and organization provide standalone value without any AI features. The AI analysis layer is a premium capability on top of a fundamentally useful evidence management system.
Agentic over semantic. Retrieval is reasoning-based, not similarity-based. The system builds structured evidence indexes at ingestion and uses LLM reasoning to navigate them. No vector database, no embedding vendor, no chunking. Similarity is not relevance; relevance requires reasoning.
Multi-tenant from day one. Every table, every S3 path, every query is scoped by firm and case. Multi-tenancy is a data model property, not a bolt-on.
Agent-native from day one. Every API endpoint is simultaneously a UI operation and an agent tool. The API is the tool layer; there is no separate "agent API." Agents are first-class practitioners operating under attorney direction, not feature add-ons. See agent-native.md for the full agent architecture.

Every FastAPI endpoint is an atomic tool. The API is the tool layer. The UI and agents call the same endpoints. This is the foundational architectural decision that makes agent parity automatic rather than aspirational.

Endpoint Design Principles

One operation per endpoint. Each route performs a single, well-defined operation with clear input/output contracts.
Consistent error format. Every endpoint returns errors in the same structure: machine-readable code, human-readable message, structured details, retry guidance, and suggested next action. See agent-native.md: Error Contract.
Async by default for long operations. Any operation that takes more than 5 seconds returns a job_id instead of blocking. See Async Operations below.

OpenAPI Extensions as Tool Metadata

Every FastAPI route carries tool metadata via OpenAPI extensions:

Extension	Required	Purpose
`x-tool-name`	Yes	Tool identifier (e.g., `evidence.search`)
`x-tool-permission`	Yes	Required permission (e.g., `read:evidence`)
`x-tool-audit-category`	Yes	Audit category (e.g., `search`, `create`, `delete`)
`x-tool-entity-type`	No	Entity type acted upon (e.g., `evidence`, `fact`)

A tool_meta() helper function attaches these extensions to routes at definition time. A unit test validates the helper works correctly, but there is no CI step that validates all routes carry the required extensions.

Frontend as Thin Client

The Next.js frontend calls the same API endpoints agents call. No BFF (backend-for-frontend). No screen-specific endpoints. The frontend currently uses raw fetch() calls to the API. OpenAPI-based TypeScript client generation (e.g., @hey-api/openapi-ts) is planned but not yet implemented.

Shell Contextual Action Slot

The top bar contains a right-side slot for page-specific action buttons (e.g., "Upload Evidence" on the evidence browser, "Create Case" on the case list). Pages inject content via TopBarActionsContext: a React context that holds the current action node. The shell layout reads this context and renders it in the top bar. Pages use the useTopBarActions(node) hook to register their actions on mount and clear them on unmount. This pattern is defined in frontend/components/shell/topbar-actions-context.tsx and is the standard for all M3+ pages.

API Conventions

All endpoints follow consistent conventions for pagination, idempotency, and versioning:

Cursor-based pagination: Every .list endpoint returns { items, next_cursor, has_more }. Cursor-based rather than offset-based because evidence, facts, and events are frequently inserted.
Idempotency keys: All POST (create) endpoints accept an Idempotency-Key header. This is implemented centrally in backend/app/middleware/idempotency.py (Redis-backed, 24-hour TTL, body-hash matching, lock-based deduplication). Endpoint implementations do not need per-service idempotency logic — it is handled by the middleware layer automatically for all POST requests that include the header.
URL-prefix versioning: All endpoints use /v1/ prefix. Breaking changes require a new version; non-breaking additions don't.
Application-level enum validation: Status fields and type enums (source_type, capture_method, processing_status, case status, etc.) are validated at the application layer via Pydantic, not with DB-level CHECK constraints or PostgreSQL ENUM types. Allowed values are defined in a central module (e.g., app/constants.py or per-domain enums.py) and referenced by both schemas and services. This enables rapid iteration — adding a new source type is a code change, not an Alembic migration. DB columns store plain VARCHAR. Existing CHECK constraints on case status should be migrated to this pattern for consistency.

See agent-native.md: API Conventions for the full specification.

Real-Time Updates

Intactus uses Server-Sent Events (SSE) for one-way real-time updates from the backend to the browser. SSE is the default transport for progress updates, enrichment step completion, and firm-scoped event delivery where the client needs a live feed but does not need bidirectional messaging.

Why SSE

One-way fits the product. Most live updates in Intactus are server-to-client notifications: job progress, enrichment progress, event feed updates, and UI refresh triggers.
Operationally simpler than WebSockets. SSE runs over normal HTTP, works well with the existing FastAPI/Starlette stack, and avoids connection state and protocol complexity that the product does not currently need.
Native browser support. The frontend can consume streams with EventSource without a custom client protocol.

Transport Architecture

The real-time path is:

Celery / backend service
  -> publish event to Redis pub/sub
  -> SSE endpoint subscribes to Redis channel
  -> Next.js same-origin proxy streams the response
  -> browser EventSource updates the UI

There are two SSE scopes in the backend:

Case-level streams: used for evidence/enrichment progress within a single case
Firm-level streams: used for broader firm-scoped domain events

The backend publishes events into Redis pub/sub channels keyed by firm and, when needed, by case. This keeps event fan-out cheap and avoids holding PostgreSQL connections open for long-poll delivery.

Backend Design

The SSE endpoints live in a dedicated Starlette sub-application mounted under /api/v1/sse. This is intentional.

The main FastAPI app uses BaseHTTPMiddleware layers for concerns like error handling, idempotency, tenant handling, and rate limiting.
Those middleware layers can buffer responses, which breaks streaming semantics.
Mounting a separate ASGI sub-app allows SSE responses to bypass the buffering middleware stack and stream incrementally.

The backend SSE handlers do four things:

Authenticate the request directly rather than relying on the normal middleware path.
Authorize scope and case access before opening the stream.
Subscribe to the appropriate Redis pub/sub channel.
Emit a snapshot first, then stream live events and periodic heartbeat comments.

The snapshot-then-stream pattern is important. On connect, the server first sends the current known state so the client can hydrate immediately, then it continues with incremental updates. This avoids races where the UI connects midway through an active enrichment run and would otherwise wait for the next event before showing current state.

Heartbeats are sent periodically to keep the connection alive through proxies and load balancers and to refresh server-side connection bookkeeping.

Frontend Design

The browser does not connect directly to the backend SSE endpoint. Instead, it connects to a same-origin Next.js route that proxies the stream through to the backend.

This proxy exists for practical reasons:

EventSource cannot set arbitrary authorization headers
the session cookie is httpOnly and must be read server-side
same-origin routing is easier to operate under the app's CSP and browser cookie rules

The Next.js route reads the session cookie, adds the backend Authorization header, requests the backend SSE stream, and returns the stream body without buffering or transformation.

On the client side:

EventSource is used for the live connection
the stream is debounced before triggering expensive refetches
the client uses the initial snapshot to hydrate current state
list/detail views selectively refetch only the affected records rather than doing a full page reload

Reliability Model

The frontend treats SSE as the preferred real-time channel, not the only channel.

Connections automatically reconnect with exponential backoff
the client performs freshness checks during long-running enrichment
if SSE fails persistently, the UI falls back to polling

This hybrid approach keeps the UI responsive without making correctness depend on a permanently healthy stream. SSE improves latency and user experience; polling remains the safety net.

Security and Multi-Tenancy

SSE follows the same tenant boundaries as the rest of the platform.

channels are scoped by firm_id, and case-level channels also include case_id
connection authorization is checked before subscription
case streams verify that the principal can access the requested case
event payloads are intentionally small and avoid sending unnecessary document content or PII

This preserves the same firm/case isolation model used throughout the API and database layers.

Architectural Tradeoff

SSE is the right fit for Intactus today because the product mostly needs server-to-browser event delivery, not collaborative peer messaging or duplex sessions. If future features require true bidirectional low-latency messaging, WebSockets or a dedicated realtime service may be warranted. For the current architecture, SSE provides the lowest-complexity path to live updates while staying aligned with the existing HTTP-first API design.

System Architecture Overview

               CLIENT PORTAL                    ATTORNEY DASHBOARD
            (Responsive Web → Native)           (Web Application)
                     |                                |
                     v                                v
             ┌───────────────────────────────────────────┐
             │              API GATEWAY                  │
             │    (Caddy → FastAPI, Auth, Rate Limits)   │
             └─────────────────┬─────────────────────────┘
                               │
             ┌─────────────────┼──────────────────────┐
             │                 │                       │
             v                 v                       v
      ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐
      │  INGESTION   │  │   SEARCH     │  │    ANALYSIS      │
      │  PIPELINE    │  │ (PostgreSQL) │  │    (Future)      │
      │              │  │              │  │                   │
      │ - File upload│  │ - Full-text  │  │ - Agentic        │
      │ - Text       │  │   search     │  │   retrieval      │
      │   extraction │  │ - Entity     │  │ - Sonnet/Opus    │
      │ - OCR (Haiku)│  │   lookup     │  │   via Anthropic  │
      │ - Hash       │  │ - Filters    │  │   API            │
      │ - Manifest   │  │              │  │ - Report gen     │
      │ - Enrichment │  │ Cost: FREE   │  │ - Chat with      │
      │   (entities, │  │ at query     │  │   My Case        │
      │    facts,    │  │              │  │                   │
      │    relations,│  │              │  │ Cost: $$          │
      │    summaries)│  │              │  │ per query         │
      │              │  │              │  │                   │
      │ Cost: $0.005 │  │              │  │                   │
      │ per item     │  │              │  │                   │
      └──────┬───────┘  └──────┬───────┘  └────────┬──────────┘
             │                 │                    │
             v                 v                    v
      ┌──────────────────────────────────────────────────┐
      │                  DATA LAYER                       │
      │                                                   │
      │  PostgreSQL 16        S3 (artifacts)              │
      │  - metadata           - files, images             │
      │  - manifest           - screenshots               │
      │  - entities           - audio, video              │
      │  - relationships      - .eml files                │
      │  - hashes             - text exports              │
      │  - summaries                                      │
      │  - classifications                                │
      │                                                   │
      │  ┌─────────────────────────────────────────┐      │
      │  │  TASK QUEUE (Celery + Redis)             │      │
      │  │  - Ingestion jobs                        │      │
      │  │  - Enrichment jobs                       │      │
      │  └─────────────────────────────────────────┘      │
      └──────────────────────────────────────────────────┘

Data Model

Multi-Tenancy

Every record in the system is scoped by firm and case. This is not a future concern; it is a structural property of the data model from the first migration.

                    ┌──────────┐
                    │  Firm     │
                    │ (tenant)  │
                    └────┬─────┘
                         │ has many
                    ┌────┴─────┐
                    │  Case     │
                    └────┬─────┘
                         │ has many
     ┌───────────────┼──────────────┬──────────────┬──────────────┐
     │               │              │              │              │
┌────┴────┐   ┌──────┴────┐  ┌─────┴──────┐ ┌────┴────┐  ┌──────┴─────┐
│Evidence │   │  Entity   │  │Relationship│ │  Fact   │  │   Claim    │
│  Item   │   │           │  │            │ │         │  │  └─ Issue  │
└─────────┘   └───────────┘  └────────────┘ └─────────┘  │  (future)  │
                                                          └────────────┘

Every table includes firm_id and case_id columns
PostgreSQL Row-Level Security enforces tenant isolation on business data tables — tenant context is set per-session via a FastAPI dependency, and RLS policies enforce it for all queries. Auth tables (sessions, agent_sessions) are exempt from RLS; token lookups are scoped by cryptographic hash and must resolve the tenant before RLS context can be set
Two database roles: superuser for migrations, application role for runtime (subject to RLS on business data)
S3 key paths: /{firm_id}/{case_id}/{artifact_type}/{artifact_id}/{filename}
No cross-firm queries are possible at the application or database layer
Firm admin can see all cases within their firm; attorneys see assigned cases; clients see their own case only

Implementation: See backend/app/tenant.py for session-scoped tenant context, migrations 0003/0004 for RLS policies, and migration 0011 for the auth table exemption.

Actor attribution: All tables with audit fields (created_by, updated_by, or equivalent) include actor_type (human | agent) and actor_id columns. This distinguishes human actions from agent actions across the data model without requiring separate audit infrastructure. When actor_type is agent, the actor_id references the agent record and the agent's agent_owner_id links to the directing attorney.

Auth & Permissions

Role	Sees	Can Do
Firm admin	All cases in firm	Manage attorneys, billing, firm settings
Attorney	Assigned cases	Full evidence management, AI analysis, case administration
Staff	Assigned cases	Evidence intake, organization, search (no AI analysis)
Agent	Cases assigned to owner attorney	Tool-layer operations scoped by permission grants (see Agent Role Model)
Client	Own case only	Upload evidence, portal features, Chat with My Case, secure messaging
Guest	Specific case(s) only	Read-only access to attorney dashboard view for that case. For co-counsel, expert witnesses, mediators, guardians ad litem.

Authentication: FastAPI owns all authentication. Human users authenticate via magic links (email-based, passwordless) sent through Resend. Prefixed bearer tokens with SHA-256 hashing only — raw tokens are never persisted. Sessions are stored in PostgreSQL. See backend/app/auth/ for the full implementation.

Agent authentication: Agents use API keys exchanged for scoped session tokens. A single auth dependency validates both human and agent tokens, resolving to a unified principal model. Agent tables (API keys, sessions, audit log) ship in M1; management endpoints ship in M2+. See backend/app/models/agent.py for the data model and agent-native.md: Agent Authentication & Sessions for the full specification.

Agent permissions: Every agent has an agent_owner_id foreign key referencing the directing attorney. An agent inherits its case access from the owner attorney but is further scoped by operation type (read, write, delete, analyze) and entity type. Permission checks are enforced at the API layer via require_scope(): same middleware, same firm/case scoping, with additional agent scope validation. See agent-native.md: Agent Role Model for the full permission model.

Fact Schema

A Fact is a discrete factual assertion, linked to one or more source passages across one or more evidence items that support it. Facts are the atomic unit of case reasoning; they connect raw evidence to legal arguments with passage-level provenance.

Facts are generated at two points:

At ingestion (auto-computed): The ingestion pipeline extracts factual assertions from evidence text, each linked to specific character offsets in the source. These are stored automatically and clearly labeled as AI-generated.
At analysis (attorney-directed): When the attorney requests AI analysis, the resulting findings are modeled as facts requiring explicit attorney approval before entering the case record.

The facts table stores the assertion text, a confidence score (0.00-1.00), a status field (auto, approved, dismissed), approval tracking (approved_by, approved_at), temporal fields (occurred_at, occurred_end_at, occurred_precision) for when the fact took place, and a JSONB metadata column. See backend/app/models/knowledge_graph.py (Fact class) for the full column set.

The fact_evidence junction table implements the many-to-many relationship between facts and evidence items. Each row captures where in a specific evidence item the fact is grounded: source_snippet, start_offset, end_offset, and an is_primary flag for the canonical source. The table uses a composite primary key on (fact_id, evidence_item_id) and includes firm_id denormalized for RLS.

Two additional junction tables connect facts to other knowledge graph elements:

fact_entities: Links facts to the entities they mention, with an optional role and mention_text. Composite PK on (fact_id, entity_id).
relationship_facts: Links relationships to the facts that support them. Composite PK on (relationship_id, fact_id).

See backend/app/models/knowledge_graph.py for FactEvidence, FactEntity, and RelationshipFact.

Two-tier approval model:

Tier	Generated By	Approval	Examples
Auto-computed metadata	Ingestion pipeline (Haiku)	Stored automatically, attorney can edit/override anytime	Entity extraction, 2-sentence summaries
Facts & assertions	Ingestion pipeline or analysis engine	Require explicit attorney review: approve, edit, or dismiss	"Respondent was 40 minutes late for pickup on Oct 14", "Financial disclosure contradicts social media post from Sep 3"

The approval workflow supports batch operations: select multiple facts, approve all, or filter by confidence and approve in bulk. This keeps the workflow fast for the 80% of facts that are obviously correct while preserving rigor for borderline findings.

Fact state machine: Facts have three statuses (auto, approved, dismissed) but no terminal states. The batch_update_status service function transitions any set of facts to any valid status. There are no separate /approve or /dismiss endpoints — all status changes go through POST /facts/bulk-update with a status field. Facts can be soft-deleted via DELETE /facts/{id} (sets deleted_at/deleted_by_id). The system uses hard-delete for orphan cleanup during re-analysis.

Batch operations: The bulk-update endpoint follows a validate-all-before-execute pattern: all fact IDs must be validated as belonging to the requesting firm_id + case_id before any transition is executed. Max 100 facts per batch; exceeding returns 422. Response is per-item: [{ fact_id, status: "approved" | "failed", error?: str }].

Claims and Issues: The claims, issues, issue_facts, and issue_evidence tables are defined in the Issue & Claim Schema section below but are not yet implemented.

Relationship Schema

At ingestion, the enrichment pipeline extracts relationships between entities using a category + label system. Rather than a flat enum of relationship types, each relationship has a structured category (from a fixed set of 7), a free-text label describing the specific relationship, and an optional inverse_label for the reverse direction.

Evidence Type	Extracted Relationships (category / example labels)
Email	communication / "sent email to", "cc'd on email"; professional / "works with"
Text message	communication / "messaged", "texted"; social / "is friend of"
Court document	legal / "filed by", "regarding"; professional / "represented by"
Financial record	financial / "paid", "owes"; professional / "employed by"
General evidence	familial / "parent of", "sibling of"; spatial / "lives at", "works at"

The 7 relationship categories are defined in KG_RELATIONSHIP_CATEGORIES in backend/app/enums.py: familial, professional, legal, social, financial, spatial, communication. The API validates category against this set; unknown values return 422. The label and inverse_label fields are free-text, allowing the LLM to describe the specific nature of each relationship.

Each relationship links a source_entity_id to a target_entity_id, with an optional confidence score. Direction-agnostic deduplication is enforced via a unique index on LEAST(source_entity_id, target_entity_id), GREATEST(source_entity_id, target_entity_id), case_id, and category — so "A is parent of B" and "B is parent of A" within the same category cannot coexist as separate rows.

Relationships support basic CRUD operations and can be linked to supporting facts via the relationship_facts junction table. See backend/app/models/knowledge_graph.py (Relationship and RelationshipFact classes) for the full schema.

Full-text search: facts.assertion uses PostgreSQL tsvector/tsquery with a GIN index. Migration must include: CREATE INDEX facts_assertion_fts ON facts USING GIN (to_tsvector('english', assertion));

Evidence Item Schema

An Evidence Item is the core unit of the vault: a captured artifact with full provenance, hash-chain integrity, and pre-computed metadata. Evidence items are created by the ingestion pipeline and are immutable once hashed.

The evidence_items table organizes its columns into several groups:

Source provenance: source_type and capture_method classify how the evidence entered the system.
Content: title, extracted_text (full text — HTML stripped, OCR'd via PyMuPDF/fitz for PDFs and Anthropic Vision for images), content_date (original content date), captured_at.
Integrity: sha256_hash (computed by pipeline after upload, nullable at creation), manifest_entry_id (reference to JSONL manifest entry).
Processing state: processing_status, processing_error, processing_steps (per-step JSONB tracking).
Enrichment outputs: summary, summary_one_liner, classifications (JSONB), confidence. Also extraction_instructions and enrichment_instructions (attorney-provided custom instructions for re-extraction and enrichment), and text_modified_since_enrichment (flag indicating extracted text changed after last enrichment).
Review tracking: reviewed_at, reviewed_by_id — marks when an attorney reviewed the evidence.
S3 artifact reference: s3_key, content_type, size_bytes, original_filename.
Soft delete: deleted_at, deleted_by_id.
Extensible metadata: metadata JSONB for source-specific fields.

See backend/app/models/evidence.py for the full column set.

Notes:

source_type values: web, email, sms, file, image, video, audio, other. Defined in SOURCE_TYPES in backend/app/enums.py.
capture_method values: upload, browser, api, email_forward. Defined in CAPTURE_METHODS in backend/app/enums.py. Determines which pipeline variant processes the item.
processing_status has 6 states: pending, processing, completed, failed, enriching, partially_enriched. Defined in PROCESSING_STATUSES in backend/app/enums.py. Items with pending or processing status show a progress indicator in the UI. Items with failed status surface the error to the attorney.
processing_steps tracks per-step completion within the pipeline (status, timestamp, errors). Each step is recorded independently so partial failures are visible. Structure: {"step_name": {"status": "completed"|"failed"|"skipped", "started_at": "...", "completed_at": "...", "error": "..."}}.
Text extraction uses PyMuPDF/fitz for PDF parsing and Anthropic Vision API (Claude Haiku) for image OCR.

Entity Schema

An Entity is a named item extracted from evidence during ingestion. Entities are deduplicated across evidence items within a case and can be merged manually by the attorney.

Entities have exactly 3 types: person, organization, location — defined in KG_ENTITY_TYPES in backend/app/enums.py. Each entity stores a display_name (canonical name) and a dedup_key (normalized for deduplication). Uniqueness is enforced via a partial unique index on (firm_id, case_id, entity_type, dedup_key) WHERE merged_into_id IS NULL AND deleted_at IS NULL, allowing soft-deleted and merged entities to coexist with active ones sharing the same key.

Additional entity columns include:

summary and summary_one_liner: AI-generated entity summaries, debounced per-entity via the entity summarization pipeline. summary_edited tracks whether the attorney has overridden the AI summary.
parsed_date and date_precision: Temporal metadata for date-like entities.
deleted_at, deleted_by_id: Soft delete timestamps. merged_into_id: Self-referencing FK for merge tracking — when set, the entity has been merged into the referenced entity. Merge and delete are distinct operations (see Record Lifecycle Model below).
metadata (JSONB): Entity-type-specific fields.

See backend/app/models/knowledge_graph.py (Entity class) for the full column set.

Entity-evidence junction: The entity_evidence table links entities to evidence items. It uses a surrogate UUID primary key with a unique constraint on (entity_id, evidence_item_id) — guaranteeing one link per pair. Each row also stores start_offset, end_offset, mention_text, and confidence. The firm_id is denormalized for RLS. See backend/app/models/knowledge_graph.py (EntityEvidence class).

Related tables: Several additional tables support the entity subsystem:

dedup_suggestions: Stores fuzzy-match dedup candidates with a score, match_reasons (JSONB), and a cluster_id for grouping. Suggestions have a status (pending/accepted/rejected) with decided_by/decided_at tracking. Unique constraint on (source_entity_id, target_entity_id).
merge_history: Audit trail for entity merges. Each row records a single column change made during a merge operation: table_name, row_id, column_name, old_value, new_value. Grouped by merge_event_id so an entire merge can be replayed or reversed.
temporal_mentions: Stores date/time references extracted from evidence, linked to a fact_id and/or evidence_item_id. Includes occurred_at, end_at, precision, mention_text, display_text, temporal_role (when, deadline, filed, created, received), and character offsets.
fact_entities: Junction table linking facts to the entities they mention (see Fact Schema).
relationship_facts: Junction table linking relationships to supporting facts (see Relationship Schema).

See backend/app/models/knowledge_graph.py for DedupSuggestion, MergeHistory, TemporalMention, FactEntity, and RelationshipFact.

Record Lifecycle Model

All three core data types (evidence, entities, facts) follow a consistent lifecycle model. Records can be in one of four mutually exclusive states:

State	`merged_into_id`	`deleted_at`	`deleted_by_id`	Meaning
Active	NULL	NULL	NULL	Normal record
Merged	target UUID	NULL	NULL	Absorbed into target; data lives on
Deleted	NULL	timestamp	user UUID	Explicitly removed by user
Merged + Erased	target UUID	timestamp	user UUID	Future GDPR path

Active predicate: WHERE merged_into_id IS NULL AND deleted_at IS NULL (used in all list queries).

Delete behavior by actor:

User (UI delete): Soft-delete -- sets deleted_at/deleted_by_id. Leaves junction rows in place. Invalidates dedup suggestions. Reversible with audit trail.
System (orphan cleanup, re-analysis): Hard-delete -- SQL DELETE. FK CASCADE handles junction rows. Used for junk removal with no user intent to preserve.
Merge (dedup approval): Sets merged_into_id only. NOT deletion. Preserves lineage per FHIR/MDM standards.

Soft-delete leaves junction rows intact: When a record is soft-deleted, its junction/child rows (fact_entities, entity_evidence, relationship_facts, etc.) remain in the database. They are filtered at query time via the active predicate on the parent table. This preserves graph connections for future undo/restore capabilities.

Merge behavior: When an attorney merges two entities, the secondary entity's merged_into_id is set to the primary entity's id. All entity_evidence rows, relationships, and aliases transfer to the primary. The secondary entity is retained (merged, not deleted) so that existing references resolve correctly. Every column change during the merge is recorded in merge_history, enabling full unmerge capability. After merge completes, entity summarization is automatically triggered for the primary entity to incorporate the newly absorbed data.

Merge security: The merge endpoint validates that both entities share the same firm_id and case_id. No cross-case merges are permitted.

Merge idempotency: If the secondary entity already has merged_into_id = primary_id, the merge is a no-op. Return 200 with the current state of the primary entity.

Auto-deduplication: The dedup_key is computed at write time by generate_dedup_key() in backend/app/services/entity_service.py. The key is prefixed with the entity type and applies type-specific normalization: person names strip titles/suffixes and canonicalize nicknames; organization names strip legal suffixes and expand abbreviations; location names expand address abbreviations. After type-specific processing, the name is lowercased, accents are stripped (NFKD), punctuation is removed, tokens are sorted alphabetically, and joined with underscore. Example: "Dr. Sarah Johnson" (person) → person:johnson_sarah. Exact-match duplicates are caught at insertion time. Fuzzy matches (e.g., "Sarah Johnson" vs "S. Johnson") are surfaced to the attorney as dedup_suggestions rather than auto-merged, preserving accuracy over convenience.

Fact count: Entity-level counts (fact count, evidence count, relationship count) are not stored as denormalized columns. They are computed at query time via JOIN.

Full-text search: Entity display_name is indexed for full-text search using PostgreSQL tsvector/tsquery with a GIN index. Do not use ILIKE for search. Migration must include: CREATE INDEX entities_display_name_fts ON entities USING GIN (to_tsvector('english', display_name));

Issue & Claim Schema

Note

Implementation Status: NOT YET IMPLEMENTED. The tables described below do not exist in the database. This section is retained as the design specification for a future milestone.

A Claim is a top-level legal argument or legal theory. An Issue is a sub-element within a claim — typically a legal element that must be proven, a statutory factor the court must weigh, or a discrete contested question. Together they provide the hierarchical structure attorneys use to organize facts and evidence around legal theories.

In tort/contract litigation: Claims map to legal theories (Negligence, Breach of Contract, RICO). Issues map to elements of each theory ("Duty," "Breach," "Causation," "Damages"). authority_citations may reference a statute or pattern jury instruction. Common law claims have no statute.

In family law: Claims map to petition or motion types ("Petition for Primary Physical Custody," "Motion for Contempt — Violation of Placement Order"). Issues map to statutory factors the court must weigh — e.g., the 16 enumerated factors in Wis. Stat. § 767.41(5)(am). The attorney creates one Issue per factor they intend to argue. The claim's authority_citations references the factor-list statute; individual Issues are labeled by factor name/number.

In criminal law (defense side): claim_type='defense' captures the overall defense theory ("Self-Defense," "Mistaken Identity," "Lack of Intent"). claim_type='affirmative_defense' captures formal affirmative defenses with their own required elements (self-defense requires: belief of imminent threat, reasonableness, not the initial aggressor). claim_type='motion' captures pretrial motions — a "Motion to Suppress — Unlawful Stop" becomes a Claim with Issues mapping to the 4th Amendment prongs the attorney must argue. Evidence suppression motions are high-value: winning one can collapse the prosecution's case.

In criminal law (prosecution side): claim_type='charge' represents each count in the charging document. authority_citations is effectively required — every criminal charge is statutory (e.g., Wis. Stat. § 940.01(1)(a) for 1st degree intentional homicide, Wis. Stat. § 346.63 for OWI, 18 U.S.C. § 1962 for RICO). Issues map to the elements of the offense that must be proven beyond a reasonable doubt. Each element becomes one Issue; Facts and evidence are linked to prove or disprove that element.

jurisdiction distinguishes state from federal charges — 'WI' for Wisconsin, 'federal' for federal crimes. A single case may have both (federal RICO charge + parallel state fraud charge for the same conduct). A claim can reference multiple statutes via the authority_citations array.

Neither jurisdiction nor authority_citations is required at the DB level. Common law civil claims have neither. But criminal charges should always include authority_citations.

For user-facing features, see Case Outlines and Issue Linking.

CREATE TABLE claims (
    id UUID PRIMARY KEY,
    firm_id UUID NOT NULL,
    case_id UUID NOT NULL,
    claim_type VARCHAR NOT NULL DEFAULT 'claim',
    -- Civil:    'claim', 'counterclaim', 'affirmative_defense'
    -- Criminal: 'charge' (prosecution-side, always statutory),
    --           'defense' (defense narrative/theory),
    --           'affirmative_defense' (self-defense, insanity — shared with civil),
    --           'motion' (pretrial/trial motions: suppress, dismiss, for judgment)
    -- Family:   'claim' covers petitions/motions (differentiated by title)
    title VARCHAR NOT NULL,          -- e.g., "Count 1: 1st Degree Intentional Homicide",
                                     --        "Self-Defense", "Motion to Suppress — Unlawful Stop",
                                     --        "Petition for Primary Physical Custody", "Negligence"
    description TEXT,                -- optional narrative context
    jurisdiction VARCHAR,            -- optional: 'federal', 'WI', 'CA', etc.
    authority_citations JSONB,       -- optional for civil; effectively required for criminal charges
    sort_order INT NOT NULL,
    created_by UUID NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL
);
-- authority_citations format:
-- [{"citation": "Wis. Stat. § 940.01(1)(a)", "jurisdiction": "WI", "label": "1st Degree Intentional Homicide"},
--  {"citation": "18 U.S.C. § 1962", "jurisdiction": "federal", "label": "RICO"},
--  {"citation": "Wis. Stat. § 767.41", "jurisdiction": "WI", "label": "Wisconsin Custody Factors"}]
 
CREATE TABLE issues (
    id UUID PRIMARY KEY,
    firm_id UUID NOT NULL,
    case_id UUID NOT NULL,
    claim_id UUID NOT NULL REFERENCES claims(id),
    title VARCHAR NOT NULL,          -- e.g., "Late pickups"
    description TEXT,
    sort_order INT NOT NULL,         -- ordering within the parent claim
    created_by UUID NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL
);
 
-- Many-to-many: facts can support multiple issues
CREATE TABLE issue_facts (
    issue_id UUID NOT NULL REFERENCES issues(id),
    fact_id UUID NOT NULL REFERENCES facts(id),
    PRIMARY KEY (issue_id, fact_id)
);
 
-- Many-to-many: evidence items can be linked to issues directly
CREATE TABLE issue_evidence (
    issue_id UUID NOT NULL REFERENCES issues(id),
    evidence_item_id UUID NOT NULL,
    PRIMARY KEY (issue_id, evidence_item_id)
);

Claims and issues are scoped by firm_id and case_id like all other tables. The sort_order fields drive the attorney's custom ordering in the case outline view and in exported reports (Facts by Issues, Case Outline, Statement of Material Facts).

Report Data Path — All Practice Areas

The same claims → issues → issue_facts → facts → fact_evidence → evidence_items query path drives different output documents depending on practice area and claim_type:

Civil litigation — Statement of Material Facts (SUMF) Required with summary judgment motions. Numbered paragraphs, each a Fact with all supporting evidence citations, organized by legal element (Issue).

Family law — Hearing Memorandum / Trial Brief Each custody factor (Issue) becomes a section. Facts under it become numbered paragraphs with exhibit citations. sort_order on Issues mirrors the statutory factor numbering (e.g., Wis. Stat. § 767.41(5)(am) factors 1–16).

Criminal — Case Theory Brief / Trial Brief claim_type='charge': each element of the offense (Issue) with corroborating facts. claim_type='defense': defense narrative organized by theory and supporting facts. claim_type='affirmative_defense': each required element of the defense with facts. claim_type='motion': suppression motion structured by constitutional prong (Issue), with facts and evidence citations supporting the challenge.

In all cases:

sort_order on Claims and Issues controls document ordering
affirmative_defense and defense claim_types use the same hierarchy as offensive claims — no separate table required
The query path is identical; only the document template and section headers differ

Agent Architecture

For the full agent-native design philosophy, role model, tool inventory, and multi-agent coordination model, see agent-native.md. This section covers the architectural integration points within the platform.

API = Tool Layer

The API is the tool layer. Every FastAPI endpoint is simultaneously a UI operation and an agent tool. There is no separate "agent API" or "tool registry" apart from the API itself. The OpenAPI spec is the tool registry; agents discover available operations by reading it. See API Design: Tool-Layer-First for the design principles.

The Complete Tool Inventory maps every user action to a specific API endpoint across 22 domains: cases, evidence, facts, claims & issues, entities, relationships, reports, exports, messages, chat, notifications, monitoring, designations, transcripts, saved filters, timeline, ingestion, jobs, audit, agent management, usage & billing, and tool discovery.

Agent Authentication

Agents authenticate via scoped API keys (not human auth flows). Attorneys generate keys through the dashboard, each scoped to specific cases, operation permissions, and rate limits. Agent sessions are created via POST /agent/sessions with a configurable TTL. See agent-native.md: Agent Authentication & Sessions for the full specification.

Agent Audit Log

All agent actions are logged in a dedicated audit table with attorney attribution, tool invocation details, and tamper-evident hashing of inputs and outputs. See agent-native.md: Agent Audit Trail for the full schema.

The agent_audit_log table is scoped by firm_id and case_id like all other tables. It records the agent identity, the directing attorney (agent_owner_id), the tool invoked, the entity acted upon, and hashes of both input and output. Entries appear alongside human actions in the case activity feed.

Agent Infrastructure Schema: The data model for API keys (agent_api_keys), sessions (agent_sessions), events (events), webhook delivery tracking (webhook_deliveries), and usage operations (usage_operations) is defined in agent-native.md: Agent Infrastructure Schema. The usage_operations table tracks per-call LLM costs (operation_type, operation_price, llm_cost, case_id, actor) for the M4 enrichment pipeline and all subsequent LLM operations.

Agent Session Model

Creation: Agent authenticates with API key, creates session via POST /agent/sessions with target case(s) and requested permissions
Briefing: Agent receives a case briefing with key entities, open issues, recent activity, and privilege boundaries, refreshable at any time via GET /agent/sessions/{id}/briefing
Work: Agent calls API endpoints (tools) within its permission scope. Long-running operations return job_ids for async tracking.
Observability: Agent subscribes to events (webhooks or polling) and can check for changes since last update via GET /agent/sessions/{id}/changes?since={timestamp}
Termination: Session terminates on TTL expiry, explicit DELETE /agent/sessions/{id}, or API key revocation

See agent-native.md: Observability for context refresh capabilities.

Rate Limiting & Quotas

Per-API-key rate limits (configurable by attorney, default 100 req/min), per-case session quotas (default 5 concurrent), and attorney-configurable AI cost controls: per-case caps, per-case alerts, firm-wide caps. A circuit breaker auto-suspends agent sessions after 50 consecutive errors and notifies the attorney. See agent-native.md: Rate Limiting, Quotas & Cost Management and business-model.md: AI Cost Management for the full specification.

Ingestion Pipeline

For capture methods that feed this pipeline, see Evidence Capture.

Pipeline Steps

Step	Method	LLM?	Cost	Milestone
1. Extract text from HTML/email/PDF/image	Parsing (PyMuPDF/fitz for PDFs), Anthropic Vision API for images	No*	Free*	M2
2. Store extracted text with positional metadata	Application logic (character offsets, page/line for PDFs)	No	Free	M2
3. Extract entities (person, organization, location)	Haiku LLM (configurable)	Yes (cheap)	~$0.001	M4
4. Extract facts with passage-level citations	Haiku LLM (configurable)	Yes (cheap)	~$0.001	M4
5. Derive relationships from extracted facts	Haiku LLM (configurable)	Yes (cheap)	~$0.001	M4
6. Generate summary + one-liner	Haiku LLM (configurable)	Yes (cheap)	~$0.001	M4
7. SHA-256 hash all artifacts	Compute	No	Free	M2
8. Record in tamper-evident manifest	Application logic	No	Free	M2
9. Store artifacts in S3	S3	No	~$0.023/GB/mo	M2

* Image OCR uses Claude Haiku via the Anthropic Vision API, which is an LLM call (~$0.001/image). PDF text extraction via PyMuPDF is free.

Note: Classification is not currently a separate pipeline step. The classifications JSONB column exists on the evidence model but is not populated by the enrichment pipeline.

M2 implements steps 1-2 and 7-9 — text extraction, hashing, manifest recording, and S3 storage. No LLM calls for PDFs, so per-artifact ingestion cost is effectively free (S3 storage only). M4 adds steps 3-6 — LLM-based entity extraction, fact extraction, relationship derivation, and summarization. All enrichment steps use Haiku by default (~$0.001/item each). Model is configurable per step via environment variables.

Pipeline Orchestration

The ingestion pipeline is implemented as a single Celery task (run_pipeline) that executes steps sequentially within one task invocation. Each step is tracked in the processing_steps JSONB column and is independently retriable.

# M2: Single-task pipeline (steps 1-2, 7-9)
@celery_app.task(name="app.tasks.ingestion", bind=True, max_retries=3)
def run_pipeline(self, evidence_id, firm_id, force=False):
    # 1. Download artifact from S3 to temp file
    # 2. Extract text (dispatch to extractor by content_type)
    # 3. Compute SHA-256 hash (chunked reads)
    # 4. Create manifest entry (hash chain, per-case lock)
    # 5. Set processing_status = 'completed'

M4 enrichment: After the ingestion pipeline completes (text extraction, hashing, manifest), enrichment is spawned as a separate async task via run_enrichment.apply_async(). Within the enrichment task, steps run sequentially (not in parallel): extract_entities → extract_facts → derive_relationships → summarize. This sequential ordering is intentional — fact extraction can reference extracted entities, and relationship derivation uses extracted facts. Each step's status is tracked independently in processing_steps JSONB. The evidence item's processing_status transitions through enriching during enrichment and lands at completed or partially_enriched depending on step outcomes.

Error handling per step: Each step catches its own exceptions and records them in processing_steps JSONB (status, timestamp, error message). If a step fails, the evidence item is marked processing_status = 'failed' with the error in processing_error. The Celery task retries up to 3 times with 60-second delay for transient errors.

Re-entrant behavior: Every step checks whether its output already exists before running. Re-running the pipeline on an already-completed item is a no-op unless the force=True parameter is passed. This makes retries safe.

File Handling Protocol

All file transfers (upload and download) use presigned S3 URLs. Files never pass through the API; agents and the UI upload directly to S3 and download directly from S3.

Upload: Caller requests a presigned URL via POST /cases/{id}/evidence/upload, PUTs the file to S3, then confirms the upload via POST /evidence/uploads/{id}/confirm to trigger the ingestion pipeline. Batch uploads follow the same pattern with parallel presigned URLs.

Download: Caller requests a presigned URL via GET /evidence/{id}/download or GET /exports/{id}/download, then GETs the file directly from S3.

See agent-native.md: File Handling Protocol for the full specification.

Tamper-Evident Manifest

Every evidence item is recorded in a JSONL manifest file at ingestion time. The manifest provides an independent, append-only chain of custody record that can be verified without access to the database.

Manifest format (one JSON object per line):

{"seq":1,"evidence_id":"uuid","sha256":"abc123...","filename":"email_oct14.eml","s3_key":"/firm/case/evidence/uuid/email_oct14.eml","captured_at":"2026-01-15T10:30:00Z","capture_method":"email_forward","size_bytes":14208,"prev_hash":"0000...","entry_hash":"def456..."}
{"seq":2,"evidence_id":"uuid","sha256":"789abc...","filename":"screenshot_fb.png","s3_key":"/firm/case/evidence/uuid/screenshot_fb.png","captured_at":"2026-01-15T10:31:00Z","capture_method":"browser","size_bytes":284912,"prev_hash":"def456...","entry_hash":"ghi789..."}

Hash chain: Each entry's entry_hash is SHA-256(prev_hash + evidence_id + sha256 + captured_at). The first entry uses prev_hash: "0" * 64. Any modification to a prior entry breaks the chain from that point forward; tampering is immediately detectable.

Storage: One manifest file per case, stored in S3 at /{firm_id}/{case_id}/manifest.jsonl. The manifest is also mirrored in the manifest_entries database table for queryability, but the S3 file is the authoritative record for legal/audit purposes.

Write ordering and partial failure: The DB manifest_entries row is written first (within the ingestion transaction), then the JSONL line is appended to S3. If the S3 append fails after the DB commit, a reconciliation task is enqueued to retry the S3 write. The DB is the operational authority (queries, UI, API responses) while the S3 JSONL file is the legal/audit authority (court-presentable, independently verifiable). A reconciliation job can rebuild the S3 manifest from the DB mirror at any time, so transient S3 failures do not block ingestion.

Independent verification: Given a manifest file and access to the S3 artifacts, any party can verify the chain:

For each entry, re-hash the artifact bytes and compare against sha256
Recompute entry_hash from prev_hash + evidence_id + sha256 + captured_at
Verify each entry's prev_hash matches the prior entry's entry_hash
Any break in the chain indicates tampering or corruption

This verification can be performed by opposing counsel, a court-appointed expert, or an independent auditor without requiring access to the Intactus platform.

Search & Retrieval

For user-facing search capabilities, see Search & Discovery.

Layer 1: Structured Search (Free, No LLM)

PostgreSQL handles 80% of queries instantly:

Attorney Action	Implementation	Cost
Text search ("custody")	PostgreSQL full-text search (GIN index)	Free
Filter by classification ("threatening")	Pre-computed tag filter (not yet populated — see note)	Free
Find mentions of a person ("Sarah")	Entity index lookup	Free
Browse timeline	Sort by pre-extracted dates, filter by type	Free
Read summary of each result	Pre-computed at ingestion	Free
Filter by source type, date range	Indexed metadata filters	Free

Note: Classification filtering depends on the classifications JSONB column being populated during enrichment. The column exists but no classification step is currently in the enrichment pipeline, so this filter will not return results until classification is implemented.

Search Request Schema

The planned search schema below describes the target design. Currently, the evidence list endpoint uses query parameters (not this POST body format) for filtering and pagination.

{
  "query": "custody pickup late",
  "filters": {
    "source_types": ["email", "sms"],
    "date_range": { "start": "2025-06-01", "end": "2025-12-31" },
    "classifications": ["custody_relevant"],
    "entity_ids": ["uuid-of-sarah"],
    "capture_methods": ["upload", "email_forward"],
    "statuses": ["completed"],
    "client_visible": true
  },
  "sort": { "field": "content_date", "order": "desc" },
  "cursor": null,
  "limit": 25
}

Notes:

query drives PostgreSQL full-text search (GIN index). Empty or null query returns all items matching the filters.
All filter arrays use AND between different filter types, OR within a filter array. Example: source_types: ["email", "sms"] AND classifications: ["custody_relevant"] returns emails OR texts that are custody-relevant.
entity_ids filters to evidence items that mention any of the specified entities (via entity_evidence junction table).
Valid capture_methods values: upload, browser, api, email_forward.

Layer 2: Agentic Retrieval (Cheap, Haiku/Sonnet)

For complex queries that require reasoning, such as "find all instances where he contradicted his financial disclosure" or "identify the escalation pattern across these communications," the system uses agentic reasoning over a structured evidence index.

Attorney asks complex question
        ↓
Build evidence manifest: structured index of all case evidence
  (item ID, type, date, source, sender, recipient, entities,
   classifications, summary, relationship edges)
        ↓
LLM (Haiku or Sonnet) reasons over the manifest
  - Identifies relevant evidence items
  - Explains reasoning: "Items 14, 27, 31, and 45 are relevant because..."
  - Requests full text of specific items if needed
        ↓
If full text needed: retrieve from S3, send to LLM with question
        ↓
Return results with explicit provenance (item IDs, dates, sources)

This approach is inspired by PageIndex (vectorless, reasoning-based retrieval) and DeepRead (structure-aware document reasoning). The key insight: the evidence manifest (summaries, metadata, entities, relationships, classifications) is a structured index that the LLM navigates through reasoning, not similarity matching. For a typical case (50-500 items), the manifest fits comfortably in context.

Agent-native note: Agents use the same tool layer as the UI to compose searches. A Research Agent performing agentic retrieval calls evidence.search, entities.search, relationships.traverse, and facts.search, the same atomic tools available to any consumer. The manifest-reasoning pattern is one composition strategy; agents can compose tools in novel ways (e.g., traversing relationships first, then filtering evidence) without being constrained to a single retrieval pipeline.

Why not vector search?

Evidence queries are relational, not just semantic. "All emails from Sarah in October" is a structured query, not a similarity search.
Provenance is critical. The attorney needs exact document IDs and dates, not similarity scores.
The evidence corpus per case is small enough (50-500 items) that agentic reasoning over a structured manifest outperforms chunk-based vector retrieval on both accuracy and explainability.
No embedding vendor means no second AI provider, no second ZDR agreement, no second privilege argument.
Vector search can be added later (pgvector is in the PostgreSQL stack) as an additional retrieval signal if case sizes grow beyond what fits in context.

Layer 3: Deep Analysis (Expensive, Sonnet/Opus)

Only invoked when the attorney explicitly requests synthesis or analysis. The agentic retrieval layer identifies relevant evidence; the analysis layer reasons deeply over the full text.

Operation	Model	Estimated Cost
Summarize selected evidence items	Sonnet	~$0.30-0.50
Build timeline narrative	Sonnet	~$0.30-0.50
Compare/contrast documents	Sonnet	~$0.40-0.60
Detect patterns across corpus	Sonnet	~$0.50-1.00
Deep legal analysis	Opus	~$2.00-3.00
Full case report generation	Opus	~$3.00-5.00

Model Routing Strategy

Query
  │
  ├── Structured search (SQL-expressible)         → PostgreSQL (free)
  ├── Classification / quick summary              → Haiku ($0.01-0.03)
  ├── Agentic retrieval / manifest reasoning      → Haiku or Sonnet ($0.05-0.30)
  ├── Standard analysis / synthesis               → Sonnet ($0.30-0.50)
  └── Complex legal reasoning / full report       → Opus ($2.00-3.00)

Background Job Architecture

Job Queue

Component	Technology	Rationale
Queue broker	Redis	Simple, proven, low-latency
Task framework	Celery (Python)	Mature, integrates with FastAPI, supports retries/scheduling
Scheduling	Celery Beat (deferred)	Periodic tasks — not yet configured, will be added when monitoring/polling features ship

Job Types

Job	Trigger	Priority	Timeout	Milestone
Evidence upload processing (ingestion pipeline)	Upload confirmed	High	5 min	M2
Stale upload cleanup	Periodic (manual for now)	Low	1 min	M2
Web capture	Attorney/client submits URL	High	2 min	Future
Email ingestion (OAuth)	Attorney initiates sync	Medium	10 min	Future
Inbound email processing	SES receives forwarded email	High	5 min	Future
Enrichment (entities, facts, relationships, summary)	Upload completed or re-enrichment requested	Medium	2 min	M4
Automated monitoring poll	Celery Beat schedule	Low	5 min	Future
Analysis request	Attorney initiates	Medium	5 min	Future
Report generation	Attorney initiates	Low	10 min	Future
Chat with My Case query	Client asks question	High	30 sec	Future

Status Tracking

Every job has a status visible to the appropriate user:

QUEUED → PROCESSING → COMPLETED
                   → FAILED (with error detail)
                   → CANCELLING → CANCELLED

Attorney sees: job status per evidence item, any failures requiring attention
Failed jobs are isolated; one bad extraction doesn't block the rest of the batch

Agent-initiated jobs: Agents can enqueue background jobs through the API (e.g., an Intake Agent triggering ingestion for a batch of uploads, or a Drafting Agent requesting report generation). Agents poll for job completion or receive webhook notifications. Every agent-initiated job is attributed to the agent and its owner attorney in the audit trail.

Async Operations

Any API operation that takes >5 seconds returns a job_id instead of blocking. This applies to both human and agent callers.

Caller invokes long-running operation (e.g., POST /cases/{id}/reports)
        ↓
API returns immediately: { "job_id": "uuid", "status": "queued", "poll_url": "/jobs/{id}" }
        ↓
Caller polls GET /jobs/{id} for status OR receives webhook on completion
        ↓
On completion: GET /jobs/{id}/result returns output or presigned download URL

Job management endpoints:

Endpoint	Description
`GET /jobs/{id}`	Get job status, progress, and metadata
`GET /jobs`	List jobs (filterable by case, type, status)
`POST /jobs/{id}/cancel`	Cancel an in-flight job
`POST /jobs/{id}/retry`	Retry a failed job
`GET /jobs/{id}/result`	Get job output or presigned download URL

Failed jobs include: error type, error message, retry guidance, and partial results (if any). See agent-native.md: Async Operations Pattern for the full specification.

Event System

The platform emits events on state changes. Both the UI and agents can subscribe to events for real-time reactivity.

Event Types

Events are emitted for significant state changes. The canonical event type list is maintained in agent-native.md: Event Types.

Implemented event types: 21 event types across 6 domains — evidence.*, entity.*, relationship.*, fact.*, job.*, case.*. These cover CRUD operations on all core knowledge graph entities, job lifecycle events, and case-level events. See backend/app/enums.py for the canonical list.

Future event types: message.received, monitoring.delta_detected, report.ready, agent.session_completed, agent.cap_warning, agent.cap_reached.

Delivery Mechanisms

Polling (M2): GET /events?since={timestamp}&types={event_types} returns events since the given timestamp. Long-polling option (?wait=30) holds the connection up to 30 seconds to reduce chattiness.

Webhooks (not yet implemented): Agents will register a webhook URL during session creation. The platform will POST event payloads on state changes with exponential backoff and retries.

Event Schema

Every event includes: event_id, event_type, case_id, entity_type, entity_id, actor_type (human/agent/system), actor_id, timestamp, and an event-specific data payload.

See agent-native.md: Event System for the full specification.

Notification System

Note

Implementation Status: Not yet implemented. No notifications table, WebSocket push, or push notification infrastructure exists. This section describes the planned notification architecture.

Events drive user-facing notifications. Not every event produces a notification; only events that require user attention do.

Planned delivery channels:

Channel	Technology	Use Case
In-app	WebSocket push to dashboard (planned)	Real-time: new evidence processed, messages received
Email	Resend transactional email (planned)	Digests, job failures, approval requests, monitoring alerts
Push	Web Push API (planned)	Mobile-priority: client messages, urgent monitoring alerts

Planned preference configuration: Each user will configure notification preferences per event type and channel, with defaults set per role.

Privacy & Privilege Architecture

Heppner-Safe Design

Every design decision is informed by United States v. Heppner (SDNY, Feb. 10, 2026):

Heppner Failure	Intactus Design
Client used consumer AI independently	Attorney directs all AI analysis (work product)
Consumer privacy policy allows data disclosure	Anthropic commercial API terms (no training on customer data)
No expectation of confidentiality	Contractual confidentiality with Anthropic
AI is not an attorney	AI operates as a tool under attorney direction (Kovel doctrine)
Documents didn't reflect counsel's strategy	Analysis initiated by attorney reflects their litigation strategy
No attorney oversight of AI actions	Every agent action logged with `agent_owner_id`, attorney directs and reviews all agent work product

Data Flow

Client uploads evidence (or system captures it)
        ↓
Stored in encrypted S3 (scoped to firm/case, in platform's VPC)
        ↓
Ingestion pipeline processes locally (text extraction) then LLM enrichment
        ↓
Entity/relationship extraction, classification, summary via Haiku; fact extraction via Sonnet
        ↓
Attorney reviews evidence in dashboard
        ↓
Attorney initiates analysis → agentic retrieval identifies relevant items
        ↓
Relevant evidence sent to Sonnet/Opus via Anthropic API (direct). Bedrock migration planned before production.
        ↓
Response returned to platform
        ↓
Analysis stored as attorney work product

Infrastructure Requirements

Requirement	Solution	Status
Data encryption at rest	S3 SSE-KMS, RDS encryption	Planned
Data encryption in transit	TLS 1.3 everywhere	Implemented (Caddy)
No data on public internet	Bedrock via PrivateLink (planned)	Planned — currently using Anthropic API (direct)
Zero data retention by LLM provider	Anthropic commercial API terms; Bedrock ZDR planned	Partial — commercial terms apply; ZDR planned
No model training on customer data	Anthropic commercial API terms	Implemented (contractual)
SOC 2 Type II compliance	Vanta/Drata for continuous monitoring	Planned
Audit logging	Application-level audit log; CloudTrail planned	Partial — app audit log implemented
Multi-tenant data isolation	Firm/case scoping on every query + row-level security	Implemented

PII Redaction Layer

Note

Implementation Status: Not yet implemented. Presidio is not installed and no PII redaction infrastructure exists. This section describes the planned redaction architecture.

Even within the private VPC, an optional redaction layer using Microsoft Presidio:

Evidence text passes through Presidio NER pipeline
PII entities replaced with consistent placeholders ([PERSON_1], [DATE_1], etc.)
Mapping table stored locally, encrypted
Redacted text sent to LLM
Response rehydrated with original values

This provides a second line of defense beyond the contractual ZDR protections.

Agent-Specific Privilege Rules

Agents operate under the same privilege framework as human users, with additional structural guarantees:

Attorney direction: Every agent acts on behalf of an attorney (agent_owner_id). The attorney initiates, configures, and reviews agent work. This satisfies the Kovel doctrine requirement that the agent operates under attorney direction.
Case scoping: An agent cannot access cases not assigned to its owner attorney. The same firm/case middleware that scopes human access scopes agent access.
Work product inheritance: All agent work product (extracted facts, generated reports, search results) inherits attorney work-product privilege because the attorney directed the agent's operation. The audit trail proves direction.
No independent action: Agents cannot operate without attorney authorization. There is no "background agent" that acts on its own initiative; every agent session traces back to an attorney's explicit or standing instruction.
Audit fidelity: The agent audit trail logs every action with the same granularity as human actions, plus agent-specific metadata (model used, reasoning trace, input/output hashes). This provides the evidentiary basis for privilege claims.

Technology Stack

Layer	Technology	Rationale
Frontend	Next.js, React 19, TypeScript	Path to React Native for mobile; component reuse across client portal and attorney dashboard
Styling	Tailwind CSS v4, shadcn/ui, Radix UI	Consistent design system, accessible components
Icons	Lucide React	Clean, consistent iconography
Package manager	pnpm	Fast, disk-efficient
API codegen	@hey-api/openapi-ts (planned)	Auto-generate TypeScript client from FastAPI OpenAPI schema. Currently using raw fetch()
Backend	Python 3.12+, FastAPI, Uvicorn	Async-first, existing expertise from evidence project
ORM	SQLAlchemy 2.x (async) + asyncpg	Async database access, mature migration tooling
Migrations	Alembic	Schema versioning and migration management
Data validation	Pydantic + pydantic-settings	Settings management, request/response schemas
Dependency management	uv	Fast, reproducible Python environments
Database	PostgreSQL 16	Relational data, full-text search, JSONB, future pgvector/AGE option
Object storage	AWS S3	Artifact storage, encryption, durability
LLM	Claude (Haiku/Sonnet) via Anthropic API	Single vendor for all AI. Bedrock migration planned before production
Web capture	Playwright (Python) (not yet implemented)	Full-page screenshots, video interception
OCR	Anthropic Vision API (Claude Haiku)	High-quality image OCR via LLM vision
NER/Entity extraction	Haiku/Sonnet LLM	Single-pass extraction during ingestion, configurable per step
Transcription	Whisper (not yet implemented)	Cost-effective audio/video transcription
Task queue	Celery + Redis	Async job processing, scheduling, retries
Auth	FastAPI-owned (magic links via Resend)	Unified human + agent auth, PostgreSQL-backed sessions
Email (transactional)	Resend	Magic links, notifications
Email (inbound)	AWS SES (receiving) (not yet implemented)	Case intake email addresses
Reverse proxy	Caddy	Auto HTTPS (Let's Encrypt), routing, security headers
Deployment	Docker Compose	Single VPS initially, containers ready for scaling
Linting	Ruff (Python), ESLint (TypeScript)	Consistent code quality
Testing	pytest, pytest-asyncio, httpx	Backend test suite
Dev workflow	Makefile	Unified commands for dev, test, lint, migrate, deploy
Payments	Stripe (not yet implemented)	Subscription billing, usage metering, invoicing
Compliance	Vanta or Drata (not yet implemented)	SOC 2 continuous monitoring

Billing Infrastructure

Note

Implementation Status: Not yet implemented. This section describes the planned billing architecture. No Stripe integration exists.

Billing is handled entirely through Stripe. The platform tracks usage internally and reports it to Stripe for metering and invoicing.

Component	Purpose
Stripe Products & Prices	Define subscription tiers (Practitioner, Firm, Enterprise)
Stripe Subscriptions	Manage recurring billing per firm
Stripe Usage Records	Report AI analysis token consumption for metered billing
Stripe Customer Portal	Self-service plan changes, payment method updates, invoice history
Stripe Webhooks	Sync subscription state changes back to the platform

Billing period: Monthly, aligned to firm signup date. AI usage (metered component) is reported to Stripe daily and invoiced at period end. Base subscription fees are charged at period start.

Payment failure handling: Stripe Smart Retries handle failed payments automatically. After 3 failed attempts over 14 days, the subscription moves to past_due. The firm admin receives email notifications at each retry. After 28 days past due, the account is downgraded to read-only mode (no new evidence ingestion, no AI analysis) until payment is resolved. No data is deleted; the vault remains accessible in read-only mode indefinitely.

See business-model.md for pricing tiers, AI cost management, and the usage tracking model.

Background Job Architecture

Job Queue

Component	Technology	Rationale
Queue broker	Redis	Simple, proven, low-latency
Task framework	Celery (Python)	Mature, integrates with FastAPI, supports retries/scheduling
Scheduling	Celery Beat (deferred)	Periodic tasks — not yet configured, will be added when monitoring/polling features ship

Job Types

Job	Trigger	Priority	Timeout	Milestone
Evidence upload processing (ingestion pipeline)	Upload confirmed	High	5 min	M2
Stale upload cleanup	Periodic (manual for now)	Low	1 min	M2
Web capture	Attorney/client submits URL	High	2 min	Future
Email ingestion (OAuth)	Attorney initiates sync	Medium	10 min	Future
Inbound email processing	SES receives forwarded email	High	5 min	Future
Enrichment (entities, facts, relationships, summary)	Upload completed or re-enrichment requested	Medium	2 min	M4
Automated monitoring poll	Celery Beat schedule	Low	5 min	Future
Analysis request	Attorney initiates	Medium	5 min	Future
Report generation	Attorney initiates	Low	10 min	Future
Chat with My Case query	Client asks question	High	30 sec	Future

Status Tracking

Every job has a status visible to the appropriate user:

QUEUED → PROCESSING → COMPLETED
                   → FAILED (with error detail)
                   → CANCELLING → CANCELLED

Attorney sees: job status per evidence item, any failures requiring attention
Failed jobs are isolated; one bad extraction doesn't block the rest of the batch

Async Operations

Any API operation that takes >5 seconds returns a job_id instead of blocking. This applies to both human and agent callers.

Caller invokes long-running operation (e.g., POST /cases/{id}/reports)
        ↓
API returns immediately: { "job_id": "uuid", "status": "queued", "poll_url": "/jobs/{id}" }
        ↓
Caller polls GET /jobs/{id} for status OR receives webhook on completion
        ↓
On completion: GET /jobs/{id}/result returns output or presigned download URL

Job management endpoints:

Endpoint	Description
`GET /jobs/{id}`	Get job status, progress, and metadata
`GET /jobs`	List jobs (filterable by case, type, status)
`POST /jobs/{id}/cancel`	Cancel an in-flight job
`POST /jobs/{id}/retry`	Retry a failed job
`GET /jobs/{id}/result`	Get job output or presigned download URL

Failed jobs include: error type, error message, retry guidance, and partial results (if any). See agent-native.md: Async Operations Pattern for the full specification.

Event System

The platform emits events on state changes. Both the UI and agents can subscribe to events for real-time reactivity.

Event Types

Events are emitted for significant state changes. The canonical event type list is maintained in agent-native.md: Event Types.

Future event types: message.received, monitoring.delta_detected, report.ready, agent.session_completed, agent.cap_warning, agent.cap_reached.

Delivery Mechanisms

Webhooks (not yet implemented): Agents will register a webhook URL during session creation. The platform will POST event payloads on state changes with exponential backoff and retries.

Event Schema

Every event includes: event_id, event_type, case_id, entity_type, entity_id, actor_type (human/agent/system), actor_id, timestamp, and an event-specific data payload.

See agent-native.md: Event System for the full specification.

Notification System

Note

Implementation Status: Not yet implemented. No notifications table, WebSocket push, or push notification infrastructure exists. This section describes the planned notification architecture.

Events drive user-facing notifications. Not every event produces a notification; only events that require user attention do.

Planned delivery channels:

Channel	Technology	Use Case
In-app	WebSocket push to dashboard (planned)	Real-time: new evidence processed, messages received
Email	Resend transactional email (planned)	Digests, job failures, approval requests, monitoring alerts
Push	Web Push API (planned)	Mobile-priority: client messages, urgent monitoring alerts

Planned preference configuration: Each user will configure notification preferences per event type and channel, with defaults set per role.

Privacy & Privilege Architecture

Heppner-Safe Design

Every design decision is informed by United States v. Heppner (SDNY, Feb. 10, 2026):

Heppner Failure	Intactus Design
Client used consumer AI independently	Attorney directs all AI analysis (work product)
Consumer privacy policy allows data disclosure	Anthropic commercial API terms (no training on customer data)
No expectation of confidentiality	Contractual confidentiality with Anthropic
AI is not an attorney	AI operates as a tool under attorney direction (Kovel doctrine)
Documents didn't reflect counsel's strategy	Analysis initiated by attorney reflects their litigation strategy
No attorney oversight of AI actions	Every agent action logged with `agent_owner_id`, attorney directs and reviews all agent work product

Data Flow

Client uploads evidence (or system captures it)
        ↓
Stored in encrypted S3 (scoped to firm/case, in platform's VPC)
        ↓
Ingestion pipeline processes locally (text extraction) then LLM enrichment
        ↓
Entity/relationship extraction, classification, summary via Haiku; fact extraction via Sonnet
        ↓
Attorney reviews evidence in dashboard
        ↓
Attorney initiates analysis → agentic retrieval identifies relevant items
        ↓
Relevant evidence sent to Sonnet/Opus via Anthropic API (direct). Bedrock migration planned before production.
        ↓
Response returned to platform
        ↓
Analysis stored as attorney work product

Infrastructure Requirements

Requirement	Solution	Status
Data encryption at rest	S3 SSE-KMS, RDS encryption	Planned
Data encryption in transit	TLS 1.3 everywhere	Implemented (Caddy)
No data on public internet	Bedrock via PrivateLink (planned)	Planned — currently using Anthropic API (direct)
Zero data retention by LLM provider	Anthropic commercial API terms; Bedrock ZDR planned	Partial — commercial terms apply; ZDR planned
No model training on customer data	Anthropic commercial API terms	Implemented (contractual)
SOC 2 Type II compliance	Vanta/Drata for continuous monitoring	Planned
Audit logging	Application-level audit log; CloudTrail planned	Partial — app audit log implemented
Multi-tenant data isolation	Firm/case scoping on every query + row-level security	Implemented

PII Redaction Layer

Note

Implementation Status: Not yet implemented. Presidio is not installed and no PII redaction infrastructure exists. This section describes the planned redaction architecture.

Even within the private VPC, an optional redaction layer using Microsoft Presidio:

Evidence text passes through Presidio NER pipeline
PII entities replaced with consistent placeholders ([PERSON_1], [DATE_1], etc.)
Mapping table stored locally, encrypted
Redacted text sent to LLM
Response rehydrated with original values

This provides a second line of defense beyond the contractual ZDR protections.

Agent-Specific Privilege Rules

Agents operate under the same privilege framework as human users, with additional structural guarantees:

Attorney direction: Every agent acts on behalf of an attorney (agent_owner_id). The attorney initiates, configures, and reviews agent work. This satisfies the Kovel doctrine requirement that the agent operates under attorney direction.
Case scoping: An agent cannot access cases not assigned to its owner attorney. The same firm/case middleware that scopes human access scopes agent access.
Work product inheritance: All agent work product (extracted facts, generated reports, search results) inherits attorney work-product privilege because the attorney directed the agent's operation. The audit trail proves direction.
No independent action: Agents cannot operate without attorney authorization. There is no "background agent" that acts on its own initiative; every agent session traces back to an attorney's explicit or standing instruction.
Audit fidelity: The agent audit trail logs every action with the same granularity as human actions, plus agent-specific metadata (model used, reasoning trace, input/output hashes). This provides the evidentiary basis for privilege claims.

Technology Stack

Layer	Technology	Rationale
Frontend	Next.js, React 19, TypeScript	Path to React Native for mobile; component reuse across client portal and attorney dashboard
Styling	Tailwind CSS v4, shadcn/ui, Radix UI	Consistent design system, accessible components
Icons	Lucide React	Clean, consistent iconography
Package manager	pnpm	Fast, disk-efficient
API codegen	@hey-api/openapi-ts (planned)	Auto-generate TypeScript client from FastAPI OpenAPI schema. Currently using raw fetch()
Backend	Python 3.12+, FastAPI, Uvicorn	Async-first, existing expertise from evidence project
ORM	SQLAlchemy 2.x (async) + asyncpg	Async database access, mature migration tooling
Migrations	Alembic	Schema versioning and migration management
Data validation	Pydantic + pydantic-settings	Settings management, request/response schemas
Dependency management	uv	Fast, reproducible Python environments
Database	PostgreSQL 16	Relational data, full-text search, JSONB, future pgvector/AGE option
Object storage	AWS S3	Artifact storage, encryption, durability
LLM	Claude (Haiku/Sonnet) via Anthropic API	Single vendor for all AI. Bedrock migration planned before production
Web capture	Playwright (Python) (not yet implemented)	Full-page screenshots, video interception
OCR	Anthropic Vision API (Claude Haiku)	High-quality image OCR via LLM vision
NER/Entity extraction	Haiku/Sonnet LLM	Single-pass extraction during ingestion, configurable per step
Transcription	Whisper (not yet implemented)	Cost-effective audio/video transcription
Task queue	Celery + Redis	Async job processing, scheduling, retries
Auth	FastAPI-owned (magic links via Resend)	Unified human + agent auth, PostgreSQL-backed sessions
Email (transactional)	Resend	Magic links, notifications
Email (inbound)	AWS SES (receiving) (not yet implemented)	Case intake email addresses
Reverse proxy	Caddy	Auto HTTPS (Let's Encrypt), routing, security headers
Deployment	Docker Compose	Single VPS initially, containers ready for scaling
Linting	Ruff (Python), ESLint (TypeScript)	Consistent code quality
Testing	pytest, pytest-asyncio, httpx	Backend test suite
Dev workflow	Makefile	Unified commands for dev, test, lint, migrate, deploy
Payments	Stripe (not yet implemented)	Subscription billing, usage metering, invoicing
Compliance	Vanta or Drata (not yet implemented)	SOC 2 continuous monitoring

Billing Infrastructure

Note

Implementation Status: Not yet implemented. This section describes the planned billing architecture. No Stripe integration exists.

Billing is handled entirely through Stripe. The platform tracks usage internally and reports it to Stripe for metering and invoicing.

Component	Purpose
Stripe Products & Prices	Define subscription tiers (Practitioner, Firm, Enterprise)
Stripe Subscriptions	Manage recurring billing per firm
Stripe Usage Records	Report AI analysis token consumption for metered billing
Stripe Customer Portal	Self-service plan changes, payment method updates, invoice history
Stripe Webhooks	Sync subscription state changes back to the platform

Billing period: Monthly, aligned to firm signup date. AI usage (metered component) is reported to Stripe daily and invoiced at period end. Base subscription fees are charged at period start.

See business-model.md for pricing tiers, AI cost management, and the usage tracking model.

Technology Stack

Layer	Technology	Rationale
Frontend	Next.js, React 19, TypeScript	Path to React Native for mobile; component reuse across client portal and attorney dashboard
Styling	Tailwind CSS v4, shadcn/ui, Radix UI	Consistent design system, accessible components
Icons	Lucide React	Clean, consistent iconography
Package manager	pnpm	Fast, disk-efficient
API codegen	@hey-api/openapi-ts (planned)	Auto-generate TypeScript client from FastAPI OpenAPI schema. Currently using raw fetch()
Backend	Python 3.12+, FastAPI, Uvicorn	Async-first, existing expertise from evidence project
ORM	SQLAlchemy 2.x (async) + asyncpg	Async database access, mature migration tooling
Migrations	Alembic	Schema versioning and migration management
Data validation	Pydantic + pydantic-settings	Settings management, request/response schemas
Dependency management	uv	Fast, reproducible Python environments
Database	PostgreSQL 16	Relational data, full-text search, JSONB, future pgvector/AGE option
Object storage	AWS S3	Artifact storage, encryption, durability
LLM	Claude (Haiku/Sonnet) via Anthropic API	Single vendor for all AI. Bedrock migration planned before production
Web capture	Playwright (Python) (not yet implemented)	Full-page screenshots, video interception
OCR	Anthropic Vision API (Claude Haiku)	High-quality image OCR via LLM vision
NER/Entity extraction	Haiku/Sonnet LLM	Single-pass extraction during ingestion, configurable per step
Transcription	Whisper (not yet implemented)	Cost-effective audio/video transcription
Task queue	Celery + Redis	Async job processing, scheduling, retries
Auth	FastAPI-owned (magic links via Resend)	Unified human + agent auth, PostgreSQL-backed sessions
Email (transactional)	Resend	Magic links, notifications
Email (inbound)	AWS SES (receiving) (not yet implemented)	Case intake email addresses
Reverse proxy	Caddy	Auto HTTPS (Let's Encrypt), routing, security headers
Deployment	Docker Compose	Single VPS initially, containers ready for scaling
Linting	Ruff (Python), ESLint (TypeScript)	Consistent code quality
Testing	pytest, pytest-asyncio, httpx	Backend test suite
Dev workflow	Makefile	Unified commands for dev, test, lint, migrate, deploy
Payments	Stripe (not yet implemented)	Subscription billing, usage metering, invoicing
Compliance	Vanta or Drata (not yet implemented)	SOC 2 continuous monitoring

Billing Infrastructure

Note

Implementation Status: Not yet implemented. This section describes the planned billing architecture. No Stripe integration exists.

Billing is handled entirely through Stripe. The platform tracks usage internally and reports it to Stripe for metering and invoicing.

Component	Purpose
Stripe Products & Prices	Define subscription tiers (Practitioner, Firm, Enterprise)
Stripe Subscriptions	Manage recurring billing per firm
Stripe Usage Records	Report AI analysis token consumption for metered billing
Stripe Customer Portal	Self-service plan changes, payment method updates, invoice history
Stripe Webhooks	Sync subscription state changes back to the platform

Billing period: Monthly, aligned to firm signup date. AI usage (metered component) is reported to Stripe daily and invoiced at period end. Base subscription fees are charged at period start.

See business-model.md for pricing tiers, AI cost management, and the usage tracking model.