trust-layer-design

# Enterprise Trust Layer — Design Document

> **Status**: design (sprint #29 candidate). Research-backed plan for transforming
> the wiki into a company-wide Knowledge Operating System exposed via MCP to
> multiple AI consumers (Claude Desktop, Claude Code, Codex CLI, internal chat,
> custom agents). Builds on existing v1.0 stack (Postgres + pgvector +
> pg_search + reranker + MCP) and Sprint #28 follow-ups (saved views, agentic
> chat).

---

## 1. Executive summary

Today's wiki is **read-correct** (CRUD + RBAC + audit) and **AI-aware**
(extract → dedupe → proposals + agentic chat). Missing is the
**trust layer** — the property that any AI-generated output is
**verifiably grounded**, **freshness-aware**, **conflict-aware**, and
**access-controlled per-consumer**.

The plan covers four interlocking pillars:

| Pillar | What it guarantees | Industry term |
|---|---|---|
| **A. Grounded generation** | answer cannot reference content not in retrieved sources | "grounding" (MS), "trust layer" (Salesforce), "verifiable fact layer" (Hebbia) |
| **B. Governance metadata** | every page carries authority, freshness, access dimensions readable by retrieval + validator | "knowledge graph signals" (Glean), "metadata-driven context" (M-Files) |
| **C. Validator pipeline** | server-side post-generation checks beyond what the model can do alone | RAGAS faithfulness, Patronus Lynx, Galileo Luna pattern |
| **D. Multi-consumer access control** | per-agent identity + per-domain MCP server + filter-at-retrieval | OpenAI Connectors / Glean permissions / Pinecone+SpiceDB pattern |

Plus two cross-cutting concerns:
- **E. Cache hierarchy** — exact / canonical / semantic / prompt — drives cost down 60–85% in proven deployments while preserving freshness via CDC invalidation.
- **F. Audit trail** — every AI access logged with `human_actor` + `agent_actor` + `policy_id` + `data_returned_hash` for SOC2/ISO 27001/ISO 42001 compliance.

**Vendor neutrality.** The trust layer is implemented on our side — citation
grounding, validator pipeline, cache, RBAC — and is therefore **provider-agnostic
by design**. The wiki routes every LLM call through the shared LiteLLM proxy
at `hub.s2.emersion.eu`, which unifies Anthropic, OpenAI, Gemini, and Ollama
behind a single `/v1/messages` endpoint. Switching the chat model is a
one-line change in `.env` (`WIKI_CHAT_MODEL=claude-sonnet-pro` →
`openai/gpt-5` → `gemini/gemini-2.5-pro` → `ollama/qwen2.5:7b`). The only
backend-specific code path is the optional Anthropic-Citations adapter
(§6.4 path A), which the DIY substring grounder (§6.4 path B) replaces
identically for every other provider.

---

## 2. Research-validated principles

Findings from production deployments (Glean, M365 Copilot, Salesforce Einstein,
Hebbia, OpenAI Connectors, Notion AI, Confluence AI) and 2024–2026 academic /
industry literature:

### 2.1 Citations are necessary but not sufficient

A **citation that is verifiably verbatim in the source document at a known char
range** eliminates URL hallucination. Two implementation paths reach the same
guarantee:

- **Vendor-native** — Anthropic Citations API (Jan 2025) emits structured
  citation blocks with `start_char_index` / `end_char_index` already bound to
  the retrieved document. Works only on Claude.
- **Vendor-neutral (DIY)** — model emits a citation marker (e.g. `[[wiki/path.md]]`
  or a structured tool call); server post-processes the answer, performs
  **exact substring match** of the surrounding sentence against the cited page,
  records `start_offset`/`end_offset`, and degrades any sentence that fails the
  match to "uncovered". This works against **any** LLM (OpenAI GPT-5, Gemini
  2.5, Ollama Qwen 2.5, ...) routed through LiteLLM and is what Emersion runs
  in production.

Both approaches give the same end-state — a citation chip in the UI that the
user can click to verify the exact passage — and feed the same validator
(§7).

What citations do **NOT** guarantee, regardless of approach:

- the cited passage actually entails the claim (it might just mention the topic)
- the source document is current
- the source is the authoritative one (a person-page mentioning a policy is
  not the policy)
- no other source in the corpus contradicts the claim

**→ Trust layer = Citations + governance + validator. Citations alone is layer 0.**

### 2.2 Filter at retrieval, never at generation

The Pinecone/SpiceDB/Authzed/Glean consensus: documents the requesting
principal cannot access **must never enter the LLM context window**. Asking
the model "nicely" to filter is a leak vector (the model leaks via summary
and tool calls even when politely told not to). Implication: every MCP tool
call resolves identity FIRST, applies permission filter, THEN retrieves.

### 2.3 Confidence scores mislead users

ACM UMAP 2025 research: prompt-derived "I am 87% sure" numbers are not
calibrated and **erode user trust faster than they help detect halucinations**.
Use binary states (grounded / not-grounded / conflict) and hard rules
(deprecated source = block).

### 2.4 Semantic cache threshold = 0.92, not 0.85

The 0.88–0.94 cosine similarity band is the "topically related but
semantically different" danger zone. Genuine duplicates cluster ≥0.95.
Documented enterprise misfires (banking case, InfoQ 2025) traced to
threshold 0.85.

### 2.5 Cache key must include RBAC scope hash

The #1 enterprise breach pattern in semantic caching is cross-tenant leakage
via metadata-filter-only namespaces. Cache key must be:

```
sha256(
  question_normalized || model_version || embedding_model_version ||
  system_prompt_hash || kb_version || rbac_scope_hash
)
```

Drop any one and you get silent cross-contamination.

### 2.6 Backend boundary vs content domain — two different things

Two patterns get conflated in industry literature:

1. **Per-backend MCP** (OpenAI Connectors pattern): one MCP server per
   *separate system* — Salesforce CRM, Workday HR, Jira, internal wiki.
   Each is its own data store, codebase, lifecycle. Cross-backend isolation
   is structural via RFC 8707 Resource Indicators. Used when company runs
   many SaaS systems.

2. **Per-content-domain access control inside a single MCP** (this wiki):
   one MCP server, **one** unified knowledge base. Pages are tagged with a
   `domain` frontmatter field (`finance`, `marketing`, `hr`, …) and access
   is field-filtered per request. Used when company runs one knowledge OS
   that covers all departments.

For Emersion's wiki the right answer is **option 2 today**: single `wiki-mcp`,
field-level domain enforcement via Cedar policy applied during retrieval.
**Option 1 becomes relevant later** when the company plugs in a second data
*system* (e.g. a future `salesforce-mcp` or `workday-mcp`) — at that point an
MCP gateway aggregates both behind one client-facing endpoint.

See §9 for the full single-MCP-multi-domain design and §11.5 for the future
gateway pattern.

### 2.7 `ai_access` is a separate axis from human classification

Knostic, Lasso, and Glean expose `ai_access ∈ {none, retrieval_only, full}`
as **orthogonal** to `classification`. Rationale: data a human is authorized
to read may still be a leak vector when embedded in an LLM context window
(via summary, tool call, or training-data inversion). Example: a sales rep
can read customer NDA text in a UI, but pushing that into an agent's tool
result risks it being summarized into a published artifact.

### 2.8 Contradiction detection scales O(n²) — cap K at 5–8

Pairwise NLI across all retrieved chunks is the proven approach (ContraGen,
DRAGged-Into-a-Conflict 2024–25). At K=20 that's 190 pair calls. Use a
small NLI judge (Galileo Luna-2 3B/8B or DeBERTa NLI) not the main model,
and cap retrieval to 5–8 candidates before contradiction check.

### 2.9 Tamper-evident audit log for SOC2/ISO 27001/ISO 42001

Compliance auditors treat "agent acted with no human request" as an
**attributability gap**. Required log fields: `timestamp`, `human_actor (sub)`,
`agent_actor (act)`, `agent_client_id`, `tool_name`, `resource_id`,
`resource_classification`, `decision (allow|deny)`, `policy_id`,
`request_id`, `session_id`, `data_returned_hash`. Append-only sink
(object-locked S3 or outbox+digest chain). Retain ≥1 year SOC2, 3 years
ISO 27001 typical scope. **Deny events too** — auditors require evidence
the control fired.

---

## 3. Datový model

### 3.1 `pages.governance` JSONB sloupec

```sql
ALTER TABLE pages ADD COLUMN governance JSONB NOT NULL DEFAULT '{}'::jsonb;

CREATE INDEX pages_governance_authority
  ON pages USING GIN ((governance->'authority_level'));
CREATE INDEX pages_governance_owner
  ON pages ((governance->>'owner_user'));
CREATE INDEX pages_governance_valid_until
  ON pages ((governance->>'valid_until'));
CREATE INDEX pages_governance_ai_access
  ON pages ((governance->>'ai_access'));
CREATE INDEX pages_governance_review_due
  ON pages ((governance->>'next_review_due'));
```

Plus separate column `content_hash text` (už máme z sprint #2) hraje roli
tamper-detection signálu pro cache invalidation.

### 3.2 `canonical_answers` — vyřešené known answers

```sql
CREATE TABLE canonical_answers (
  id                   uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  question_hash        text NOT NULL,                    -- L0 exact-match key
  question_text        text NOT NULL,
  question_language    text DEFAULT 'cs',
  question_embedding   vector(1024),                     -- L2 semantic-match key
  rbac_scope_hash      text NOT NULL,                    -- *required* in key
  answer_text          text NOT NULL,
  answer_citations     jsonb NOT NULL,                   -- [{page_id, cited_text, char_range, confidence}]
  model_used           text NOT NULL,
  query_type           text NOT NULL,                    -- synthesis | lookup | agentic
  validated_at         timestamptz NOT NULL,
  validator_verdict    text NOT NULL,                    -- ok | warning (only ok cached)
  approved_by          uuid REFERENCES users(id),        -- null = auto-promoted, set = human
  approved_at          timestamptz,
  source_page_ids      uuid[] NOT NULL DEFAULT '{}',
  source_freshness_min int,                              -- youngest source last_reviewed age
  hit_count            bigint NOT NULL DEFAULT 0,
  expires_at           timestamptz NOT NULL,
  invalidated_at       timestamptz,
  invalidated_reason   text,
  created_at           timestamptz NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX canonical_active_per_scope
  ON canonical_answers (question_hash, rbac_scope_hash)
  WHERE invalidated_at IS NULL;
CREATE INDEX canonical_source_pages
  ON canonical_answers USING GIN (source_page_ids);
CREATE INDEX canonical_embedding_hnsw
  ON canonical_answers USING hnsw (question_embedding vector_cosine_ops)
  WHERE invalidated_at IS NULL;
```

### 3.3 `trust_validations` — výsledky validátoru

```sql
CREATE TABLE trust_validations (
  id                  uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  response_id         uuid NOT NULL REFERENCES chat_messages(id) ON DELETE CASCADE,
  citation_idx        int,
  check_type          text NOT NULL,    -- citation_supports | source_stale | better_source | contradiction | permission_violation
  verdict             text NOT NULL,    -- ok | warning | error
  reason              text NOT NULL,
  suggested_page_id   uuid REFERENCES pages(id),
  contradiction_id    uuid REFERENCES contradiction_warnings(id),
  judge_model         text,             -- haiku-4.5 | galileo-luna-2 | n/a
  validated_at        timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX trust_validations_response ON trust_validations (response_id);
CREATE INDEX trust_validations_problem
  ON trust_validations (verdict) WHERE verdict != 'ok';
```

### 3.4 `contradiction_warnings` — konfliktní zdroje

```sql
CREATE TABLE contradiction_warnings (
  id                       uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  topic_hash               text NOT NULL,
  subject_label            text NOT NULL,
  predicate                text,
  claim_a_text             text NOT NULL,
  source_a_page_id         uuid NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
  source_a_cited_text      text,
  claim_b_text             text NOT NULL,
  source_b_page_id         uuid NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
  source_b_cited_text      text,
  detected_by              text NOT NULL,   -- ingest:dedupe | validator | user:flag
  detected_at              timestamptz NOT NULL DEFAULT now(),
  severity                 text NOT NULL DEFAULT 'medium',
  status                   text NOT NULL DEFAULT 'open',  -- open | reviewing | resolved | wont_fix
  resolved_at              timestamptz,
  resolved_by              uuid REFERENCES users(id),
  resolution_note          text,
  resolution_canonical_id  uuid REFERENCES pages(id)
);

CREATE INDEX contradiction_warnings_topic ON contradiction_warnings (topic_hash);
CREATE INDEX contradiction_warnings_open
  ON contradiction_warnings (status) WHERE status = 'open';
```

### 3.5 `audit_log` — žádné nové sloupce, nové `action` kódy

Existující `audit_log(actor_type, actor_id, action, resource_id, resource_path, metadata jsonb)`
postačuje. Přidáváme action vocabulary:

| Action | Required metadata |
|---|---|
| `chat.response.created` | `{response_id, model, query_type, citation_count, validator_verdict, usage}` |
| `chat.citation.clicked` | `{response_id, citation_idx, page_id}` |
| `chat.response.flagged_incorrect` | `{response_id, reason, citation_idx?}` |
| `chat.response.confirmed_correct` | `{response_id}` |
| `trust.validation.warning` | `{response_id, citation_idx, check_type, suggested_page_id}` |
| `trust.validation.contradiction_detected` | `{contradiction_id, source_a, source_b}` |
| `canonical.created` | `{canonical_id, response_id, source_page_ids, expires_at}` |
| `canonical.cache_hit` | `{canonical_id, question_hash, layer}` |
| `canonical.invalidated` | `{canonical_id, reason, triggered_by_page_id?}` |
| `canonical.approved` | `{canonical_id, approver}` |
| `page.governance.changed` | `{page_id, before, after}` |
| `contradiction.resolved` | `{contradiction_id, resolution, canonical_page_id?}` |
| `mcp.access.allowed` | `{agent_client_id, human_sub, tool, resource_id, policy_id, classification}` |
| `mcp.access.denied` | `{agent_client_id, human_sub, tool, resource_id, policy_id, reason}` |
| `mcp.tool.called` | already exists from sprint #23 — extend with `{policy_id, data_returned_hash}` |

Plus **tamper-evidence**: optional follow-up adds outbox-style digest chain
(hash(N) = sha256(hash(N-1) || row N)) so audit log integrity can be verified
by external party. Out of scope for sprint #29.

---

## 4. Frontmatter governance schema

Validated against industry consensus (Glean signals + M-Files metadata +
Salesforce Trust Layer + Knostic ai_access dimension):

```yaml
---
# Existing identity & lifecycle
title: Security Policy v1.0
type: policy
slug: security-policy-v1-0
status: active                      # active | paused | archived | draft
area: security
tags: []

# === Governance (sprint #29) ===

# Authority — drives validator's "prefer canonical" pass
authority_level: canonical          # canonical | reference | draft | deprecated
                                    # canonical  = source of truth for topic
                                    # reference  = informational, may be cited
                                    # draft      = pre-review, soft-warn on citation
                                    # deprecated = hard-block, suggest superseded_by

# Ownership & approval
owner_user: petr@emersion.dev
owner_team: ENG
approvers:                          # multi-approver workflow
  - email: hana@emersion.dev
    role: compliance
  - email: petr@emersion.dev
    role: cto
approved_at: 2026-05-11T08:00:00Z

# Freshness
last_verified_at: 2026-05-11        # SEPARATE from updated_at — "someone confirmed
                                    # this is still true" vs "someone edited it"
review_cadence_days: 365            # next_review_due = last_verified_at + this
next_review_due: 2027-05-11         # validator stale if past
valid_from: 2026-01-01
valid_until: 2026-12-31              # null = no expiry

# Supersession chain
supersedes: policy/security-policy-2025-v0-9.md
superseded_by: null                  # set when deprecated, points to new version

# Classification & access (orthogonal axes)
classification: confidential          # public | internal | confidential | restricted
domain: security                      # legacy from sprint #20 (work | personal | hr | finance | …)
ai_access: full                       # none | retrieval_only | full
                                      # none           = LLM context forbidden (even for authorized humans)
                                      # retrieval_only = LLM may use as evidence but not echo verbatim
                                      # full           = LLM may include in answer + quote
pii_flags: []                         # ['email','phone','ssn',…] — drives redaction
data_subjects: []                     # user identifiers for GDPR right-to-erasure
embargo_until: null                   # pre-announcement docs
legal_hold: false                     # bypasses normal deletion

# Lifecycle compliance
retention_until: 2032-12-31           # SOC2 CC6.5 disposal evidence
source_type: policy                   # policy | standard | reference | note | external | derived
confidence: 1.0                       # 0..1 — for agent-extracted facts

# Provenance
provenance:
  ingest_source: web:note:petr        # web:note:* | tus:* | api:* | mcp:*
  agent_pipeline: extract-claims-v0.7
  raw_archive_id: <uuid>
  extracted_at: 2026-05-11T08:00:00Z
---
```

### State transitions

```
draft ──(owner sign-off)──→ approved ──(governance sign-off)──→ canonical
                                │
                                ↓ (replacement published)
                              deprecated
```

Each transition writes `page.governance.changed` audit row. Downgrades
(canonical → reference, canonical → draft) require admin role AND audit
note explaining why.

---

## 5. API změny

### 5.1 `/api/pages` — governance management

| Method | Path | Změna |
|---|---|---|
| `POST /api/pages` | accept `governance` block; agent-created → `authority_level=draft` default |
| `PATCH /api/pages/:id` | governance edits gated: `canonical` set requires admin; `approvers` requires admin; `last_verified_at` editable by owner |
| `POST /api/pages/:id/review` | **nový** — `last_verified_at = now()`, triggers re-validation of canonical_answers citing this page |
| `POST /api/pages/:id/deprecate` | **nový** — sets `superseded_by`, `authority_level=deprecated`, cascades to invalidate canonical_answers |
| `POST /api/pages/:id/canonicalize` | **nový** — admin-only; transitions `approved → canonical`, emits audit |

### 5.2 `/api/chat/...` — trust-aware SSE events

Extended `ChatSseEvent`:

```ts
type ChatSseEvent =
  | { type: 'token'; content: string }
  | { type: 'classified'; queryType: 'synthesis' | 'lookup' | 'agentic' }
  | { type: 'tool_call_start'; toolUseId: string; tool: string; input: object }
  | { type: 'tool_call_result'; toolUseId: string; tool: string; summary: string }
  // Structured + char-range-bound (vendor-native on Anthropic, server-side
  // substring grounder elsewhere — see §6.4 paths A and B)
  | { type: 'citation_v2'; idx: number; pageId: string; pagePath: string;
      citedText: string; charRange: [number, number];
      governance: { authority_level: string; valid_until: string|null;
                    last_verified_at: string|null; ai_access: string } }
  | { type: 'trust_warning'; citationIdx: number; checkType: TrustCheckType;
      reason: string; suggestedPageId?: string; contradictionId?: string }
  // Final verdict, sent async after stream completes (validator runs out-of-band)
  | { type: 'trust_verdict'; verdict: 'ok' | 'warning' | 'error';
      canBeCanonical: boolean; checks: TrustCheckSummary[] }
  | { type: 'canonical_hit'; canonicalId: string; hitCount: number; layer: 'L0'|'L1'|'L2' }
  | { type: 'done'; messageId: string }
  | { type: 'error'; message: string };

type TrustCheckType =
  | 'citation_unsupported'
  | 'source_stale'
  | 'source_deprecated'
  | 'better_source_exists'
  | 'contradicting_source'
  | 'permission_violation'
  | 'ai_access_blocked';
```

### 5.3 `/api/canonical-answers` — cache management

| Method | Path | Co |
|---|---|---|
| `GET /api/canonical-answers/lookup` | server-side hash; checks L0 exact + L2 semantic ≥0.92 with rbac_scope match |
| `GET /api/canonical-answers/:id` | detail |
| `POST /api/canonical-answers/:id/approve` | admin promotes auto → human-approved (extends expiry) |
| `POST /api/canonical-answers/:id/invalidate` | manual |
| `DELETE /api/canonical-answers/:id` | admin |
| `GET /api/canonical-answers` | list (admin dashboard) |

### 5.4 `/api/trust-validator` — re-run + read

| Method | Path | Co |
|---|---|---|
| `POST /api/trust-validator/:response-id/rerun` | re-execute pipeline |
| `GET /api/trust-validator/:response-id` | latest results |

### 5.5 `/api/contradictions` — konflikt management

| Method | Path | Co |
|---|---|---|
| `GET /api/contradictions?status=open` | list |
| `GET /api/contradictions/:id` | detail with excerpts from both sources |
| `POST /api/contradictions/:id/resolve` | mark resolved + optional canonical page |
| `POST /api/contradictions` | manual report (user-initiated) |

### 5.6 MCP authorization — RFC 8707 + per-agent identity

Současný MCP server používá Bearer + Keycloak realm. Pro enterprise-grade
rozšiřujeme:

1. **Resource Indicators (RFC 8707)** v token request:
   ```
   POST /realms/emersion/protocol/openid-connect/token
   grant_type=client_credentials
   client_id=agent.tech-robot
   audience=https://api.wiki.s2.emersion.eu/mcp
   resource=https://api.wiki.s2.emersion.eu/mcp
   ```
   Token issued for wiki-mcp **cannot** be replayed against future
   finance-mcp / hr-mcp on same realm.

2. **On-Behalf-Of (RFC 8693) token exchange** pro user-in-the-loop:
   ```
   POST /token
   grant_type=urn:ietf:params:oauth:grant-type:token-exchange
   subject_token=<human-user-token>
   subject_token_type=...:access_token
   actor_token=<agent-service-account-token>
   audience=https://api.wiki.s2.emersion.eu/mcp
   ```
   Resulting token has both `sub` (human) + `act` (agent) claims. MCP server
   logs both in audit.

3. **Per-domain MCP servers** — `wiki-mcp` (current), future `finance-mcp`,
   `hr-mcp`. Same Keycloak realm, distinct `audience` per RFC 8707. Agent
   `tech-robot` can be granted `wiki-mcp + hr-mcp` but not `finance-mcp` —
   permissions live in Keycloak service-account roles, audited via
   `mcp.access.{allowed,denied}` rows.

---

## 6. Změny v `agentic.ts`

> **Vendor-neutral approach.** All snippets below are framed in Anthropic
> message shape (because that is what `@anthropic-ai/sdk` returns through
> LiteLLM today), but the trust-relevant transformations — `pageToDocument`,
> governance context, citation extraction — are model-agnostic. Switching the
> wiki to GPT-5 or Gemini means changing the model alias in `.env`, not
> touching this file. Anthropic-native `citations: { enabled: true }` is
> downgraded to a marker-based DIY path when the active backend doesn't
> implement it (see §6.4).

### 6.1 Tool result → grounded document blocks

```ts
type GroundedDocument = {
  type: 'document';
  source: { type: 'text'; media_type: 'text/markdown'; data: string };
  title: string;
  context: string;             // governance summary as plain text
  // When provider supports it (Claude via Anthropic SDK), pass through
  // native citation-binding hint. Other providers ignore the field; the
  // server-side substring grounder (§7 phase 1) recovers the same property.
  citations?: { enabled: true };
};

function pageToDocument(p: PageWithGovernance): GroundedDocument {
  return {
    type: 'document',
    source: { type: 'text', media_type: 'text/markdown', data: p.content },
    title: p.title,
    context: buildGovernanceContext(p),
    citations: { enabled: true },
  };
}

function buildGovernanceContext(p: PageWithGovernance): string {
  const g = p.governance;
  const stale = g.valid_until && new Date(g.valid_until) < new Date();
  const overdue = g.next_review_due && new Date(g.next_review_due) < new Date();
  return [
    `path: ${p.path}`,
    `type: ${p.type}`,
    `authority: ${g.authority_level}`,
    `owner: ${g.owner_user ?? g.owner_team ?? 'unknown'}`,
    g.approved_at ? `approved: ${g.approved_at}` : null,
    g.valid_until ? `valid_until: ${g.valid_until}${stale ? ' (STALE)' : ''}` : null,
    overdue ? `review_overdue: yes` : null,
    g.supersedes ? `supersedes: ${g.supersedes}` : null,
    g.superseded_by ? `DEPRECATED — superseded_by: ${g.superseded_by}` : null,
  ].filter(Boolean).join('\n');
}
```

### 6.2 Structured runbooks/SOPs — per-step content blocks

Industry pitfall: vendor-native citation binders (Anthropic Citations,
similar OpenAI features) force **sentence-level chunking** when documents are
passed as one big text blob. For structured content (runbooks with numbered
steps, SOPs with sections), sentence chunking butchers meaning.

Solution: emit each step / section as its own content block, so the citation
unit aligns with the structural unit. Anthropic supports this via
`type: 'custom_content'`; the DIY grounder achieves the same by computing
substring offsets per step rather than over the joined string.

```ts
function runbookToDocument(p: RunbookPage): AnthropicCustomContent {
  return {
    type: 'document',
    title: p.title,
    context: buildGovernanceContext(p),
    citations: { enabled: true },
    source: {
      type: 'content',
      content: p.steps.map((step, i) => ({
        type: 'text',
        text: `Step ${i + 1}: ${step.text}`,
      })),
    },
  };
}
```

Triggered by `pages.type ∈ {runbook, sop, decision}` (structured types).

### 6.3 System prompt — governance-aware

```
You are the agentic chat agent for a company knowledge wiki with governance.

Source preference order (always cite the most authoritative):
  1. authority_level=canonical → these are the truth of record
  2. authority_level=reference → reviewed but informational
  3. authority_level=draft → DO NOT cite as fact; if you must mention, prefix
     "podle drafted zápisu..." or "preliminary, not yet approved"
  4. authority_level=deprecated → NEVER cite as current truth; if mentioning
     for context, use "historicky se uvádělo... (deprecated, viz nová verze X)"

Freshness:
  * Page with `valid_until` in the past → mention inline "(informace k <date>)"
  * Page with `review_overdue: yes` → prefer a fresher canonical source

Contradictions:
  * If two sources disagree on a fact, DO NOT pick silently. Either:
    (a) prefer the canonical source explicitly, OR
    (b) state "wiki obsahuje konfliktní informace: A vs B" and cite both.

Citations:
  * If the active backend supports a native citation mechanism (Anthropic Citations),
    use it — each claim must be bound to the exact document range that supports it.
  * If not, append `[[wiki/<path>.md]]` markers immediately after each claim that
    rests on a specific source. The server-side substring grounder (§7 phase 1)
    will resolve them to char ranges and reject any sentence whose marker
    doesn't match the cited page's content.

AI access:
  * Documents with `ai_access: retrieval_only` may be used as evidence for your
    reasoning but DO NOT quote verbatim — paraphrase to summary.
  * Documents with `ai_access: none` will never be in your context.
```

### 6.4 Stream handler — two paths, one wire format

The internal SSE event `citation_v2` is **the same regardless of backend**.
The stream handler picks one of two extraction paths based on what the active
provider emits:

**Path A — native binding (Claude via Anthropic Citations):**

```ts
} else if (event.delta.type === 'citations_delta') {
  const c = event.delta.citation;
  const doc = documents[c.document_index];
  yield {
    type: 'citation_v2',
    idx: citationCounter++,
    pageId: doc.metadata.pageId,
    pagePath: doc.metadata.path,
    citedText: c.cited_text,
    charRange: [c.start_char_index, c.end_char_index],
    governance: doc.metadata.governance,
  };
}
```

**Path B — marker extraction (OpenAI / Gemini / Ollama / any other):**

```ts
// stream-side: collect plain text, defer citation extraction until block end
} else if (event.delta.type === 'text_delta') {
  textBuffer += event.delta.text;
  yield { type: 'token', content: event.delta.text };
}

// on content_block_stop: regex out [[wiki/...]] markers, substring-match against
// the nearest preceding sentence, emit citation_v2 with synthesized char range.
for (const m of textBuffer.matchAll(/\[\[wiki\/([^\]]+)\.md\]\]/g)) {
  const sentence = sentenceBefore(textBuffer, m.index);
  const doc = documentsByPath.get(m[1] + '.md');
  if (!doc) continue;
  const { start, end } = findSubstring(doc.source.data, sentence);
  if (start < 0) {
    // unsupported — sentence marker references a doc but no substring overlap
    // → degrade verdict to "uncovered" in §7 phase 1
    continue;
  }
  yield {
    type: 'citation_v2',
    idx: citationCounter++,
    pageId: doc.metadata.pageId,
    pagePath: doc.metadata.path,
    citedText: doc.source.data.slice(start, end),
    charRange: [start, end],
    governance: doc.metadata.governance,
  };
}
```

The chooser is one line at session setup:
`const useNativeCitations = isAnthropicBackend(model);`

### 6.5 Post-stream → async validator

```ts
const responseId = await chatService.appendMessage({...});
yield { type: 'done', messageId: responseId };

// Fire-and-forget; validator emits trust_verdict event when done
queueTrustValidation({
  responseId, question, answer: fullText.join(''),
  citations: collectedCitations, documents: collectedDocuments,
  rbacScope: requestor.scopeHash,
});
```

---

## 7. Validator pipeline (server-side)

Located in `apps/api/src/lib/trust-validator-service.ts`. Runs as **BullMQ
worker** (queue `trust-validation`, separate from ingest). Each chat response
is queued; results flow back to UI via SSE if connection still open, otherwise
persisted to `trust_validations` and surfaced on next poll.

### 7.1 6-phase pipeline (industry consensus order)

```
┌──────────────────────────────────────────────────────────────┐
│ Phase 0: PERMISSION RE-CHECK (defense in depth)              │
│   For each cited page:                                       │
│     - Verify requestor's RBAC scope still allows reading     │
│     - This duplicates ingest-time check; protects against    │
│       race when permissions change mid-conversation          │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│ Phase 1: CITATION SUPPORTS (NLI claim entailment)            │
│   For each citation:                                         │
│     a) Verify cited_text ∈ page.content via exact substring  │
│        match (load-bearing for non-Anthropic providers; on   │
│        Anthropic Citations the guarantee is provider-side,   │
│        but we re-check defensively against provider bugs)    │
│     b) Extract surrounding ±400 char window from page        │
│     c) Decompose the answer sentence containing the          │
│        citation into atomic claims (small structured-tool LLM)│
│     d) For each atomic claim: NLI check vs cited window      │
│        Judge: Galileo Luna-2 (3B), DeBERTa-NLI, or any small │
│        tool-calling LLM — explicitly NOT the main chat model │
│        Output: entails | partial | contradicts | unrelated   │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│ Phase 2: FRESHNESS                                           │
│   For each citation's source page:                           │
│     - valid_until < now() → STALE                            │
│     - last_verified_at + freshness_days < now() → OVERDUE    │
│     - authority_level=deprecated → DEPRECATED                │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│ Phase 3: AUTHORITATIVE SOURCE CHECK                          │
│   For each cited fact (subject + predicate via Haiku):       │
│     - SELECT * FROM pages WHERE                              │
│         governance.authority_level = 'canonical' AND         │
│         topic_match (BM25 + dense ≥ 0.7)                     │
│     - If canonical source exists AND citation isn't it:      │
│       SUGGEST_BETTER_SOURCE with link                        │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│ Phase 4: CONTRADICTION DETECTION                             │
│   For each fact-claim in answer:                             │
│     a) Hash subject+predicate (canonical form)               │
│     b) Lookup contradiction_warnings WHERE topic_hash match  │
│        → CONTRADICTING + link existing warning               │
│     c) PLUS: pairwise NLI across top-5 retrieved chunks      │
│        (cap K=5 per research; O(n²)=10 calls max per turn)   │
│        Judge: Galileo Luna NLI / DeBERTa-NLI / Haiku         │
│     d) Newly detected contradiction → INSERT contradiction_warning │
│        row + audit emit                                      │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│ Phase 5: AI_ACCESS GATE                                      │
│   If any cited page has ai_access='retrieval_only' AND       │
│   the answer quotes verbatim (≥10 word overlap from page) → │
│     VERBATIM_QUOTE_BLOCKED + reason                          │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│ AGGREGATE VERDICT                                            │
│   error    if ANY:                                           │
│      citation_unsupported | source_deprecated |              │
│      contradicting_source | permission_violation |           │
│      ai_access_blocked                                       │
│   warning if ANY:                                            │
│      source_stale | source_overdue | better_source_exists |  │
│      citation_partial                                        │
│   ok      otherwise                                          │
│                                                              │
│   canBeCanonical = (verdict == 'ok') AND                     │
│                    (all sources authority='canonical') AND   │
│                    (no source stale)                         │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
  Insert trust_validations rows, audit events,
  optionally promote to canonical_answers,
  emit trust_verdict SSE event (if connection still open).
```

### 7.2 Latency budget

| Phase | Budget |
|---|---|
| 0 permission | <50 ms (DB only) |
| 1 NLI claim entailment | <1500 ms (~5 citations × Haiku call, parallel) |
| 2 freshness | <50 ms |
| 3 authoritative source search | <500 ms |
| 4 contradiction (10 NLI calls cap) | <2000 ms |
| 5 ai_access gate | <100 ms |
| **Total p95** | **<4500 ms** |

Validator runs **out-of-band** — chat stream completes immediately, trust_verdict
arrives ~3–5 s later. Acceptable because UI already streamed the answer.

---

## 8. Cache hierarchy (4 layers)

Research consensus: layered cache, cheapest check first. Cost benchmarks from
production deployments suggest 60–85% cost reduction when properly layered.

```
┌─────────────────────────────────────────────────────────────┐
│ L0: Exact match                                              │
│   Key:   sha256(question_normalized || model || kb_version  │
│                 || rbac_scope_hash || system_prompt_hash)   │
│   Store: Redis Hash, key prefix `wiki:cache:exact:`         │
│   TTL:   24 h, jittered                                     │
│   Hit rate: 5–15% (only literal repeats)                    │
│   Invalidation: kb_version bump on any page edit            │
└─────────────────────────────────────────────────────────────┘
                          │ miss
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ L1: Canonical answers                                        │
│   Curated/auto-promoted Q→A in `canonical_answers` table    │
│   Match: exact question_hash within same rbac_scope_hash    │
│   Hit rate: 10–25% on stable queries (glossary, policy)     │
│   Invalidation: pg trigger on page edit → mark referenced   │
│                 canonical_answers as invalidated            │
└─────────────────────────────────────────────────────────────┘
                          │ miss
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ L2: Semantic match                                           │
│   Embed user question (bge-m3) → ANN search                 │
│   `canonical_answers.question_embedding` cosine ≥ 0.92      │
│   AND rbac_scope_hash match (NOT just metadata filter —     │
│   threshold check + identical scope hash both required)     │
│   Hit rate: 30–50% on paraphrased duplicates                │
│   Risk: false hit at 0.85–0.91 — must stay ≥0.92            │
└─────────────────────────────────────────────────────────────┘
                          │ miss
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ L3: Provider-side prompt cache                              │
│   Anthropic: `cache_control` blocks (write 1.25× / read 0.10×)│
│   OpenAI: automatic prompt caching on prefix matches         │
│   When backend doesn't support either: skip L3 entirely      │
│   Cached: 1. tools, 2. system prompt, 3. retrieved docs      │
│   Auto-managed: 5-min default TTL (provider-specific)        │
│   $0 hits when using subscription bridges (claude-code-bridge)│
└─────────────────────────────────────────────────────────────┘
                          │ miss
                          ▼
┌─────────────────────────────────────────────────────────────┐
│ L4: Fresh LLM call (active chat model — Claude/GPT-5/Gemini) │
└─────────────────────────────────────────────────────────────┘
```

### 8.1 Cache key composition (research-mandated)

```ts
type CacheKey = {
  question_normalized: string;     // lowercase, whitespace-collapsed
  model_version: string;           // LiteLLM alias: 'claude-sonnet-pro', 'openai/gpt-5', 'gemini/gemini-2.5-pro', ...
  embedding_model_version: string; // 'bge-m3-v1'
  system_prompt_hash: string;      // sha256 of system prompt
  kb_version: string;              // monotonic counter, bumped on page edit
  rbac_scope_hash: string;         // *required* — sha256 of sorted(domains + roles + sensitivity_ceiling)
};
```

Drop **any one** and you get silent cross-contamination.

### 8.2 Invalidation strategy

Per research, TTL-alone is insufficient. Use **CDC pattern**:

1. Postgres trigger on `pages UPDATE` emits `page.changed` row in `outbox` table
2. BullMQ worker `cache-invalidator` consumes outbox, finds `canonical_answers`
   where `page_id = ANY(source_page_ids)`, sets `invalidated_at = now()`
3. Also bumps global `kb_version` counter (Redis INCR), which forces L0 keys
   to mismatch (kb_version is part of hash)

### 8.3 Skip caching when

- **Time/user-dependent** questions ("co jsem editoval včera", "mé úkoly")
- **Long multi-turn** (history dominates hash, false-similar)
- **High-stakes** financial/legal/medical claims (force fresh, audit always)
- **Low traffic** (<1000 q/day) — only L3 prompt cache pays off; skip L0–L2

---

## 9. Multi-consumer access control — single wiki, many content domains

### 9.0 The right model for Emersion

Wiki is **one knowledge base** that covers all departments. Content domains
(finance, marketing, IT, HR, …) are an **attribute of each page**, not
a separate backend. Access control is **field-level** inside one MCP server,
not multiple MCPs.

Big picture:

```
                        ┌──────────────────────────────┐
   Claude Desktop   ──► │  api.wiki.s2.emersion.eu/mcp │
   Claude Code      ──► │  (single MCP server)         │
   Codex CLI        ──► │                              │
   Internal chat    ──► │  filters per agent claim:    │
                        │   - tools shown               │
                        │   - search results            │
                        │   - read page allowed         │
                        └──────────────────────────────┘
                                  │
                                  ▼
                        ┌──────────────────────────────┐
                        │  pages with frontmatter:     │
                        │   domain: finance / hr /     │
                        │           marketing / it /   │
                        │           sales / legal / …  │
                        │  classification: public /    │
                        │       internal / confidential│
                        │  ai_access: full /           │
                        │       retrieval_only / none  │
                        └──────────────────────────────┘
```

When a future second *system* lands (Salesforce CRM via salesforce-mcp,
Workday HR via workday-mcp), an **MCP gateway** plugs in front (§11.5).
Until then, one wiki-mcp is the right call.

### 9.1 Content domains (controlled enum)

Default vocabulary for Emersion (mirror in Keycloak `domain:*` realm roles
and in page frontmatter `domain` field):

| Domain | Coverage |
|---|---|
| `engineering` | code, architecture, infra, tooling |
| `product` | roadmap, specs, customer feedback |
| `design` | brand, UX research, prototypes |
| `sales` | accounts, pipeline, playbooks |
| `marketing` | campaigns, content, brand voice |
| `it-support` | helpdesk, runbooks, internal tools, security |
| `finance` | budget, expenses, invoicing, taxes |
| `hr` | employees, onboarding, policies, payroll |
| `legal` | contracts, compliance, IP, GDPR |
| `operations` | facilities, procurement, vendor management |
| `management` | board minutes, OKRs, strategy |
| `public` | freely accessible to anyone in the company |

Each `area` (from sprint #27 extract) maps to a default `domain`:

| `area` (page topic) | default `domain` (access scope) |
|---|---|
| engineering, security | engineering |
| product | product |
| design | design |
| sales | sales |
| marketing | marketing |
| support | it-support |
| operations | operations |
| finance | finance |
| legal | legal |
| hr | hr |
| general | public |

Override: the editor / approver can re-tag any page's domain manually. Multi-
domain pages get `domain: shared` and validator widens overlap rule.

### 9.2 Identity model — agents as Keycloak clients

Each AI consumer is a **first-class Keycloak client** with own `client_id`
and explicit `domains` claim subset. The user picking which agent to
authenticate as is equivalent to picking what scope of data the agent can
touch.

| Client | Allowed `domains` | Typical user / use |
|---|---|---|
| `emersion-web` | match user's own | NextAuth, the web chat — inherits the human's scope |
| `agent.universal` | match user's own | Claude Desktop / Code where the human is fully scoped |
| `agent.it-support` | `it-support, engineering, public` | incident triage bot |
| `agent.hr-onboarding` | `hr, public` | onboarding automation |
| `agent.finance-analyst` | `finance, sales, management, public` | quarterly review |
| `agent.marketing-content` | `marketing, product, design, public` | copywriting |
| `agent.legal-review` | `legal, hr, finance, public` | contract scanning |
| `agent.engineering-rag` | `engineering, product, it-support, public` | dev assistant |

Note: agent scope ⊆ the human's scope. When `petr@` (with `domains: [*]`)
uses `agent.marketing-content`, the effective scope is intersection =
`marketing, product, design, public` — strictly narrower than petr's own.
This prevents accidental over-disclosure even by privileged humans
("principle of least surprise" for tooling).

### 9.3 Token issuance — RFC 8707 + RFC 8693

When a client requests a token, it specifies **audience** = the MCP endpoint
URI (RFC 8707 Resource Indicators). Future per-system MCPs (`finance-mcp`,
`hr-mcp` as separate backends) reject tokens whose `aud` ≠ their own URI,
making confused-deputy attacks structurally impossible.

For user-in-the-loop flows, **RFC 8693 token exchange** binds both the
human (`sub`) and the agent (`act`) into one token:

```
POST /realms/emersion/protocol/openid-connect/token
grant_type=urn:ietf:params:oauth:grant-type:token-exchange
subject_token=<petr's user token>
actor_token=<agent.marketing-content service-account token>
audience=https://api.wiki.s2.emersion.eu/mcp
```

The resulting token claims:
- `sub: petr@` (the human accountable)
- `act: { sub: agent.marketing-content }` (the agent acting)
- `aud: wiki-mcp`
- `domains: [marketing, product, design, public]` (intersection of human ∩ agent)

Audit log records both — SOC2/ISO 27001 attributability requirement.

### 9.4 Hybrid authorization: ReBAC + ABAC

| Layer | What it decides | Implementation |
|---|---|---|
| **ReBAC** | "can agent X read documents owned by team Y?" | Cedar / OpenFGA / SpiceDB (Zanzibar) |
| **ABAC** | "agent X is forbidden from any classification=financial, regardless of relationships" | Cedar / OPA policy rule overlay |

Current sprint #20 uses simple RBAC (role + domain + sensitivity). Sprint #29
keeps that as **fast path** and adds ReBAC for fine-grained ("Anna can read
projects owned by her team but not other teams") + ABAC for AI-specific gates
(`ai_access` field, agent-blocklist classifications).

Recommended implementation: **Cedar** (AWS open-source policy language,
Rust-based, statically analyzable). One Cedar policy file per domain
boundary, loaded at MCP server boot.

Example Cedar policy snippet — note that **domain enforcement is generic**
(any agent can only read pages whose `domain` ∈ its `domains` claim), not a
hard-coded per-agent rule:

```cedar
// Domain match: agent may only read pages whose domain is in its claim set
// (or the page is in the universal 'public' domain).
forbid (
  principal is Agent,
  action == Action::"ReadPage",
  resource
) when {
  resource.domain != "public" &&
  !(resource.domain in principal.allowed_domains)
};

// AI access: never expose ai_access=none pages to any agent, ever
forbid (
  principal is Agent,
  action,
  resource
) when { resource.ai_access == "none" };

// AI access: agents may not quote verbatim from retrieval_only pages
forbid (
  principal is Agent,
  action == Action::"QuoteVerbatim",
  resource
) when { resource.ai_access == "retrieval_only" };

// Classification: 'restricted' pages require explicit per-agent grant
// (board minutes, M&A, etc. — agents whitelist, not deny-list)
forbid (
  principal is Agent,
  action,
  resource
) when {
  resource.classification == "restricted" &&
  !(resource.id in principal.explicit_restricted_grants)
};
```

The policy file lives in version control (`infra/cedar/wiki-mcp.cedar`),
loaded at boot. Editing it requires a PR + admin review. Cedar's static
analyzer catches obvious mistakes (e.g. `permit` without conditions on
restricted resources) at CI time.

### 9.5 Filter at retrieval

Sprint #29 changes MCP `search_wiki` tool to:

```ts
async function searchWiki(input, principal) {
  // 1. Run BM25 + dense retrieval → candidates
  const candidates = await runSearchPipeline(input.query);

  // 2. Resolve principal's full attribute set (roles, domains, ai_access tier)
  const subject = await resolveSubject(principal);

  // 3. For each candidate, check Cedar policy
  const allowed = candidates.filter((p) => {
    const decision = cedar.isAuthorized({
      principal: subject,
      action: 'ReadPage',
      resource: p.governance,
    });
    if (!decision.allowed) {
      auditLog.write('mcp.access.denied', {
        agent_client_id: principal.client_id,
        human_sub: principal.act?.sub,
        tool: 'search_wiki',
        resource_id: p.id,
        policy_id: decision.policy_id,
        reason: decision.reason,
      });
    }
    return decision.allowed;
  });

  return allowed;
}
```

Denied results **never reach the LLM context window**.

### 9.6 Token chain

```
Human (petr@) ──login──► Keycloak ──issues──► user_token (sub=petr)
                                                  │
                                                  │ OBO exchange
                                                  ▼
User_token + agent_creds ──► Keycloak token-exchange ──► action_token
                                                          (sub=petr, act=tech-robot,
                                                           aud=wiki-mcp)
                                                          │
                                                          ▼
                                            MCP server validates,
                                            logs both sub+act in audit.
```

---

## 10. UI warning states

Inspired by Salesforce Einstein Trust Layer + Glean. **Binary states + hard
rules**, no confidence percentages (research-mandated).

### 10.1 Trust banner per response

```
┌────────────────────────────────────────────────────┐
│ Petr Hlobil žije v Praze ... [[person]]            │
│ Dne 11. května schválil Security Policy ...        │
│ [[person]] ← problém                                │
│                                                    │
│ ╭─ ⚠ MOŽNÁ NEPŘESNOST ──────────────────────────╮ │
│ │ Citace #2 odkazuje na person/petr-hlobil.md   │ │
│ │ ale fakt "schválil Security Policy" patří     │ │
│ │ na policy/security-policy-v1-0.md (canonical) │ │
│ │ [Opravit citaci] [Označit OK] [Audit]         │ │
│ ╰────────────────────────────────────────────────╯ │
│                                                    │
│ 👍 správně   👎 chybné   🔗 sdílet                 │
└────────────────────────────────────────────────────┘
```

### 10.2 Banner states

| Verdict | Banner | Action |
|---|---|---|
| `ok + allCanonical` | ✅ "Ověřeno z canonical zdrojů" + button "Uložit jako oficiální odpověď" | promote to canonical_answers |
| `ok + mixed` | ✓ "Ověřeno" — tichý green ribbon | none |
| `warning, source_stale` | ⚠ "Zdroj `X` je nečerstvý (review overdue)" | "Otevřít k review" |
| `warning, better_source` | ⚠ "Existuje autoritativnější zdroj `policy/...md`" | "Přepsat citaci" |
| `warning, citation_partial` | ⚠ "Citace #N citovaný text plně nepodporuje tvrzení" | "Otevřít citaci" / "Flag" |
| `error, source_deprecated` | ❌ "Citovaný zdroj je DEPRECATED — viz nová verze `X`" | hard ribbon "Nepoužívat odpověď" |
| `error, contradicting_source` | ❌ "Wiki obsahuje konfliktní informace o tomto tématu" | link `/contradictions/:id` |
| `error, ai_access_blocked` | ❌ "Tento zdroj nesmí být citován v AI výstupech (ai_access)" | block render |
| `error, permission_violation` | ❌ "Citovaný zdroj nemáš oprávnění zobrazit" | hide citation |

### 10.3 Inline citation chip

```
[[Petr Hlobil]] (📄 person/petr-hlobil.md  authority:canonical ↗)
                          │
                          ├─ 👁 zobrazit citovanou pasáž (highlight)
                          ├─ ⚐ označit jako chybnou citaci
                          └─ 🔄 navrhnout lepší zdroj
```

### 10.4 Per-page trust indicator (page detail)

```
person/petr-hlobil.md                  authority: canonical ✓
├─ status: active
├─ approved by: Hana Procházková (compliance), 11. května 2026
├─ next review due: 2027-05-11 (✓ 364 days remaining)
├─ ai_access: full
├─ retention until: 2032-12-31
└─ ⚐ 2 chybné citace nahlášené uživateli — [zobrazit]
```

---

## 11. Migration plán — Sprint #29 (4 týdny)

| Week | Phase | Deliverables |
|---|---|---|
| **1** | Citation grounding + governance schema | Migration `0008_governance_metadata.sql`; refactor `agentic.ts` to emit grounded document blocks (path A native on Anthropic, path B `[[wiki/...]]` markers elsewhere); server-side substring grounder; new SSE events `citation_v2`, `trust_warning`, `trust_verdict`; structured citation chips in UI; templates updated with governance block (default `authority_level=draft`); backfill existing pages with `approved` |
| **2** | Governance editing + roll-out | Page editor governance panel; `POST /api/pages/:id/{review,deprecate,canonicalize}` routes; supersession cascade; admin marks top-20 pages canonical; per-page trust indicator |
| **3** | Validator + canonical cache | Migration `0009_trust_layer.sql` (3 tables); `TrustValidatorService` 6-phase pipeline; BullMQ worker; cache invalidation pg trigger; canonical_answers L0+L1+L2 with rbac_scope_hash; warning ribbons + flag actions |
| **4** | MCP field-level RBAC + telemetry | Cedar policy engine integration (single wiki-mcp); 12 `domain:*` Keycloak realm roles; per-agent service-account clients (`agent.it-support`, `agent.hr-onboarding`, …); RFC 8707/8693 token flows; filter-at-retrieval in MCP tools; Grafana dashboard `wiki-trust.json`; Slack alerts; `/admin/trust` page; SOC2-grade audit row format |

### Rollback

Each migration backward-compatible:
- `0008` adds nullable JSONB column with default `{}`
- `0009` only adds new tables — no `pages`/`audit_log` schema change

Feature flag `TRUST_LAYER_ENABLED=true|false` env gates new code paths in
`runAgenticChat` and validator queue. Ship code disabled to prod first,
enable per-env after smoke.

### 11.5 Future: MCP Gateway (sprint #30+, when second backend system arrives)

Today's wiki-mcp is one server with field-level access control inside.
When the company later plugs in a *second system* (Salesforce CRM,
Workday HR, internal Jira, GitHub, …) we add a **gateway** in front:

```
                       ┌─────────────────────────────┐
   Claude Desktop  ──► │  mcp.s2.emersion.eu         │
   Claude Code     ──► │  (MCP Gateway)              │
   Codex CLI       ──► │                             │
   Internal chat   ──► └────────┬────────────────────┘
                                │
                                ├───► wiki-mcp        (today)
                                ├───► salesforce-mcp  (future)
                                ├───► workday-mcp     (future)
                                ├───► jira-mcp        (future)
                                └───► github-mcp      (future)
```

Gateway responsibilities (each is well-understood production pattern):

| Layer | Purpose |
|---|---|
| **Auth** | Verify single Bearer token (RFC 8707 audience = gateway). |
| **Policy** | Cedar — which backends is this agent allowed to call? |
| **Tools discovery aggregation** | `tools/list` fan-out, namespace-prefix tools (`wiki:search_wiki`, `salesforce:list_accounts`), filter by policy. |
| **Routing** | `tools/call wiki:search_wiki` → OBO token-exchange for `aud=wiki-mcp` → forward → relay response. |
| **Caching** | Cross-domain canonical_answers cache lives at gateway. |
| **Audit** | Centralized log: human + agent + tool + resource + decision. |
| **Rate limit** | Per-agent, per-tool, per-backend quotas. |

**Important**: gateway changes ZERO config for end users when added later —
they keep their single MCP entry (`api.wiki.s2.emersion.eu/mcp` becomes
`mcp.s2.emersion.eu/mcp`, one-time migration). New backends require only a
new docker container + one routing entry.

Existing OSS / commercial gateways to reference but **not adopt immediately**
(none are production-grade enough for self-hosted enterprise as of Q2 2026):

- **Composio** — commercial aggregator, 100+ MCP servers
- **Cloudflare AI Gateway** — generic LLM+MCP, rate-limit + observability
- **mcp.run** — function registry pattern
- **Pulse MCP** — observability-focused

Best self-hosted approach when the time comes: lightweight Hono service
(reuse wiki-api stack), Cedar policy + Redis cache, ~2 weeks dev.

---

## 12. Testovací scénáře

### 12.1 Unit / integration (`apps/api/src/__tests__/trust-layer.test.ts`)

1. **Citation supports happy path** — page A says "Petr žije v Praze", page B doesn't mention Petr. Query "kde žije Petr?". Assert: model cites A, validator verdict=`ok`.
2. **Citation supports halucinace** — mock model cites page B. Assert: verdict=`error`, check_type=`citation_unsupported`.
3. **Source stale** — policy page with `valid_until = now - 1y`. Assert: warning `source_stale`.
4. **Better source exists** — `person/petr.md` mentions "schválil policy", `policy/security.md` is canonical. Assert: warning `better_source_exists` + `suggested_page_id`.
5. **Contradiction detected** — 2 pages claim different role of subject. Assert: `contradiction_warnings` row + verdict=`error`.
6. **Deprecated source hard block** — page X has `authority_level=deprecated`. Assert: validator verdict=`error`, `source_deprecated`.
7. **ai_access retrieval_only** — page has `ai_access=retrieval_only`. Mock model quotes verbatim. Assert: `ai_access_blocked` error.
8. **ai_access=none never in context** — search_wiki returns NULL for that page even when query matches it.
9. **Permission violation race** — page changed to confidential mid-conversation. Assert: validator catches in Phase 0.

### 12.2 Cache scenarios

10. **L0 exact hit** — 2× identical query. Assert: 2nd call <100 ms, audit `canonical.cache_hit layer=L0`.
11. **L0 miss due to rbac_scope** — same query, different user (different domains claim). Assert: cache miss, separate canonical_answer row per scope.
12. **L2 semantic hit at 0.93** — paraphrased query. Assert: cache hit.
13. **L2 miss at 0.89** — topically related but distinct. Assert: cache miss (don't false-fire).
14. **Cache invalidation on page edit** — canonical_answer cites page P. Edit P → trigger fires → row marked `invalidated_at`. Next query: full pipeline.
15. **Cache invalidation on page deprecate** — page X deprecated → all canonical_answers citing X invalidated immediately.

### 12.3 MCP authorization

16. **Per-agent scope** — `agent.tech-robot` token can call `wiki-mcp.search_wiki` but rejected on hypothetical `finance-mcp` (RFC 8707 audience mismatch → 401).
17. **OBO sub+act** — token-exchange produces token with both. MCP audit logs both.
18. **Cedar policy filter** — `agent.tech-robot` does `search_wiki("budget")` — finance pages excluded from results, `mcp.access.denied` rows written per excluded doc.
19. **ai_access=none invisible** — `search_wiki("ceo strategic plan")` even from admin — page with `ai_access=none` invisible to agent context regardless of human role.

### 12.4 End-to-end UX

20. **Trust banner green** — answer with all canonical sources. UI shows ✅ + "Uložit jako oficiální" button.
21. **User flags incorrect** — click 👎 → `chat.response.flagged_incorrect` audit row → admin sees in `/admin/trust` queue.
22. **Promote to canonical** — admin clicks "Schválit". `canonical_answers.approved_by` set, expiry extended to 90d.
23. **Validator latency budget** — synthetic 5-citation response. Assert validator completes <4.5s p95 over 100 runs.

---

## 13. Success metrics (28-day target)

| KPI | Target | Source |
|---|---|---|
| Citation accuracy (manual eval 50 q) | ≥ 95% | weekly audit |
| User flag rate | < 2% | `audit_log` aggregate |
| Canonical cache hit rate (top 10 q) | > 30% | `audit_log` aggregate |
| Open contradictions (>7 days) | 0 | `contradiction_warnings` |
| Mean verdict distribution | ok > 70% / warning < 25% / error < 5% | Grafana panel |
| Validator p95 latency | < 5 s | Prometheus histogram |
| Token cost reduction (after L1+L2) | 40–60% | LiteLLM dashboard |

---

## 14. Risks & mitigations

| Risk | Probability | Mitigation |
|---|---|---|
| Validator LLM (Haiku) itself halucinuje at entailment check | medium | dual-judge: Haiku + Galileo Luna NLI; agree-to-pass; disagreement → flag for human review |
| Semantic cache cross-contamination | high if shipped wrong | `rbac_scope_hash` in cache key is REQUIRED; unit test 11 covers |
| Cache invalidation flapping | medium | debounce on `pages UPDATE`: aggregate within 60s window before invalidating |
| Native-citation backends charge ~2× for grounded mode | medium | cap retrieved documents to 6 per tool call; use prompt caching on system prompt + tools; DIY substring path has no such surcharge (works on any model) |
| Governance "canonical bloat" | medium | admin sign-off required for promote; quarterly review |
| User-flag spam | low | rate-limit 10 flags/user/24h; admin trend review |
| MCP "rug pull" attack (server swaps tools post-install) | low | sign resource metadata; pin tool schemas client-side; MDPI 2025 reco |
| Cedar policy mis-config blocks legitimate access | medium | dry-run mode logs `would_deny`; canary deploy; revert one-command |
| Contradiction NLI O(n²) cost | medium | cap K=5 retrieved before pairwise; lazy mode (only if any `authority_level=canonical` differs) |
| Active LLM provider outage (Anthropic / OpenAI / Google) | medium | LiteLLM routes through any of 4+ backends — single-line `WIKI_CHAT_MODEL` change in `.env` failovers (claude-sonnet-pro → openai/gpt-5 → gemini/gemini-2.5-pro); audit row carries provider id; degraded operation continues |
| LiteLLM proxy itself outage | medium | proxy is shared with atlas + n8n; ops runbook covers restart; for prolonged outage chat falls back to wiki-search-only (no LLM); validator queue pauses + drains when proxy returns |
| Vendor lock-in to a single backend | low — by design we don't have it | the trust layer (citation grounding, validator, cache) is implemented on our side; the only vendor-specific code path is the optional native-citations adapter in §6.4 |

---

## 15. Open questions for product / governance

1. **Who gets to promote to `canonical`?** Recommend: admin role OR compliance/legal role for policy-type pages. Not arbitrary editor.
2. **`ai_access=retrieval_only` default for what page types?** Recommend: `person`, `customer`, `decision` (board minutes) default `retrieval_only`; `policy`, `sop`, `runbook`, `glossary-term`, `product` default `full`.
3. **Right-to-erasure** — when `data_subjects` includes user X who requests deletion, what cascades? Recommend: redact PII inline, retain document, audit log keeps `actor_id_redacted` hash.
4. **Embargo window** — default? Recommend: 0 (must be set explicitly per page).
5. **Retention default** — by `source_type`? Policy = 7y, decision = 7y, runbook = 3y, note = 2y, doc = 2y.
6. **Validator runs on every response or sampled?** Recommend: every response in chat (cost: ~$0.01 per Haiku validator call × responses). For MCP tool calls, sampled (e.g. every 10th) to keep latency budget.

---

## 16. Citations to research

### Trust & accuracy
- Anthropic Citations API — https://platform.claude.com/docs/en/build-with-claude/citations
- Anthropic Citations review (Simon Willison) — https://simonwillison.net/2025/Jan/24/anthropics-new-citations-api/
- Glean trust signals — https://www.glean.com/perspectives/how-data-governance-frameworks-support-ai-search-optimization
- Salesforce Einstein Trust Layer — https://developer.salesforce.com/blogs/2023/10/inside-the-einstein-trust-layer
- M365 Copilot governance — https://learn.microsoft.com/en-us/microsoft-365/copilot/secure-govern-copilot-foundational-deployment-guidance
- Hebbia Verifiable Fact Layer — https://medium.com/@takafumi.endo/hebbias-edge-building-a-system-of-record-for-enterprise-reasoning-1264ab76ec6b
- Contradiction Detection (arXiv 2504.00180) — https://arxiv.org/abs/2504.00180
- DRAGged Into a Conflict — https://research.google/pubs/dragged-into-a-conflict-detecting-and-addressing-conflicting-sources-in-retrieval-augmented-llms/
- RAGAS Faithfulness — https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/faithfulness/
- Confidence Ratings erode trust (ACM UMAP 2025) — https://dl.acm.org/doi/10.1145/3708319.3734178

### Caching
- Anthropic Prompt Caching — https://platform.claude.com/docs/en/build-with-claude/prompt-caching
- GPTCache — https://github.com/zilliztech/GPTCache
- Semantic Caching ($720→$72 case) — https://medium.com/@labeveryday/prompt-caching-is-a-must-how-i-went-from-spending-720-to-72-monthly-on-api-costs-3086f3635d63
- Cache Invalidation for AI (TianPan) — https://tianpan.co/blog/2026-04-20-cache-invalidation-ai-semantic-rag
- VentureBeat 73% cut — https://venturebeat.com/orchestration/why-your-llm-bill-is-exploding-and-how-semantic-caching-can-cut-it-by-73
- TrueFoundry: text-based cache keys wrong — https://www.truefoundry.com/blog/semantic-caching-llm-gateway
- waLLMartCache (multi-tenant) — https://link.springer.com/chapter/10.1007/978-3-031-78183-4_15

### MCP & access control
- MCP Authorization spec — https://modelcontextprotocol.io/specification/draft/basic/authorization
- RFC 9728 OAuth Protected Resource — https://datatracker.ietf.org/doc/html/rfc9728
- Cerbos MCP fine-grained authz — https://www.cerbos.dev/blog/mcp-authorization
- Authzed SpiceDB for RAG — https://authzed.com/blog/fine-grained-authorization-using-spicedb-for-retrieval-augmented-generation-rag
- Pinecone RAG access control — https://www.pinecone.io/learn/rag-access-control/
- Glean permissions-aware AI — https://www.glean.com/perspectives/security-permissions-aware-ai
- Knostic AI oversharing analysis — https://www.knostic.ai/blog/glean-data-security
- Authenticated Delegation (arXiv 2501.09674) — https://arxiv.org/html/2501.09674v1
- AI Agents SOC2 (Teleport) — https://goteleport.com/blog/ai-agents-soc-2/
- MCP Security pitfalls — https://towardsdatascience.com/the-mcp-security-survival-guide-best-practices-pitfalls-and-real-world-lessons/
- WorkOS best authz platforms 2026 — https://workos.com/blog/best-authorization-platforms-ai-agent-permissions-2026

---

## 17. Status

- **Document version**: 1.2 (sprint #29 draft, vendor-neutral rewrite)
- **Author**: Claude Opus 4.7 with web research (see §16)
- **Reviewed**: pending Petr Hlobil
- **Implementation start**: TBD (after design approval)
- **Estimated effort**: 4 weeks (1 sprint @ ~120 h dev)

### Changelog

- **1.2** (2026-05-11) — Vendor-neutral redesign per stakeholder feedback:
  trust layer must work regardless of which LLM backend the wiki is wired
  to. §1 adds explicit "vendor neutrality" statement; §2.1 reframes
  citations as two interchangeable implementation paths (Anthropic native
  vs DIY substring grounding); §6.1 and §6.4 split the citation extractor
  into path A (provider-native) and path B (marker-and-substring); §7
  phase 1 explicitly mentions the DIY grounder; §8 L3 generalizes to
  "provider-side prompt cache"; §11 week 1 deliverable rewritten; §14
  risk row replaced ("Anthropic API regional outage" → "LiteLLM proxy
  outage" + "provider failover via env flip" + "no vendor lock-in by
  design"). Driving change: production reality is LiteLLM at
  `hub.s2.emersion.eu` routing across Anthropic / OpenAI / Gemini /
  Ollama; the trust layer is implemented on our side, not delegated to
  Claude features. Stack-side adapter changes (`packages/llm/src/anthropic/client.ts`:
  optional `usage`, reasoning-model `temperature` skip, reasoning
  `max_tokens` floor of 4096) verified end-to-end against
  `openai/gpt-5-nano` via `/v1/messages` — correct Anthropic-shape
  response, tool_use returned, citations stream functional.
- **1.1** (2026-05-11) — Resolved per-domain MCP vs field-level RBAC
  confusion. Wiki is **one** knowledge base with field-level `domain`
  enforcement inside one MCP, not per-department MCP servers. Per-system
  MCP gateway pattern (multi-product) moved to §11.5 as future state.
  Added §9.0 Big Picture, §9.1 controlled domain enum (12 values:
  engineering / product / design / sales / marketing / it-support /
  finance / hr / legal / operations / management / public), §9.2
  agent identity table, §9.3 RFC 8707 + RFC 8693 token-exchange.
  Cedar policy snippet generalized to attribute-driven instead of
  per-agent hard-codes. Migration plan Week 4 updated.
- **1.0** (2026-05-11) — Initial design, research-backed enterprise trust
  layer plan (Citations API + governance metadata + validator pipeline
  + canonical answers cache + audit telemetry).