Version: 1.0 Authority: Subordinate to Codex Disciplinae Epistemicae Rev. V Nature: Operational Directive Language: English (working document)


1. PURPOSE AND SCOPE

This manual provides operational procedures for implementing the Codex Disciplinae Epistemicae within the AOS framework. It transforms Codex principles into executable workflows.

Core Principle:

"We do not decide what is true. We document how well claims survive contact with evidence over time."

This manual governs how analysis is performed, not what conclusions are reached.


2. ARCHITECTURAL PRINCIPLES

2.1 Separation of Concerns (Mandatory)

The system implements strict role separation across three functions:

Role Function Prohibitions
Collector Deterministic evidence acquisition No interpretation, no plausibility filtering
Researcher Structural extraction and normalization No truth assessment, no synthesis across sources
Analyst Codex-bound evaluation and scoring No web browsing, no independent fact-checking

Invariant: No single role may both acquire and judge evidence.

2.2 Two-Tier Pipeline

Tier A — Collector (Broad, Cheap, Deterministic)

  • Ingests articles via deterministic sources (RSS, Atom, direct URLs)
  • Stores raw content, metadata, and normalized text
  • Never reasons, scores, summarizes, or infers
  • Never rejects content for "implausibility"

Tier B — Analyzer (Selective, Governed, Budgeted)

  • Operates only on articles admitted through gating rules
  • Applies Codex scoring, claim extraction, and temporal comparison
  • May conclude INDETERMINATA at any time
  • Must cite stored evidence for every inference

2.3 Deterministic Foundation

  • Same inputs → same analysis outputs
  • All random elements explicitly seeded
  • Evidence packs are sealed and immutable after collection
  • Version control on all transformations

2.4 Temporal Grounding

  • All analyses timestamped with absolute times
  • Relative references ("today", "yesterday") converted to absolutes at ingest
  • Timezone explicitly recorded
  • NOW_UTC and USER_TZ provided by host environment

3. EVIDENCE HANDLING PROTOCOLS

3.1 Collector Operations

INPUT:  Topic bundle, source allowlist, time window
OUTPUT: Evidence Pack

PROCESS:
  1. Fetch canonical documents via RSS/Atom
  2. Store raw content + metadata
  3. Generate content hash (SHA256)
  4. Create immutable artifact record
  5. Append to ledger with retrieval timestamp

RULES:
  - No interpretation
  - No plausibility filtering
  - Same query → same results

3.2 Evidence Pack Structure

{
  "evidence_id": "sha256:...",
  "source_url": "canonical",
  "published_at": "ISO8601",
  "retrieved_at": "ISO8601",
  "content_hash": "sha256:...",
  "raw_text": "extracted",
  "metadata": {
    "publisher": "",
    "author": "",
    "section": "",
    "word_count": 0
  }
}

3.3 Evidence Pack Immutability (Invariant)

An Evidence Pack, once created for an analysis run, is an immutable input artifact.

New information requires:

  • A new Evidence Pack
  • A new run_id
  • Explicit linking to prior packs if relevant

4. RESEARCHER OPERATIONS

4.1 Extraction Protocol

INPUT:  Evidence Pack
OUTPUT: Structured Claims

TASKS:
  1. Named entity identification
  2. Claim boundary detection
  3. Claim extraction (verbatim quotes with offsets)
  4. Attribution mapping (who said what)
  5. Temporal reference normalization
  6. Source lineage tracking (original vs. derivative reporting)

PROHIBITIONS:
  - No truth assessment
  - No plausibility judgment
  - No synthesis across sources
  - No scoring or weighting

4.2 Claim Normalization Format

{
  "claim_id": "uuid",
  "source_evidence_id": "sha256:...",
  "verbatim_text": "",
  "normalized_text": "",
  "speaker_entity_id": "uuid|unknown",
  "claim_type": "factual|interpretive|predictive|normative",
  "time_reference": {
    "absolute": "ISO8601|null",
    "relative": "quoted_string",
    "normalized": "ISO8601"
  },
  "entities_mentioned": ["uuid1", "uuid2"],
  "span_offsets": [start, end]
}

5. GATING RULES (Analyzer Admission)

No article enters analysis unless it triggers at least one gate.

5.1 Valid Gates

Gate Trigger Condition
Actor Gate Mentions tracked high-salience actors
Event Gate References tracked or emerging events
Novelty Gate Introduces materially new facts vs. last 72h cluster
Contradiction Gate Conflicts with prior reporting
Claim Risk Gate Makes strong claims with weak sourcing
Policy Impact Gate Claims retaliation, coercion, or systemic abuse
Scope Gate Explicitly requested by user or reviewer

5.2 Gate Priority (When Multiple Trigger)

Contradiction > Novelty > Actor/Event > Claim Risk > Policy Impact > Scope

All triggered gates are logged, even if not the deciding factor.

5.3 Non-Gated Articles

Articles that do not pass a gate remain:

  • Stored (raw content preserved)
  • Indexed (metadata searchable)
  • NOT analyzed (no Codex pipeline execution)

6. CLUSTERING BEFORE ANALYSIS (Required)

6.1 Purpose

Prevent duplicate spending on rewrites of the same story.

6.2 Process

Before analysis:

  1. Articles MUST be clustered by content similarity
  2. Each cluster selects:
    • Primary Representative: Highest detail, best sourcing
    • Adversarial Sample (optional): Divergent framing from different publisher domain
  3. All other articles inherit cluster context and are not analyzed independently

6.3 Deterministic Adversarial Retrieval (v0)

For each cluster, Collector MUST include:

  1. Primary: Highest similarity representative
  2. Diversity: One from different publisher domain
  3. Adversarial: Maximally dissimilar article within cluster above similarity threshold

Selection based on content hashes and similarity metrics (deterministic).

6.4 Clustering Algorithm

v0 specification: MinHash + LSH on normalized clean text Parameters: Configurable, versioned, logged with each run


7. BUDGET GOVERNOR (Hard Constraint)

7.1 Analysis Limits

Analysis operates under explicit limits:

  • Maximum articles per time window
  • Maximum tokens per run
  • Maximum compute time per run

When budget is exhausted, analysis stops.

7.2 Priority Queue

P0: Active events with new evidence (highest priority)
P1: Tracked actors with novel claims
P2: Contextual background
P3: Indexed only (no analysis)

7.3 Budget Parameters

  • Daily analysis budget: [configurable]
  • Per-run budget: [configurable]
  • Adjustment: Configuration-only, not ad hoc

8. ANALYST OPERATIONS (CODEX PIPELINE)

8.1 Pipeline Execution Order

 1. TEMPORAL ANCHOR SET
    - Record NOW_LOCAL, NOW_UTC, TIMEZONE
    - Convert all relative time references

 2. SEMANTIC PARSING
    - Split into logical units
    - Identify proposition boundaries

 3. CLAIM CLASSIFICATION (Codex II)
    - Apply genus classification
    - Flag mixed-genus statements

 4. BURDEN OF PROOF CHECK (Codex III)
    - Identify missing evidence
    - Assess sufficiency
    - Apply power asymmetry adjustment

 5. TEMPORAL CONTEXT (Codex IV)
    - Event maturity assessment
    - Flag premature certainty
    - Apply temporal humility

 6. JURISDICTION CHECK (Codex V)
    - Identify category errors
    - Flag extra-jurisdictional claims

 7. COERCION DETECTION (Codex VI)
    - Violence laundering check
    - Structural invalidity test
    - Normalization detection

 8. ESSENTIALISM DETECTION (Codex VII)
    - Group attribution errors
    - Composition/division fallacies

 9. POWER SPHERE ANALYSIS (Codex VIII)
    - Descriptive vs. justificatory use
    - Sovereignty violations

10. INTENT INFERENCE LIMITS (Codex IX)
    - Pattern-based inference rules
    - Prohibition of premature attribution

11. TECHNOLOGY LAUNDERING (Codex X)
    - AI output classification
    - False authority detection

12. LIFECYCLE ASSIGNMENT (Codex XII)
    - Active/Dormant/Reopened/Deprecated
    - Dormancy timer check

13. LEDGER INTEGRATION (Codex XIII)
    - Version history check
    - Mutation tracking
    - Provenance chain validation

14. CALCULUS EPISTEMICUS
    - Vector assignment
    - Component scoring

15. EXIT ASSIGNMENT (Codex XV)
    - Final classification

9. CALCULUS EPISTEMICUS IMPLEMENTATION

9.1 Vector Structure

{
  "assertion_id": "uuid",
  "epistemic_vector": {
    "factum_demonstratum": 1.0,
    "factum_assertum": 0.2,
    "indeterminata": 0.0,
    "refutata": -1.0,
    "error_categoriae": -0.8,
    "overclaim_certainty": -0.6,
    "launderatio_technologica": -1.0,
    "correction_event": 0.4
  },
  "weights_version": "calculus_weights_v1",
  "derived_composites": {
    "hygiene_score": 0.45,
    "evidence_strength": 0.7,
    "uncertainty_band": "wide"
  }
}

9.2 Weight Table Rule

  • Weights live in versioned config file: calculus_weights_vN.json
  • Each analysis snapshot records weights_version
  • Same vector + same weights → same composites (deterministic)

9.3 Derived Composites

Composite Derivation
Actor hygiene vector Mean tag weights per 30/90 days
Outlet amplification Rate of factum_assertum repetition without factum_demonstratum
Correction latency Time from contradiction emergence to correction_event
Narrative drift Edit-distance or claim-mutation rate within cluster over time

9.4 Uncertainty Bands (Not Confidence Intervals)

  • Narrow: Single primary source, minimal contradictions, settled timeline
  • Medium: Multiple sources with some contradictions, or event in cursu
  • Wide: Conflicting primary sources, high contradiction density, or emerging event

9.5 Output Requirements

  • Every scalar shown must have "Show components" option
  • Derived composites labeled as "composite"
  • All outputs reference weights_version

9.6 Prohibitions

  • NO monolithic "truth score" as primary output
  • NO ordinal scale (+2 to -2) as epistemic measure
  • NO "confidence intervals" implying statistical sampling

10. TEMPORAL HANDLING

10.1 Dormancy Management

NIGHTLY JOB:
  FOR claim IN claims WHERE state = 'active':
    IF now() - last_evidence_at > dormancy_interval:
      SET state = 'dormant'
      LOG state_change

10.2 Dormancy Intervals (Configurable by Domain)

Domain Interval
Fast news 180 days
Long investigations / legal 365 days
Historical research 5 years

10.3 Reactivation

ON new EvidenceEvent FOR claim:
  IF state = 'dormant':
    SET state = 'active'
    INCREMENT reopened_count
    SET reopened_at = now()
    LOG reactivation

10.4 Core Principle: Claims Never Resolve

Claims are NEVER marked "resolved" or "finished." They can only:

  • Go dormant (sleep)
  • Be reopened (wake)

History is always reactivatable.


11. QUALITY CHECKS

11.1 Pre-Analysis Checklist

  • [ ] Temporal anchor set
  • [ ] Evidence pack sealed
  • [ ] Source lineage documented
  • [ ] Entity resolution complete
  • [ ] Claim extraction verified

11.2 Post-Analysis Audit

  • [ ] All claims classified
  • [ ] Burden of proof assessed
  • [ ] No category errors
  • [ ] No premature attribution
  • [ ] Uncertainty explicitly acknowledged
  • [ ] Ledger integration complete
  • [ ] NO monolithic "truth score" emitted
  • [ ] Epistemic vector stored with weights_version

11.3 Self-Audit Questions

Before finalizing any analysis:

  1. Did we infer intent without evidence?
  2. Did we collapse uncertainty for narrative flow?
  3. Did we apply rules symmetrically across all actors?
  4. Did we document all assumptions?
  5. Can this analysis be reproduced exactly with the same inputs?
  6. Did we emit any scalar without decomposable components?

If yes to 1-2 or 6, or no to 3-5 → rollback and correct.


12. OUTPUT STANDARDS

12.1 Required Elements

  • Temporal context statement
  • Claim-by-claim assessment
  • Evidence citations with hashes
  • Epistemic vector (component scores)
  • Alternative explanations considered
  • Ledger reference
  • weights_version

12.2 Prohibited Elements

  • Monolithic "truth" or "confidence" scores without decomposition
  • Moral prescriptions
  • Calls to action
  • Unqualified certainty
  • Narrative smoothing
  • Uncited assumptions

13. FAILURE MODES AND MITIGATIONS

13.1 Common Failure Modes

Mode Description Mitigation
Cutoff Shock Model disbelieves post-training events Treat model priors as irrelevant
Self-Reinforcement Echo chamber from similar sources Require source diversity
Narrative Drift Claims evolve silently Track claim evolution over time
Authority Worship Trust based on speaker identity Separate actor from evidence
Temporal Confusion Relative time references Absolute timestamps required
Scalar Collapse Reducing vector to single number Vector preservation mandatory

13.2 Mitigation Protocols

  • Evidence freeze after collection
  • Primary source preference
  • Adversarial retrieval requirement
  • Lineage tracking (original vs. rewrite)
  • Deterministic clustering
  • Vector preservation: Store all epistemic components, never collapse

13.3 Anti-Reinforcement Safeguards

  • Source lineage tracking (rewrites not counted as independent)
  • Publisher class diversity requirement
  • Primary artifact preference (original > commentary)
  • Adversarial inclusion in every cluster

14. DATA MODEL REQUIREMENTS

14.1 Core Tables

entities
  id, name, type, aliases, created_at, updated_at

evidence
  id, content_hash, source_url, published_at, retrieved_at
  raw_text, metadata_json, parse_version

claims
  id, normalized_text, claim_type, time_reference
  subject_entity_id, predicate, object

assertions
  id, claim_id, evidence_id, speaker_entity_id
  stance, codex_tags, epistemic_vector, weights_version

claim_states
  claim_id, state, last_evidence_at, reopened_count
  created_at, updated_at

assessment_snapshots
  claim_id, snapshot_at, epistemic_vector
  best_supported_hypothesis, supporting_evidence_ids
  contradicting_evidence_ids, notes

14.2 Graph Relationships

claim_supports_claim (claim_id, supports_id, strength)
claim_contradicts_claim (claim_id, contradicts_id, strength)
actor_asserted_claim (actor_id, claim_id, context)
outlet_published_evidence (outlet_id, evidence_id, role)

15. INTEGRATION WITH AOS

15.1 Operational Modes

Mode Scope Gate
TUTUS Analysis only, no execution Default
RETE Sandboxed scoring validation System
PERSISTE Persistent action Human required

15.2 Ledger Integration

  • All analyses append-only to AOS ledger
  • Full provenance chain required
  • Version diff on all modifications
  • Cryptographic signatures for critical operations

15.3 Constitutional Alignment

This manual operates within the AOS Constitution:

  • Principium 1 (Human Authority): Humans retain final judgment
  • Principium 2 (Propose vs. Decide): LLMs propose, humans decide
  • Principium 3 (Honest Uncertainty): INDETERMINATA is praised
  • Principium 5 (Accountability): All decisions traceable
  • Principium 8 (Explicit Failure): Better to fail explicitly than succeed silently

APPENDIX A: QUERY EXAMPLES

Actor Analysis

"Show epistemic behavior of Actor X over time"
→ claims_by_actor + epistemic_vector_trends + correction_latency

"Compare outlets on claim introduction vs amplification"
→ outlet_claim_origination_rate vs outlet_claim_repetition_rate

Event Analysis

"Timeline of Event Y reporting"
→ documents_by_time + claim_evolution + vector_changes

"Load-bearing claims in narrative Z"
→ claim_graph_centrality + supporting_evidence_quality

System Health

"Uncertainty acknowledgment trends"
→ indeterminata_rate_over_time_by_outlet

"Category error frequency by domain"
→ jurisdiction_errors + sphere_of_influence_errors

APPENDIX B: HANDLING POST-CUTOFF EVENTS

B.1 Temporal Dislocation Rule

When a claim concerns an event that post-dates the LLM's training data:

  • The claim is NOT penalized for implausibility
  • Model disbelief is irrelevant
  • Evaluation depends solely on:
    • Number and independence of corroborating sources
    • Provenance quality
    • Consistency across evidence
    • Presence or absence of credible dispute

Training-cutoff ignorance is treated as a null prior, not a negative signal.

B.2 Resolution via Corroboration

If corroboration is unavailable:

  • Mark INDETERMINATA
  • Keep the claim indexed and dormant
  • Allow future reactivation when evidence arrives

END OF MANUAL

Version: 1.0 Date: 2026-01-09 Authority: Codex Disciplinae Epistemicae Rev. V Status: Ready for Spec Gate