Version: 1.0 Authority: Subordinate to Codex Disciplinae Epistemicae Rev. V Nature: Operational Directive Language: English (working document)
1. PURPOSE AND SCOPE
This manual provides operational procedures for implementing the Codex Disciplinae Epistemicae within the AOS framework. It transforms Codex principles into executable workflows.
Core Principle:
"We do not decide what is true. We document how well claims survive contact with evidence over time."
This manual governs how analysis is performed, not what conclusions are reached.
2. ARCHITECTURAL PRINCIPLES
2.1 Separation of Concerns (Mandatory)
The system implements strict role separation across three functions:
| Role | Function | Prohibitions |
|---|---|---|
| Collector | Deterministic evidence acquisition | No interpretation, no plausibility filtering |
| Researcher | Structural extraction and normalization | No truth assessment, no synthesis across sources |
| Analyst | Codex-bound evaluation and scoring | No web browsing, no independent fact-checking |
Invariant: No single role may both acquire and judge evidence.
2.2 Two-Tier Pipeline
Tier A — Collector (Broad, Cheap, Deterministic)
- Ingests articles via deterministic sources (RSS, Atom, direct URLs)
- Stores raw content, metadata, and normalized text
- Never reasons, scores, summarizes, or infers
- Never rejects content for "implausibility"
Tier B — Analyzer (Selective, Governed, Budgeted)
- Operates only on articles admitted through gating rules
- Applies Codex scoring, claim extraction, and temporal comparison
- May conclude INDETERMINATA at any time
- Must cite stored evidence for every inference
2.3 Deterministic Foundation
- Same inputs → same analysis outputs
- All random elements explicitly seeded
- Evidence packs are sealed and immutable after collection
- Version control on all transformations
2.4 Temporal Grounding
- All analyses timestamped with absolute times
- Relative references ("today", "yesterday") converted to absolutes at ingest
- Timezone explicitly recorded
- NOW_UTC and USER_TZ provided by host environment
3. EVIDENCE HANDLING PROTOCOLS
3.1 Collector Operations
INPUT: Topic bundle, source allowlist, time window
OUTPUT: Evidence Pack
PROCESS:
1. Fetch canonical documents via RSS/Atom
2. Store raw content + metadata
3. Generate content hash (SHA256)
4. Create immutable artifact record
5. Append to ledger with retrieval timestamp
RULES:
- No interpretation
- No plausibility filtering
- Same query → same results
3.2 Evidence Pack Structure
{
"evidence_id": "sha256:...",
"source_url": "canonical",
"published_at": "ISO8601",
"retrieved_at": "ISO8601",
"content_hash": "sha256:...",
"raw_text": "extracted",
"metadata": {
"publisher": "",
"author": "",
"section": "",
"word_count": 0
}
}
3.3 Evidence Pack Immutability (Invariant)
An Evidence Pack, once created for an analysis run, is an immutable input artifact.
New information requires:
- A new Evidence Pack
- A new run_id
- Explicit linking to prior packs if relevant
4. RESEARCHER OPERATIONS
4.1 Extraction Protocol
INPUT: Evidence Pack
OUTPUT: Structured Claims
TASKS:
1. Named entity identification
2. Claim boundary detection
3. Claim extraction (verbatim quotes with offsets)
4. Attribution mapping (who said what)
5. Temporal reference normalization
6. Source lineage tracking (original vs. derivative reporting)
PROHIBITIONS:
- No truth assessment
- No plausibility judgment
- No synthesis across sources
- No scoring or weighting
4.2 Claim Normalization Format
{
"claim_id": "uuid",
"source_evidence_id": "sha256:...",
"verbatim_text": "",
"normalized_text": "",
"speaker_entity_id": "uuid|unknown",
"claim_type": "factual|interpretive|predictive|normative",
"time_reference": {
"absolute": "ISO8601|null",
"relative": "quoted_string",
"normalized": "ISO8601"
},
"entities_mentioned": ["uuid1", "uuid2"],
"span_offsets": [start, end]
}
5. GATING RULES (Analyzer Admission)
No article enters analysis unless it triggers at least one gate.
5.1 Valid Gates
| Gate | Trigger Condition |
|---|---|
| Actor Gate | Mentions tracked high-salience actors |
| Event Gate | References tracked or emerging events |
| Novelty Gate | Introduces materially new facts vs. last 72h cluster |
| Contradiction Gate | Conflicts with prior reporting |
| Claim Risk Gate | Makes strong claims with weak sourcing |
| Policy Impact Gate | Claims retaliation, coercion, or systemic abuse |
| Scope Gate | Explicitly requested by user or reviewer |
5.2 Gate Priority (When Multiple Trigger)
Contradiction > Novelty > Actor/Event > Claim Risk > Policy Impact > Scope
All triggered gates are logged, even if not the deciding factor.
5.3 Non-Gated Articles
Articles that do not pass a gate remain:
- Stored (raw content preserved)
- Indexed (metadata searchable)
- NOT analyzed (no Codex pipeline execution)
6. CLUSTERING BEFORE ANALYSIS (Required)
6.1 Purpose
Prevent duplicate spending on rewrites of the same story.
6.2 Process
Before analysis:
- Articles MUST be clustered by content similarity
- Each cluster selects:
- Primary Representative: Highest detail, best sourcing
- Adversarial Sample (optional): Divergent framing from different publisher domain
- All other articles inherit cluster context and are not analyzed independently
6.3 Deterministic Adversarial Retrieval (v0)
For each cluster, Collector MUST include:
- Primary: Highest similarity representative
- Diversity: One from different publisher domain
- Adversarial: Maximally dissimilar article within cluster above similarity threshold
Selection based on content hashes and similarity metrics (deterministic).
6.4 Clustering Algorithm
v0 specification: MinHash + LSH on normalized clean text Parameters: Configurable, versioned, logged with each run
7. BUDGET GOVERNOR (Hard Constraint)
7.1 Analysis Limits
Analysis operates under explicit limits:
- Maximum articles per time window
- Maximum tokens per run
- Maximum compute time per run
When budget is exhausted, analysis stops.
7.2 Priority Queue
P0: Active events with new evidence (highest priority)
P1: Tracked actors with novel claims
P2: Contextual background
P3: Indexed only (no analysis)
7.3 Budget Parameters
- Daily analysis budget: [configurable]
- Per-run budget: [configurable]
- Adjustment: Configuration-only, not ad hoc
8. ANALYST OPERATIONS (CODEX PIPELINE)
8.1 Pipeline Execution Order
1. TEMPORAL ANCHOR SET
- Record NOW_LOCAL, NOW_UTC, TIMEZONE
- Convert all relative time references
2. SEMANTIC PARSING
- Split into logical units
- Identify proposition boundaries
3. CLAIM CLASSIFICATION (Codex II)
- Apply genus classification
- Flag mixed-genus statements
4. BURDEN OF PROOF CHECK (Codex III)
- Identify missing evidence
- Assess sufficiency
- Apply power asymmetry adjustment
5. TEMPORAL CONTEXT (Codex IV)
- Event maturity assessment
- Flag premature certainty
- Apply temporal humility
6. JURISDICTION CHECK (Codex V)
- Identify category errors
- Flag extra-jurisdictional claims
7. COERCION DETECTION (Codex VI)
- Violence laundering check
- Structural invalidity test
- Normalization detection
8. ESSENTIALISM DETECTION (Codex VII)
- Group attribution errors
- Composition/division fallacies
9. POWER SPHERE ANALYSIS (Codex VIII)
- Descriptive vs. justificatory use
- Sovereignty violations
10. INTENT INFERENCE LIMITS (Codex IX)
- Pattern-based inference rules
- Prohibition of premature attribution
11. TECHNOLOGY LAUNDERING (Codex X)
- AI output classification
- False authority detection
12. LIFECYCLE ASSIGNMENT (Codex XII)
- Active/Dormant/Reopened/Deprecated
- Dormancy timer check
13. LEDGER INTEGRATION (Codex XIII)
- Version history check
- Mutation tracking
- Provenance chain validation
14. CALCULUS EPISTEMICUS
- Vector assignment
- Component scoring
15. EXIT ASSIGNMENT (Codex XV)
- Final classification
9. CALCULUS EPISTEMICUS IMPLEMENTATION
9.1 Vector Structure
{
"assertion_id": "uuid",
"epistemic_vector": {
"factum_demonstratum": 1.0,
"factum_assertum": 0.2,
"indeterminata": 0.0,
"refutata": -1.0,
"error_categoriae": -0.8,
"overclaim_certainty": -0.6,
"launderatio_technologica": -1.0,
"correction_event": 0.4
},
"weights_version": "calculus_weights_v1",
"derived_composites": {
"hygiene_score": 0.45,
"evidence_strength": 0.7,
"uncertainty_band": "wide"
}
}
9.2 Weight Table Rule
- Weights live in versioned config file:
calculus_weights_vN.json - Each analysis snapshot records
weights_version - Same vector + same weights → same composites (deterministic)
9.3 Derived Composites
| Composite | Derivation |
|---|---|
| Actor hygiene vector | Mean tag weights per 30/90 days |
| Outlet amplification | Rate of factum_assertum repetition without factum_demonstratum |
| Correction latency | Time from contradiction emergence to correction_event |
| Narrative drift | Edit-distance or claim-mutation rate within cluster over time |
9.4 Uncertainty Bands (Not Confidence Intervals)
- Narrow: Single primary source, minimal contradictions, settled timeline
- Medium: Multiple sources with some contradictions, or event in cursu
- Wide: Conflicting primary sources, high contradiction density, or emerging event
9.5 Output Requirements
- Every scalar shown must have "Show components" option
- Derived composites labeled as "composite"
- All outputs reference weights_version
9.6 Prohibitions
- NO monolithic "truth score" as primary output
- NO ordinal scale (+2 to -2) as epistemic measure
- NO "confidence intervals" implying statistical sampling
10. TEMPORAL HANDLING
10.1 Dormancy Management
NIGHTLY JOB:
FOR claim IN claims WHERE state = 'active':
IF now() - last_evidence_at > dormancy_interval:
SET state = 'dormant'
LOG state_change
10.2 Dormancy Intervals (Configurable by Domain)
| Domain | Interval |
|---|---|
| Fast news | 180 days |
| Long investigations / legal | 365 days |
| Historical research | 5 years |
10.3 Reactivation
ON new EvidenceEvent FOR claim:
IF state = 'dormant':
SET state = 'active'
INCREMENT reopened_count
SET reopened_at = now()
LOG reactivation
10.4 Core Principle: Claims Never Resolve
Claims are NEVER marked "resolved" or "finished." They can only:
- Go dormant (sleep)
- Be reopened (wake)
History is always reactivatable.
11. QUALITY CHECKS
11.1 Pre-Analysis Checklist
- [ ] Temporal anchor set
- [ ] Evidence pack sealed
- [ ] Source lineage documented
- [ ] Entity resolution complete
- [ ] Claim extraction verified
11.2 Post-Analysis Audit
- [ ] All claims classified
- [ ] Burden of proof assessed
- [ ] No category errors
- [ ] No premature attribution
- [ ] Uncertainty explicitly acknowledged
- [ ] Ledger integration complete
- [ ] NO monolithic "truth score" emitted
- [ ] Epistemic vector stored with weights_version
11.3 Self-Audit Questions
Before finalizing any analysis:
- Did we infer intent without evidence?
- Did we collapse uncertainty for narrative flow?
- Did we apply rules symmetrically across all actors?
- Did we document all assumptions?
- Can this analysis be reproduced exactly with the same inputs?
- Did we emit any scalar without decomposable components?
If yes to 1-2 or 6, or no to 3-5 → rollback and correct.
12. OUTPUT STANDARDS
12.1 Required Elements
- Temporal context statement
- Claim-by-claim assessment
- Evidence citations with hashes
- Epistemic vector (component scores)
- Alternative explanations considered
- Ledger reference
- weights_version
12.2 Prohibited Elements
- Monolithic "truth" or "confidence" scores without decomposition
- Moral prescriptions
- Calls to action
- Unqualified certainty
- Narrative smoothing
- Uncited assumptions
13. FAILURE MODES AND MITIGATIONS
13.1 Common Failure Modes
| Mode | Description | Mitigation |
|---|---|---|
| Cutoff Shock | Model disbelieves post-training events | Treat model priors as irrelevant |
| Self-Reinforcement | Echo chamber from similar sources | Require source diversity |
| Narrative Drift | Claims evolve silently | Track claim evolution over time |
| Authority Worship | Trust based on speaker identity | Separate actor from evidence |
| Temporal Confusion | Relative time references | Absolute timestamps required |
| Scalar Collapse | Reducing vector to single number | Vector preservation mandatory |
13.2 Mitigation Protocols
- Evidence freeze after collection
- Primary source preference
- Adversarial retrieval requirement
- Lineage tracking (original vs. rewrite)
- Deterministic clustering
- Vector preservation: Store all epistemic components, never collapse
13.3 Anti-Reinforcement Safeguards
- Source lineage tracking (rewrites not counted as independent)
- Publisher class diversity requirement
- Primary artifact preference (original > commentary)
- Adversarial inclusion in every cluster
14. DATA MODEL REQUIREMENTS
14.1 Core Tables
entities
id, name, type, aliases, created_at, updated_at
evidence
id, content_hash, source_url, published_at, retrieved_at
raw_text, metadata_json, parse_version
claims
id, normalized_text, claim_type, time_reference
subject_entity_id, predicate, object
assertions
id, claim_id, evidence_id, speaker_entity_id
stance, codex_tags, epistemic_vector, weights_version
claim_states
claim_id, state, last_evidence_at, reopened_count
created_at, updated_at
assessment_snapshots
claim_id, snapshot_at, epistemic_vector
best_supported_hypothesis, supporting_evidence_ids
contradicting_evidence_ids, notes
14.2 Graph Relationships
claim_supports_claim (claim_id, supports_id, strength)
claim_contradicts_claim (claim_id, contradicts_id, strength)
actor_asserted_claim (actor_id, claim_id, context)
outlet_published_evidence (outlet_id, evidence_id, role)
15. INTEGRATION WITH AOS
15.1 Operational Modes
| Mode | Scope | Gate |
|---|---|---|
| TUTUS | Analysis only, no execution | Default |
| RETE | Sandboxed scoring validation | System |
| PERSISTE | Persistent action | Human required |
15.2 Ledger Integration
- All analyses append-only to AOS ledger
- Full provenance chain required
- Version diff on all modifications
- Cryptographic signatures for critical operations
15.3 Constitutional Alignment
This manual operates within the AOS Constitution:
- Principium 1 (Human Authority): Humans retain final judgment
- Principium 2 (Propose vs. Decide): LLMs propose, humans decide
- Principium 3 (Honest Uncertainty): INDETERMINATA is praised
- Principium 5 (Accountability): All decisions traceable
- Principium 8 (Explicit Failure): Better to fail explicitly than succeed silently
APPENDIX A: QUERY EXAMPLES
Actor Analysis
"Show epistemic behavior of Actor X over time"
→ claims_by_actor + epistemic_vector_trends + correction_latency
"Compare outlets on claim introduction vs amplification"
→ outlet_claim_origination_rate vs outlet_claim_repetition_rate
Event Analysis
"Timeline of Event Y reporting"
→ documents_by_time + claim_evolution + vector_changes
"Load-bearing claims in narrative Z"
→ claim_graph_centrality + supporting_evidence_quality
System Health
"Uncertainty acknowledgment trends"
→ indeterminata_rate_over_time_by_outlet
"Category error frequency by domain"
→ jurisdiction_errors + sphere_of_influence_errors
APPENDIX B: HANDLING POST-CUTOFF EVENTS
B.1 Temporal Dislocation Rule
When a claim concerns an event that post-dates the LLM's training data:
- The claim is NOT penalized for implausibility
- Model disbelief is irrelevant
- Evaluation depends solely on:
- Number and independence of corroborating sources
- Provenance quality
- Consistency across evidence
- Presence or absence of credible dispute
Training-cutoff ignorance is treated as a null prior, not a negative signal.
B.2 Resolution via Corroboration
If corroboration is unavailable:
- Mark INDETERMINATA
- Keep the claim indexed and dormant
- Allow future reactivation when evidence arrives
END OF MANUAL
Version: 1.0 Date: 2026-01-09 Authority: Codex Disciplinae Epistemicae Rev. V Status: Ready for Spec Gate