Methods_manual_ver1_english

Version: 1.0 Authority: Subordinate to Codex Disciplinae Epistemicae Rev. V Nature: Operational Directive Language: English (working document)

1. PURPOSE AND SCOPE

This manual provides operational procedures for implementing the Codex Disciplinae Epistemicae within the AOS framework. It transforms Codex principles into executable workflows.

Core Principle:

"We do not decide what is true. We document how well claims survive contact with evidence over time."

This manual governs how analysis is performed, not what conclusions are reached.

2. ARCHITECTURAL PRINCIPLES

2.1 Separation of Concerns (Mandatory)

The system implements strict role separation across three functions:

Role	Function	Prohibitions
Collector	Deterministic evidence acquisition	No interpretation, no plausibility filtering
Researcher	Structural extraction and normalization	No truth assessment, no synthesis across sources
Analyst	Codex-bound evaluation and scoring	No web browsing, no independent fact-checking

Invariant: No single role may both acquire and judge evidence.

2.2 Two-Tier Pipeline

Tier A — Collector (Broad, Cheap, Deterministic)

Ingests articles via deterministic sources (RSS, Atom, direct URLs)
Stores raw content, metadata, and normalized text
Never reasons, scores, summarizes, or infers
Never rejects content for "implausibility"

Tier B — Analyzer (Selective, Governed, Budgeted)

Operates only on articles admitted through gating rules
Applies Codex scoring, claim extraction, and temporal comparison
May conclude INDETERMINATA at any time
Must cite stored evidence for every inference

2.3 Deterministic Foundation

Same inputs → same analysis outputs
All random elements explicitly seeded
Evidence packs are sealed and immutable after collection
Version control on all transformations

2.4 Temporal Grounding

All analyses timestamped with absolute times
Relative references ("today", "yesterday") converted to absolutes at ingest
Timezone explicitly recorded
NOW_UTC and USER_TZ provided by host environment

3. EVIDENCE HANDLING PROTOCOLS

3.1 Collector Operations

INPUT:  Topic bundle, source allowlist, time window
OUTPUT: Evidence Pack

PROCESS:
  1. Fetch canonical documents via RSS/Atom
  2. Store raw content + metadata
  3. Generate content hash (SHA256)
  4. Create immutable artifact record
  5. Append to ledger with retrieval timestamp

RULES:
  - No interpretation
  - No plausibility filtering
  - Same query → same results

3.2 Evidence Pack Structure

{
  "evidence_id": "sha256:...",
  "source_url": "canonical",
  "published_at": "ISO8601",
  "retrieved_at": "ISO8601",
  "content_hash": "sha256:...",
  "raw_text": "extracted",
  "metadata": {
    "publisher": "",
    "author": "",
    "section": "",
    "word_count": 0
  }
}

3.3 Evidence Pack Immutability (Invariant)

An Evidence Pack, once created for an analysis run, is an immutable input artifact.

New information requires:

A new Evidence Pack
A new run_id
Explicit linking to prior packs if relevant

4. RESEARCHER OPERATIONS

4.1 Extraction Protocol

INPUT:  Evidence Pack
OUTPUT: Structured Claims

TASKS:
  1. Named entity identification
  2. Claim boundary detection
  3. Claim extraction (verbatim quotes with offsets)
  4. Attribution mapping (who said what)
  5. Temporal reference normalization
  6. Source lineage tracking (original vs. derivative reporting)

PROHIBITIONS:
  - No truth assessment
  - No plausibility judgment
  - No synthesis across sources
  - No scoring or weighting

4.2 Claim Normalization Format

{
  "claim_id": "uuid",
  "source_evidence_id": "sha256:...",
  "verbatim_text": "",
  "normalized_text": "",
  "speaker_entity_id": "uuid|unknown",
  "claim_type": "factual|interpretive|predictive|normative",
  "time_reference": {
    "absolute": "ISO8601|null",
    "relative": "quoted_string",
    "normalized": "ISO8601"
  },
  "entities_mentioned": ["uuid1", "uuid2"],
  "span_offsets": [start, end]
}

5. GATING RULES (Analyzer Admission)

No article enters analysis unless it triggers at least one gate.

5.1 Valid Gates

Gate	Trigger Condition
Actor Gate	Mentions tracked high-salience actors
Event Gate	References tracked or emerging events
Novelty Gate	Introduces materially new facts vs. last 72h cluster
Contradiction Gate	Conflicts with prior reporting
Claim Risk Gate	Makes strong claims with weak sourcing
Policy Impact Gate	Claims retaliation, coercion, or systemic abuse
Scope Gate	Explicitly requested by user or reviewer

5.2 Gate Priority (When Multiple Trigger)

Contradiction > Novelty > Actor/Event > Claim Risk > Policy Impact > Scope

All triggered gates are logged, even if not the deciding factor.

5.3 Non-Gated Articles

Articles that do not pass a gate remain:

Stored (raw content preserved)
Indexed (metadata searchable)
NOT analyzed (no Codex pipeline execution)

6. CLUSTERING BEFORE ANALYSIS (Required)

6.1 Purpose

Prevent duplicate spending on rewrites of the same story.

6.2 Process

Before analysis:

Articles MUST be clustered by content similarity
Each cluster selects:
- Primary Representative: Highest detail, best sourcing
- Adversarial Sample (optional): Divergent framing from different publisher domain
All other articles inherit cluster context and are not analyzed independently

6.3 Deterministic Adversarial Retrieval (v0)

For each cluster, Collector MUST include:

Primary: Highest similarity representative
Diversity: One from different publisher domain
Adversarial: Maximally dissimilar article within cluster above similarity threshold

Selection based on content hashes and similarity metrics (deterministic).

6.4 Clustering Algorithm

v0 specification: MinHash + LSH on normalized clean text Parameters: Configurable, versioned, logged with each run

7. BUDGET GOVERNOR (Hard Constraint)

7.1 Analysis Limits

Analysis operates under explicit limits:

Maximum articles per time window
Maximum tokens per run
Maximum compute time per run

When budget is exhausted, analysis stops.

7.2 Priority Queue

P0: Active events with new evidence (highest priority)
P1: Tracked actors with novel claims
P2: Contextual background
P3: Indexed only (no analysis)

7.3 Budget Parameters

Daily analysis budget: [configurable]
Per-run budget: [configurable]
Adjustment: Configuration-only, not ad hoc

8. ANALYST OPERATIONS (CODEX PIPELINE)

8.1 Pipeline Execution Order

 1. TEMPORAL ANCHOR SET
    - Record NOW_LOCAL, NOW_UTC, TIMEZONE
    - Convert all relative time references

 2. SEMANTIC PARSING
    - Split into logical units
    - Identify proposition boundaries

 3. CLAIM CLASSIFICATION (Codex II)
    - Apply genus classification
    - Flag mixed-genus statements

 4. BURDEN OF PROOF CHECK (Codex III)
    - Identify missing evidence
    - Assess sufficiency
    - Apply power asymmetry adjustment

 5. TEMPORAL CONTEXT (Codex IV)
    - Event maturity assessment
    - Flag premature certainty
    - Apply temporal humility

 6. JURISDICTION CHECK (Codex V)
    - Identify category errors
    - Flag extra-jurisdictional claims

 7. COERCION DETECTION (Codex VI)
    - Violence laundering check
    - Structural invalidity test
    - Normalization detection

 8. ESSENTIALISM DETECTION (Codex VII)
    - Group attribution errors
    - Composition/division fallacies

 9. POWER SPHERE ANALYSIS (Codex VIII)
    - Descriptive vs. justificatory use
    - Sovereignty violations

10. INTENT INFERENCE LIMITS (Codex IX)
    - Pattern-based inference rules
    - Prohibition of premature attribution

11. TECHNOLOGY LAUNDERING (Codex X)
    - AI output classification
    - False authority detection

12. LIFECYCLE ASSIGNMENT (Codex XII)
    - Active/Dormant/Reopened/Deprecated
    - Dormancy timer check

13. LEDGER INTEGRATION (Codex XIII)
    - Version history check
    - Mutation tracking
    - Provenance chain validation

14. CALCULUS EPISTEMICUS
    - Vector assignment
    - Component scoring

15. EXIT ASSIGNMENT (Codex XV)
    - Final classification

9. CALCULUS EPISTEMICUS IMPLEMENTATION

9.1 Vector Structure

{
  "assertion_id": "uuid",
  "epistemic_vector": {
    "factum_demonstratum": 1.0,
    "factum_assertum": 0.2,
    "indeterminata": 0.0,
    "refutata": -1.0,
    "error_categoriae": -0.8,
    "overclaim_certainty": -0.6,
    "launderatio_technologica": -1.0,
    "correction_event": 0.4
  },
  "weights_version": "calculus_weights_v1",
  "derived_composites": {
    "hygiene_score": 0.45,
    "evidence_strength": 0.7,
    "uncertainty_band": "wide"
  }
}

9.2 Weight Table Rule

Weights live in versioned config file: calculus_weights_vN.json
Each analysis snapshot records weights_version
Same vector + same weights → same composites (deterministic)

9.3 Derived Composites

Composite	Derivation
Actor hygiene vector	Mean tag weights per 30/90 days
Outlet amplification	Rate of factum_assertum repetition without factum_demonstratum
Correction latency	Time from contradiction emergence to correction_event
Narrative drift	Edit-distance or claim-mutation rate within cluster over time

9.4 Uncertainty Bands (Not Confidence Intervals)

Narrow: Single primary source, minimal contradictions, settled timeline
Medium: Multiple sources with some contradictions, or event in cursu
Wide: Conflicting primary sources, high contradiction density, or emerging event

9.5 Output Requirements

Every scalar shown must have "Show components" option
Derived composites labeled as "composite"
All outputs reference weights_version

9.6 Prohibitions

NO monolithic "truth score" as primary output
NO ordinal scale (+2 to -2) as epistemic measure
NO "confidence intervals" implying statistical sampling

10. TEMPORAL HANDLING

10.1 Dormancy Management

NIGHTLY JOB:
  FOR claim IN claims WHERE state = 'active':
    IF now() - last_evidence_at > dormancy_interval:
      SET state = 'dormant'
      LOG state_change

10.2 Dormancy Intervals (Configurable by Domain)

Domain	Interval
Fast news	180 days
Long investigations / legal	365 days
Historical research	5 years

10.3 Reactivation

ON new EvidenceEvent FOR claim:
  IF state = 'dormant':
    SET state = 'active'
    INCREMENT reopened_count
    SET reopened_at = now()
    LOG reactivation

10.4 Core Principle: Claims Never Resolve

Claims are NEVER marked "resolved" or "finished." They can only:

Go dormant (sleep)
Be reopened (wake)

History is always reactivatable.

11. QUALITY CHECKS

11.1 Pre-Analysis Checklist

[ ] Temporal anchor set
[ ] Evidence pack sealed
[ ] Source lineage documented
[ ] Entity resolution complete
[ ] Claim extraction verified

11.2 Post-Analysis Audit

[ ] All claims classified
[ ] Burden of proof assessed
[ ] No category errors
[ ] No premature attribution
[ ] Uncertainty explicitly acknowledged
[ ] Ledger integration complete
[ ] NO monolithic "truth score" emitted
[ ] Epistemic vector stored with weights_version

11.3 Self-Audit Questions

Before finalizing any analysis:

Did we infer intent without evidence?
Did we collapse uncertainty for narrative flow?
Did we apply rules symmetrically across all actors?
Did we document all assumptions?
Can this analysis be reproduced exactly with the same inputs?
Did we emit any scalar without decomposable components?

If yes to 1-2 or 6, or no to 3-5 → rollback and correct.

12. OUTPUT STANDARDS

12.1 Required Elements

Temporal context statement
Claim-by-claim assessment
Evidence citations with hashes
Epistemic vector (component scores)
Alternative explanations considered
Ledger reference
weights_version

12.2 Prohibited Elements

Monolithic "truth" or "confidence" scores without decomposition
Moral prescriptions
Calls to action
Unqualified certainty
Narrative smoothing
Uncited assumptions

13. FAILURE MODES AND MITIGATIONS

13.1 Common Failure Modes

Mode	Description	Mitigation
Cutoff Shock	Model disbelieves post-training events	Treat model priors as irrelevant
Self-Reinforcement	Echo chamber from similar sources	Require source diversity
Narrative Drift	Claims evolve silently	Track claim evolution over time
Authority Worship	Trust based on speaker identity	Separate actor from evidence
Temporal Confusion	Relative time references	Absolute timestamps required
Scalar Collapse	Reducing vector to single number	Vector preservation mandatory

13.2 Mitigation Protocols

Evidence freeze after collection
Primary source preference
Adversarial retrieval requirement
Lineage tracking (original vs. rewrite)
Deterministic clustering
Vector preservation: Store all epistemic components, never collapse

13.3 Anti-Reinforcement Safeguards

Source lineage tracking (rewrites not counted as independent)
Publisher class diversity requirement
Primary artifact preference (original > commentary)
Adversarial inclusion in every cluster

14. DATA MODEL REQUIREMENTS

14.1 Core Tables

entities
  id, name, type, aliases, created_at, updated_at

evidence
  id, content_hash, source_url, published_at, retrieved_at
  raw_text, metadata_json, parse_version

claims
  id, normalized_text, claim_type, time_reference
  subject_entity_id, predicate, object

assertions
  id, claim_id, evidence_id, speaker_entity_id
  stance, codex_tags, epistemic_vector, weights_version

claim_states
  claim_id, state, last_evidence_at, reopened_count
  created_at, updated_at

assessment_snapshots
  claim_id, snapshot_at, epistemic_vector
  best_supported_hypothesis, supporting_evidence_ids
  contradicting_evidence_ids, notes

14.2 Graph Relationships

claim_supports_claim (claim_id, supports_id, strength)
claim_contradicts_claim (claim_id, contradicts_id, strength)
actor_asserted_claim (actor_id, claim_id, context)
outlet_published_evidence (outlet_id, evidence_id, role)

15. INTEGRATION WITH AOS

15.1 Operational Modes

Mode	Scope	Gate
TUTUS	Analysis only, no execution	Default
RETE	Sandboxed scoring validation	System
PERSISTE	Persistent action	Human required

15.2 Ledger Integration

All analyses append-only to AOS ledger
Full provenance chain required
Version diff on all modifications
Cryptographic signatures for critical operations

15.3 Constitutional Alignment

This manual operates within the AOS Constitution:

Principium 1 (Human Authority): Humans retain final judgment
Principium 2 (Propose vs. Decide): LLMs propose, humans decide
Principium 3 (Honest Uncertainty): INDETERMINATA is praised
Principium 5 (Accountability): All decisions traceable
Principium 8 (Explicit Failure): Better to fail explicitly than succeed silently

APPENDIX A: QUERY EXAMPLES

Actor Analysis

"Show epistemic behavior of Actor X over time"
→ claims_by_actor + epistemic_vector_trends + correction_latency

"Compare outlets on claim introduction vs amplification"
→ outlet_claim_origination_rate vs outlet_claim_repetition_rate

Event Analysis

"Timeline of Event Y reporting"
→ documents_by_time + claim_evolution + vector_changes

"Load-bearing claims in narrative Z"
→ claim_graph_centrality + supporting_evidence_quality

System Health

"Uncertainty acknowledgment trends"
→ indeterminata_rate_over_time_by_outlet

"Category error frequency by domain"
→ jurisdiction_errors + sphere_of_influence_errors

APPENDIX B: HANDLING POST-CUTOFF EVENTS

B.1 Temporal Dislocation Rule

When a claim concerns an event that post-dates the LLM's training data:

The claim is NOT penalized for implausibility
Model disbelief is irrelevant
Evaluation depends solely on:
- Number and independence of corroborating sources
- Provenance quality
- Consistency across evidence
- Presence or absence of credible dispute

Training-cutoff ignorance is treated as a null prior, not a negative signal.

B.2 Resolution via Corroboration

If corroboration is unavailable:

Mark INDETERMINATA
Keep the claim indexed and dormant
Allow future reactivation when evidence arrives

END OF MANUAL

Version: 1.0 Date: 2026-01-09 Authority: Codex Disciplinae Epistemicae Rev. V Status: Ready for Spec Gate