Independence and Decay: The Structural Evidence Gap in AI Mediated Information
AI assistants change how they describe companies every day. These shifts occur even when no change happens inside the enterprise. Model behaviour drifts because generative systems are not stable information processors. They are adaptive systems whose outputs move with every inference update, retrieval adjustment, or safety filter change.
Once these assistants influence customer expectations, analyst questions, and regulatory scrutiny, representation drift becomes a material risk. No model provider can verify or certify its own outputs. Decay is measurable. Independence is required. Their combination creates an evidence gap that only an external layer can close.
1. Independence is a structural requirement
A model provider controls the entire chain that produces an answer.
This includes:
- training data selection
- ranking logic
- inference pipelines
- policy filters
- retrieval weighting
- model merges
- version rollouts
Because the provider controls every mechanism, it cannot certify the accuracy or consistency of its own representations. This is not an ethical concern. It is a structural limitation. A system cannot act as both generator and verifier of the same output.
Audit grade verification requires:
- external evidence storage
- second source comparison
- version controlled snapshots
- prompt and output hashing
- independent cross model arbitration
These elements cannot originate inside a single platform. Any internal dashboard is self reporting, not verification.
2. Decay is measurable and predictable
Generative models shift outputs as a consequence of how they manage relevance, cost, and context. The drift is not a malfunction. It is the rational outcome of latency optimisation, retrieval pruning, model merges, and reinforcement cycles.
A thirty day decay study on ten global brands produced the following results across three leading assistants: GPT 5.1, Gemini 3.x Advanced, and Claude 3.7 Sonnet.
| Assistant | Day 1 Output | Day 30 Output | Change |
|---|---|---|---|
| GPT 5.1 | “Low emissions intensity in its region” | “Lagging on emissions intensity relative to peers” | Full inversion |
| Gemini 3.x Advanced | “Model A is the recommended choice this year” | “Model A appears discontinued and is not recommended” | False discontinuation |
| Claude 3.7 Sonnet | “Brand is widely seen as highly reliable” | “Brand shows mixed reliability and potential issues” | Summary drift |
None of the companies changed their disclosures, products, or certifications during the test window. The drift came solely from model side behaviour.
PSOS variation across the three assistants ranged from eight percent to forty two percent. This range is large enough to alter purchasing decisions, analyst interpretations, and compliance perceptions.
Decay is not noise. It is the default behaviour of model weighted systems.
3. Independence and decay cannot be separated
Decay requires measurement.
Measurement requires independence.
Independence requires a second system that does not rely on a single provider’s inference logic.
The chain is linear.
If decay exists, a verifying entity must exist.
If the verifying entity is the platform itself, the evidence is not independent.
If evidence is not independent, it cannot satisfy auditors, regulators, or boards.
If model behaviour diverges across providers, verification must take place across systems.
A single model provider cannot measure divergence across competing models.
This makes cross model, independent evidence the only viable control structure.
4. Provider incentives diverge from enterprise requirements
Model providers optimise for output reliability measured by user satisfaction and latency. They do not optimise for enterprise level stability. Several operational incentives create systematic divergence from governance needs:
- retrieval layers are modified to reduce inference cost
- ranking logic is adjusted to increase output confidence
- model merges produce unannounced shifts in answer weighting
- safety filters remove context relevant to regulated industries
- training data updates introduce silent distributional changes
None of these incentives favour stable, cross time representations of companies.
They favour user flow and compute efficiency.
This is why provider dashboards cannot serve as verification mechanisms. The incentives that drive their internal measurements differ from the incentives that drive enterprise risk controls.
5. Micro evidence
Two examples illustrate the severity.
Example A: Regulatory profile drift
Prompt: “Is Company X above or below peers on emissions intensity”
Output:
Day 1: “Among the lowest in the region”
Day 30: “Lagging relative to peers”
Reason: updated relevance weighting in retrieval.
Example B: False product discontinuation
Prompt: “Which model from Brand Y is most reliable this year”
Output:
Day 3: “Model A is the recommended choice”
Day 19: “Model A is discontinued”
Reason: hallucinated SKU status due to compressed retrieval.
These changes alter commercial and regulatory conclusions.
6. Method
- ten brands
- three leading assistants: GPT 5.1, Gemini 3.x Advanced, Claude 3.7 Sonnet
- thirty days
- identical prompts
- version locked where possible
- prompt hashes
- output hashes
- timestamped logs
- PSOS computed independently
- no optimisation behaviour
- no assistant specific adaptation
This method is sufficient for audit reproducibility.
7. Governance implication
AI assistants have become part of the external information environment. Boards are now accountable for how their companies are represented in these systems. Provider dashboards cannot meet governance requirements because they lack independence and cannot detect cross model divergence.
An external evidence layer is not optional.
It is the only mechanism that satisfies basic audit logic and risk oversight.
Independence is required.
Decay is measurable.
The combination makes verification unavoidable.