AI Model Provenance and Supply Chain

**AI · Provenance · Audit trails · Copyright attestation**

AI View on GitHub → 3 docs in this dossier

Overview

AI Model Provenance and Supply Chain

AI · Provenance · Audit trails · Copyright attestation

The problem

The modern AI pipeline touches dozens of parties and artifacts:

Training datasets from multiple sources (Common Crawl, licensed content, user-contributed, synthetic).
Base models developed and licensed (Llama, Mistral, GPT, Claude).
Fine-tuned variants building on the base.
Distilled / quantized / modified derivatives.
Inference endpoints serving the models.
Applications consuming the inferences.

Claims across this chain are constantly contested:

Copyright. “This model was trained on copyrighted content we didn’t license.” Class actions and regulatory investigations are piling up. The model developer’s internal logs aren’t a sufficient answer.
Attribution. “This fine-tune is ours, don’t claim it’s yours.” Derivative work disputes.
Safety. “This model was fine-tuned for a purpose different from what’s disclosed.” (A model fine-tuned on malicious data, distributed under a benign name.)
Licensing. “This model’s license forbids commercial use, but the downstream app is commercial.” The downstream application may not even know where the model came from.
Benchmarks / capabilities. “Model X achieves Y on benchmark Z.” Self-reported; hard to independently verify.

Today: internal CSVs, training-run tags, and README files. Not cryptographic. Not verifiable by anyone other than the model’s producer.

Why Quidnug fits

AI artifacts have natural identities and relationships:

Datasets are identifiable things.
Models are derived from datasets + parent models.
Fine-tunes are derived from a specific base model and training data.
Inferences come from specific model versions.

This is a directed graph of signed claims. Quidnug’s trust + title + event model fits directly.

Problem	Quidnug primitive
”What’s this model trained on?”	Title of model + events linking to dataset quids
”Was this training authorized by data owner?”	Signed event by data owner on the model title
”What benchmarks has this model been tested on?”	Events from independent benchmarkers
”Who claims this model is safe?”	Trust edges from safety attesters
”Is this inference from the claimed model?”	Inference output bound to model quid via signature
”Has this model been fine-tuned since release?”	Model’s event stream lists all descendants

High-level architecture

     ┌─────────────────────────────────────────────────┐
     │        ai.provenance.models (domain)             │
     └─────────────────────────────────────────────────┘
                         │
      ┌──────────────────┼──────────────────┐
      │                  │                  │
      ▼                  ▼                  ▼
 Dataset quids     Base-model quids    Fine-tune quids
      │                  │                  │
      │                  │                  │
      │ TITLE:           │ TITLE:            │ TITLE:
      │ "is-training-    │ "is-derived-     │ "is-derived-
      │  data-for"       │  from-dataset"    │  from-model"
      │                  │                  │
      ▼                  ▼                  ▼
 Event streams:    Event streams:     Event streams:
 - licensed        - training-run     - fine-tune-start
 - access-granted  - benchmarks       - benchmark
 - scraped         - safety-review    - deployed
                                     - inference-count

Data model

Quids

Dataset, each training dataset has a quid. The dataset’s owner (data curator) signs its metadata.
Model, each model + version is a quid. Base models, fine-tunes, and quantized versions are distinct quids.
Model producer, the organization that trains the model; has its own guardian set for key recovery.
Benchmark org, MLCommons, HELM, etc.; publishes signed benchmark results.
Safety attester, independent safety auditors.
Rights holder, publisher, artist, code author with licensing claims.

Domain

ai.provenance                       (top)
├── ai.provenance.datasets
├── ai.provenance.models
│   ├── ai.provenance.models.foundation
│   └── ai.provenance.models.fine-tunes
├── ai.provenance.licensing
└── ai.provenance.benchmarks

Dataset title

{
  "type": "TITLE",
  "assetId": "dataset-common-crawl-2024-cc-main-en",
  "domain": "ai.provenance.datasets",
  "titleType": "training-dataset",
  "owners": [{"ownerId": "common-crawl-foundation", "percentage": 100.0}],
  "attributes": {
    "datasetHash": "<sha256 of canonicalized dataset>",
    "sizeBytes": "3.1T",
    "language": "en",
    "contentType": "web-text",
    "collectionDate": "2024-12",
    "license": "CC0",
    "licenseURL": "https://commoncrawl.org/terms-of-use/",
    "exclusions": ["copyrighted-ebook-shadow-library",
                   "known-malicious-sites"]
  },
  "signatures": {"common-crawl-foundation": "<sig>"}
}

Model title

{
  "type": "TITLE",
  "assetId": "model-acme-foundation-7b-v2",
  "domain": "ai.provenance.models.foundation",
  "titleType": "ai-model",
  "owners": [{"ownerId": "acme-ai", "percentage": 100.0}],
  "attributes": {
    "modelArchitecture": "decoder-transformer",
    "parameters": 7000000000,
    "modelHash": "<sha256 of model weights>",
    "framework": "PyTorch",
    "trainingDataRef": [
      "dataset-common-crawl-2024-cc-main-en",
      "dataset-acme-proprietary-licensed-books"
    ],
    "license": "Apache-2.0",
    "releaseDate": "2026-04-01",
    "trainingCompute": "1.2e23 FLOPs"
  },
  "signatures": {"acme-ai": "<sig>"}
}

Training run events

On the model’s stream:

1. training.started
   payload: { trainingDataRefs: [...], config: <hash>,
              startedAt: ... }
   signer: acme-ai

2. training.completed
   payload: { finalLossHash: ..., checkpointsHash: ...,
              totalFLOPs: ..., endedAt: ... }
   signer: acme-ai

3. safety.evaluated
   signer: safety-org-anthropic-evals
   payload: { evaluatorOrg: ..., evaluationHash: ...,
              redTeamReportHash: ..., overallRating: "acceptable" }

4. benchmark.submitted
   signer: mlcommons
   payload: { benchmark: "MMLU", score: 0.78, runDate: ... }

5. license.claimed
   signer: rights-holder-publisher-X
   payload: { claim: "model trained on copyrighted books ...",
              counterclaimID: null, evidenceHash: ... }

Derivative model (fine-tune)

{
  "type": "TITLE",
  "assetId": "model-widgetco-finetune-for-support",
  "domain": "ai.provenance.models.fine-tunes",
  "owners": [{"ownerId": "widget-corp", "percentage": 100.0}],
  "attributes": {
    "baseModelRef": "model-acme-foundation-7b-v2",
    "fineTuneData": "dataset-widget-support-tickets-private",
    "modelHash": "<sha256>",
    "license": "inherited + proprietary additions",
    "intendedUse": "customer support chatbot"
  },
  "signatures": {"widget-corp": "<sig>"}
}

On this title’s stream, derivation.authorized events from the base model’s owner:

derivation.authorized
  signer: acme-ai   (the base model owner)
  payload: { derivativeModelID: "model-widgetco-finetune-for-support",
             authorizedUses: ["commercial", "non-commercial"],
             termsHash: "<sha256 of license>" }

Inference attestation

When a model serves an inference, it can emit a signed inference-ran event:

eventType: "inference.ran"
subjectId: <model quid>
payload: {
  inferenceID: "inf-abc-123",
  requestHash: "<sha256 of prompt>",
  responseHash: "<sha256 of response>",
  timestamp: ...,
  computeEnv: "acme-gpu-cluster-us-east"
}
signer: model producer

A downstream consumer can verify: “The inference I received was produced by model X, running at time T.” No one can forge an inference claim from a model without that model’s producer’s key.

Consumer trust

A downstream application (e.g., an LLM-powered customer support product):

func (app *App) EvaluateModel(modelID string) ModelAssessment {
    title := app.quidnug.GetTitle(modelID)
    events := app.quidnug.GetSubjectEvents(modelID, "TITLE")

    // Check each attestation's source via relational trust
    var safetyOK bool
    for _, ev := range events {
        if ev.EventType == "safety.evaluated" {
            trust, _ := app.quidnug.GetTrust(app.quid,
                ev.Payload["evaluatorOrg"].(string),
                "ai.provenance.safety", nil)
            if trust.TrustLevel >= 0.8 {
                safetyOK = true
                break
            }
        }
    }

    // Check for license claims
    hasUnresolvedLicenseClaims := false
    for _, ev := range events {
        if ev.EventType == "license.claimed" && ev.Payload["counterclaimID"] == nil {
            // Unresolved copyright claim, risky
            claimantTrust, _ := app.quidnug.GetTrust(app.quid,
                ev.Payload["signer"].(string),
                "ai.provenance.licensing", nil)
            if claimantTrust.TrustLevel >= 0.5 {
                hasUnresolvedLicenseClaims = true
            }
        }
    }

    return ModelAssessment{
        SafetyVerified:          safetyOK,
        UnresolvedLicenseIssues: hasUnresolvedLicenseClaims,
        ReadyForProduction:      safetyOK && !hasUnresolvedLicenseClaims,
    }
}

Counter-attestations

Disputes happen. A rights holder files a license-claim event. The model producer can file a license.contested:

eventType: "license.contested"
payload: {
  contestsClaimID: <earlier event ID>,
  evidence: <hash>,
  arguments: "Model was trained on publicly available
              summaries, not full text. Summaries are
              transformative under fair use..."
}

Both claim and contest live in the record. Consumers weigh their trust in both parties. Courts (if it gets there) have a full signed evidence chain.

Key Quidnug features

Title-of-title hierarchy, dataset, base model, fine-tune all have titles; event links model them into a DAG.
Event streams per artifact, training runs, safety evals, benchmarks, license claims.
Domain hierarchy, scope trust by dataset provenance vs. model safety vs. licensing.
Relational trust, different consumers trust different safety orgs / benchmarkers.
Guardian sets, model producer’s signing keys recoverable (a lab’s HSM loss shouldn’t orphan all their published models).
Push gossip, new claims (especially safety and license) propagate immediately.

Value delivered

Dimension	Before	With Quidnug
Dataset provenance	README files	Signed title + hash; verifiable
Model-to-dataset linkage	Blog posts	Signed derivation relationship
Safety attestation	Internal labs / private audits	On-chain claims from independent attesters
License dispute evidence	Emails, depositions	Signed claim/counterclaim chain
Benchmark result verification	Self-reported	Benchmark org’s signed event
Fine-tune authorization	Contract + trust	`derivation.authorized` event
Inference authenticity	Rely on endpoint	Signed inference event
Consumer evaluation	Vendor’s marketing	Algorithmic: trust × attestations

What’s in this folder

README.md, this document
implementation.md, Quidnug API calls
threat-model.md, security analysis

Runnable POC

Full end-to-end demo at examples/ai-model-provenance/:

model_provenance.py, pure verifier: producer-trust gate, derivative base-model gate, dataset-license filter, safety strictness, benchmark requirement.
model_provenance_test.py, 14 pytest cases.
demo.py, eight-step end-to-end flow covering accept foundation, accept derivative, reject prohibited dataset, warn on missing safety.

cd examples/ai-model-provenance
python demo.py

../ai-agent-authorization/, authorizing the agent built on top of a model
../ai-content-authenticity/, provenance for AI-generated content
QDP-0005 Push Gossip

Implementation

Concrete API calls, pseudocode, signing shape.

Implementation: AI Model Provenance

1. Register a dataset

curl -X POST $NODE/api/identities -d '{
  "quidId":"common-crawl-foundation",
  "name":"Common Crawl Foundation",
  "homeDomain":"ai.provenance.datasets",
  "creator":"common-crawl-foundation","updateNonce":1
}'

# The dataset itself is a TITLE owned by the curator
curl -X POST $NODE/api/v1/titles -d '{
  "assetId":"dataset-common-crawl-2024-cc-main-en",
  "domain":"ai.provenance.datasets",
  "titleType":"training-dataset",
  "owners":[{"ownerId":"common-crawl-foundation","percentage":100.0}],
  "attributes":{
    "datasetHash":"<sha256>",
    "sizeBytes":"3.1T",
    "language":"en",
    "license":"CC0",
    "exclusions":["copyrighted-shadow-libraries"]
  },
  "signatures":{"common-crawl-foundation":"<sig>"}
}'

2. Register a base model

# Identity for the lab
curl -X POST $NODE/api/identities -d '{
  "quidId":"acme-ai-labs",
  "name":"Acme AI Labs",
  "homeDomain":"ai.provenance.models.foundation",
  "creator":"acme-ai-labs","updateNonce":1
}'

# Install a guardian set for the lab (HSM failures happen)
curl -X POST $NODE/api/v2/guardian/set-update -d '{ /* ... */ }'

# The model itself
curl -X POST $NODE/api/v1/titles -d '{
  "assetId":"model-acme-foundation-7b-v2",
  "domain":"ai.provenance.models.foundation",
  "titleType":"ai-model",
  "owners":[{"ownerId":"acme-ai-labs","percentage":100.0}],
  "attributes":{
    "modelArchitecture":"decoder-transformer",
    "parameters":7000000000,
    "modelHash":"<sha256 of weights>",
    "license":"Apache-2.0",
    "trainingDataRefs":[
      "dataset-common-crawl-2024-cc-main-en",
      "dataset-acme-proprietary-licensed-books"
    ],
    "releaseDate":"2026-04-01"
  },
  "signatures":{"acme-ai-labs":"<sig>"}
}'

3. Training run events

# When training begins
curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-acme-foundation-7b-v2",
  "subjectType":"TITLE",
  "eventType":"training.started",
  "payload":{
    "trainingDataRefs":["dataset-common-crawl-2024-cc-main-en"],
    "configHash":"<sha256 of training config>",
    "startedAt":1713400000,
    "expectedCompletion":1716000000,
    "computeEnv":"acme-gpu-cluster-us-east"
  },
  "creator":"acme-ai-labs","signature":"<sig>"
}'

# When training completes
curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-acme-foundation-7b-v2",
  "subjectType":"TITLE",
  "eventType":"training.completed",
  "payload":{
    "finalLossHash":"<sha256>",
    "checkpointsHash":"<sha256>",
    "totalFLOPs":"1.2e23",
    "endedAt":1716000000
  },
  "creator":"acme-ai-labs","signature":"<sig>"
}'

4. Safety evaluation

An independent safety org (e.g., anthropic-evals-team) runs tests and publishes:

curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-acme-foundation-7b-v2",
  "subjectType":"TITLE",
  "eventType":"safety.evaluated",
  "payload":{
    "evaluatorOrg":"anthropic-evals-team",
    "evaluationHash":"<sha256 of full report>",
    "redTeamReportHash":"<sha256>",
    "overallRating":"acceptable",
    "knownIssues":["occasional-hallucination-on-math-problems"],
    "evaluationDate":1716100000
  },
  "creator":"anthropic-evals-team","signature":"<sig>"
}'

Anthropic Evals publishes their trust from whoever views them as authoritative. Consumers doing their own trust eval weigh Anthropic’s signature by their own trust in Anthropic.

5. Benchmark submissions

MLCommons, HELM, or other benchmark orgs run tests:

curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-acme-foundation-7b-v2",
  "subjectType":"TITLE",
  "eventType":"benchmark.submitted",
  "payload":{
    "benchmark":"MMLU",
    "score":0.78,
    "benchmarkVersion":"2024.04",
    "runDate":1716200000,
    "fullResultsHash":"<sha256>"
  },
  "creator":"mlcommons","signature":"<sig>"
}'

6. Derivative (fine-tune) authorization

Widget Corp fine-tunes Acme’s model:

# First register the fine-tune as a title
curl -X POST $NODE/api/v1/titles -d '{
  "assetId":"model-widgetco-finetune-v1",
  "domain":"ai.provenance.models.fine-tunes",
  "titleType":"ai-model",
  "owners":[{"ownerId":"widget-corp","percentage":100.0}],
  "attributes":{
    "baseModelRef":"model-acme-foundation-7b-v2",
    "fineTuneDataRef":"dataset-widget-support-tickets",
    "intendedUse":"customer support",
    "license":"proprietary",
    "modelHash":"<sha256>"
  },
  "signatures":{"widget-corp":"<sig>"}
}'

# Acme signs an authorization event on the fine-tune title
curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-widgetco-finetune-v1",
  "subjectType":"TITLE",
  "eventType":"derivation.authorized",
  "payload":{
    "baseModelRef":"model-acme-foundation-7b-v2",
    "derivativeModelRef":"model-widgetco-finetune-v1",
    "authorizedUses":["commercial-internal","non-commercial-research"],
    "forbiddenUses":["generative-content-for-resale"],
    "licenseTermsHash":"<sha256 of full license terms doc>"
  },
  "creator":"acme-ai-labs","signature":"<sig>"
}'

Without Acme’s signed authorization, Widget Corp’s fine-tune’s event stream lacks the derivation.authorized event. Downstream consumers relying on that authorization can detect it.

7. Inference attestation

When the production service runs an inference:

type InferenceAttestation struct {
    InferenceID   string
    ModelRef      string
    RequestHash   string
    ResponseHash  string
    Timestamp     int64
    ComputeEnv    string
}

func (s *InferenceServer) AttestInference(req InferenceRequest, resp InferenceResponse) error {
    event := map[string]interface{}{
        "subjectId":   s.modelQuid,
        "subjectType": "TITLE",
        "eventType":   "inference.ran",
        "payload": map[string]interface{}{
            "inferenceID":   req.ID,
            "requestHash":   sha256sum(req),
            "responseHash":  sha256sum(resp),
            "timestamp":     time.Now().Unix(),
            "computeEnv":    s.computeEnv,
        },
        "creator":   s.operatorQuid,
        "signature": s.sign(/* canonical bytes */),
    }
    return s.submitEvent(event)
}

Inference consumers can later verify: “This response I claim came from model X at time T really did.” Useful for:

AI-generated content attribution
Regulatory compliance (“which model produced this recommendation?”)
Debugging: “Did the right model handle this request?“

8. License claim and contest

A publisher detects content from their books in the model’s outputs:

curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-acme-foundation-7b-v2",
  "subjectType":"TITLE",
  "eventType":"license.claimed",
  "payload":{
    "claimType":"copyright-violation",
    "claimantJurisdiction":"US",
    "evidenceHash":"<sha256>",
    "affectedWorks":["isbn-1234567890","isbn-1234567891"],
    "demandedRemedy":"cease + statutory damages"
  },
  "creator":"publisher-x","signature":"<sig>"
}'

Acme contests:

curl -X POST $NODE/api/v1/events -d '{
  "subjectId":"model-acme-foundation-7b-v2",
  "subjectType":"TITLE",
  "eventType":"license.contested",
  "payload":{
    "contestsClaimID":"<event ID of claim>",
    "argumentsHash":"<sha256 of response brief>",
    "evidenceHash":"<sha256 of training-data audit>"
  },
  "creator":"acme-ai-labs","signature":"<sig>"
}'

9. Consumer-side evaluation

func (c *Consumer) PreflightModel(modelID string) PreflightReport {
    events := c.GetEvents(modelID, "TITLE")

    report := PreflightReport{ModelID: modelID}

    for _, ev := range events {
        switch ev.EventType {
        case "safety.evaluated":
            evaluator := ev.Payload["evaluatorOrg"].(string)
            trust := c.GetTrust(c.selfQuid, evaluator, "ai.provenance.safety")
            report.SafetyAttestations = append(report.SafetyAttestations,
                SafetyRecord{Evaluator: evaluator, Rating: ev.Payload["overallRating"].(string), Trust: trust.TrustLevel})

        case "benchmark.submitted":
            bench := ev.Payload["benchmark"].(string)
            score := ev.Payload["score"].(float64)
            signerTrust := c.GetTrust(c.selfQuid, ev.Creator, "ai.provenance.benchmarks")
            report.Benchmarks = append(report.Benchmarks,
                BenchmarkResult{Benchmark: bench, Score: score, ReporterTrust: signerTrust.TrustLevel})

        case "license.claimed":
            // Check if contested
            contested := c.hasContestEvent(events, ev.ID)
            if !contested {
                report.OpenLicenseClaims = append(report.OpenLicenseClaims, ev)
            }
        }
    }

    return report
}

10. Model key rotation (producer lost HSM)

Acme’s signing HSM fails. Initiate guardian recovery:

curl -X POST $NODE/api/v2/guardian/recovery/init -d '{
  "subjectQuid":"acme-ai-labs",
  "fromEpoch":0,
  "toEpoch":1,
  "newPublicKey":"<hex>",
  "minNextNonce":1,
  "maxAcceptedOldNonce":0,
  "anchorNonce":<next>,
  "validFrom":<now>,
  "guardianSigs":[ /* Acme's CEO, CTO, CISO */ ]
}'

Post-rotation, downstream consumers still verify their historical events (those used the old-epoch key, which is still known in the ledger). New events use the new-epoch key.

11. Testing

func TestModelProvenance_DerivationChainVerification(t *testing.T) {
    // Register dataset, base model, fine-tune, auth event
    // Verify: consumer traversing from fine-tune can reach
    //   original dataset + all safety attestations
}

func TestModelProvenance_UnauthorizedFineTuneDetectable(t *testing.T) {
    // Fine-tune title created without derivation.authorized event
    // Consumer's preflight: flags missing authorization
}

func TestModelProvenance_LicenseClaimContest(t *testing.T) {
    // Publisher files claim; Acme contests
    // Consumer sees both; can decide
}

Where to go next

Threat model

Adversaries, assumed capabilities, mitigations.

Threat Model: AI Model Provenance

Assets

Provenance integrity, the cryptographic record of dataset, training, and fine-tune relationships.
Safety attestations, signed claims from evaluators.
License-dispute evidence, the full chain of claims and counter-claims.
Model producer reputations, a producer who has shipped many well-attested safe models builds trust.

Attackers

Attacker	Capability	Goal
Rogue model producer	Their own signing key	False safety claims, hide IP issues
Competitor	No access to producer’s keys	Smear via false license claims
Fake “evaluator”	Spins up a new quid claiming to be a safety org	Bogus safety endorsements
Data subject	Has their own content in training data	Force takedown via false claims
End user	Consumes inferences	Verify authenticity

Threats and mitigations

T1. Producer falsely claims safety

Attack. Acme self-publishes a safety.evaluated event claiming an independent evaluator endorsed safety, but actually signed it with their own key.

Mitigation.

Signer verification. The event is signed by whoever submitted it. If Acme submits, only Acme’s key matches. Consumers looking for “evaluator’s own attestation” check creator on the event, not just the content.
Relational trust in evaluator. A consumer’s own trust in the evaluator determines weight. Acme self-vouching counts as… Acme self-vouching, weighted only by consumer’s trust in Acme.

Residual risk. None structural. Consumer needs to understand who signs what.

T2. Fake evaluator quid

Attack. Attacker creates a quid named “Anthropic-Evals- Official” and publishes endorsements of a malicious model.

Mitigation.

Trust edges must be issued from trusted parties to the quid, consumers don’t trust a quid just because of its name. They trust it because other entities they trust have declared trust in it.
Domain ownership (if configured), the ai.provenance.safety domain’s validators can prevent random quids from claiming to be safety evaluators.

Residual risk. Social engineering (naming tricks) can confuse uninformed consumers. Mitigated by tooling that shows “trust path” prominently in UI.

T3. Competitor smear via false license claim

Attack. Competitor publishes a license.claimed event claiming Acme violated their copyright (fabricated).

Mitigation.

Acme can contest with license.contested event.
Both claim and contest are visible; consumers weigh both by their trust in each party.
Frivolous claims from low-trust entities are de-prioritized.

Residual risk. Reputational. A false claim visible on- chain may chill adoption even if contested. Market dynamics.

T4. Model producer’s key compromise

Attack. Acme’s signing key is stolen. Attacker publishes fake derivation events or fake benchmark results.

Mitigation.

Guardian recovery rotates Acme’s key. Post-rotation, attacker’s signatures at old epoch become invalid.
Anchor nonces, even with the old key, attacker can’t replay or re-use a signature.
Quick invalidation path, immediate epoch freeze via invalidation anchor.

Residual risk. Window between compromise and rotation. During window, attacker can publish events with old-epoch sig. Mitigated by monitoring (event-rate anomalies).

T5. Dataset hash forgery

Attack. Acme registers a dataset title with a fake datasetHash. Claims to have trained on a clean dataset when in fact they trained on something else.

Mitigation.

Dataset hash is deterministic. Independent auditors can replicate the hash given the dataset. A fake hash can be caught in audit.
Safety evaluator’s report (if thorough) audits the claimed training data. A safety.evaluated event from a high-trust evaluator includes this.

Residual risk. If the dataset is proprietary and no one can independently re-hash, the claim is on Acme’s word alone. Mitigated by audit norms.

T6. Fine-tune without authorization

Attack. Someone fine-tunes Acme’s model without authorization, then registers the fine-tune as their own.

Mitigation.

Missing derivation.authorized event is detectable. Consumers’ pre-flight checks flag it.
Licensing enforcement is ultimately legal; Quidnug provides the evidence trail.

T7. Inference forgery

Attack. Someone serves inferences from model X and claims they’re from model Y.

Mitigation.

Inference events signed by the model producer. Fake inferences would need the producer’s signing key.
Response hash binds the inference content to the attestation, altering the response breaks the hash match.

Residual risk. If the producer sincerely offers inferences that then get repackaged by middlemen, middlemen can misattribute. Consumer needs to verify inference event signatures, not trust middlemen.

T8. Privacy: what can be inferred from on-chain data?

Concern. The event stream reveals:

Which models exist
When they were trained
Who evaluated them
Which datasets they used
Inference counts

Mitigation / reality.

Metadata is public by design, that’s the point of provenance.
Sensitive training data stays OFF-chain; only hashes are published.
For inference privacy, emit batch events (“10,000 inferences this hour”) rather than per-inference if volume is sensitive.

T9. Replay

Attack. Attacker replays old events.

Mitigation. Anchor nonce + dedup. Same as every other use case.

T10. Fork-block abuse

Attack. Consortium fork-block changes provenance rules in a way that hides producer accountability.

Mitigation. 2/3 quorum + notice period. See ../institutional-custody/threat-model.md.

Not defended against

Physical-world copyright. Whether a model’s output “substantially resembles” copyrighted training data is a legal question, not a cryptographic one.
Model weight theft. If attacker steals Acme’s model weights, Quidnug can’t prevent it. But if they try to publish the weights under a new quid, there’s no derivation.authorized from Acme, their claim is trivially traceable.
Fine-grained safety. “Safe for adults but not kids”, that’s attribute-level granularity beyond a single event. Multiple signed evaluations with distinct contexts handle this; protocol supports it, just needs consumer-side aggregation logic.
Regulatory-mandated provenance, if the EU AI Act requires specific fields we don’t yet have, add them. Extensible by design.

AI Model Provenance and Supply Chain

Overview

AI Model Provenance and Supply Chain

The problem

Why Quidnug fits

High-level architecture

Data model

Quids

Domain

Dataset title

Model title

Training run events

Derivative model (fine-tune)

Inference attestation

Consumer trust

Counter-attestations

Key Quidnug features

Value delivered

What’s in this folder

Runnable POC

Related

Implementation: AI Model Provenance

1. Register a dataset

2. Register a base model

3. Training run events

4. Safety evaluation

5. Benchmark submissions

6. Derivative (fine-tune) authorization

7. Inference attestation

8. License claim and contest

9. Consumer-side evaluation

10. Model key rotation (producer lost HSM)

11. Testing

Where to go next

Threat Model: AI Model Provenance

Assets

Attackers

Threats and mitigations

T1. Producer falsely claims safety

T2. Fake evaluator quid

T3. Competitor smear via false license claim

T4. Model producer’s key compromise

T5. Dataset hash forgery

T6. Fine-tune without authorization

T7. Inference forgery

T8. Privacy: what can be inferred from on-chain data?

T9. Replay

T10. Fork-block abuse

Not defended against

References