Does EU AI Act Article 12 apply to my AI agent?

Article 12 applies to all high-risk AI systems defined in Annex III of Regulation (EU) 2024/1689 and to certain general-purpose models. For UK and EU fintechs, the most common in-scope uses are creditworthiness scoring, claims triage, certain insurance pricing models, biometric identification, and AI systems used in employment, education or essential-services access. If your agent influences a regulated decision about a natural person, assume it is in scope until classified otherwise.

When is the Article 12 deadline?

High-risk AI provisions of the EU AI Act, including Article 12 record-keeping obligations, become enforceable on 2 August 2026. General-purpose model obligations applied from 2 August 2025. Prohibited-practice provisions (Article 5) applied from 2 February 2025.

What records must Article 12 logs contain?

Article 12 requires automatic recording of events over the lifetime of the system, sufficient to support traceability appropriate to the intended purpose. In practice this means per-action capture of tool invocations, model calls, data classes touched, decision metadata, sub-agent activity, and modification events. Logs must enable identification of risk situations under Article 79 and substantial modifications that may trigger re-conformity assessment.

How long must Article 12 logs be retained?

At minimum six months under Article 19, unless Union or Member State law requires longer. For UK firms regulated by the FCA, parallel SYSC requirements typically extend retention to five to seven years. Sector law in healthcare, payments and insurance frequently extends further. Build for the longest applicable retention, not the AI Act minimum.

What is the fine for breaching Article 12?

Under Article 99, non-compliance with provider obligations including Article 12 can attract administrative fines of up to €15 million or 3% of global annual turnover, whichever is higher. Supplying incorrect or misleading information to authorities can attract up to €7.5 million or 1%. Article 5 prohibitions can attract up to €35 million or 7%. Member States can apply higher amounts in specific circumstances.

Will Datadog, Splunk or our existing logging tools satisfy Article 12?

No. The AI Office has emphasised that records cannot be reconstructed after the fact from disparate sources. Datadog and Splunk capture HTTP-level events but do not capture agent-internal state — which tool was selected, what decision was made, what data class was touched. Compliance automation platforms such as Drata and Vanta address static controls and do not capture runtime activity. Article 12 requires capture at the agent runtime layer, with tamper-evidence, scaled to the intended purpose.

EU AI Act Article 12 explained — 2026 guide

Published 7 June 2026 · Updated 23 June 2026 · 16 min read · By Hak Turkel

The text of Article 12 is short. Three paragraphs. The operational implications are enormous, and most firms reading the text the first time underestimate the engineering work required to comply. Below is what Article 12 actually says, what regulators have signalled they expect to see, and what production AI systems need to do between now and 2 August 2026.

A high-risk AI system, under Article 6 and Annex III of the EU AI Act, is any AI system used as a safety component of a regulated product or deployed in one of eight categories — including biometrics, critical infrastructure, education, employment, access to essential services, law enforcement, migration, justice administration, and democratic processes. For UK and EU fintechs, the most common in-scope use cases are creditworthiness scoring, claims triage, and certain insurance pricing models.

Below is the operational summary: each Article 12 requirement, what it means in practice for a production AI agent, and what Agent Audit's EU AI Act solution does to satisfy it.

Article 12 requirement	What it means in practice	What Agent Audit does
Automatic recording over the system lifecycle	Every action the agent takes is captured at the time it happens — no manual instrumentation, no after-the-fact reconstruction.	SDK adapters wrap the agent runtime (OpenAI Agents SDK, Claude Agent SDK, MCP, LangChain, CrewAI) and emit a receipt for every tool call, model call, sub-agent spawn, and decision.
Identification of risk situations (Article 79)	The log must let you spot — and produce evidence of — risk events: discrimination, data-protection breaches, unsafe outputs.	Anomaly alerts on unusual data access, off-policy actions and confidence drift. Signed Slack/Teams notifications with receipt event IDs for follow-up.
Post-market monitoring (Article 72)	Continuous, structured evidence of how the system behaves in production — not just incident logs.	Operator dashboard surfaces volume, latency, decision distribution and resource-access patterns per agent, per session, per data class.
Tamper-evidence (regulator must be able to trust the record)	Logs must be provably unchanged since capture. A regulator should not have to take the firm's word for it.	SHA-256 hash chain across receipts within each session + RFC 3161 timestamping of chain heads. Independent offline verify via `agentaudit-verify`.
Six-month minimum retention (Article 19)	Logs must be retrievable for at least six months. Sector law (FCA, MIFID II) often extends to 5–7 years.	Tiered retention: 90-day hot for query, 7-year cold archive (Starter); indefinite cold + customer-held keys (Enterprise).
Availability to authorities on request	On a regulator request, you must be able to produce a focused evidence pack for a named system over a named period, in a format the auditor accepts.	One-click EU AI Act Article 12 evidence pack with hash-chain proof, signed manifest, notarisation tokens, and auditor-friendly PDF + machine-readable JSON.

What the text says.

Article 12 of Regulation (EU) 2024/1689 covers "Record-keeping". It applies to all high-risk AI systems as defined in Annex III and to certain general-purpose models. It requires three things:

The system shall technically allow for the automatic recording of events over the lifetime of the system.
The logging capabilities shall ensure a level of traceability of the system's functioning that is appropriate to the intended purpose.
Logs shall in particular enable the monitoring of operation of the high-risk AI system with regard to situations that may result in the AI system presenting a risk within the meaning of Article 79 or undergoing substantial modification.

In plain English: the system must log its own behaviour, automatically, in a way that supports both routine monitoring and incident investigation.

What "automatic" means in practice.

The AI Office has emphasised that records cannot be reconstructed after the fact from disparate sources. The logging must be a property of the system itself, captured at the time of action, not assembled later from multiple inputs.

This rules out most current approaches in production. Datadog and Splunk capture HTTP-level events but do not capture the agent-internal state — which tool was selected, what decision was made, what data class was touched. Custom logging written by engineers tends to be incomplete and inconsistent across services. Compliance automation platforms like Drata and Vanta address static controls and don't capture runtime activity at all. For a longer treatment of why these tools fall short, see why existing logs don't work for AI agents.

What "appropriate to the intended purpose" means.

The regulation gives operators flexibility on the format and depth of logging, scaled to the intended use of the system. A high-risk system affecting consumer creditworthiness will face higher scrutiny than one automating internal process tasks. But the threshold is not zero — any Annex III system must log enough to allow an authority to reconstruct operational behaviour during a defined period.

The specific things logs must enable.

Article 12 paragraph 3 lists the situations the logs must support identification of:

Situations where the system may present a risk under Article 79 (the post-market monitoring regime)
Situations where the system has undergone substantial modification (which may require re-conformity assessment)
For systems used at the workplace, where natural persons interact with them, the operational period and the natural persons involved in verifying the output

Retention.

Logs must be kept for a period appropriate to the intended purpose of the AI system, and at minimum six months unless Union or Member State law requires otherwise. For most regulated UK firms, the practical minimum will be seven years — driven by FCA SYSC retention requirements that apply in parallel to the AI Act.

Availability to authorities.

Logs must be made available to national competent authorities on request. In the UK, this means the supervisory authority designated to oversee AI under the forthcoming UK AI regulatory framework — currently the ICO, FCA and relevant sector regulators. In the EU, the national market surveillance authority. The form of the disclosure is not prescribed, but the AI Office has indicated that they expect to receive evidence in a format that supports independent analysis — not raw log files.

The penalty regime.

Article 99 sets the administrative fine framework:

Up to €35 million or 7% of global annual turnover for non-compliance with the Article 5 prohibitions
Up to €15 million or 3% for non-compliance with provider obligations including Article 12
Up to €7.5 million or 1% for supplying incorrect or misleading information to authorities

Member States can apply higher amounts in specific circumstances. For UK firms in scope of the AI Act extraterritorially, the enforcement mechanism will be cooperation between EU market surveillance authorities and UK supervisors.

What an operational answer looks like.

A defensible Article 12 implementation has five properties:

Automatic capture at the agent runtime layer. Not assembled from HTTP logs after the fact. Captured at the point of decision, tool invocation, or data access.
Cryptographic integrity. The log must be tamper-evident. Hash-chain or equivalent that makes retroactive editing mathematically detectable.
Sufficient granularity. Per-action, with input/output fingerprints, data classification, decision metadata, and tool identification.
Retention with cryptographic continuity. Cold storage that does not break the chain. Verification reproducible from any read-only copy.
Regulator-acceptable output format. A pack the authority can read, indexed and signed, with a verification methodology independent of the system operator.

Two worked examples.

Abstract requirements only become tractable once you map them to a concrete agent. Below are two examples drawn from in-scope use cases we see most often in UK fintech and insurance.

Example 1 — A credit decisioning agent.

A consumer lender deploys an LLM-orchestrated agent that ingests applicant data, calls an internal scoring model, retrieves bureau data, flags potential affordability concerns, and returns a recommend/decline decision to a human underwriter. The agent is Annex III high-risk (point 5(b) — creditworthiness assessment).

A defensible Article 12 record must include, at minimum, for each decision: a receipt for the inbound application, hash-linked to all tool calls (bureau API, scoring model, fraud check), the model version and prompt hash, the data class touched at each step (PII grade, special category data flag), the decision output with confidence, and the identity of the human underwriter who approved or overrode. If the agent's reasoning changed because the bureau response was delayed and a fallback ran, the receipt must capture that branch too. A regulator investigating an alleged discriminatory pattern needs to reproduce the decision path from the receipts alone.

What this rules out: a "log line per HTTP request" approach. Two identical applications can produce different outcomes for legitimate reasons — bureau timeout, tool-routing variance, prompt re-evaluation — and the log must let an auditor see which branch ran and why.

Example 2 — An insurance claims triage agent.

A motor insurer deploys an agent that reads a first-notification-of-loss, classifies the claim by severity and fraud risk, pulls prior policy history, decides routing (fast-track payout, human adjuster, fraud investigation), and drafts the customer reply. This sits at the intersection of Annex III high-risk and FCA conduct expectations.

The Article 12 record must capture the classification rationale, the routing decision and any policy data accessed. The FCA SYSC layer adds a further requirement: traceability of the customer outcome against the firm's "consumer duty" obligations. In practice this means the same receipt must serve two regulators with different questions — the AI Act asks "did the system behave appropriately?", the FCA asks "did the customer get a fair outcome?". A pack format that answers only one question is worth less than half a pack that answers both. This is why we publish a separate FCA solution view alongside the EU AI Act solution.

How to scope your AI agent inventory.

Firms typically undercount their in-scope agents by a factor of two or three. The audit pattern that surfaces missing agents:

Start from regulated decisions, not from systems. List every decision the firm makes about a natural person that is regulated under Annex III, FCA conduct rules, or sector law. For each, ask: is any part of this decision touched by an AI model or agent?
Walk the data flow. An agent that does not make the final decision but ranks, filters, or enriches the input is still in scope. Pre-decision filters are a common omission.
Include retired models still serving traffic. Models marked "deprecated" in an MLOps dashboard but still routed to in shadow mode or in rollback paths are in scope until the route is gone.
Include vendor and SaaS-embedded models. Article 25 places provider obligations on you if you put the system into service under your own name, even if a vendor trained it. A SaaS feature called "AI assistant" inside a procured platform is yours to log.
Audit the inventory quarterly. The provider/deployer/importer distinction shifts as you swap vendors and ship new features. Treat the inventory as a live register, not a one-off.

We publish a 23-point version of this scoping exercise as the Article 12 operational checklist — free, no email gate.

Article 12 alongside SOC 2, ISO 42001 and NIST AI RMF.

Article 12 is not the only AI logging regime your firm will be measured against. Most teams will face four overlapping demands. The good news is that a single receipt format, well designed, satisfies all four — because they ask variations of the same question, framed for different audiences.

Regime	Audience	Core ask	Overlap with Article 12
EU AI Act Art. 12	EU market surveillance authorities	Automatic, tamper-evident, per-action records over the system lifetime.	—
SOC 2 (Trust Services Criteria)	Enterprise procurement, B2B auditors	Operating effectiveness of controls over a period — including change management and processing integrity.	The Article 12 receipt is the strongest "processing integrity" evidence available for AI agents. See Article 12 vs SOC 2.
ISO/IEC 42001:2023	AI management system auditors	A documented AI management system across lifecycle — risk, data, deployment, monitoring.	The receipt feeds clauses 8.2 (operational planning) and 9.1 (monitoring & measurement).
NIST AI RMF 1.0	US federal and federal-aligned buyers	Govern / Map / Measure / Manage — evidence-led continuous risk management.	The receipt is the "Measure" telemetry NIST asks for and that most teams cannot produce.
FCA SYSC / Consumer Duty	UK supervisor (financial services)	Adequate systems & controls, retention of records, fair customer outcomes.	Drives the longer retention horizon (5–7 years) — see retention vs notarisation.

Build the receipt format to satisfy the strictest of these, not the lowest common denominator. Retrofitting later is significantly more expensive than designing once for the union.

The 2 August 2026 timeline.

If you are starting now, a realistic countdown looks like:

Months 1–2 — inventory; classify against Annex III; engage auditor on the format.
Months 3–5 — instrument the agent runtime with capture, hash-chaining and notarisation. Decide build vs buy. The RFC 3161 timestamping decision is upstream of everything else.
Months 6–8 — produce a dry-run evidence pack against synthetic production data. Walk the pack through with your auditor and at least one external counsel.
Months 9–10 — operationalise post-market monitoring under Article 72 (the dashboard piece — alerts, anomaly review, signed notifications).
Months 11–12 — internal go-live; production cut-over with the receipt format frozen at least 30 days before 2 August 2026.

The single biggest scheduling error we see is treating the pack format as a "we'll do it last" deliverable. The pack format is what your auditor signs off, and they have a queue. Get the format frozen first; the SDK instrumentation is more parallelisable.

What firms should do now.

If you're a UK firm preparing for the deadline:

Inventory every AI agent in production. Most firms underestimate their count by 2–3x.
Classify each against Annex III. Be honest — claims triage at an insurer or credit decisioning at a lender are in scope.
Assess your current logging against the five operational properties above. Most firms will fail at properties 1, 2, and 5.
Decide build vs buy. The build cost for the SDK alone is two engineering quarters; the pack format catalogue is a permanent commitment because regulations evolve.
Engage your auditor early. The format they accept is the format you need to produce. Get them in the room before you commit to a platform.

Frequently asked questions.

Does EU AI Act Article 12 apply to my AI agent?: Article 12 applies to all high-risk AI systems defined in Annex III of Regulation (EU) 2024/1689 and to certain general-purpose models. For UK and EU fintechs, the most common in-scope uses are creditworthiness scoring, claims triage, certain insurance pricing models, biometric identification, and AI systems used in employment, education or essential-services access. If your agent influences a regulated decision about a natural person, assume it is in scope until classified otherwise.
When is the Article 12 deadline?: High-risk AI provisions of the EU AI Act, including Article 12 record-keeping obligations, become enforceable on 2 August 2026. General-purpose model obligations applied from 2 August 2025. Prohibited-practice provisions (Article 5) applied from 2 February 2025.
What records must Article 12 logs contain?: Article 12 requires automatic recording of events over the lifetime of the system, sufficient to support traceability appropriate to the intended purpose. In practice this means per-action capture of tool invocations, model calls, data classes touched, decision metadata, sub-agent activity, and modification events. Logs must enable identification of risk situations under Article 79 and substantial modifications that may trigger re-conformity assessment.
How long must Article 12 logs be retained?: At minimum six months under Article 19, unless Union or Member State law requires longer. For UK firms regulated by the FCA, parallel SYSC requirements typically extend retention to five to seven years. Sector law in healthcare, payments and insurance frequently extends further. Build for the longest applicable retention, not the AI Act minimum — see retention vs notarisation for the engineering trade-offs.
What is the fine for breaching Article 12?: Up to €15 million or 3% of global annual turnover, whichever is higher, for non-compliance with provider obligations including Article 12. Supplying incorrect or misleading information to authorities can attract up to €7.5 million or 1%. Article 5 prohibitions can attract up to €35 million or 7%. Member States can apply higher amounts in specific circumstances.
Will Datadog, Splunk or our existing logging tools satisfy Article 12?: No. The AI Office has emphasised that records cannot be reconstructed after the fact from disparate sources. Datadog and Splunk capture HTTP-level events but do not capture agent-internal state — which tool was selected, what decision was made, what data class was touched. Compliance automation platforms such as Drata and Vanta address static controls and do not capture runtime activity. See what RFC 3161 timestamping is and why it matters for the cryptographic side of the requirement.

Start free → install in 5 minutes →