Risk Management

NIST AI RMF Evidence Requirements: What Govern and Measure Actually Demand

Published 30 June 2026 · KairoNull · 7 min read

The NIST AI Risk Management Framework (AI RMF 1.0) was released in January 2023. It has become the primary reference framework for AI governance in US federal contexts, and it is increasingly cited by international organisations as a cross-jurisdictional benchmark alongside the EU AI Act and ISO 42001.

The AI RMF is structured around four core functions: Govern, Map, Measure, and Manage. The framework is voluntary, but its adoption is increasingly expected as a baseline for organisations handling federal contracts, financial regulation, or critical infrastructure. And critically, for organisations operating in both the US and EU, the AI RMF's evidence requirements are substantively aligned with EU AI Act Article 12.

The Four Functions and Their Evidence Demands

Each function in the AI RMF has distinct evidence requirements. The most technically demanding are Govern and Measure:

GOVERN

Policies, accountability structures, and organisational practices for AI risk management. Requires documented evidence that governance policies are actually followed, not just stated. Auditors look for evidence of policy application at the system level, not just policy documents.

MAP

Contextual understanding of AI risk: who is affected, what harms could arise, what regulatory context applies. Requires documentation of the risk identification process and evidence that context-specific risks were actually assessed for each system.

MEASURE

Quantitative and qualitative assessment of AI risks and impacts. Requires documented measurement results that can be traced to specific system states at specific times. General performance metrics are insufficient; measurement must be traceable to evidence.

MANAGE

Prioritisation and response to identified risks. Requires documented evidence that risk responses were implemented and their effectiveness assessed. Corrective actions must be traceable to the original risk evidence that triggered them.

What GOVERN 1.7 Specifically Requires

GOVERN 1.7 is the subcategory that most directly addresses AI audit trails. It states that processes should be in place to log and maintain records of AI system design, development, deployment, and decommissioning decisions and activities. The word "log" here is being used in its governance sense, not its technical sense.

In the governance context, logging means creating records that can serve as evidence in an investigation, dispute, or regulatory review. A technical log that can be modified by a database administrator is a record. An immutable, timestamped, hash-chained record is evidence. The AI RMF expects evidence, not records.

GOVERN 1.7 explicitly addresses traceability. The framework states that organisations should be able to trace AI decisions back to the specific system version, data inputs, and organisational decisions that produced them. This is not possible with standard application logs, which lack model versioning, input capture, and tamper-evidence.

MEASURE 2.5: Continuous Monitoring Requirements

MEASURE 2.5 requires that AI systems be monitored for performance and alignment with intended behaviour on a continuous basis, not just at deployment or during scheduled audits. Critically, the monitoring must produce documented, retained evidence of what was measured, when, and under what conditions.

The implication is that evidence accumulation must be always-on from the day an AI system is deployed. An organisation that begins accumulating evidence in response to a regulatory enquiry has already failed the continuous monitoring requirement. Evidence must exist for the entire operational period of the system, not just the period following a triggering event.

What MEASURE 2.5 rejects

Periodic manual audits of system performance
After-the-fact reconstruction of system state
Aggregate performance metrics without decision-level tracing
Dashboard screenshots as evidence of compliance
Vendor attestations that cannot be independently verified
Evidence accumulated only after a problem is identified

What MEASURE 2.5 requires

Continuous automated evidence capture from day one
Decision-level records, not aggregate metrics
Tamper-evident records that cannot be backdated
Independently verifiable evidence with no vendor dependency
Traceable chain from AI decision to monitoring record
Retention across the full operational lifetime of the system

MEASURE 2.6: Bias and Fairness Evidence

MEASURE 2.6 addresses bias testing and fairness evaluation. It requires that bias assessments be documented with evidence, and that the evidence must support the conclusions drawn. An assertion that a model is fair, without documented evidence of what was measured and how, does not satisfy MEASURE 2.6.

The evidence requirement here is particularly demanding because bias assessments are contested. If an organisation claims a model is fair and that claim is challenged, the organisation must be able to produce the specific evidence on which the claim rested. Evidence that cannot be shown to have been captured at the time of the assessment, and that could have been produced retrospectively to support a predetermined conclusion, will not be accepted.

Tamper-evident, timestamped records of model outputs, produced at the time the system was in production, are the only form of evidence that survives this scrutiny. Post-hoc testing on a preserved model version is a useful supplement but cannot substitute for contemporaneous capture.

The FedRAMP and Financial Regulation Intersection

For organisations seeking FedRAMP authorisation or operating under US financial regulation, AI RMF alignment is increasingly a baseline expectation rather than a best-practice aspiration. The OFR, OCC, and FDIC have all issued AI risk management guidance that references or is substantively aligned with the AI RMF.

The SEC's 2023 AI disclosure requirements for investment advisers, while primarily focused on conflicts of interest, also require documentation of how AI systems influence investment recommendations. The CFPB has been explicit that automated decision-making in consumer credit must be supported by documentation capable of sustaining adverse action notice requirements.

In each case, the pattern is the same: regulators are requiring evidence of AI system behaviour at the time of the decision, captured in a form that cannot be altered retroactively. The AI RMF provides the framework. Tamper-evident evidence infrastructure is the implementation.

Aligning EU AI Act Article 12 with NIST AI RMF

Organisations operating across jurisdictions are increasingly looking for evidence infrastructure that satisfies both EU AI Act Article 12 and NIST AI RMF requirements from a single implementation. The technical requirements of the two frameworks are substantively compatible:

Automatic capture (Article 12) maps directly to continuous monitoring evidence (MEASURE 2.5)
Tamper-evident records (Article 12) satisfy immutability requirements (GOVERN 1.7)
RFC3161 timestamping (Article 12) satisfies timestamp integrity requirements in both frameworks
Independent verifiability (Article 12) addresses the vendor-independent verification expectation in AI RMF
Six-month minimum retention (Article 12) is a floor; AI RMF expects full operational lifetime coverage

Building to Article 12 technical standards satisfies AI RMF evidence requirements with margin. The cryptographic properties required for EU regulatory admissibility exceed what US frameworks currently mandate, making an Article 12-compliant evidence layer a future-proof investment for cross-jurisdictional organisations.

Practical Implementation: What Must Change

Most organisations looking at AI RMF alignment discover that their gap is not in policy or governance structure but in evidence generation. They have the governance documents. They lack the evidence layer that demonstrates the governance is actually applied to AI system behaviour in production.

The practical changes required are:

Deploy an evidence capture layer that operates at the AI system level, not the application level
Ensure every AI decision is captured with input values, model version, output, and RFC3161 timestamp
Chain records with SHA-256 so that any retrospective modification is immediately detectable
Configure retention to cover the full operational lifetime of each AI system, not a fixed rolling window
Implement a verification mechanism that allows auditors to confirm record integrity without vendor access

None of these requirements necessitate changes to the AI systems themselves. They require an evidence infrastructure layer that observes, timestamps, and chains AI decisions. The AI system continues to operate unchanged. The evidence layer operates in parallel.

NIST AI RMF-aligned evidence infrastructure, deployed in 90 days

KairoNull's Umbra Trust Protocol deploys as an evidence layer above your existing AI systems. No model changes. No retraining. Every AI decision captured, timestamped with RFC3161, and chained with SHA-256 into a tamper-evident ledger that satisfies both NIST AI RMF and EU AI Act Article 12 from a single implementation.

Book a 30-min scoping call