The call arrives on a Thursday afternoon. A federal investigator — from the Office of Inspector General, the SEC, or a state health department — has questions about an AI system your organisation uses to make decisions. They want documentation. They want it by Monday morning. You have forty-eight hours.
What happens next depends entirely on work you either did or did not do before that call arrived. In our experience, most mid-market operators have not done it. Not because they are careless — most are thoughtful, well-run organisations — but because no one told them the documentation they needed was different from the documentation they already had.
This essay is about that gap. It is about what regulators actually ask for, why the standard enterprise document stack does not answer those questions, and what the forty-eight hours look like when you are prepared versus when you are not.
What triggers the call.
Federal AI inquiries do not arrive from nowhere. They are triggered by something: a complaint from a patient or customer, a whistleblower report, a pattern flagged in routine data analysis, or — increasingly — a proactive sweep by a regulator that has decided to examine how a class of organisations is using a specific type of AI system.
The Office of Inspector General has been explicit about its interest in AI. Its 2024 work plan included examination of AI tools used in Medicare and Medicaid billing, prior authorisation, and clinical decision support. The SEC, following its March 2024 enforcement actions against Delphia USA and Global Predictions — two investment advisers charged with making false and misleading statements about their AI capabilities — has signalled that it considers AI disclosure a material obligation, not a marketing question. State health departments are increasingly auditing AI-assisted prior authorisation tools following a wave of denial-rate complaints.
The common thread is this: the regulator is not asking whether you used AI. It is asking whether you knew what your AI was doing, whether you documented it, and whether you were honest about it.
The document list that surprises most operators.
When a federal investigator asks for AI documentation, they are not asking for your IT security policy or your data retention schedule. They are asking for a specific set of artefacts that most mid-market organisations do not maintain as a matter of course. Based on our work with clients who have navigated these inquiries, the request typically includes some version of the following.
A system inventory. A complete list of every AI system in production — not just the ones you know about, but every automated decision-making tool, every vendor-provided model, every internally built algorithm that influences a regulated outcome. Most organisations cannot produce this list in forty-eight hours because they have never assembled it.
Model documentation. For each system on the inventory, investigators want to understand what the model does, what data it was trained on, what its known limitations are, and what human oversight exists. This is the substance of what the AI governance community calls a "model card" — a structured summary of a model's purpose, performance, and risks.
Audit trails. A record of what the system decided, when, and on what basis. For AI systems that influence billing, clinical decisions, or financial advice, the absence of decision-level logging is itself a finding. Investigators cannot reconstruct what happened if the system did not record it.
Policy and training records. Evidence that the organisation had a policy governing AI use, that relevant staff were trained on it, and that the policy was reviewed at a cadence that kept pace with the technology.
Incident records. Documentation of any known failures, errors, or unexpected outputs — and evidence of how those incidents were handled.
The most common finding in our pre-audit reviews: the organisation cannot produce a complete list of AI systems in production. Vendors have been onboarded by individual departments. Integrations have accumulated. The left hand does not know what the right hand deployed.
The organisation has a data sheet for the AI tool it purchased from a major vendor. It has nothing for the three internally built automation tools, the two vendor integrations added by the operations team, or the predictive model the finance department built in Python eighteen months ago.
The system makes decisions. The decisions are not logged at the decision level — only at the transaction level. Reconstructing what the model decided, and why, requires a forensic exercise that takes weeks, not hours.
The AI use policy was written, signed, and filed. No one has run a tabletop exercise. No one has tested whether the policy's requirements are actually being met in production. The policy describes an operating model that does not exist.
Two versions of the same morning.
The difference between a prepared organisation and an unprepared one is not visible until the call arrives. Then it becomes very visible, very quickly.
In the unprepared version, the forty-eight hours look like this. The general counsel calls the CTO. The CTO calls the head of data. The head of data calls the vendor. The vendor's support team is in a different time zone. The inventory does not exist, so someone starts building one from memory. The model documentation does not exist, so someone starts writing it retrospectively — which is not documentation, it is reconstruction, and investigators know the difference. By Monday morning, the organisation has produced a partial response that raises more questions than it answers.
In the prepared version, the forty-eight hours look like this. The general counsel opens a shared drive. The AI registry is there — current as of last Friday, when the weekly governance meeting updated it. The model cards are there, one per system. The audit trail export takes four hours, not four days, because the logging infrastructure was built to produce it. The incident log is there. The policy and training records are there. The response is complete, accurate, and produced without panic.
The difference between these two versions is not a matter of scale or budget. We have seen the prepared version at organisations with forty employees. We have seen the unprepared version at organisations with four thousand. The variable is discipline, not size.
The forty-eight hour window is not a test of your technology. It is a test of your governance. The technology is the easy part. The governance is the work. — Field note, Q1 audit review
Three frameworks that are already in force.
The instinct to defer AI governance investment until the regulatory picture "clarifies" is understandable. The regulatory picture is genuinely complex. But three frameworks are already in force and already being used as the basis for enforcement.
The NIST AI Risk Management Framework, published in January 2023, is voluntary but is increasingly referenced by federal agencies as the standard of care for responsible AI use. Its four functions — Govern, Map, Measure, and Manage — map directly onto the documentation that investigators request. An organisation that has implemented the NIST AI RMF has, by definition, most of what it needs to respond to a federal inquiry.
The EU AI Act, which entered into force in August 2024, is not directly applicable to US domestic operations — but it is applicable to any organisation operating in or selling into the EU, and its documentation requirements for high-risk AI systems are becoming the de facto global standard. Organisations that build their governance programs to EU AI Act standards will find themselves over-prepared for US regulatory inquiries, not under-prepared.
ISO/IEC 42001:2023, the international standard for AI management systems, provides a structured framework for the kind of continuous governance that regulators expect. It is not a compliance checkbox — it is an operating model. Organisations that have implemented it have a defensible position in any regulatory inquiry.
The underwriter is asking the same questions.
There is a parallel pressure that is arriving faster than federal regulation for many mid-market operators: cyber insurance underwriting. Insurers have begun asking, in renewal questionnaires, whether organisations maintain an AI system inventory, whether they have model documentation, and whether they have an AI incident response plan.
The organisations that cannot answer these questions are finding their coverage limited, their premiums elevated, or their applications declined. The logic is identical to the regulatory logic: an organisation that cannot tell you what AI systems it operates cannot credibly represent the risk profile of those systems.
This is not a future concern. It is a current one. We have worked with clients who discovered, at renewal, that their existing cyber policy excluded AI-related incidents because they had not disclosed AI system use at the time of underwriting. The disclosure gap is a coverage gap.
Four artefacts, one cadence.
The good news is that the documentation regulators and underwriters request is not exotic. It is the documentation that good governance produces as a byproduct. The question is not whether to produce it — it is whether to produce it before or after the call arrives.
The four artefacts that close the gap are: an AI system registry, model cards for each system, decision-level audit logging, and an incident log. None of these requires a large team or a sophisticated platform. We have seen them maintained in a Google Sheet, a Notion workspace, and a shared drive. The form does not matter. The discipline does.
The cadence that keeps them current is a weekly governance meeting — forty-five minutes, four people, one agenda item: what changed in the registry this week, and does anything require action? The output is a one-page minute. The minute is the audit trail for the governance process itself.
An organisation that has these four artefacts and this cadence can respond to a federal inquiry in forty-eight hours. An organisation that does not will spend the forty-eight hours building, retrospectively, what it should have built prospectively — and the retrospective version will not be as good, and investigators will know it.
The inventory is the starting line.
If your organisation has not started, the starting line is the inventory. Put four people in a room — a legal or compliance lead, an IT or data lead, an operations lead, and a recorder — for two hours. List, by name, every AI system your organisation uses to make or influence a decision. Include vendor-provided tools. Include internally built tools. Include the automation the operations team built in a spreadsheet.
The list will be incomplete. That is correct. Version zero of the inventory is not the destination — it is the starting line. By the third week of weekly updates, it will be more accurate than anything you could have produced in a single session. By the sixth week, you will have model cards for the highest-risk systems. By the twelfth week, you will have a governance position that can survive a Monday morning request.
The forty-eight hours are coming. The question is whether you will spend them responding or scrambling.
