Most organisations treat AI deployment as the finish line. They spend months evaluating vendors, negotiating contracts, running pilots, and managing change. The model goes live. The project is declared a success. The project team moves on. And then, six months later, something goes wrong — and no one is watching.
The something that goes wrong is usually not dramatic. It is not a catastrophic failure. It is a slow drift: the model's accuracy degrades as the real-world data it encounters diverges from the data it was trained on. The false positive rate climbs. The escalation rate — the proportion of cases the AI cannot handle and routes to a human — increases. The cost per query rises. No one notices because no one is measuring, and no one is measuring because no one scheduled the meeting.
The AI operating meeting is the meeting that catches these things. It is not a strategy meeting. It is not a vendor review. It is not a board presentation. It is a sixty-minute operational review of the AI systems your organisation has deployed, conducted by the people who operate them, at a regular cadence, with the authority to act on what they find.
This is how to run it.
Clearing the conceptual ground.
Before describing what the meeting is, it is worth being precise about what it is not — because the most common failure mode is conflating it with a different kind of meeting.
The AI operating meeting is not a strategy meeting. Strategy meetings ask: should we use AI? What AI should we invest in? What is our AI roadmap? Those are important questions. They belong in a different meeting, with a different attendee list, on a different cadence. The operating meeting asks: how is the AI we have deployed performing today? That is a different question, and it requires different data and different decision rights.
The AI operating meeting is not a vendor review. Vendor reviews ask: is the vendor meeting its contractual obligations? What is the roadmap for the product? Those questions belong in a quarterly business review with the vendor. The operating meeting is an internal review of your organisation's use of the system — which includes vendor-provided systems, but is not limited to them, and is not the vendor's meeting to run.
The AI operating meeting is not a governance committee. The governance committee (to the extent it should exist at all — see our previous essay on this topic) handles escalated decisions and policy questions. The operating meeting handles operational data and operational decisions. The two are connected — the operating meeting is the escalation path to the governance committee — but they are not the same.
The AI operating meeting is an operational review. It is the AI equivalent of the weekly sales pipeline review, the monthly financial close meeting, or the daily operations stand-up. It is the meeting where the people responsible for the AI systems look at the data, identify what is working and what is not, and make the operational decisions required to keep the systems performing.
Five roles, not five departments.
The meeting should have five roles represented. Not five departments — five roles. The distinction matters because in a mid-market organisation, one person may hold multiple roles, and the meeting should not be larger than it needs to be.
The AI program owner is the executive sponsor — the person who is accountable for the AI program at the organisational level. They chair the meeting. They have the authority to escalate to the governance committee or to the board. In a two-hundred-person company, this is typically the COO or the CTO.
The named system owner (or owners, if multiple systems are being reviewed) is the person who is operationally responsible for a specific AI system. They bring the performance data. They know what changed since the last meeting. They own the action items that come out of the review.
The risk or compliance lead is the person who tracks the regulatory and policy environment. They flag when a system's behaviour may be creating compliance exposure. They own the governance actions section of the agenda.
The technical lead is the person who understands the model — how it works, what its known failure modes are, and what changes to the system would require. They interpret the metrics and explain anomalies.
A recorder takes the minutes and tracks action items. This is not a junior role — the minutes are the governance record of the meeting, and they need to be accurate and complete.
Segment by segment.
The agenda is structured in five segments. The times are not suggestions — they are constraints. A meeting that runs over is a meeting that will not be repeated.
Minutes 0–5: Housekeeping and context. The chair opens the meeting, confirms the agenda, and notes any significant external context — a regulatory development, a vendor announcement, an industry incident — that is relevant to the review. This segment is not a discussion. It is a briefing.
Minutes 5–20: Performance metrics review. The named system owners present the performance data for each system under review. The data is pre-circulated — the meeting is not the place to read the numbers for the first time. The discussion focuses on anomalies: what changed since the last meeting, what is trending in the wrong direction, and what requires action. The technical lead interprets anomalies. The risk lead flags any metrics that are approaching policy thresholds.
Minutes 20–35: Incident review. Any AI-related incidents since the last meeting are reviewed. An incident is any event in which an AI system produced an output that was incorrect, unexpected, or harmful — including near-misses that were caught before they caused harm. The review covers: what happened, what the root cause was, what the immediate response was, and what the systemic fix is. The blameless post-mortem format applies: the goal is learning, not accountability.
Minutes 35–50: Pipeline review. Any changes to AI systems that are planned or in progress are reviewed. This includes model updates, data pipeline changes, new deployments, and retirements. The risk lead assesses whether any planned changes require governance committee review. The technical lead confirms readiness. The program owner approves or defers.
Minutes 50–60: Governance actions and close. The risk lead presents any governance actions — policy updates, regulatory developments, documentation requirements — that need to be addressed. Action items from the meeting are assigned with named owners and deadlines. The next meeting is confirmed.
The sixty-minute constraint is not arbitrary. It is the constraint that keeps the meeting operational. When meetings run long, they become strategy meetings. When they become strategy meetings, the operational data stops being reviewed. When the operational data stops being reviewed, the problems accumulate. — Field note, operating cadence review
What to put on the dashboard.
The performance metrics review is only as good as the metrics being reviewed. The wrong metrics — or too many metrics — produce a meeting that generates noise without signal. The right metrics are the ones that tell you, quickly, whether each system is performing within acceptable parameters.
For most AI systems in mid-market operations, the core metrics are four.
Accuracy drift measures whether the model's predictions are becoming less accurate over time. All models drift — the question is how fast and whether the drift is within acceptable bounds. Accuracy drift is typically measured against a held-out validation set that is updated periodically to reflect current real-world conditions. A drift of more than five percentage points from baseline is a trigger for review.
False positive rate measures the proportion of cases in which the model flags something as positive that is actually negative. In a customer service triage tool, a false positive is a case routed to a human that the AI could have handled. In a fraud detection model, a false positive is a legitimate transaction flagged as fraudulent. The acceptable false positive rate depends on the cost of the error — which is why it needs to be defined before deployment, not after.
Escalation rate measures the proportion of cases that the AI cannot handle and routes to a human. A rising escalation rate is an early indicator of model drift or data distribution shift — the model is encountering inputs it was not trained to handle. It is also a direct operational cost: escalated cases require human time.
Cost per query measures the total operational cost of running the AI system divided by the number of queries it processes. This metric catches infrastructure cost creep — as models are updated or as query volume changes, the cost per query can drift significantly from the original business case.
How often is often enough?
The right cadence depends on the risk tier of the systems being reviewed and the maturity of the governance program.
For newly deployed systems — systems in their first ninety days of production — the cadence should be weekly. New systems are most likely to encounter distribution shift, edge cases, and integration issues in the first ninety days. Weekly review catches these problems before they compound.
For High-risk systems in stable operation — systems that influence decisions about people, or that are subject to regulatory oversight — the cadence should be bi-weekly. High-risk systems warrant more frequent review even when they are performing within parameters, because the consequences of undetected drift are more severe.
For Medium and Low-risk systems in stable operation, monthly review is appropriate. The monthly review should still use the same agenda structure — it is a shorter meeting, not a different meeting.
The cadence should be adjusted upward — more frequent — when a system is undergoing significant change, when a relevant regulatory development has occurred, or when a metric has crossed a threshold that requires closer monitoring. It should not be adjusted downward below monthly for any system in production.
The governance record.
The minute from the AI operating meeting is not a transcript. It is a structured record of: the systems reviewed, the metrics discussed, the incidents reviewed, the pipeline changes approved or deferred, the governance actions assigned, and the action items with named owners and deadlines.
The minute should be one page. It should be distributed to the attendees and to the board observer within twenty-four hours of the meeting. It should be stored in the AI registry alongside the registry entries for the systems reviewed.
Read across a year, the stack of minutes is the organisation's actual governance record — not the policy document, not the committee charter, but the record of what was actually reviewed, what was actually decided, and what was actually done. This is the record that survives a regulatory inquiry. It is also the record that allows the organisation to learn from its own history.
The most common failure mode in operating cadences: the meeting runs, decisions are made, action items are assigned verbally — and then no one writes the minute. Three weeks later, no one can remember what was decided. The governance record does not exist.
The minute is written by the recorder and sits in their personal drive. The board observer never sees it. The risk lead does not have access to it. When the regulator asks for governance records, the minute cannot be found.
The first meeting.
If your organisation has not run an AI operating meeting before, the first meeting is different from subsequent meetings. The first meeting is a baseline exercise: you are not reviewing changes since the last meeting, because there was no last meeting. You are establishing the baseline.
The first meeting agenda: spend the first twenty minutes reviewing the AI registry — if you have one — and confirming that it is current. If you do not have a registry, spend the first twenty minutes building a draft. Spend the next twenty minutes establishing the baseline metrics for each system: what are the current accuracy, false positive rate, escalation rate, and cost per query? These numbers, recorded in the minute, become the baseline against which future meetings measure drift. Spend the final twenty minutes establishing the governance actions: what documentation is missing, what reviews are overdue, what changes are pending?
The first meeting will not be clean. The data will be incomplete. The metrics will not all be available. That is correct. The first meeting is not the destination — it is the starting line. By the third meeting, the data will be more complete. By the sixth, the meeting will run to time and produce a clean minute. By the twelfth, the operating cadence will be the most reliable governance instrument your organisation has.
That is the whole argument for the AI operating meeting. Not that it is sophisticated — it is not. Not that it requires significant investment — it does not. But that it is the mechanism by which the organisations that operate AI well differ from the organisations that do not. The difference is not the technology. It is the discipline of looking at the data, every week, with the people who can act on it.
Sixty minutes. Five people. Every week. That is the operating model.
