In regulated financial crime work, “good enough” AI is rarely good enough - you need decisions you can defend.
Dr Janet Bastiman is Chief Data Scientist at Napier AI, where she leads data science work supporting anti-money laundering and financial crime compliance.
She focuses on translating complex models into outcomes teams can audit, explain, and improve.
Regulation is pushing AI out of the “black box” era. This involves practical solutions like explainable workflows, proportionate oversight, and an evidence trail that stands up under challenge.
You can watch this video on YouTube or listen to the interview on Spotify.
Why Regulated AI Has Become a Board-Level Topic
Financial crime teams adopted machine learning early because transaction monitoring at scale is beyond human-sized work.
What's now changed is that models play a greater role in shaping who gets investigated, who gets delayed, and who gets treated as higher risk.
Regulation Cares About Outcomes, Not Just Models
The EU AI Act sets higher expectations where AI can materially affect people’s lives.
In the UK, the government’s pro-innovation approach still expects sector regulators to enforce accountability, documentation, and control.
That matters because “AI in compliance” is high stakes.
A wrong decision can mean missed criminal activity, or it can mean the wrong person is treated as suspicious.
Global standard-setters are also watching the threat landscape shift, including AI-enabled tactics such as deepfakes, which can complicate identity and verification controls.
The FATF’s horizon scan on AI and deepfakes is a useful reminder that model governance and fraud resilience are converging problems.
Model Risk Management Is Spreading Beyond Traditional Use Cases
Organisations dealing with important areas of our lives need ownership, validation, monitoring, and the ability to challenge model outputs.
Those disciplines are increasingly relevant when the “model” is a workflow that mixes rules, classifiers, and generative components.
Explainability That Helps a Financial Crime Analyst
“Explainability” becomes real when a case is reviewed and written up.
An analyst needs clarity they can use, and evidence they can verify. An auditor might need reasoning to look back on a year down the line.
Start With the Question, Not the Shiniest Model
Janet says explainability largely starts with matching the tool to the task.
"Not every DIY task in your house needs a power drill. It's definitely a case of 'the right tool for the job'."
Dr Janet Bastiman, Chief Data Scientist, Napier AI
LLMs are strong at summarising unstructured information, but they are not universal decision engines.
If the goal is “review or discount”, as in many AML workflows, you need a model designed for classification and engineered to show why it reached its conclusion.
Make Explanations Evidence-Led
When a model reduces false positives, the success metric is not just “fewer alerts”.
The real question is whether reviewers can trust the reasons for discounting a potential issue, especially when those decisions may later be challenged.
Janet says the practical test has to be: would you accept this explanation from a human analyst?
In AML settings, explainability usually needs four things:
How confident the system is - and when it is uncertain.
The key evidence points that drove the decision.
Links back to source records that can be checked quickly.
Plain language that fits into an audit narrative.
This lines up with the ICO guidance on explaining decisions made with AI, emphasing transparency and accountability in ways people can act on.
Watch for “Models in a Trench Coat”
Teams are exploring agentic workflows: multiple models chained together, sometimes with automated actions attached.
Janet jokes that some of these systems may actually be “lots of different AI models sort of in a trench coat”, and some of them may be what we would think of as agentic.
But whether it’s one or the other, when a decision is produced by a chain, explainability must cover the chain, not just the final answer.
Oversight That Holds Up When Volumes Are High
Human oversight is essential, but simply stating “human in the loop” is not a control.
The control is whether review, escalation, and challenge still work when the queue is busy.
Scale Review Like a Regulated Process
Janet draws a direct parallel between model use and how institutions already manage human decision-making.
Most firms have second-line and third-line reviews for high-impact work.
AI-assisted work should follow the same pattern, with controls that are proportionate to risk.
"If you failed your driving test, you didn't hallucinate all of the bits you got wrong - you got them wrong."
Dr Janet Bastiman, Chief Data Scientist, Napier AI
Calling errors “hallucinations” can make them sound mystical and unmanageable, when most failures are plain errors: wrong evidence, wrong reasoning, or wrong context.
And they should be addressed as such. A workable oversight design often includes:
Tiered review: sample low-risk decisions, and require mandatory review for high-impact outcomes.
Rotating case sampling to detect drift and bias early.
Clear override routes, so analysts can challenge a model and record why.
The Defensible Decision Trail: What Good Looks Like
If regulators ask “why did you do that?”, an AI programme succeeds or fails on documentation as much as accuracy.
The goal is to reconstruct a decision and show it was reasonable, controlled, and monitored.
Audit the Pipeline, Not Only the Outcome
Janet describes starting with an implementation audit: data in, outputs out, and how results were scored and reviewed.
In production, organisations should track which model version ran, what it was trained on, and how performance changed each time it retrained.
For complex systems, add traceability across the chain: at a given timestamp, which model IDs ran, on which data, and in what order.
This is also how you respond when something goes wrong in a way that can lead to logical and proportionate action.
If bad input data poisons a retraining cycle, you need to isolate impact, roll back, and remediate quickly.
What To Ask Before You Scale AI in Financial Crime
Defensibility improves when the questions are embedded in governance early.
Can we explain the decision? Not how the model works, but why this case moved in this direction.
Can we prove the evidence? Explanations should link back to source records, not invented references.
Could we replay it? Model IDs, timestamps, training data, and performance logs should make this possible.
The simplest test is this: if you had to defend the decision to a regulator tomorrow, would you be comfortable with the record you have today?
FAQs
What Does “Explainable AI” Mean in AML?
It means showing confidence, the key drivers behind a decision, and links back to evidence so investigators can verify and write up the rationale.
Is a Large Language Model Enough for Financial Crime Decisioning?
LLMs can help with summarising and triage, but review or discount decisions usually need models designed for classification and engineered for audit and traceability.
How Do You Add Human Oversight Without Slowing Everything Down?
Use tiered controls: sample low-risk decisions, require mandatory review for high-impact outcomes, and rotate case sampling to detect drift early.
How Do You Make AI Decisions Defensible to Regulators?
Record inputs, model versions, timestamps, outputs, review actions, and evidence used, then monitor performance over time and be able to roll back safely.
Sam Kendall is a marketing strategist with over a decade of experience working on how organisations communicate with people through digital channels. At Beyond Encryption, he leads digital marketing, collaborating closely with product and sales on secure, trustworthy customer communications. His work is grounded in research, buying behaviour, and practical experience, with a focus on clarity, consistency, and long-term effectiveness rather than short-term tactics.