Janet Bastiman Napier AI
7 min

Regulated AI: Explainability, Oversight, and Defensible Financial Crime Decisions

Posted by Picture of Sam Kendall Sam Kendall

In regulated financial crime work, “good enough” AI is rarely good enough - you need decisions you can defend.

Dr Janet Bastiman is Chief Data Scientist at Napier AI, where she leads data science work supporting anti-money laundering and financial crime compliance.

She focuses on translating complex models into outcomes teams can audit, explain, and improve.

Regulation is pushing AI out of the “black box” era. This involves practical solutions like explainable workflows, proportionate oversight, and an evidence trail that stands up under challenge.

You can watch this video on YouTube or listen to the interview on Spotify.

Why Regulated AI Has Become a Board-Level Topic

Financial crime teams adopted machine learning early because transaction monitoring at scale is beyond human-sized work.

What's now changed is that models play a greater role in shaping who gets investigated, who gets delayed, and who gets treated as higher risk.

Regulation Cares About Outcomes, Not Just Models

The EU AI Act sets higher expectations where AI can materially affect people’s lives.

In the UK, the government’s pro-innovation approach still expects sector regulators to enforce accountability, documentation, and control.

That matters because “AI in compliance” is high stakes.

A wrong decision can mean missed criminal activity, or it can mean the wrong person is treated as suspicious.

Global standard-setters are also watching the threat landscape shift, including AI-enabled tactics such as deepfakes, which can complicate identity and verification controls.

The FATF’s horizon scan on AI and deepfakes is a useful reminder that model governance and fraud resilience are converging problems.

Model Risk Management Is Spreading Beyond Traditional Use Cases

Organisations dealing with important areas of our lives need ownership, validation, monitoring, and the ability to challenge model outputs.

Those disciplines are increasingly relevant when the “model” is a workflow that mixes rules, classifiers, and generative components.

Explainability That Helps a Financial Crime Analyst

“Explainability” becomes real when a case is reviewed and written up.

An analyst needs clarity they can use, and evidence they can verify. An auditor might need reasoning to look back on a year down the line.

Start With the Question, Not the Shiniest Model

Janet says explainability largely starts with matching the tool to the task.

"Not every DIY task in your house needs a power drill. It's definitely a case of 'the right tool for the job'."

Dr Janet Bastiman, Chief Data Scientist, Napier AI

LLMs are strong at summarising unstructured information, but they are not universal decision engines.

If the goal is “review or discount”, as in many AML workflows, you need a model designed for classification and engineered to show why it reached its conclusion.

Make Explanations Evidence-Led

When a model reduces false positives, the success metric is not just “fewer alerts”.

The real question is whether reviewers can trust the reasons for discounting a potential issue, especially when those decisions may later be challenged.

Janet says the practical test has to be: would you accept this explanation from a human analyst?

In AML settings, explainability usually needs four things:

  • How confident the system is - and when it is uncertain.
  • The key evidence points that drove the decision.
  • Links back to source records that can be checked quickly.
  • Plain language that fits into an audit narrative.

This lines up with the ICO guidance on explaining decisions made with AI, emphasing transparency and accountability in ways people can act on.

Watch for “Models in a Trench Coat”

Teams are exploring agentic workflows: multiple models chained together, sometimes with automated actions attached.

Janet jokes that some of these systems may actually be “lots of different AI models sort of in a trench coat”, and some of them may be what we would think of as agentic.

But whether it’s one or the other, when a decision is produced by a chain, explainability must cover the chain, not just the final answer.

Oversight That Holds Up When Volumes Are High

Human oversight is essential, but simply stating “human in the loop” is not a control.

The control is whether review, escalation, and challenge still work when the queue is busy.

Scale Review Like a Regulated Process

Janet draws a direct parallel between model use and how institutions already manage human decision-making.

Most firms have second-line and third-line reviews for high-impact work.

AI-assisted work should follow the same pattern, with controls that are proportionate to risk.

"If you failed your driving test, you didn't hallucinate all of the bits you got wrong - you got them wrong."

Dr Janet Bastiman, Chief Data Scientist, Napier AI

Calling errors “hallucinations” can make them sound mystical and unmanageable, when most failures are plain errors: wrong evidence, wrong reasoning, or wrong context.

And they should be addressed as such. A workable oversight design often includes:

  • Tiered review: sample low-risk decisions, and require mandatory review for high-impact outcomes.
  • Rotating case sampling to detect drift and bias early.
  • Clear override routes, so analysts can challenge a model and record why.

Frameworks such as the NIST AI Risk Management Framework help by treating AI as a socio-technical system.

Bias Is Not a Side Issue in AML Models

AML is full of patterns that look “suspicious” but are actually normal life.

That is why model development needs diverse input from the teams behind them.

Representative Data Is a Safety Feature

Janet gives an example: people on multiple zero-hours contracts can have cash-flow patterns that resemble pass-through money laundering.

“Money in is quickly money out” can reflect financial stress, not criminal intent.

Teams can combine engineering and governance choices to reduce avoidable bias:

  • Test whether training data under-represents certain locations, segments, or behaviours.
  • Where sensitive features are not valid predictors, remove access to them.
  • Use synthetic data carefully to improve coverage, then validate for realism and leakage risks.

This reflects the UK’s Data and AI Ethics Framework, particularly its focus on accountability and challenge.

The Defensible Decision Trail: What Good Looks Like

If regulators ask “why did you do that?”, an AI programme succeeds or fails on documentation as much as accuracy.

The goal is to reconstruct a decision and show it was reasonable, controlled, and monitored.

Audit the Pipeline, Not Only the Outcome

Janet describes starting with an implementation audit: data in, outputs out, and how results were scored and reviewed.

In production, organisations should track which model version ran, what it was trained on, and how performance changed each time it retrained.

For complex systems, add traceability across the chain: at a given timestamp, which model IDs ran, on which data, and in what order.

This is also how you respond when something goes wrong in a way that can lead to logical and proportionate action.

If bad input data poisons a retraining cycle, you need to isolate impact, roll back, and remediate quickly.

What To Ask Before You Scale AI in Financial Crime

Defensibility improves when the questions are embedded in governance early.

  • Can we explain the decision? Not how the model works, but why this case moved in this direction.
  • Can we prove the evidence? Explanations should link back to source records, not invented references.
  • Could we replay it? Model IDs, timestamps, training data, and performance logs should make this possible.

The simplest test is this: if you had to defend the decision to a regulator tomorrow, would you be comfortable with the record you have today?

 

FAQs

What Does “Explainable AI” Mean in AML?

It means showing confidence, the key drivers behind a decision, and links back to evidence so investigators can verify and write up the rationale.

Is a Large Language Model Enough for Financial Crime Decisioning?

LLMs can help with summarising and triage, but review or discount decisions usually need models designed for classification and engineered for audit and traceability.

How Do You Add Human Oversight Without Slowing Everything Down?

Use tiered controls: sample low-risk decisions, require mandatory review for high-impact outcomes, and rotate case sampling to detect drift early.

How Do You Make AI Decisions Defensible to Regulators?

Record inputs, model versions, timestamps, outputs, review actions, and evidence used, then monitor performance over time and be able to roll back safely.

Just email it (securely)! CTA

 

References

Regulation (EU) 2024/1689 (Artificial Intelligence Act), European Union, 2024

A Pro-Innovation Approach to AI Regulation: Government Response, UK Government, 2024

SS1/23: Model Risk Management Principles for Banks, Prudential Regulation Authority, 2023

Explaining Decisions Made with AI, Information Commissioner's Office, 2020

Artificial Intelligence Risk Management Framework (AI RMF 1.0), National Institute of Standards and Technology, 2023

Data and AI Ethics Framework, UK Government, 2025

Horizon Scan: AI and Deepfakes, Financial Action Task Force, 2025

FCA Allows Firms to Experiment with AI Alongside NVIDIA, Financial Conduct Authority, 2025

 

Reviewed by

Sam Kendall, 04.02.2026

 

03 03 26

Posted by: Sam Kendall

Sam Kendall is a marketing strategist with over a decade of experience working on how organisations communicate with people through digital channels. At Beyond Encryption, he leads digital marketing, collaborating closely with product and sales on secure, trustworthy customer communications. His work is grounded in research, buying behaviour, and practical experience, with a focus on clarity, consistency, and long-term effectiveness rather than short-term tactics.

Return to listing