Skip to main content
Janet Bastiman Napier AI
8 min

Regulated AI: Explainability, Oversight, and Defensible Financial Crime Decisions

Posted by Picture of Sam Kendall Sam Kendall

In regulated financial crime work, “good enough” AI is rarely good enough - you need decisions you can defend.

Dr Janet Bastiman is Chief Data Scientist at Napier AI, where she leads data science work supporting anti-money laundering and financial crime compliance.

She focuses on translating complex models into outcomes teams can audit, explain, and improve.

Regulation is pushing AI out of the “black box” era. The practical response is explainable workflows, proportionate oversight, and an evidence trail that stands up under challenge.

You can watch this video on YouTube or listen to the interview on Spotify, or listen on Apple Podcasts.

Created from episode transcript

Why Regulated AI Has Become a Board-Level Topic

Financial crime teams adopted machine learning early because transaction monitoring at scale is beyond human-sized work.

What has changed is that models play a greater role in shaping who gets investigated, who gets delayed, and who gets treated as higher risk.

Regulation Cares About Outcomes and Evidence

The EU AI Act sets higher expectations where AI can materially affect people’s lives.

In the UK, the government’s pro-innovation approach still expects sector regulators to enforce accountability, documentation, and control.

“AI in compliance” is high stakes: a wrong decision can mean missed criminal activity, or the wrong person treated as suspicious.

Global standard-setters are also watching the threat landscape shift, including AI-enabled tactics such as deepfakes, which can complicate identity and verification controls.

The FATF’s horizon scan on AI and deepfakes shows model governance and fraud resilience converging in the same operational workflows.

The FCA’s AI sandbox work with NVIDIA shows how quickly experimentation is moving in regulated markets, which raises the bar on oversight as well as innovation.

Model Risk Management Is Spreading Beyond Traditional Use Cases

Organisations dealing with important areas of our lives need ownership, validation, monitoring, and the ability to challenge model outputs.

The PRA’s SS1/23 model risk management principles make the same point for banks: model risk is a governance and technical problem.

Those disciplines are increasingly relevant when the “model” is a workflow that mixes rules, classifiers, and generative components.

Explainability That Helps a Financial Crime Analyst

“Explainability” becomes real when a case is reviewed and written up.

An analyst needs clarity they can use, and evidence they can verify. An auditor might need reasoning to look back on a year down the line.

Start With the Question, Not the Shiniest Model

Explainability largely starts with matching the tool to the task.

"Not every DIY task in your house needs a power drill. It's definitely a case of 'the right tool for the job'."

Dr Janet Bastiman, Chief Data Scientist, Napier AI

LLMs are strong at summarising unstructured information, but they are poor substitutes for classification models in review-or-discount workflows.

If the goal is “review or discount”, as in many AML workflows, you need a model designed for classification and engineered to show why it reached its conclusion.

Make Explanations Evidence-Led

When a model reduces false positives, the success metric is whether reviewers can trust the reasons for discounting a potential issue, especially when those decisions may later be challenged.

The practical test is straightforward: would you accept this explanation from a human analyst?

In AML settings, explainability usually needs four things:

  • How confident the system is - and when it is uncertain.
  • The key evidence points that drove the decision.
  • Links back to source records that can be checked quickly.
  • Plain language that fits into an audit narrative.

This lines up with the ICO guidance on explaining decisions made with AI, emphasising transparency and accountability in ways people can act on.

Watch for “Models in a Trench Coat”

Teams are exploring agentic workflows: multiple models chained together, sometimes with automated actions attached.

Some of these systems may actually be “lots of different AI models sort of in a trench coat”, and some of them may be what we would think of as agentic.

When a decision is produced by a chain, explainability must cover the full chain, including every step that shaped the final answer.

A defensible AI decision needs a record someone else can follow without reopening the whole model.

That is why evidence links, version history, and review notes matter as much as the headline score or alert outcome.

Oversight That Holds Up When Volumes Are High

Human oversight is essential, but simply stating “human in the loop” does little on its own.

The control is whether review, escalation, and challenge still work when the queue is busy.

Scale Review Like a Regulated Process

Model use in financial crime should follow the same pattern institutions already use for high-impact human decision-making.

Most firms have second-line and third-line reviews for high-impact work.

AI-assisted work should follow the same pattern, with controls that are proportionate to risk.

"If you failed your driving test, you didn't hallucinate all of the bits you got wrong - you got them wrong."

Dr Janet Bastiman, Chief Data Scientist, Napier AI

Calling errors “hallucinations” can make them sound mystical and unmanageable, when most failures are plain errors: wrong evidence, wrong reasoning, or wrong context.

And they should be addressed as such. A workable oversight design often includes:

  • Tiered review: sample low-risk decisions, and require mandatory review for high-impact outcomes.
  • Rotating case sampling to detect drift and bias early.
  • Clear override routes, so analysts can challenge a model and record why.

Frameworks such as the NIST AI Risk Management Framework help by treating AI as a socio-technical system.

"When a firm needs to defend an automated decision, the technical question is whether it can reconstruct what happened: which system ran, on what data, with what output, and who reviewed it."

Michael Wakefield, CTO, Beyond Encryption (Mailock)

The same standard applies wherever automated decisions affect customers: the record has to survive a busy queue and a later challenge.

Bias Is Not a Side Issue in AML Models

AML is full of patterns that look “suspicious” but are actually normal life.

That is why model development needs diverse input from the teams behind them.

Representative Data Is a Safety Feature

People on multiple zero-hours contracts can have cash-flow patterns that resemble pass-through money laundering.

“Money in is quickly money out” can reflect financial stress, not criminal intent.

 

Interested In Risk-Aware Identity Checks?

Discover how AssureScore uses trust signals to support more proportionate identity challenges in digital interactions.

Explore AssureScore

Teams can combine engineering and governance choices to reduce avoidable bias:

  • Test whether training data under-represents certain locations, segments, or behaviours.
  • Where sensitive features are not valid predictors, remove access to them.
  • Use synthetic data carefully to improve coverage, then validate for realism and leakage risks.

This reflects the UK’s Data and AI Ethics Framework, particularly its focus on accountability and challenge.

The Defensible Decision Trail: What Good Looks Like

If regulators ask “why did you do that?”, an AI programme succeeds or fails on documentation as much as accuracy.

The goal is to reconstruct a decision and show it was reasonable, controlled, and monitored.

Audit the Pipeline, Not Only the Outcome

A useful starting point is an implementation audit: data in, outputs out, and how results were scored and reviewed.

In production, organisations should track which model version ran, what it was trained on, and how performance changed each time it retrained.

For complex systems, add traceability across the chain: at a given timestamp, which model IDs ran, on which data, and in what order.

What Good Traceability Looks Like

Inputs, model versions, timestamps, review actions, and evidence links should let a team replay a decision without guessing what the system did at the time.

This is also how you respond when something goes wrong in a way that can lead to logical and proportionate action.

If bad input data poisons a retraining cycle, you need to isolate impact, roll back, and remediate quickly.

What To Ask Before You Scale AI in Financial Crime

Defensibility improves when the questions are embedded in governance early.

  • Can we explain the decision? Focus on why this case moved in this direction and what evidence supported it.
  • Can we prove the evidence? Explanations should link back to source records, not invented references.
  • Could we replay it? Model IDs, timestamps, training data, and performance logs should make this possible.

The simplest test is this: if you had to defend the decision to a regulator tomorrow, would you be comfortable with the record you have today?

 

FAQs

What Does “Explainable AI” Mean in AML?

It means showing confidence, the key drivers behind a decision, and links back to evidence so investigators can verify and write up the rationale.

Is a Large Language Model Enough for Financial Crime Decisioning?

LLMs can help with summarising and triage, but review or discount decisions usually need models designed for classification and engineered for audit and traceability.

How Do You Add Human Oversight without Slowing Everything Down?

Use tiered controls: sample low-risk decisions, require mandatory review for high-impact outcomes, and rotate case sampling to detect drift early.

How Do You Make AI Decisions Defensible to Regulators?

Record inputs, model versions, timestamps, outputs, review actions, and evidence used, then monitor performance over time and be able to roll back safely.

 

References

Dr Janet Bastiman, LinkedIn profile

Napier AI, company website

Regulation (EU) 2024/1689 (Artificial Intelligence Act), European Union, 2024

A Pro-Innovation Approach to AI Regulation: Government Response, UK Government, 2024

SS1/23: Model Risk Management Principles for Banks, Prudential Regulation Authority, 2023

Explaining Decisions Made with AI, Information Commissioner's Office, 2020

Artificial Intelligence Risk Management Framework (AI RMF 1.0), National Institute of Standards and Technology, 2023

Data and AI Ethics Framework, UK Government, 2025

Horizon Scan: AI and Deepfakes, Financial Action Task Force, 2025

FCA Allows Firms to Experiment with AI Alongside NVIDIA, Financial Conduct Authority, 2025

Regulated AI: Explainability, Oversight, and Defensible Financial Crime Decisions, Dr Janet, Napier AI (#35), Apple Podcasts, 2026

Reviewed by

Sam Kendall, 25.05.2026

This content is for general information only and is not legal advice.

 

Originally posted on 03 03 26
Last updated on June 5, 2026

Posted by:  Sam Kendall

Sam Kendall works on digital marketing at Beyond Encryption, helping build B2B marketing activity around research, first principles, and sustainable growth. He writes about marketing effectiveness, positioning, customer communications, and digital culture, with longer-form work published at ATNL.

Return to listing