Skip to main content
Garry Evans & Adrian Crean
9 min

From Sampling to Scale: Using AI to Review More Conversations With Less Risk

Posted by Picture of Paul Holland Paul Holland

For regulated firms, narrow sampling can leave blind spots around customer outcomes, conduct risk, and emerging patterns that deserve attention.

Garry Evans, Chief Product & Commercial Officer, and Adrian Crean, Strategic Partnerships Director, explain how TCC GroupRecordsure is helping regulated firms move from narrow sampling to broader, faster, and more defensible oversight.

Watch the full episode above, or listen on Spotify or Apple Podcasts.

If firms want stronger evidence of customer outcomes, better visibility of conduct risk, and fewer unpleasant surprises when rules or scrutiny change, they need a way to review far more conversations and documents than manual teams can handle on their own.

Created from episode transcript

Why Small Samples Stop Working

Many firms still review only a small fraction of calls, files, and customer interactions, even though the Consumer Duty expects them to act to deliver good outcomes and keep improving where harm could emerge.

That gap can lead to compliance risks and cases falling through the cracks.

It could be a disclosure handled poorly, a vulnerable customer not supported properly, or a product journey that technically completes but leaves the customer confused.

Random sampling can mean key issues get missed and outcomes stray far from ideal.

Sampling Is Not the Same as Evidence

In practice, some firms review only a very small proportion of interactions despite dealing with very large monthly volumes.

Small samples can spot anecdotes. They struggle to prove themes.

The FCA has reinforced this point in its work on outcomes monitoring, where it expects firms to assess, test, understand, and evidence the outcomes customers receive, not simply rely on static management information or narrow reviews.

What The FCA Expects On Outcomes Monitoring

Firms should assess, test, understand, and evidence the outcomes customers receive, rather than rely on static management information or narrow reviews alone.

That standard is hard to meet when review teams only see a thin slice of the contact and file data sitting in the business.

Retrospective Reviews Become Expensive for a Reason

There is also a second problem that regulated firms know well.

When oversight is weak in normal times, remediation becomes painfully expensive later.

One large remediation exercise was initially expected to run for years, but was completed far more quickly once AI-led review was applied to the groundwork.

If firms cannot search, classify, and prioritise historic evidence at scale, every retrospective review becomes slower, costlier, and harder to defend.

"Make sure you're doing this ongoing so there are no surprises."

Adrian Crean, Strategic Partnerships Director, Recordsure

That ongoing discipline matters most when firms start deciding where AI fits in the review workflow itself.

What AI Is Actually Doing Here

It is easy to hear "AI" and think immediately about chatbots and generated summaries.

Recent Bank of England and FCA survey work shows AI adoption is already broad in UK financial services, especially in areas such as data analysis, fraud detection, operational efficiency, and controls.

Where AI Use Is Already Common

The Bank of England and FCA survey points to data analysis, fraud detection, operational efficiency, and controls as areas where adoption is already widespread.

That helps explain why conversation review is moving from niche tooling into mainstream oversight design.

Predictive AI Narrows the Haystack

Predictive AI and generative AI solve different problems. Predictive models are good at finding, classifying, extracting, and scoring.

In this workflow, that means identifying whether key topics were covered, surfacing the exact part of a call that needs review, or flagging a higher-risk case for human attention.

Manual call review is slow by design. A human reviewer can easily spend longer reviewing a call than the call itself takes, whereas topic identification can remove a large share of that effort by taking reviewers straight to the relevant passages.

The practical gain is better triage. Instead of asking skilled reviewers to spend most of their day confirming that nothing went wrong, firms can concentrate human time on the cases most likely to matter.

Generative AI Is Useful, but It Is Not a Control on Its Own

Generative AI can sound fluent without being reliably correct, which makes it a poor substitute for a controlled review workflow on its own.

That caution fits with the FCA's AI update and the ICO's guidance, which point to safe and responsible adoption, fairness, transparency, accountability, and effective risk management when firms use AI in regulated settings.

That is especially relevant where firms want dependable outputs from sensitive personal data, recorded calls, and customer files.

Large language models can be excellent for summarisation and productivity, but weaker when firms need deterministic extraction and confidence scoring on every data point.

Firms can still use generative AI productively. Using LLMs alone will require full human validation - using them on top of structured, confidence-scored data, which can only be delivered by predictive AI models, will be both more accurate and auditable.

The NIST AI Risk Management Framework reaches a similar conclusion in broader terms by treating trustworthiness, governance, and risk controls as design requirements rather than afterthoughts.

"When a firm uses AI to review customer conversations, the oversight question is the same as anywhere else: can it show which cases were reviewed, who checked them, and what happened next?"

Michael Wakefield, CTO, Beyond Encryption (Mailock)

Those controls become harder to ignore as Consumer Duty raises the standard of proof firms need around customer outcomes.

Why This Matters More Under Consumer Duty

Consumer Duty has raised the standard of proof for firms that want to say customers are getting good outcomes.

It is no longer enough to rely on a small stack of passed reviews, a dashboard that looks tidy, or a belief that front-line processes are probably working.

Outcome Testing Needs Broader Evidence

The FCA's publications on implementation and consumer support make that clear.

Firms are expected to identify issues that risk customer harm, act on them, and keep refining their approach as products, channels, and customer needs change.

Firms should not wait for a major remediation to discover what their historic data has been trying to tell them.

Reviewing more conversations makes oversight more representative, more timely, and more credible. It also strengthens the evidence base firms need under Consumer Duty and improves the speed and quality of review.

Vulnerability Often Appears in the Details

Many of the most important conduct signals are subtle.

These are not always obvious in headline metrics, yet they sit at the heart of fair treatment.

The FCA's guidance on vulnerable customers underlines why firms need to understand real customer experience as well as process completion.

 

Choosing The Right Customer Channel?

Read our research on portals, logins, email, and post before deciding how customers should receive important documents.

Read the customer preference research

Conversation analytics can help surface those moments faster, but the real value comes when firms combine that signal with human judgement and a clear escalation path.

What Good Looks Like in Practice

Scale alone is not the goal.

The goal is a review model that is faster and more defensible.

Build the Evidence Layer First

A good starting point is to build a reliable evidence layer from calls, meetings, and documents before asking a generative system to draft summaries or recommend actions.

That means prioritising transcription quality, topic detection, extraction, classification, confidence scoring, and auditability.

It also means knowing where the data came from, which models touched it, what level of confidence is attached to a result, and where a human should step in.

Bad data does not become safe just because the interface looks smart.

Keep Humans Where Judgement Matters

UK and European regulators continue to make clear that firms remain accountable for how AI is governed, overseen, and applied.

ESMA's statement on AI in investment services and the EBA's work on AI in banking both point back to familiar responsibilities around governance, oversight, fairness, and client outcomes.

In practice, that means using AI to shrink the review burden while keeping human accountability intact.

People still need to decide how rules are interpreted, how edge cases are handled, and what remediation follows when patterns of harm are found.

Design for Change, Not Just Today

Rules move, products change, and firms enter new markets.

That makes flexible review design more valuable than a single static model.

Where firms operate across languages, jurisdictions, or product lines, the goal is not to build one perfect universal check.

It is to create a review framework that can apply different lenses to the same interaction, whether that means a different regulatory checklist, a different customer journey, or a new risk theme that emerges later.

Checks Before Scaling Conversation Review

  • Can the firm search and classify historic calls and files by topic, risk, and outcome?
  • Does every AI-assisted result carry a confidence score and a clear human review route?
  • Can the review model adapt when rules, products, or jurisdictions change?

If firms can search and reinterpret their own evidence base quickly, they are in a much stronger position when scrutiny changes.

The Bigger Strategic Shift

The useful question for regulated firms is which AI belongs where, under what controls, and in support of which outcomes.

For regulated firms, the answer is unlikely to be a single model or a single workflow.

Predictive models find and structure evidence. Human reviewers test context and make judgements. Generative tools support summarisation, productivity, and next-step workflow once the underlying data is trustworthy.

That is a more grounded view than the current market noise, but it is also more useful.

It accepts that scale matters, that regulators want evidence, and that firms cannot reach that standard with manual sampling alone.

It also accepts that trust is earned in design.

For firms trying to review more conversations with less risk, AI works best as part of a stronger operating model.

 

FAQs

Why Do Small Review Samples Miss Customer Communication Issues?

Small samples can miss recurring friction, emotional signals, and vulnerable customer patterns that only appear across larger volumes of conversations.

How Can AI Support Conversation Review?

AI can help teams cluster themes, spot repeated issues, and review more interactions, while people remain responsible for judgement and action.

Why Does This Matter Under Consumer Duty?

Consumer Duty increases the need to evidence customer understanding, foreseeable harm, and whether communication journeys are working in practice.

 

References

PS22/9: A New Consumer Duty, Financial Conduct Authority, 2022

Consumer Duty Implementation: Good Practice and Areas for Improvement, Financial Conduct Authority, 2024

Insurance Multi-Firm Review of Outcomes Monitoring Under Consumer Duty, Financial Conduct Authority, 2024

Consumer Support Outcome: Good Practices and Areas for Improvement, Financial Conduct Authority, 2025

Guidance for Firms on the Fair Treatment of Vulnerable Customers, Financial Conduct Authority, 2021

Artificial Intelligence in UK Financial Services - 2024, Bank of England and Financial Conduct Authority, 2024

Artificial Intelligence (AI) Update - Further to the Government's Response to the AI White Paper, Financial Conduct Authority, 2024

Guidance on AI and Data Protection, Information Commissioner's Office, 2023

AI Risk Management Framework, National Institute of Standards and Technology, 2023

Public Statement on AI and Investment Services, European Securities and Markets Authority, 2024

Special Topic - Artificial Intelligence, European Banking Authority, 2024

Garry Evans, LinkedIn

Adrian Crean, LinkedIn

TCC Group, TCC Group

Recordsure, Recordsure

From Sampling to Scale: Using AI to Review More Conversations With Less Risk, TCC Group & Recordsure (#28), Apple Podcasts, 2026

From Sampling to Scale: Using AI to Review More Conversations With Less Risk, TCC Group & Recordsure (#28), Spotify, 2026

Reviewed by

Sam Kendall, 25.05.26

This content is for general information only and is not legal advice.

 

Originally posted on 12 05 26
Last updated on June 5, 2026

Posted by:  Paul Holland

Paul, CEO and Founder of Beyond Encryption, is an expert in digital identity, fintech, cybersecurity, and business. He developed Webline, a leading UK comparison engine, and now drives Mailock, Nigel, and AssureScore to help regulated businesses secure customer data.

Return to listing