Skip to main content
Garry Evans & Adrian Crean
8 min

From Sampling to Scale: Using AI to Review More Conversations With Less Risk

Posted by Picture of Paul Holland Paul Holland

For regulated firms, narrow sampling can leave blind spots around customer outcomes, conduct risk, and emerging patterns that deserve attention.

In this discussion, Garry Evans, Chief Product & Commercial Officer, and Adrian Crean, Strategic Partnerships Director, explain how TCC GroupRecordsure is helping regulated firms move from narrow sampling to broader, faster, and more defensible oversight.

The big idea is simple: if firms want stronger evidence of customer outcomes, better visibility of conduct risk, and fewer unpleasant surprises when rules or scrutiny change, they need a way to review far more conversations and documents than manual teams can handle on their own.

Why Small Samples Stop Working

Many firms still review only a small fraction of calls, files, and customer interactions, even though the Consumer Duty expects them to act to deliver good outcomes and keep improving where harm could emerge.

That gap can lead to compliance risks and cases falling through the cracks.

It could be a disclosure handled poorly, a vulnerable customer not supported properly, or a product journey that technically completes but leaves the customer confused.

But ultimately, randomly sampling can mean key issues get missed and outcomes stray far from ideal.

Sampling Is Not the Same as Evidence

Garry described examples where firms were reviewing only a very small proportion of interactions despite dealing with very large monthly volumes.

Small samples can spot anecdotes. They struggle to prove themes.

The FCA has reinforced this point in its work on outcomes monitoring, where it expects firms to assess, test, understand, and evidence the outcomes customers receive, not simply rely on static management information or narrow reviews.

Retrospective Reviews Become Expensive for a Reason

There is also a second problem that regulated firms know well.

When oversight is weak in normal times, remediation becomes painfully expensive later.

Garry described a large remediation exercise that was initially expected to run for years, but was completed far more quickly once AI-led review was applied to the groundwork.

If firms cannot search, classify, and prioritise historic evidence at scale, every retrospective review becomes slower, costlier, and harder to defend.

"Make sure you're doing this ongoing so there are no surprises."

Adrian Crean, Strategic Partnerships Director, Recordsure

What AI Is Actually Doing Here

It is easy to hear "AI" and think immediately about chatbots and generated summaries.

Recent Bank of England and FCA survey work shows AI adoption is already broad in UK financial services, especially in areas such as data analysis, fraud detection, operational efficiency, and controls, which helps explain why conversation review is moving from niche tooling into mainstream oversight design.

Predictive AI Narrows the Haystack

One of the most useful distinctions in the episode is the difference between predictive AI and generative AI.

Garry's point was that these systems solve different problems.

Predictive models are good at finding, classifying, extracting, and scoring.

In this workflow, that means identifying whether key topics were covered, surfacing the exact part of a call that needs review, or flagging a higher-risk case for human attention.

That matters because manual call review is slow by design.

Garry said a human reviewer can easily spend longer reviewing a call than the call itself takes, whereas topic identification can remove a large share of that effort by taking reviewers straight to the relevant passages.

The real gain is not magic automation. It is better triage.

Instead of asking skilled reviewers to spend most of their day confirming that nothing went wrong, firms can concentrate human time on the cases most likely to matter.

Generative AI Is Useful, but It Is Not a Control on Its Own

The other half of the discussion is a timely warning against using generative AI as though it were automatically reliable because it sounds fluent.

That caution fits with the FCA's AI update and the ICO's guidance, which point to safe and responsible adoption, fairness, transparency, accountability, and effective risk management when firms use AI in regulated settings.

That is especially relevant where firms want dependable outputs from sensitive personal data, recorded calls, and customer files.

Garry argued that large language models can be excellent for summarisation and productivity, but weaker when firms need deterministic extraction and confidence scoring on every data point.

The practical takeaway is not that firms should avoid generative AI.

Using LLMs alone will require 100% human validation- using it on top of structured confidence scored data, which can only be delivered by predictive AI models, will be both more accurate and auditable.

The NIST AI Risk Management Framework reaches a similar conclusion in broader terms by treating trustworthiness, governance, and risk controls as design requirements rather than afterthoughts.

Why This Matters More Under Consumer Duty

Consumer Duty has raised the standard of proof for firms that want to say customers are getting good outcomes.

It is no longer enough to rely on a small stack of passed reviews, a dashboard that looks tidy, or a belief that front-line processes are probably working.

Outcome Testing Needs Broader Evidence

The FCA's publications on implementation and consumer support make that clear.

Firms are expected to identify issues that risk customer harm, act on them, and keep refining their approach as products, channels, and customer needs change.

That aligns closely with the episode's argument that firms should not wait for a major remediation to discover what their historic data has been trying to tell them.

Reviewing more conversations is not just an efficiency play. It is a way to make oversight more representative, more timely, and more credible.

Vulnerability Often Appears in the Details

It also matters because many of the most important conduct signals are subtle.

These are not always obvious in headline metrics, yet they sit at the heart of fair treatment.

The FCA's guidance on vulnerable customers underlines why firms need to understand real customer experience, not just process completion.

Conversation analytics can help surface those moments faster, but the real value comes when firms combine that signal with human judgement and a clear escalation path.

What Good Looks Like in Practice

Scale alone is not the goal.

What matters is designing a review model that is faster and more defensible.

Build the Evidence Layer First

A good starting point is to build a reliable evidence layer from calls, meetings, and documents before asking a generative system to draft summaries or recommend actions.

That means prioritising transcription quality, topic detection, extraction, classification, confidence scoring, and auditability.

It also means knowing where the data came from, which models touched it, what level of confidence is attached to a result, and where a human should step in.

Bad data does not become safe just because the interface looks smart.

Keep Humans Where Judgement Matters

UK and European regulators continue to make clear that firms remain accountable for how AI is governed, overseen, and applied.

ESMA's statement on AI in investment services and the EBA's work on AI in banking both point back to familiar responsibilities around governance, oversight, fairness, and client outcomes.

In practice, that means using AI to shrink the review burden, not to remove human accountability.

People still need to decide how rules are interpreted, how edge cases are handled, and what remediation follows when patterns of harm are found.

Design for Change, Not Just Today

One of the smarter points in the conversation is that rules move, products change, and firms enter new markets.

That makes flexible review design more valuable than a single static model.

Where firms operate across languages, jurisdictions, or product lines, the goal is not to build one perfect universal check.

It is to create a review framework that can apply different lenses to the same interaction, whether that means a different regulatory checklist, a different customer journey, or a new risk theme that emerges later.

If firms can search and reinterpret their own evidence base quickly, they are in a much stronger position when scrutiny changes.

The Bigger Strategic Shift

Perhaps the most important point from this episode is that the conversation is no longer really about whether AI belongs in compliance.

The more useful question is which AI belongs where, under what controls, and in support of which outcomes.

For regulated firms, the answer is unlikely to be a single model or a single workflow.

Predictive models to find and structure evidence.

Human reviewers to test context and make judgements.

Generative tools to support summarisation, productivity, and next-step workflow once the underlying data is trustworthy.

That is a more grounded view than the current market noise, but it is also more useful.

It accepts that scale matters, that regulators want evidence, and that firms cannot reach that standard with manual sampling alone.

It also accepts that trust is earned in design.

For firms trying to review more conversations with less risk, that may be the real shift - from using AI as a shiny feature to using it as part of a stronger operating model.

 

FAQs

What Is the Difference Between Predictive AI and Generative AI?

Predictive AI is typically used to classify, detect, extract, or score information.

Generative AI is typically used to create content, such as summaries, drafts, or responses.

Why Is Small-Sample Call Review a Problem?

Small samples can miss recurring issues, make trend analysis weak, and leave firms with limited evidence when they need to show how customer outcomes are being monitored.

Can AI Help With Retrospective Remediation?

Yes, especially where firms need to search large back books of calls or documents for specific themes, disclosures, or conduct risks.

It does not remove the need for judgement, but it can reduce the manual burden significantly.

Does AI Remove the Need for Human Reviewers?

No.

It can reduce low-value manual effort and improve targeting, but firms still need human judgement for interpretation, escalation, and accountability.

Just email it (securely)! CTA

 

References

PS22/9: A New Consumer Duty, Financial Conduct Authority, 2022

Consumer Duty Implementation: Good Practice and Areas for Improvement, Financial Conduct Authority, 2024

Insurance Multi-Firm Review of Outcomes Monitoring Under Consumer Duty, Financial Conduct Authority, 2024

Consumer Support Outcome: Good Practices and Areas for Improvement, Financial Conduct Authority, 2025

Guidance for Firms on the Fair Treatment of Vulnerable Customers, Financial Conduct Authority, 2021

Artificial Intelligence in UK Financial Services - 2024, Bank of England and Financial Conduct Authority, 2024

Artificial Intelligence (AI) Update - Further to the Government's Response to the AI White Paper, Financial Conduct Authority, 2024

Guidance on AI and Data Protection, Information Commissioner's Office, 2023

AI Risk Management Framework, National Institute of Standards and Technology, 2023

Public Statement on AI and Investment Services, European Securities and Markets Authority, 2024

Special Topic - Artificial Intelligence, European Banking Authority, 2024

Reviewed by

Sam Kendall, 16.04.2026

 

12 05 26

Posted by: Paul Holland

Paul, CEO and Founder of Beyond Encryption, is an expert in digital identity, fintech, cybersecurity, and business. He developed Webline, a leading UK comparison engine, and now drives Mailock, Nigel, and AssureScore to help regulated businesses secure customer data.

Return to listing