The Warm Handoff Problem: Why AI-to-Human Transfers Break in Production

Voice AI handles the routine well. Answering common questions, collecting information, closing simple cases — contact centers have been automating this layer for years. The AI makes it faster and cheaper.

The hard part is what happens when AI hits its limit and needs to hand the call to a human.

Those calls are a minority by volume. They’re also the most complex, sensitive, and high-stakes interactions in the queue. And the transition is where most deployments silently break.

The numbers are worse than you’d expect

Jim Iyoob at Etech Global Services measured this across real deployments: 68% of bot-to-agent handoffs lose critical context. Handle times jump 23 seconds on average. Customer satisfaction drops 31% compared to calls that never needed escalation.

His framing is blunt:

“You’ve automated away the easy 40% but made the hard 60% worse. Your investment actually degraded the customer experience for your most difficult interactions.”

Cisco has the agent-side number: one in three agents don’t have the context they need after a transfer. A Qualtrics study from October 2025 found nearly one in five consumers who used AI for customer service got nothing useful from it — a failure rate four times higher than AI use in other contexts.

It’s not the LLM — it’s the plumbing

When I look at failed handoffs, the problem is almost never the model. The AI understood the caller fine. Intent was correctly classified. The handoff still broke.

What actually goes wrong: the caller has to re-verify identity even though the bot already confirmed it. The agent picks up with no idea what was discussed for the past four minutes. Troubleshooting the AI already attempted gets repeated. Resolution time spikes exactly where the call was already difficult.

These are systems problems, not AI problems. How context gets structured. Whether authentication state crosses the boundary between the AI platform and the agent desktop. Whether the data arrives before the agent answers — or after.

That last one is particularly bad in voice. The screen pop — customer info that’s supposed to appear when a call routes to an agent — sounds straightforward. In practice, network latency or a slow database query means the data shows up after the agent has already greeted the caller and asked for their account number. And legacy CTI middleware from 2010 has a 256-character transfer limit. Enough for an account number, not a conversation summary.

The Klarna lesson

Klarna’s AI deployment became a much-cited case. Their assistant handled 1.3 million conversations a month, equivalent to roughly 800 full-time agents. Resolution time dropped from 11 minutes to 2. The numbers were striking.

Then Klarna’s CEO publicly acknowledged they’d gotten the balance wrong. Cost savings had been over-weighted; quality suffered in complex cases. He also pointed at something more structural: the existing service infrastructure — IVRs, FAQs, knowledge bases — hadn’t been well-maintained to begin with. They’d automated on top of a shaky foundation.

The fix wasn’t scaling back AI. It was redesigning escalation: confidence scoring so the system escalated when uncertain rather than guessing, and pre-handoff summaries so agents received full context before picking up.

After those changes, the AI handled more support, not less. Cleaner handoffs meant agents could actually use what the AI had collected.

What has to survive the transfer

The list of what needs to cross the AI-to-human boundary isn’t long, but teams skip items constantly.

Authentication state. The customer proved who they are — don’t make them do it again. A structured summary: issue, relevant dates, what they want. Not a raw transcript; an agent can’t scan five minutes of dialogue while the caller is already talking. What the AI already tried — if the bot suggested three things and the customer rejected all of them, the agent needs to know before opening their mouth. And the emotional temperature of the call: was this person frustrated before the transfer? Going in circles? That changes how you start.

Full transcript should be there if the agent needs it, but not in the way.

Customers don’t always tell you when it breaks

The consumer numbers are uncomfortable. 68% of people would rather talk to a human agent. 63% say they’d leave a company if human support wasn’t available. 80% will only use chatbots if they know a human option exists.

The part that tends to get missed: fewer than a third of customers who have a bad experience tell the company about it. Half just spend less. The rest leave.

Failed handoffs may not show up in your complaint data. That’s not the same as not happening. Isabelle Zdatny of Qualtrics XM Institute: “Leaders risk mistaking this silence for a healthy relationship. But silence today often means disengagement, and disengagement has business consequences.”

How we approach it

We build on a modular ASR → LLM → TTS pipeline with LiveKit for audio infrastructure and native SIP into the client’s existing telephony stack. That modularity matters for handoffs because we have visibility at every stage — and can intervene at each one.

When a call needs a human, context gets structured before the transfer, not during it. The LLM generates a summary while the caller hears a transition message. Authentication state travels via SIP headers so the agent’s screen is populated before they pick up. The AI stays on the line during the handoff — silent, not contributing, but there if the agent needs to pull something from the earlier conversation.

The thing we work backward from: the caller should never have to repeat themselves.

Escalation rate is the wrong metric

Most contact center dashboards optimize for deflection — what percentage of calls the AI resolved without a human. It’s understandable, but it’s the wrong number to be proud of in isolation.

A 90% deflection rate with broken handoffs means the 10% who needed help had the worst possible experience. They were already calling about something complex. The bot held them for three minutes, then dropped them into a queue with no context. Those are your highest-effort, highest-stakes callers — and the system just made their day worse.

Replicant puts it well: escalation isn’t a failure. It’s a sign the system knows its limits.

The 80% your AI handles only works if the 20% that needs a human is designed with the same care.

Dealing with AI-to-human handoffs in production? Get in touch — happy to share what we’ve learned.