The quiet spread of AI agent washing in customer service

Every tech wave has its buzzwords. In the early days of the cloud, everything was “as a service.” During the Web3 boom, every startup had a “tokenomics” section. Today, it’s artificial intelligence (AI) and AI agents.
The term “AI agent” gets thrown around as a catchall. Sometimes, for systems that plan, reason, and act autonomously, but more often for little more than scripted workflows in a wrapper. Scratch the surface, and many of these “agents” can’t plan, can’t adapt, and definitely can’t decide.
More importantly, in domains like customer service, where automation directly affects user experience, calling something an “agent” when it can’t follow through is misleading and operationally risky—and a universal problem we hope to solve.
That’s the real cost of AI agent washing. It’s the gap between what systems actually do and what they’re marketed to do. And in a field where customer expectations shape adoption, it’s way more than just a branding issue; it’s a product one.
The trouble with “agentic” everything
Most of the confusion around this topic comes down to one word: autonomy.
In theory, an agent should be able to pursue goals with minimal oversight. But in practice, the word gets stretched to fit whatever’s on the roadmap. For instance, finely tuned flows with hand-coded if-else logic are suddenly “agentic.” These companies are treating AI technology like a vibe, not a solution.
But that distinction is key.
Recent research into proactive agents highlights how most systems still are far from true autonomy. In one benchmark, even top-performing models struggled to anticipate or initiate tasks without human prompts, core behaviors that define agentic intelligence. The study explicitly notes that “most agent systems remain reactive,” and that training LLMs for genuine proactiveness requires novel data pipelines, human feedback, and careful fine-tuning.
It’s also a pattern that’s showing up across sectors—from enterprise IT to ecommerce—but customer service feels the impact faster, because the stakes are immediate and workflows are exposed.
Here, if an agent can’t manage context across a troubleshooting sequence, replan when an API fails, or escalate a case with the right metadata, it’s not automating anything. Instead, it’s just shifting the burden back onto humans. And plenty of systems on the market still fall into that trap.
Levels of AI agent autonomy
Academic frameworks are already trying to clarify this. A 2025 study by interface (formerly Stiftung Neue Verantwortung) outlines five levels of agent autonomy, from narrow, human-supervised task runners to fully independent decision-makers. But most systems we see in the wild land are in the lower tiers.
And when marketing implies Level 5 autonomy while the system operates at Level 2, it sets the stage for misaligned expectations and accountability gaps. If buyers can’t reliably tell the difference between a Level 2 automation and a Level 5 autonomous agent, trust in the whole category starts to erode. This is what economists call the challenge of asymmetric information.
When every vendor calls their system an agent, the word stops meaning anything. And when labels outpace capabilities, buyers make the wrong bets, investors chase hype, and builders focus more on promotion and not performance.
“It’s a 'market for lemons' problem: high-quality systems compete with lookalikes, and over time, the signaling power of the term “agent” breaks down.
That doesn’t just hurt vendors; it slows adoption, funding, and real progress.”
The cost of overstating autonomy without the evidence
Across the AI agent ecosystem, the gap between branding and real capability is widening. And specifically, in customer service, agent washing sets up teams for failure.
When buyers expect a system that can coordinate actions end-to-end, and instead get a glorified autocomplete, they lose trust. When internal teams plan for delegation and end up with supervision, it slows everything down. And when investors chase fully autonomous demos, they fund vapor instead of value.
The real danger is that agent language overshadows the harder work of functional design. Because true autonomy comes down to architectural decisions, which takes time to build and test.
These claims inevitably invite accountability
It’s easy to call a system an agent. It’s harder to back that up. And agent washing raises the stakes for everyone in the industry. As systems take on more responsibility in high-impact workflows, the gap between label and capability becomes a liability, and it makes regulation inevitable.
The EU’s AI Act, which enters full effect in 2026, lays out strict accountability for systems deemed high-risk—including many of the use cases where agentic claims are being made today. It requires rigorous documentation, risk mitigation processes, and human oversight—especially for systems that make decisions affecting people’s access to services, employment, or legal outcomes.
The law also introduces transparency obligations for any AI system that interacts with the public or produces synthetic content. Systems must clearly disclose that users are engaging with an AI—not a human—and generative outputs must be labeled. The intent is clear: trust starts with clarity. And the more autonomy a system claims, the more scrutiny it invites.
This enforcement is already happening in the U.S., too, as the Securities and Exchange Commission (SEC) & Federal Trade Commission (FTC) are tightening scrutiny of companies marketed as AI-driven, yet lack the systems to back those claims.
Where Parloa stands
At Parloa, we use the term “AI agent” too, but with a clear emphasis on what that means in practice. Our platform offers systems that handle real customer conversations across phone, chat, and messaging. They can make sales, collect debt, schedule appointments, process claims, among other things, and integrate with business systems to resolve issues—that’s functionality.
Our agents are built to automate conversations, understand context across multiple turns and calls (with short- and long-term memory), and scale customer service reliably. That’s a meaningful kind of autonomy.
Research from the University of Toronto’s Schwartz Reisman Institute shows that fewer than 10% of so-called agents undergo any kind of external safety evaluation. Less than 20% disclose formal safety policies.
We also know what happens before deployment: analytics, simulation, and ongoing evaluations. It’s part of making sure these systems work in the real world. Additionally, we’re using internal evaluation frameworks that focus on actual system abilities: intent capture, data extraction, and workflow reliability. Because if we call it an agent, we want to know what it can do.
While agent washing thrives when language outpaces behavior, our goal is to keep those aligned, backed by years of experience, expertise, and research.
For example, a global e-commerce and fintech company, using Parloa’s platform in partnership with Waterfield Tech, deployed an AI agent to manage payment reminder calls—traditionally a tense, high-stakes touchpoint.
In just two months, the agent outperformed humans on both fronts: 66% of customers promised to pay, and 62% followed through.
Let’s name things better
None of this is to say we should gatekeep the term “agent.” Language evolves. But if we want to move this space forward, we need to get more specific about what different systems can and can’t do.
Our takeaway: Let’s reserve the term “agent” for systems with at least some degree of autonomous decision-making, without the bait-and-switch. And let’s make space for necessary autonomy: systems that don’t do everything, but do something well, independently. This applies whether you're building internal tooling or customer-facing workflows.
Because the future of AI isn’t about pretending everything is an agent. It’s about knowing which parts need to be, and designing accordingly.