Parloa integrates and optimizes the latest AI models and services, so our AI agents provide the most accurate and natural customer experiences available today — and in the future.
When a customer calls, you’ve got one shot to ensure a great experience. Long wait times, misrouted calls, and having to repeat oneself multiple times lower stats like NPS and CSAT, and leave everyone frustrated.
Parloa owns the entire audio stream, end to end, with our own proprietary phone infrastructure. Our customers are never locked into a particular audio AI technology, because we are constantly testing what’s new in the market. Parloa is committed to ensuring that our customers have access to the best AI technology and services available. That’s why our customers always have the latest voices and best models for customer service use cases.
So… how do we do it?
How we optimize every step of the call
Parloa’s AI Agent Management Platform (AMP) is built to make calling a customer service hotline as easy as speaking to a friend. This means ensuring that a customer can speak naturally, like they would in an everyday conversation. The more context you have, the easier it is to steer the conversation in the direction you want it to go.
Optimizing for context
When Parloa integrates into your system, there are a number of optimizations we build for high listening accuracy. For example, we’re able to integrate custom STT hints for each business, such as specific product names the AI agent is listening for. With these contextual hints we can prime the AI agents by setting up the context before a caller gets on the phone. Parloa knows what words to expect, so it can properly execute speech to text transcription, like company names, product names, and if you connect Parloa to a CRM, it can even recognize a customer’s name.
We can also make the system aware of context, like the expected number of digits in a customer code. For instance, we can give the agent context like “Don’t interrupt the customer until they’ve said their 9-digit customer number,” – ensuring that a customer who pauses, needs to repeat something, or has to look up their number is not interrupted. This ensures that Parloa’s AI agent collects the most useful input to pass to the LLM to correctly advance the request.
The highest voice quality comes from advanced listening and speaking
When a customer is speaking, echo cancellation ensures the best audio capture quality. We originally developed this for HSE, a live commerce business that receives a high volume of customer calls from an older population. When the average customer called HSE, they were often on speakerphone with a TV blasting in the background. In order to better understand and improve audio transcriptions, we had to build technology to reduce the background noise. This audio quality helps us now fully automate 250k orders per month for them.
On the other side of the conversation, Parloa offers a number of optimizations to make an AI agent sound natural and human. If the agent needs more processing time for a complex request, it will include intermediate responses, like “Hold on a sec,” or “Give me just a moment,” instead of awkwardly long pauses. We integrate the latest frontier models to ensure all sounds and verbal cues will make the conversation sound life-like.
Another reason our voice quality is so authentic is because of the latency optimizations we’re constantly enabling. For example, we recently moved to GPT-4o on Azure as soon as it was available, resulting in a 23% decrease in latency.
We can also leverage multiple speech-to-text (STT) models in the same flow, optimizing for alphanumeric recognition. This includes optimization for things like serial number recognition and name recognition. Our models can pick the best results from potential candidates because they are able to prioritize STT candidates.
Adding a human touch
Parloa owns the entire audio stream, which means we can easily input intermediate responses like “Let me look that up for you” to buy processing time without a perceived delay. We can also add verbal cues like “uh” and “um,” typing sounds, and other ambient background noises to avoid the sterile feeling to calls. All this makes the experience sound and feel much more natural.
We give enterprises total control of their caller experience
Parloa offers lower latency, better audio quality, and contextual feedback for smarter speech recognition. We can do this because we own the phone infrastructure and end-to-end audio stream. This means easier integration with your existing telephony system – and more natural sounding conversations with your customers. What does this mean for you? Lower or eliminated wait times, and higher NPS and CSAT scores — and higher conversion rates and cross-sell rates like HSE above.
Lukas Brückner is a Staff Product Manager at Parloa, where he is responsible for the call experience team with a focus on speech recognition technology and their enhancements. He also kicked off the development of our Real-Time Translation product and laid the foundations for the Solutions Engineering Team, ensuring the success of our first customers.