The hidden layer of personalization in AI agents
Part of the Agent Architect's Digest, a series from Parloa's Agent Architects team.
The ability to train AI agents on brand voice has become table stakes for any conversational AI solution, but what about the ability to adapt an agent’s tone in response to the individual it’s talking to? That capability is far less common, though just as important.
I was recently invited to a customer's service center to listen to how their human agents communicated with customers. Sitting next to an agent, I experienced the importance of this capability first hand. Over the course of 20 minutes, the agent shifted between three different versions of herself: relaxed with me, reassuring with a routine caller, quieter and more careful with one who I could hear was upset. She wasn't performing. She had simply spent enough years on the phone to know what each person needed.
The reflex itself isn't a contact-center skill. It's a human one.
Linguistic Style Matching
In our daily lives, we tailor our language and style to who we're talking to and what we sense their mood to be, sometimes without even realizing the shift ourselves. Psychologists call it linguistic style matching [1]. By adjusting our tone to that of the people we’re talking to, we demonstrate empathy and understanding, earning trust in the relationship.
AI agents need to be able to do this, too.
Customers today crave human-like conversations. They want to feel understood. AI agents that can pick up on these nuances earn trust and credibility with their buyers, empowering enterprises to deliver reliably fast service to customers.
This is the work I focus on as an Agent Architect for Parloa.
Personality-Adaptive Conversational Agents
The question of "What would it take for a conversational agent to understand a user's personality from how they communicate, and respond in a style that matches?" led my PhD thesis at TU Braunschweig . There, I developed Personality-Adaptive Conversational Agents, which were systems designed to read cues in a user's language and adjust their own communication style in response [2].
The Big Five model
My agents were designed in alignment with the Big Five model of personality, a framework widely used in psychology to describe individual differences across five dimensions: openness, conscientiousness, extraversion, agreeableness, and neuroticism[3]. Decades of work across psychology and computational linguistics have concluded that these traits leave measurable, consistent traces in language that can serve as verbal and paraverbal cues. Verbal cues refer to the words a user chooses, the sentence length, formality, and emotion conveyed. Paraverbal cues refer to the pace, the pitch, the placement of pauses, and speed of response. Together, these cues serve as stable signals an AI can read and respond to — the same signals humans use intuitively when adjusting to each other in conversation.
Testing the Big Five with AI
To test this framework with AI, my colleagues and I built one extraverted chatbot and one introverted chatbot, identical in every way except their tone of voice. We applied the Mairesse et al. cue inventory by hand, using informal language, short sentences, and high-emotion words for the extraverted chatbot and longer sentences, richer vocabulary, and fewer emotional markers for the introverted chatbot. Then, we verified the manipulation by running each chatbot's dialogue through IBM Watson Personality Insights (IBM's commercial personality-inference service that has since been retired), which scored them at the 83rd and 36th percentiles for extraversion, respectively. The percentile here is a comparison to a broader population: the extraverted bot's dialogue was more extraverted than 83% of speakers Watson PI was trained on, while the introverted bot's dialogue scored below 64%. The two stylistic profiles, in other words, were measurably distinct. When testing both bots with end users, the end users consistently reported higher communication satisfaction when the chatbot's style matched their own (Wilcoxon signed-rank test, p < .001) [4].
Using this information, we prototyped a third chatbot, which we codenamed Raffi . Raffi didn't embody a fixed personality but instead inferred the user's personality in real time and adapted in response[5]. We designed the bot around four functional requirements. Raffi had to integrate a personality-mining service, expose an interface where user text could be analyzed, combine both into a single interaction surface, and produce responses that a conversational designer could shape per personality dimension. The architecture was modular: Slack served as the user-facing interface, MongoDB stored both the user's messages and the inferred personality profile, and Node.js orchestrated the pipeline.
Put into production, Raffi worked as follows: a user wrote a message in Slack, which was stored in MongoDB. Once a minimum word threshold was reached, the accumulated messages were sent to IBM Watson Personality Insights, which returned a JSON object containing the user's Big Five percentiles. That profile was persisted in MongoDB and made available to the chatbot engine (Recast.AI). Recast.AI then selected from a library of pre-designed conversational flows, each authored using the 148-cue framework we'd built up in a systematic literature review, to match the inferred personality.
A small qualitative study with three participants tested two versions of Raffi back-to-back: one was configured to match each participant's inferred personality (following the law of attraction), and one was inverted. The conversation in each case was a six-minute small-talk exchange about travel and personal interests. All three participants noticed the difference between the two versions. Two preferred the matched version, describing it as friendlier and more comfortable to interact with. The third preferred the inverted version, perceiving it as more human-like. Even with the small sample, the study confirmed what we needed to know to keep building: the inference-plus-adaptation loop produced dialogues that users could reliably distinguish, and style adaptation was something they responded to.
Unlike ExtraBot and IntroBot, Raffi’s adaption was not something the designer baked in at build time. It became something the system performed at runtime. The chatbot's response library was still hand-authored, but which path it ran through for which user was decided dynamically from the user's own language.
From old-school engineering to modern LLMs
The work of ExtraBot, IntroBot, and Raffi was completed before large language models reshaped the AI field, when getting a chatbot to sound meaningfully different across contexts was a real engineering effort. Today, much of that engineering is unnecessary.
A modern LLM can read a user's messages, infer their style implicitly, and generate responses in a matching tone without the need for a separate personality-mining service, a rule-based dialogue manager, or a hand-authored library of conversational flows. What used to take Watson PI + Recast.AI + MongoDB + Node.js can, in principle, be done by a single well-prompted model.
From research pilot to enterprise production
Modern LLMs are extraordinarily good at reading signals. They can pick up vocabulary, sentence length, formality, and emotional tone from just a few turns of conversation. What they still can’t do, however, is dynamically act on that reading. Deciding which style fits which user, when to shift, by how much , and how to keep these decisions scalable still relies on design and judgment. At Parloa, we believe this design requires three key actions from the agent:
Inference: The agent needs to use vocabulary, sentence length, pacing, formality, and emotional register to infer the user's communication style from the conversation as it unfolds.
Running profile: Throughout a conversation, agents must maintain a running profile of conversation style across turns. They cannot drift from one stray sentence or break when the user pauses or switches register.
Output style selection: Agents do not need to (and should not) switch entire personalities, but they must adjust a small set of stylistic dimensions (warmth, brevity, formality, acknowledgment) within the constraints of the brand voice. This could mean making decisions about whether to greet warmly or get to business right away, whether to acknowledge an emotion explicitly or hold space for it implicitly, whether to use "we" or "I," and whether to confirm in five words or two.
To make this design tractable, we use agent composition . We don't define an agent as a single monolith, rather, we define a core agent with durable parts, voice, brand-aligned tone, skill set, and escalation logic, then layer context-specific overlays on top. Common overlay examples include regional (e.g. German vs. English), channel (e.g. voice vs. chat), or industry (e.g. retail vs. healthcare). Each overlay can override specific parameters of the core (the formality default, the pacing, the acknowledgment patterns) without breaking the agent. The result is a single source of truth for who the agent is, with deliberate room for who it needs to become depending on where it shows up. For most customers, that alone is already a significant step beyond "one voice for everyone."
A caveat
Full personality-adaptive behavior is not the right answer for every use case, especially as the data usage and cost of creating adaptive behavior is far from trivial. Inferring a caller's communication style reliably takes data. IBM's Personality Insights required a minimum of 100 words for any analysis at all, with accuracy improving substantially with longer text [6]. A 90-second "what's my order status" call often produces only a sentence or two from the customer side — far below what reliable personality inference would require.
For longer or more emotional conversations like onboarding, claims handling, and support escalations, however, adaptation pays off.
What’s Next
I’ve done the research and work to know that the ability to define different communication styles for different segments, channels, and use cases of agents is tractable today. What I’m working towards now is bringing personality adaptation to the user level at enterprise scale. The architecture is more difficult than what I built for Raffi. Modern stacks have more moving parts, brand-voice constraints make it impossible to hand the LLM full stylistic discretion, and the reliability bar is higher in production than it ever was in academic prototypes. But the principle is the same one I started studying years ago. Read the person. Respond in kind.
Years of research told us it matters. Years of production work keep proving it: customers remember the agents that felt built for them — an agent that becomes a slightly different version of itself for every person it speaks to.
References
[1] Niederhoffer, K. G., & Pennebaker, J. W. (2002). Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology, 21(4), 337–360.
[2] Ahmad, R. (2023). The Value and Design of Personality-Adaptive Conversational Agents in Service Interactions (Doctoral dissertation). Technische Universität Braunschweig.
[3] Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text. Journal of Artificial Intelligence Research, 30, 457–500.
[4] Ahmad, R., Siemon, D., & Robra-Bissantz, S. (2020). ExtraBot vs IntroBot: The Influence of Linguistic Cues on Communication Satisfaction. Proceedings of the 26th Americas Conference on Information Systems (AMCIS).
[5] Ahmad, R., Siemon, D., Fernau, D., & Robra-Bissantz, S. (2020). Introducing "Raffi": A Personality Adaptive Conversational Agent. Proceedings of the Pacific Asia Conference on Information Systems (PACIS).
[6] IBM Cloud Docs (2017). Personality Models. IBM. (Service retired in 2021.)

:format(webp))
:format(webp))
:format(webp))