Insights

Multi-agent architecture: A look inside Parloa’s Subtask Agents

21 May 2026

Author(s)

Robiert Luque Pérez

Staff Product Manager

Table of contents

Parloa recently launched Subtask Agents, a new multi-agent orchestration model within the Agent Management Platform (AMP) designed to bring modularity, precision, and speed to complex enterprise AI workflows. Within a customer interaction, a Subtask Agent owns one user goal: authentication, order status, cancellation, FAQ, escalation to a human, etc. In alignment to this one goal, each agent carries its own activation instructions, resolution instructions, and assigned skills.

Typically, multi-agent work such as this leverages a supervisor LLM at the routing layer. This LLM reads the conversation, considers the available sub-agents, and decides which one should run next. This style of routing is the easiest pattern to start with, and the path of least resistance in most agent frameworks. For voice AI agents, however, a different level of detail and security must be considered.

Below are the choices we made to build a strong Subtask Agent architecture for Parloa AI Agents:

Why voice AI requires a different approach

Building the architecture for Subtask Agents first required a an assessment of what makes voice AI different. Three key properties defined the direction of our architecture:

Latency budgets are unforgiving. Round trip, a supervisor LLM adds hundreds of milliseconds before the next agent even starts thinking. In voice, these milliseconds are the difference between sounding natural and sounding like the line went dead.
Recovery is expensive. If a chat agent routes wrongly, the user types again. If a voice agent routes wrongly into a sub-agent that’s missing the expected state, the call ends, and the customer is lost.
Compliance is structural. A chat agent that leaks personally identifiable information (PII) through the wrong tool can usually be patched by tightening a prompt. A voice agent that lets an unverified caller into a billing flow is a regulatory nightmare.

With these properties in mind, we opted for a two-layer routing approach to our architecture.

Two-layer routing for increased governance

The first layer of routing in a Subtask Agent is a set of restrictions that act as an eligibility gate, which are used to determine which subagents are available at any time in the conversation. The eligibility gate ensures security and privacy for the most sensitive situations.

Parloa’s platform evaluates every Subtask Agent's Activation Restrictions against the current values of the conversation's Storage Variables. Restrictions are boolean conditions over named variables: is_authenticated equals "true", customer_tier in ["gold","platinum"]. They are code. They are not interpreted by a language model. A Subtask Agent whose restrictions do not pass the eligibility gate is excluded entirely from the next layer of routing.

The second routing layer is selection. Among the Subtask Agents that pass the first gate, the currently active agent's LLM chooses which one to hand off to, guided by each candidate's Activation Instructions: a short natural-language description of when that agent should run. Here, the LLM's role is bounded. It picks among already-eligible agents. It cannot route to one that has been gated out, and it does not need to consider any restriction logic itself.

For example:

# Auth Subtask Agent

activation_restrictions:

- is_authenticated equals "false"

activation_instructions: >

Activate when the caller needs to verify their identity

before accessing account information.

# Billing Subtask Agent

activation_restrictions:

- is_authenticated equals "true"

activation_instructions: >

Activate when the caller wants to discuss invoices,

payments, or charges on a verified account.

In this example, when the caller first connects, is_authenticated is "false". The Auth Subtask Agent passes its restriction, but the Billing Subtask Agent does not, as billing is not in the LLM's selection pool. When the Auth Subtask Agent completes its work and writes is_authenticated = "true" to Storage, the next turn flips the eligibility set. Billing becomes available. Auth is no longer eligible. The transition is not a request from the LLM but a consequence of state.

When we began this two-strategy approach, our engineers often asked two questions:

Why not let the LLM evaluate the authentication state itself?

LLMs are subject to drift. A model update, a new system message, an unusual conversation pattern, and a model that previously gated reliably will "help" the user by skipping a step. Activation restrictions don’t allow for steps to be skipped. There is no path to the Billing Agent that does not go through is_authenticated equals "true." That property is statically checkable against the configuration.

Does this make the system rigid?

It makes it predictable.

The LLM stays in the loop for the part of the problem it is genuinely good at solving: interpreting messy human language and matching it to one of several appropriate agents. It is taken out of the loop for the part it is bad at: enforcing invariants. That division of labor has held up across deployments.

The three primitives

Subtask Agents work because three primitives compose cleanly. None of them is novel on its own. The architectural choice was to keep them separate when most multi-agent frameworks blend them.

One goal per Subtask Agent

When resolution instructions stretch to cover more than two or three scenarios, a split signal occurs, producing inconsistent routing, slower per-turn LLM decisions, and harder debugging. Subtask Agents with only one scenario, on the other hand, are easier to trace and resolve, as the state is named, transitions are logged, and there is no single supervisor prompt to interrogate.

While the cost of this discipline is upfront, bypassing the planning step is the single most common cause of late-stage rebuilds we see across customer deployments:

Builders have to plan their agent on paper before opening the builder.
Routing decisions need a Storage Variable that already exists.
Activation Restrictions need to reference variables that have been defined.

Storage Variables

Storage Variables are the conversation's state. The builder defines them as part of the agent configuration, and every Subtask Agent within that conversation reads and writes the same set of variables. They are the medium through which one Subtask Agent's effect becomes another's eligibility, and they are the only state on which routing logic depends:

Hooks update storage variables at conversation boundaries.
Skills update them during turns.
Activation Restrictions read them on the way into each turn.

Storage variables are the property that make routing reasonable to audit.

Hooks

Hooks are the deterministic execution that happens outside the conversation turn loop. They run platform code on lifecycle events without involving a language model.

Two are particularly load-bearing for the architecture:

The Conversation Start hook runs once before the agent's first turn. Here, it populates routing-critical Storage Variables: authentication state from the telephony layer, customer tier from the CRM, and language preference from the channel. The Conversation Start hook makes it so by the time the first Subtask Agent activates, everything it needs to make a routing decision is already in Storage. There’s no need for the Entry Point Subtask Agent to make an API call for routing.

The Conversation End hook runs after the conversation closes. In the Conversation End hook, ticket creation, transcript persistence, and downstream system updates occur. Failures here do not affect the customer call.

Hooks are sequential, deterministic, and bounded by timeout. For this reason, they should not live where the LLM lives.

Fast, compliant, and efficient

To refer back to the three differentiators we defined for voice AI, here are the ways in which the three primitives correlate:

Latency that does not pay for routing

In Subtask Agents, eligibility evaluation is platform code at sub-millisecond latency, and the selection step is fused into the next agent's normal turn. There is no dedicated routing LLM call, and thus no additional handoff cost.

For a customer-service voice call with multiple handoffs, the saved time compounds. In live testing on a production deployment, one customer measured a perceived 24% average reduction in time-to-call-resolution under this architecture, with the fastest conversations resolving up to 47% sooner than the baseline.

Every handoff has a state-level cause

When a Subtask Agent activates, the platform logs which Activation Restrictions evaluated to true and which Storage Variable values produced that result. The debugging loop becomes one of, "look at the variables and find the one that surprised you,” compressing the cycle in a way that is hard to overstate to anyone who has spent any time inside a supervisor-graph debugger.

Compliance gating that survives prompt drift

We have shipped agents to financial services pilots where the regulatory bar required proof that no path exists from the unauthenticated entry state to the account-data Subtask Agent. The proof is a static reachability check on the Activation Restriction graph: every Subtask Agent that touches account data has is_authenticated equals "true" (or a stronger condition) in its restrictions. The check runs in minutes. Because the property does not depend on the model, we have not had to rerun it after model upgrades.

What we learned along the way

A few lessons we would offer to anyone building a multi-agent voice system, regardless of platform.

Plan on paper before opening the builder. The most common cause of rebuild we see comes from starting in the platform, discovering the routing decision needs a Storage Variable that was never defined, and then realizing several Activation Restrictions reference variables that do not exist. Thirty minutes of paper saves several hours of rework.

Do not fold conversation logic into skill logic. Skill Instructions tell the agent how to use tools. Resolution Instructions tell the agent how to talk to the caller. Mixing them produces agents that argue with their tools and tools that try to do conversation design.

Keep the platform's system prompt opaque. We deliberately do not expose the underlying system prompt template to builders. Visible system prompts get edited, edits drift, and drift breaks production. Instead, builders shape behavior at the Subtask Agent level, where the platform owns the contract between the LLM and the conversation.

Treat the LLM as a bounded participant. Every place we found ourselves wishing the LLM would "just figure it out" turned out to be a place where deterministic state would have been faster, safer, and cheaper to debug.

Why two-layer routing is essential to multi-agent architecture

The architectural choice for Subtask Agents spurred from a single observation: LLMs are good at interpreting ambiguous human input and matching it to structured options but unreliable at enforcing invariants. The two-layer model—deterministic eligibility gating followed by LLM-driven selection among already-eligible agents—keeps each layer inside its competence boundary, producing routing decisions that are auditable without reading a prompt, compliance properties that are statically checkable, and unexpected transitions that point to a variable value rather than a model decision. The three primitives that enable this (single-goal agents, shared Storage Variables, and lifecycle Hooks) are not novel in isolation, but together, they create the conditions necessary for a system to work reliably in enterprise production. While more upfront discipline is required for successful execution, for any multi-agent system where latency, compliance, or recovery costs are non-trivial, the pre-work is well rewarded.