How Does Telephony Work? From PSTN to Cloud and AI Voice Routing

Oliver Cook
VP Global BPO Partnerships
Parloa
Home > knowledge-hub > Article
May 29, 20265 mins

Your AI agent pilot scored well in testing. Leadership greenlit production deployment across three contact center regions. Then the voice team ran the numbers: your PSTN trunks cap out at 23 concurrent calls per circuit; every additional line requires a technician visit; and the narrowband audio feeding your speech-to-text engine captures only a fraction of what customers actually say.

The AI is ready. The telephony layer underneath it is the constraint that determines what happens next.

How does telephony work in an enterprise contact center, and why does it determine whether AI agents perform or stall? The answer starts with a switching architecture that was never designed to carry AI inference in the call path, and it ends with an infrastructure decision that shapes whether AI voice agents sound conversational or robotic.

The original architecture: The Public Switched Telephone Network (PSTN)

The Public Switched Telephone Network (PSTN) is the global aggregate of circuit-switched telephone networks: telephone lines, fiber-optic cables, microwave links, satellites, and undersea cables, all interconnected by switching centers. The service layer running over this infrastructure is Plain Old Telephone Service (POTS).

How a PSTN call is established and carried:

  • A dedicated path is reserved: When a call is placed, circuit switching creates a physical connection between the two endpoints for the entire duration of the conversation. This guarantees consistent bandwidth and low latency, but the circuit consumes fixed resources regardless of whether anyone is speaking.

  • Channels are multiplexed over shared lines: Modern PSTN circuit switching uses Time Division Multiplexing (TDM), which assigns each active call a recurring time slot on a shared transmission facility. Each voice channel occupies 64 Kbps of bandwidth under the G.711 encoding standard.

  • Enterprise capacity is provisioned in fixed blocks: PSTN connectivity for contact centers typically arrives via a Primary Rate Interface (PRI) or a Session Initiation Protocol (SIP) trunk. In North America, a single PRI delivers 23 voice channels plus one signaling channel. Sizing for concurrent call volume is simple arithmetic: divide required capacity by 23. Adding capacity means ordering additional physical circuits and scheduling technician visits.

  • Call control runs on a separate signaling network: Signaling System 7 (SS7), an international standard first adopted in 1988, governs how network elements establish, manage, and release calls. SS7 uses out-of-band signaling: voice and control signals travel on separate channels. SS7 handles all signaling transactions up to the point a circuit is fully established, at which point voice transmission begins over the reserved TDM path.

The voice call path today: cloud telephony and SIP

Cloud telephony removes the fixed-capacity constraints of dedicated circuits by transporting voice across packet-switched networks. Voice over IP (VoIP) breaks audio into discrete data packets routed across an IP network, decoupling capacity from physical infrastructure.

How a modern cloud telephony call is established and carried:

  • SIP governs session setup: SIP is a signaling protocol for creating, modifying, and terminating multimedia sessions. It handles call setup and teardown. Voice audio travels separately via RTP (Real-Time Transport Protocol).

  • SIP trunks replace physical circuits: A SIP trunk is a virtual phone line connecting an enterprise's Private Branch Exchange (PBX) or cloud contact center platform to the PSTN. New lines are added via software configuration, and bandwidth is consumed only when packets are actively transmitted.

  • A Session Border Controller (SBC) sits at the network boundary: The SBC terminates the SIP trunk, handles SIP normalization to address interoperability differences between carrier and enterprise implementations, manages codec transcoding, and enforces security policy.

  • At the carrier edge, TDM converts to SIP: When a caller dials from a traditional phone, the call enters the PSTN as a circuit-switched connection. The carrier's network edge converts the TDM call into SIP signaling and RTP media, then sends a SIP INVITE to the enterprise's SBC.

  • The enterprise call control layer handles routing: From the SBC, the call passes to an on-premises IP PBX or cloud contact center platform, where routing logic applies: IVR treatment, ACD queue assignment, or an AI agent.

The emerging standard for intelligent call handling: AI voice routing

AI voice routing moves call handling from fixed menu trees to semantic understanding. Traditional Interactive Voice Response (IVR) systems rely on Dual-Tone Multi-Frequency (DTMF) touch-tone input or limited directed-dialog speech recognition. It interprets full utterances and conversational context to determine intent and drive resolution.

AI voice routing calls are processed in what's commonly called the STT-LLM-TTS pipeline:

  • Inbound speech is transcribed through speech-to-text (STT): A speech-to-text (STT) engine converts the caller's audio to text in near real time.

  • A large language model (LLM) interprets intent: It processes the transcribed text in the context of the full conversation and determines the appropriate response or action.

  • A Text-to-speech (TTS) engine renders the response as speech: TTS synthesis converts the LLM's output back to audio for the caller. The overall goal of optimized pipelines is to keep end-to-end latency per conversational turn under one second, the threshold generally considered acceptable for interactive systems.

To keep the conversation moving, the pipeline uses sentence-level streaming: each completed sentence is sent to TTS immediately while the LLM continues generating the next one. All three modules run concurrently, so audio output begins as soon as the first sentence is ready. This concurrent, low-latency architecture is the primary mechanism enabling sub-second AI voice responses in production.

Once the pipeline produces a response, the AI agent either resolves the issue within the conversation or routes to a human agent with full conversational context. AI agents in these architectures reason through multi-step processes and pull from multiple systems to resolve complex issues.

Why telephony modernization is an AI decision

The telephony infrastructure underneath your AI layer determines whether voice agents perform or stall. Legacy phone network architecture creates practical barriers in three areas that directly affect AI outcomes.

Old phone networks can't hear customers clearly

Traditional phone lines transmit audio at a fraction of the quality that modern automatic speech recognition systems need. That gap matters most when customers use specialized vocabulary: drug names, financial products and legal terms. Cloud telephony delivers higher-quality audio, which translates directly into better transcription accuracy and fewer misrouted calls.

Legacy infrastructure adds a delay that AI can't absorb

AI voice systems need real-time access to the call audio as it happens. Traditional phone infrastructure wasn't built to provide that, so connecting AI to it requires additional conversion steps that introduce delays. Every one of those steps eats into the tight time budget that separates a voice agent that sounds natural from one that sounds like it's thinking too hard. Cloud-native phone infrastructure eliminates most of that overhead.

Fixed-capacity networks can't scale with AI

Traditional phone infrastructure requires capacity to be ordered and physically installed in advance. AI agent capacity, by contrast, expands in software. Cloud-based platforms handle higher call volumes on demand, without the lead times and hard limits of physical circuits.

The traditional phone network is being switched off

Germany's transition to all-IP is largely complete. The UK has set a hard switch-off date of January 31, 2027. In the US, retirement is proceeding on a market-by-market basis with no single national deadline. Organizations still relying on legacy phone infrastructure need a migration plan that supports AI deployment before the timeline is forced on them.

Modernize your telephony stack before your AI deployment hits its ceiling

Outdated telephony infrastructure is the most common reason AI voice deployments underperform. Every legacy circuit, transcoding step, and protocol conversion adds latency that the STT-LLM-TTS pipeline can't recover.

Parloa's AI Agent Management Platform is built for this constraint. It connects to existing enterprise telephony via SIP, delivers low-latency performance across the full STT-LLM-TTS chain, and integrates with CCaaS platforms like Genesys through custom REST APIs.

Certifications include ISO 27001:2022, ISO 17442:2020, SOC 2 Type I & II, PCI DSS, HIPAA, GDPR, and DORA compliance. Parloa supports 130+ languages with AI agents tuned to regional nuances, and enterprises can go live in as little as a few weeks.

Book a demo to see how Parloa maps to your telephony environment.

FAQs

What is the difference between PSTN and VoIP?

PSTN uses circuit switching, dedicating a physical path between two endpoints for the duration of the call. VoIP uses packet switching, breaking voice into data packets routed independently across IP networks. PSTN costs traditionally scale with distance and time, while VoIP uses shared IP bandwidth and transmits packets as needed.

What is SIP trunking, and why does it matter for contact centers?

SIP trunking replaces physical PRI circuits with virtual IP-based connections to the PSTN. SIP trunks can be configured through software rather than hardware changes, capacity grows without technician visits, and the architecture supports the centralized call delivery model that cloud contact centers and AI voice agents require.

What is the difference between on-premises and cloud-based contact center infrastructure?

On-premises contact center infrastructure runs on hardware owned and managed by the enterprise, with capacity fixed at the time of installation. Cloud-based platforms host the same functionality on shared infrastructure managed by a vendor, with capacity that scales on demand through software configuration. 

For AI voice deployments, the cloud model matters because it provides the programmatic access to call audio, the elastic capacity, and the higher-quality audio codecs that AI systems require. On-premises systems can be integrated with AI, but each integration layer adds complexity and latency.

How does AI voice routing handle calls it can't resolve?

When an AI voice agent reaches the boundary of what it can handle, whether due to policy constraints, an unusual request, or a customer who explicitly asks for a human, it transfers the call to a live agent. 

In well-designed systems, that handoff includes the full conversation context: what the customer said, what the AI already attempted, and any data retrieved from backend systems during the call. This means the human agent picks up the conversation without the customer having to repeat themselves, which is one of the most measurable quality improvements AI routing delivers over traditional IVR escalation paths.

Get in touch with our team