Introduction to Active Call

Active Call is a standalone project separated from Active Call. It acts as a dedicated User Agent that handles all telephony protocols and media processing, providing an easy-to-use WebSocket API for external control.

Decoupled Architecture

Active Call features an innovative decoupled architecture that completely separates telephony & media processing from business logic:

Server: Handles SIP, RTP，audio processing and voice service (ASR/TTS) integration.
Client: LLM integration, controlling the Active Call via WebSocket.
WebSocket Protocol: Enables real-time interaction through a Command/Event pattern.

This architecture enables developers to:

Focus on business logic: No need to understand low-level details such as audio processing, SIP, RTP protocols, and other complex telephony stacks.
Language Independence: Build your AI agent using any programming language (Python, Go, Node.js, Rust, etc.) that supports WebSockets.
Tech Stack Freedom: Use any AI framework or LLM (OpenAI, LangChain, etc.) without restriction.
Independent Scaling: Deploy and scale your AI logic and media processing components independently.
Simple Debugging: Isolated domains ensure that AI logic issues don’t interfere with stable media transmission.

Protocol Support

Active Call supports industry-standard communication protocols:

SIP/RTP: Full compatibility with standard SIP trunking and VoIP hardware.
WebRTC: Direct voice interaction for browsers and mobile applications.
WebSocket: Native audio stream transmission for low-latency integrations.

Command/Event Pattern

Active Call uses an intuitive Command/Event pattern for its WebSocket control interface:

Command: Clients send instructions to Active Call to control behaviors (e.g., dial, play, record, transfer).
Event: Active Call pushes real-time status updates and processing results (e.g., ASR results, call status changes).

Plugin-based Architecture

Active Call features a plugin-based architecture that supports multiple mainstream service providers, giving you the flexibility to:

Freely switch providers: Select service providers based on cost, performance, and feature requirements.
Customize plugins: Implement your own ASR/TTS plugins to integrate self-deployed services.

Audio Processing Capabilities

Active Call includes a complete audio processing pipeline, delivering enterprise-level voice quality:

Voice Activity Detection: Intelligently detects voice activity and notifies clients.
Intelligent Noise Reduction: Removes background noise in real-time, improving ASR recognition accuracy.
Gain Control: Automatically adjusts volume to ensure stable and clear speech.

Comparison with Pipecat

Feature	Active Call	Pipecat/Monolithic Framework
Architecture Pattern	Decoupled architecture	Monolithic architecture
Deployment	Distributed deployment	Single process deployment
Learning Curve	Low, only requires understanding the API	High, requires understanding audio processing details
Debugging Difficulty	Low, problem domains are isolated	High, AI and media issues are tightly coupled
Performance	High-performance Rust implementation with multi-threaded parallelism	Single process with GIL limitations (Python)
Maintainability	Modular design with easy maintenance	High coupling, upgrades affect the entire stack

Use Cases

Enterprise applications and production environments.
Systems requiring high concurrency and high availability.
Large-scale projects with multi-team collaboration.
Complex voice interaction scenarios (IVR, intelligent customer service, voice assistants).
Applications with high performance and stability requirements.
Systems requiring integration with existing telephone infrastructure.

Summary

Low learning curve: No need to understand specialized knowledge such as audio processing, SIP protocols, or complex telephony hardware.
High development efficiency: Focus on business logic and rapidly iterate AI features without worrying about media processing.
Easy testing: WebSocket interface makes unit testing and integration testing straightforward.
Highly scalable: Supports distributed deployment and horizontal scaling.
Tech stack freedom: No restrictions on framework choices.
Production ready: Built-in monitoring, logging, and error handling mechanisms.