Quick Start

This chapter explains how to build a voice assistant with Active Call.

Prerequisites

Before starting, we need the following:

Runtime Environment

Docker: To run Active Call service
Go compiler: To run client

Cloud Service API Key

We use Alibaba Cloud as an example.

You need to activate Aliyun Bailian and create an API Key.

* We also support Tencent Cloud and Deepgram for TTS and ASR services. * You are free to choose any large language model compatible with OpenAI API.

The following services will be used:

TTS: Text-to-Speech - CosyVoice/Sambert
ASR: Real-time Speech Recognition - Paraformer
Large Language Model: Tongyi Qianwen

Start Active Call service

We need to create a configuration file, and then start Active Call with Docker.

Create Configuration File

Create the config.toml configuration file in the working directory:

cat > config.toml << 'EOF'
http_addr = "0.0.0.0:8080"
log_level = "info"
stun_server = "stun.l.google.com:19302"
recorder_path = "/tmp/recorders"
media_cache_path = "/tmp/mediacache"

[ua]
addr = "0.0.0.0"
udp_port = 13050
EOF

http_addr = "0.0.0.0:8080" is the WebSocket address of Active Call, used for client connection.

Start Active Call with Docker

Run the following command, replace your_dashscope_api_key with your API Key:

docker run -d \
  --name rustpbx \
  -p 8080:8080 \
  -p 15060:15060/udp \
  -p 13050:13050/udp \
  -p 20000-30000:20000-30000/udp \
  -e DASHSCOPE_API_KEY=your_dashscope_api_key \
  -v $(pwd)/config.toml:/app/config.toml \
  -v $(pwd)/recordings:/tmp/recorders \
  ghcr.io/restsend/rustpbx:latest \
  --conf /app/config.toml

Download SDK

Download Active CallGo source code from GitHub:

git clone https://github.com/restsend/rustpbxgo.git
cd rustpbxgo

The cmd directory in Active CallGo contains an example application.

I will explain it in detail in the next chapter Code Explanation.

Start Client

Run the client, replace your_dashscope_api_key with your API Key:

go run /static/docs/active-call/tutorial/cmd \
  --endpoint ws://127.0.0.1:8080 \
  --tts aliyun --speaker longyumi_v2 \
  --asr aliyun \
  --openai-key your_dashscope_api_key \
  --model qwen-plus \
  --openai-endpoint https://dashscope.aliyuncs.com/compatible-mode/v1 \
  --greeting "Hello, how can I help you?"

Parameters

* `--endpoint`: Active Call WebSocket address. Configured in the `http_addr` field in the previously created `config.toml` file. * `--openai-key`: large model API Key * `--model qwen-plus`: model name * `--openai-endpoint`: OpenAI compatible API endpoint. If you use other models, change it to the corresponding endpoint. * `--greeting`: welcome message.

Test Voice Assistant

Now you can speak into the microphone, You can try the following questions:

“How’s the weather today?”
“Tell me a joke”
“What’s 1+1?”
“Who are you?”

Architecture

Client creates a WebRTC Peer
Client connects to Active Call, then sends an Invite command
Active Call connects to WebRTC Peer (created at step 1)
When there is speech input, Active Call calls the ASR service and sends the text result to the client
Client sends text result to large model
Client uses TTS command to play the large model response
Active Call calls the TTS service to convert text to speech and sends it to the WebRTC Peer, then repeats st eps 4-7

* For simplicity, client starts a WebRTCPeer. In actual scenarios, WebRTC Peer (or SIP Phone) will be external, client only handle business logic and large model interaction.

Quick Start

Prerequisites

Runtime Environment

Cloud Service API Key

Start Active Call service

Create Configuration File

Start Active Call with Docker

Download SDK

Start Client

Test Voice Assistant

Architecture

More Resources