Quick Start
This chapter explains how to build a voice assistant with Active Call.
Prerequisites
Before starting, we need the following:
Runtime Environment
- Docker: To run Active Call service
- Go compiler: To run client
Cloud Service API Key
We use Alibaba Cloud as an example.
You need to activate Aliyun Bailian and create an API Key.
The following services will be used:
- TTS: Text-to-Speech - CosyVoice/Sambert
- ASR: Real-time Speech Recognition - Paraformer
- Large Language Model: Tongyi Qianwen
Start Active Call service
We need to create a configuration file, and then start Active Call with Docker.
Create Configuration File
Create the config.toml configuration file in the working directory:
cat > config.toml << 'EOF'
http_addr = "0.0.0.0:8080"
log_level = "info"
stun_server = "stun.l.google.com:19302"
recorder_path = "/tmp/recorders"
media_cache_path = "/tmp/mediacache"
[ua]
addr = "0.0.0.0"
udp_port = 13050
EOF
http_addr = "0.0.0.0:8080" is the WebSocket address of Active Call, used for client connection.
Start Active Call with Docker
Run the following command, replace your_dashscope_api_key with your API Key:
docker run -d \
--name rustpbx \
-p 8080:8080 \
-p 15060:15060/udp \
-p 13050:13050/udp \
-p 20000-30000:20000-30000/udp \
-e DASHSCOPE_API_KEY=your_dashscope_api_key \
-v $(pwd)/config.toml:/app/config.toml \
-v $(pwd)/recordings:/tmp/recorders \
ghcr.io/restsend/rustpbx:latest \
--conf /app/config.toml
Download SDK
Download Active CallGo source code from GitHub:
git clone https://github.com/restsend/rustpbxgo.git
cd rustpbxgo
The cmd directory in Active CallGo contains an example application.
I will explain it in detail in the next chapter Code Explanation.
Start Client
Run the client, replace your_dashscope_api_key with your API Key:
go run /static/docs/active-call/tutorial/cmd \
--endpoint ws://127.0.0.1:8080 \
--tts aliyun --speaker longyumi_v2 \
--asr aliyun \
--openai-key your_dashscope_api_key \
--model qwen-plus \
--openai-endpoint https://dashscope.aliyuncs.com/compatible-mode/v1 \
--greeting "Hello, how can I help you?"
Parameters
* `--endpoint`: Active Call WebSocket address. Configured in the `http_addr` field in the previously created `config.toml` file. * `--openai-key`: large model API Key * `--model qwen-plus`: model name * `--openai-endpoint`: OpenAI compatible API endpoint. If you use other models, change it to the corresponding endpoint. * `--greeting`: welcome message.Test Voice Assistant
Now you can speak into the microphone, You can try the following questions:
- “How’s the weather today?”
- “Tell me a joke”
- “What’s 1+1?”
- “Who are you?”
Architecture
- Client creates a WebRTC Peer
- Client connects to Active Call, then sends an Invite command
- Active Call connects to WebRTC Peer (created at step 1)
- When there is speech input, Active Call calls the ASR service and sends the text result to the client
- Client sends text result to large model
- Client uses TTS command to play the large model response
- Active Call calls the TTS service to convert text to speech and sends it to the WebRTC Peer, then repeats st eps 4-7