Quick Start
This chapter explains how to build a voice assistant with RustPBX.
Prerequisites
Before starting, we need the following:
Runtime Environment
- Docker: To run RustPBX service
- Go compiler: To run client
Cloud Service API Key
We use Alibaba Cloud as an example.
You need to activate Aliyun Bailian and create an API Key.
- We also support Tencent Cloud and Deepgram for TTS and ASR services.
- You are free to choose any large language model compatible with OpenAI API.
The following services will be used:
- TTS: Text-to-Speech - CosyVoice/Sambert
- ASR: Real-time Speech Recognition - Paraformer
- Large Language Model: Tongyi Qianwen
Start RustPBX service
We need to create a configuration file, and then start RustPBX with Docker.
Create Configuration File
Create the config.toml configuration file in the working directory:
cat > config.toml << 'EOF'
http_addr = "0.0.0.0:8080"
log_level = "info"
stun_server = "stun.l.google.com:19302"
recorder_path = "/tmp/recorders"
media_cache_path = "/tmp/mediacache"
[ua]
addr = "0.0.0.0"
udp_port = 13050
EOF
http_addr = "0.0.0.0:8080" is the WebSocket address of RustPBX, used for client connection.
Start RustPBX with Docker
Run the following command, replace your_dashscope_api_key with your API Key:
docker run -d \
--name rustpbx \
-p 8080:8080 \
-p 15060:15060/udp \
-p 13050:13050/udp \
-p 20000-30000:20000-30000/udp \
-e DASHSCOPE_API_KEY=your_dashscope_api_key \
-v $(pwd)/config.toml:/app/config.toml \
-v $(pwd)/recordings:/tmp/recorders \
ghcr.io/restsend/rustpbx:latest \
--conf /app/config.toml
Download SDK
Download RustPBXGo source code from GitHub:
git clone https://github.com/restsend/rustpbxgo.git
cd rustpbxgo
The cmd directory in RustPBXGo contains an example application.
I will explain it in detail in the next chapter Code Explanation.
Start Client
Run the client, replace your_dashscope_api_key with your API Key:
go run ./cmd \
--endpoint ws://127.0.0.1:8080 \
--tts aliyun --speaker longyumi_v2 \
--asr aliyun \
--openai-key your_dashscope_api_key \
--model qwen-plus \
--openai-endpoint https://dashscope.aliyuncs.com/compatible-mode/v1 \
--greeting "Hello, how can I help you?"
--endpoint: RustPBX WebSocket address. Configured in thehttp_addrfield in the previously createdconfig.tomlfile.--openai-key: large model API Key--model qwen-plus: model name--openai-endpoint: OpenAI compatible API endpoint. If you use other models, change it to the corresponding endpoint.--greeting: welcome message.
Test Voice Assistant
Now you can speak into the microphone, You can try the following questions:
- "How's the weather today?"
- "Tell me a joke"
- "What's 1+1?"
- "Who are you?"
Architecture
- Client creates a WebRTC Peer
- Client connects to RustPBX, then sends an Invite command
- RustPBX connects to WebRTC Peer (created at step 1)
- When there is speech input, RustPBX calls the ASR service and sends the text result to the client
- Client sends text result to large model
- Client uses TTS command to play the large model response
- RustPBX calls the TTS service to convert text to speech and sends it to the WebRTC Peer, then repeats st eps 4-7
- For simplicity, client starts a WebRTCPeer. In actual scenarios, WebRTC Peer (or SIP Phone) will be external, client only handle business logic and large model interaction.
更多内容
📄️ WebRTC Call
Detailed introduction to WebRTC call
📄️ TTS (Text-to-Speech )
Detailed introduction to TTS (Text-to-Speech)