Quick Start

This chapter explains how to build a voice assistant with RustPBX.

Prerequisites

Before starting, we need the following:

Runtime Environment

Docker: To run RustPBX service
Go compiler: To run client

Cloud Service API Key

We use Alibaba Cloud as an example.

You need to activate Aliyun Bailian and create an API Key.

info

We also support Tencent Cloud and Deepgram for TTS and ASR services.
You are free to choose any large language model compatible with OpenAI API.

The following services will be used:

TTS: Text-to-Speech - CosyVoice/Sambert
ASR: Real-time Speech Recognition - Paraformer
Large Language Model: Tongyi Qianwen

Start RustPBX service

We need to create a configuration file, and then start RustPBX with Docker.

Create Configuration File

Create the config.toml configuration file in the working directory:

cat > config.toml << 'EOF'
http_addr = "0.0.0.0:8080"
log_level = "info"
stun_server = "stun.l.google.com:19302"
recorder_path = "/tmp/recorders"
media_cache_path = "/tmp/mediacache"

[ua]
addr = "0.0.0.0"
udp_port = 13050
EOF

http_addr = "0.0.0.0:8080" is the WebSocket address of RustPBX, used for client connection.

Start RustPBX with Docker

Run the following command, replace your_dashscope_api_key with your API Key:

docker run -d \
  --name rustpbx \
  -p 8080:8080 \
  -p 15060:15060/udp \
  -p 13050:13050/udp \
  -p 20000-30000:20000-30000/udp \
  -e DASHSCOPE_API_KEY=your_dashscope_api_key \
  -v $(pwd)/config.toml:/app/config.toml \
  -v $(pwd)/recordings:/tmp/recorders \
  ghcr.io/restsend/rustpbx:latest \
  --conf /app/config.toml

Download SDK

Download RustPBXGo source code from GitHub:

git clone https://github.com/restsend/rustpbxgo.git
cd rustpbxgo

The cmd directory in RustPBXGo contains an example application.

I will explain it in detail in the next chapter Code Explanation.

Start Client

Run the client, replace your_dashscope_api_key with your API Key:

go run ./cmd \
  --endpoint ws://127.0.0.1:8080 \
  --tts aliyun --speaker longyumi_v2 \
  --asr aliyun \
  --openai-key your_dashscope_api_key \
  --model qwen-plus \
  --openai-endpoint https://dashscope.aliyuncs.com/compatible-mode/v1 \
  --greeting "Hello, how can I help you?"

Parameters

--endpoint: RustPBX WebSocket address. Configured in the http_addr field in the previously created config.toml file.
--openai-key: large model API Key
--model qwen-plus: model name
--openai-endpoint: OpenAI compatible API endpoint. If you use other models, change it to the corresponding endpoint.
--greeting: welcome message.

Test Voice Assistant

Now you can speak into the microphone, You can try the following questions:

"How's the weather today?"
"Tell me a joke"
"What's 1+1?"
"Who are you?"

Architecture

Client creates a WebRTC Peer
Client connects to RustPBX, then sends an Invite command
RustPBX connects to WebRTC Peer (created at step 1)
When there is speech input, RustPBX calls the ASR service and sends the text result to the client
Client sends text result to large model
Client uses TTS command to play the large model response
RustPBX calls the TTS service to convert text to speech and sends it to the WebRTC Peer, then repeats st eps 4-7

info

For simplicity, client starts a WebRTCPeer. In actual scenarios, WebRTC Peer (or SIP Phone) will be external, client only handle business logic and large model interaction.

📄️ WebRTC Call

Detailed introduction to WebRTC call

📄️ TTS (Text-to-Speech )

Detailed introduction to TTS (Text-to-Speech)

Prerequisites​

Runtime Environment​

Cloud Service API Key​

Start RustPBX service​

Create Configuration File​

Start RustPBX with Docker​

Download SDK​

Start Client​

Test Voice Assistant​

Architecture​

更多内容​