Skip to main content

Quick Start

This chapter explains how to build a voice assistant with RustPBX.

Prerequisites

Before starting, we need the following:

Runtime Environment

  • Docker: To run RustPBX service
  • Go compiler: To run client

Cloud Service API Key

We use Alibaba Cloud as an example.

You need to activate Aliyun Bailian and create an API Key.

info
  • We also support Tencent Cloud and Deepgram for TTS and ASR services.
  • You are free to choose any large language model compatible with OpenAI API.

The following services will be used:

Start RustPBX service

We need to create a configuration file, and then start RustPBX with Docker.

Create Configuration File

Create the config.toml configuration file in the working directory:

cat > config.toml << 'EOF'
http_addr = "0.0.0.0:8080"
log_level = "info"
stun_server = "stun.l.google.com:19302"
recorder_path = "/tmp/recorders"
media_cache_path = "/tmp/mediacache"

[ua]
addr = "0.0.0.0"
udp_port = 13050
EOF

http_addr = "0.0.0.0:8080" is the WebSocket address of RustPBX, used for client connection.

Start RustPBX with Docker

Run the following command, replace your_dashscope_api_key with your API Key:

docker run -d \
--name rustpbx \
-p 8080:8080 \
-p 15060:15060/udp \
-p 13050:13050/udp \
-p 20000-30000:20000-30000/udp \
-e DASHSCOPE_API_KEY=your_dashscope_api_key \
-v $(pwd)/config.toml:/app/config.toml \
-v $(pwd)/recordings:/tmp/recorders \
ghcr.io/restsend/rustpbx:latest \
--conf /app/config.toml

Download SDK

Download RustPBXGo source code from GitHub:

git clone https://github.com/restsend/rustpbxgo.git
cd rustpbxgo

The cmd directory in RustPBXGo contains an example application.

I will explain it in detail in the next chapter Code Explanation.

Start Client

Run the client, replace your_dashscope_api_key with your API Key:

go run ./cmd \
--endpoint ws://127.0.0.1:8080 \
--tts aliyun --speaker longyumi_v2 \
--asr aliyun \
--openai-key your_dashscope_api_key \
--model qwen-plus \
--openai-endpoint https://dashscope.aliyuncs.com/compatible-mode/v1 \
--greeting "Hello, how can I help you?"
Parameters
  • --endpoint: RustPBX WebSocket address. Configured in the http_addr field in the previously created config.toml file.
  • --openai-key: large model API Key
  • --model qwen-plus: model name
  • --openai-endpoint: OpenAI compatible API endpoint. If you use other models, change it to the corresponding endpoint.
  • --greeting: welcome message.

Test Voice Assistant

Now you can speak into the microphone, You can try the following questions:

  • "How's the weather today?"
  • "Tell me a joke"
  • "What's 1+1?"
  • "Who are you?"

Architecture

RustPBX
RustPBX
UserAgent
UserAgent
SDK
SDK
Media Session
Media Session
Session Control
Session Control
Media Engine
Media Engine
TTS Command
TTS Command
ASR Event
ASR Event
WebRTC Peer
WebRTC Peer
LLM
LLM
ASR
ASR
TTS
TTS
Aliyun Bailian
Aliyun Bailian
Client
Client
Text is not SVG - cannot display
  1. Client creates a WebRTC Peer
  2. Client connects to RustPBX, then sends an Invite command
  3. RustPBX connects to WebRTC Peer (created at step 1)
  4. When there is speech input, RustPBX calls the ASR service and sends the text result to the client
  5. Client sends text result to large model
  6. Client uses TTS command to play the large model response
  7. RustPBX calls the TTS service to convert text to speech and sends it to the WebRTC Peer, then repeats st eps 4-7
info
  • For simplicity, client starts a WebRTCPeer. In actual scenarios, WebRTC Peer (or SIP Phone) will be external, client only handle business logic and large model interaction.

更多内容