WebSocket API
Connect to RustPBX
RustPBX uses WebSocket connections
Address
The listening address of RustPBX is configured in RustPBX's config.toml:
http_addr = "0.0.0.0:8080"
Paths
Different paths correspond to different voice call types:
/call: Audio stream transmitted via WebSocket/call/sip: Audio stream transmitted via SIP/RTP/call/webrtc: Audio stream transmitted via WebRTC RTP
Parameters
id(optional, string): Session ID. Defaults to server-generated UUID. (Should be set todialogIdwhen answering)dump(optional, bool, default:true): Whether to enable dumppingInterval(optional, int, unit: seconds, default: 20): WebSocket Ping interval. When enabled, will periodically receiveSessionEvent::PingeventsserverSideTrack(optional, string, default:serverSideTrack): Set server-side TrackID
Example
ws://localhost:8080/call/sip?id=session123&dump=true
Commands
Commands are sent over the WebSocket connection as JSON messages, with the command field indicating the command type.
Invite - Initiate Call
Purpose: Initiate a new call.
- SIP calls require setting
option.callerandoption.calleeto the caller and callee SIP addresses respectively - WebRTC calls require setting
option.offerto the call's SDP offer
Example:
{
"command": "invite",
"option": {
"denoise": true,
"callee": "sip:alice@192.168.3.197:12345",
"caller": "sip:192.168.3.197:3050",
"vad": {
"type": "silero",
"silenceTimeout": 5000
},
"asr": {
"provider": "tencent"
},
"tts": {
"provider": "tencent",
"speaker": "601003"
},
"sip": {}
}
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "invite" |
option | CallOption | ✓ | Call configuration parameters, see CallOption for details |
Accept - Answer Incoming Call
Purpose: Answer an incoming call.
Answering calls requires setting the id parameter in the connection URL to dialogId, which is provided by the webhook request.
See:
Example:
{
"command": "accept",
"option": {
"denoise": true,
"vad": {
"type": "silero",
"silenceTimeout": 5000
},
"asr": {
"provider": "tencent"
},
"tts": {
"provider": "tencent",
"speaker": "601003"
}
}
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "accept" |
option | CallOption | ✓ | Call configuration parameters, see CallOption for details |
Reject - Reject Incoming Call
Purpose: Reject an incoming call.
Request failure response code list: Request Failure 4xx
Example:
{
"command": "reject",
"reason": "Busy Here",
"code": 486
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "reject" |
reason | string | ✓ | - | Rejection reason |
code | number | ✗ | - | SIP response code |
Ringing - Ringing
Purpose: Send ringing response for incoming call. (180 Ringing)
If recorder is set in the Ringing command, the recorder option in subsequent Accept commands will not override the recording settings during the ringing phase.
Example:
{
"command": "ringing",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
},
"earlyMedia": true,
"ringtone": "http://example.com/ringtone.wav"
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "ringing" |
recorder | RecorderOption | ✗ | - | Call recording configuration |
earlyMedia | boolean | ✗ | false | Enable early media during ringing |
ringtone | string | ✗ | - | Custom ringtone URL |
TTS - Text-to-Speech
Purpose: Convert text to speech and play audio.
Example:
{
"command": "tts",
"text": "Hello, this is a test message",
"speaker": "xiaoyan",
"playId": "unique_play_id",
"autoHangup": false,
"streaming": false,
"endOfStream": false,
"waitInputTimeout": 30,
"option": {
"provider": "tencent",
"volume": 5,
"speed": 1.0
}
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "tts" |
text | string | ✓ | - | Text to synthesize |
speaker | string | ✗ | - | Voice type, see provider voice lists |
playId | string | ✗ | - | TTS Track identifier. |
autoHangup | boolean | ✗ | false | Whether to automatically hang up after TTS playback completes |
streaming | boolean | ✗ | false | Whether to enable streaming TTS |
endOfStream | boolean | ✗ | false | Whether this is the last command for the current playId |
waitInputTimeout | number | ✗ | - | Maximum time to wait for user input, unit: seconds |
option | SynthesisOption | ✗ | - | TTS provider-specific options, can configure voice format, sample rate, etc. |
base64 | boolean | ✗ | false | Whether to use base64 encoding |
- When streaming TTS is enabled, TTS commands will be sent in the same WebSocket connection.
- Set
endOfStream = truewhen all TTS commands for theplayIdhave been sent. TheTTS Trackwill exit after all command results finish playing and send a Track End event. - Set
base64=trueto pass base64-encoded PCM audio through thetextfield. - If
playIdis set, the Track End event sent by this TTS Track will include thisplayId. - If the current
playIdis the same as a previous TTS command'splayId, it will reuse the previous TTS Track; otherwise, it will terminate the previous TTS Track and create a newTTS Track.
For details, see TTS(Text-to-Speech)
Play - Play Audio
Purpose: Play audio from URL.
Example:
{
"command": "play",
"url": "http://example.com/audio.mp3",
"autoHangup": false,
"waitInputTimeout": 30
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "play" |
url | string | ✓ | - | Audio file URL to play (supports HTTP/HTTPS). This URL will be returned as playId in the trackEnd event |
autoHangup | boolean | ✗ | false | If true, will automatically hang up after playback completes |
waitInputTimeout | number | ✗ | - | Maximum time to wait for user input, unit: seconds |
Interrupt - Interrupt Playback
Purpose: Interrupt current TTS or audio playback.
- If the current TTS result has not finished playing, an Interruption event will be triggered, containing the played time and the time when audio was received from the TTS provider. If the provider supports subtitles, it will also include the estimated position of the played text.
- If
graceful = trueis set, it will wait for the current TTS command result to finish playing before exiting, see Interruption.
Example:
{
"command": "interrupt",
"graceful": false
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "interrupt" |
graceful | boolean | ✗ | false | Whether to gracefully interrupt |
Refer - Transfer Call
Purpose: Transfer call to another party (SIP REFER).
See Transfer Call
Example:
{
"command": "refer",
"caller": "sip:alice@example.com",
"callee": "sip:charlie@example.com",
"options": {
"denoise": true,
"timeout": 30,
"moh": "http://example.com/hold_music.wav",
"autoHangup": true
}
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "refer" |
caller | string | ✓ | Transfer caller SIP address (currently connected local SIP address, e.g.: sip:{localIP}:13050) |
callee | string | ✓ | Transfer target SIP URI (e.g., sip:bob@example.com) |
options | ReferOption | ✗ | Transfer configuration, see ReferOption for details |
Mute - Mute
Purpose: Mute all or specified Tracks.
If trackId is specified, mute the corresponding Track; otherwise, mute all Tracks.
Example:
{
"command": "mute",
"trackId": "track-123"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "mute" |
trackId | string | ✗ | Track ID to mute (if not specified, mute all tracks) |
Unmute - Unmute
Purpose: Unmute muted Tracks.
If trackId is specified, unmute the corresponding Track; otherwise, unmute all Tracks.
Example:
{
"command": "unmute",
"trackId": "track-123"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "unmute" |
trackId | string | ✗ | Track ID to unmute (if not specified, unmute all tracks) |
Hangup - Hangup
Purpose: End the call.
Example:
{
"command": "hangup",
"reason": "user_requested",
"initiator": "user"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "hangup" |
reason | string | ✗ | Hangup reason |
initiator | string | ✗ | Party that initiated the hangup (user, system, etc.) |
Events
Events are received from the server in JSON format. All timestamps are in milliseconds. Each event contains an event field indicating the event type, and most events also include a trackId field indicating the related Track.
Incoming - Incoming Call Event
Trigger: When an incoming call is received (SIP calls only).
Example:
{
"event": "incoming",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"caller": "sip:alice@example.com",
"callee": "sip:bob@example.com",
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "incoming" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
caller | string | Caller's SIP address |
callee | string | Callee's SIP address |
sdp | string | SDP offer from the caller |
Answer - Answer Event
Trigger: When the call is answered and SDP negotiation is complete.
For SIP calls, the Answer event is triggered when 200 OK is received.
Example:
{
"event": "answer",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "answer" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
sdp | string | SDP answer from the callee |
Reject - Reject Event
Trigger: When the call is rejected.
Example:
{
"event": "reject",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"reason": "Busy",
"code": 486
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "reject" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
reason | string | Rejection reason |
code | number | SIP response code (optional) |
Request failure response code list: Request Failure 4xx
Ringing - Ringing Event
Trigger: When the call is ringing (SIP calls only).
Example:
{
"event": "ringing",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"earlyMedia": false
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "ringing" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
earlyMedia | boolean | Whether early media is available |
Hangup - Hangup Event
Trigger: When the call ends.
Example:
{
"event": "hangup",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"reason": "user_requested",
"initiator": "user",
"startTime": "2024-01-01T12:00:00Z",
"hangupTime": "2024-01-01T12:05:30Z",
"answerTime": "2024-01-01T12:00:05Z",
"from": {
"username": "alice",
"realm": "example.com",
"source": "sip:alice@example.com"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "hangup" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
reason | string | Hangup reason (optional) |
initiator | string | Party that initiated the hangup (optional) |
startTime | string | ISO 8601 timestamp of call start |
hangupTime | string | ISO 8601 timestamp of call end |
answerTime | string | ISO 8601 timestamp of call answer (optional) |
ringingTime | string | ISO 8601 timestamp of call ringing start (optional) |
from | Attendee | Caller information (optional) |
to | Attendee | Callee information (optional) |
extra | object | Additional call metadata (optional) |
Speaking - Speaking Event
Trigger: When VAD detects speech. Call Option must configure VAD
Example:
{
"event": "speaking",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995200000
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "speaking" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
startTime | number | Time when speech started (milliseconds) |
Silence - Silence Event
Trigger: When more than speechPadding milliseconds have passed since the current speech started, and more than silencePadding milliseconds have passed since the last Silence event was triggered (if any). Call Option must configure VAD.
Example:
{
"event": "silence",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995195000,
"duration": 5000
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "silence" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
startTime | number | Time when silence started (milliseconds) |
duration | number | Silence duration (milliseconds) |
AsrFinal - ASR Final Event
Trigger: Stable result of speech recognition.
Example:
{
"event": "asrFinal",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"index": 1,
"startTime": 1640995200000,
"endTime": 1640995205000,
"text": "Hello, how can I help you today?"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "asrFinal" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
index | number | ASR result sequence number |
startTime | number | Speech start time (milliseconds, optional) |
endTime | number | Speech end time (milliseconds, optional) |
text | string | Speech recognition result |
AsrDelta - ASR Delta Event
Trigger: Intermediate result of speech recognition (may change).
Example:
{
"event": "asrDelta",
"trackId": "track-abc123",
"index": 1,
"timestamp": 1640995200000,
"text": "Hello, how can"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "asrDelta" |
trackId | string | Unique identifier of the Track |
index | number | ASR result sequence number |
timestamp | number | Event timestamp (milliseconds) |
startTime | number | Speech start time (milliseconds, optional) |
endTime | number | Speech end time (milliseconds, optional) |
text | string | Speech recognition result (may change) |
TrackStart - Track Start Event
Trigger: When a Track starts (RTP, TTS, file playback, etc.).
Example:
{
"event": "trackStart",
"trackId": "track-tts-456",
"timestamp": 1640995200000,
"playId": "llm-001"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "trackStart" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
playId | string | TTS command's playId (optional) or Play command's URL (optional) |
Both TTS and Play commands create corresponding Tracks. The playId in TrackStart is:
- TTS Track: TTS command's playId (optional)
- Play Track: Play command's URL
TrackEnd - Track End Event
Trigger: When a Track ends (RTP ends, TTS completes, file playback completes, etc.).
Example:
{
"event": "trackEnd",
"trackId": "track-tts-456",
"timestamp": 1640995230000,
"duration": 30000,
"ssrc": 1234567890,
"playId": "llm-001"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "trackEnd" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
duration | number | Track duration (milliseconds) |
ssrc | number | RTP synchronization source identifier |
playId | string | TTS command's playId (optional) or Play command's URL (optional) |
Both TTS and Play commands create corresponding Tracks. The playId in TrackEnd is:
- TTS Track: TTS command's playId (optional)
- Play Track: Play command's URL
Interruption - Interruption Event
Trigger: When an Interrupt command is received and there are unfinished TTS commands.
Example:
{
"event": "interruption",
"trackId": "track-tts-456",
"timestamp": 1640995215000,
"playId": "llm-001",
"subtitle": "Hello, this is a long message that was interrupted",
"position": 5,
"totalDuration": 30000,
"current": 15000
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "interruption" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
playId | string | For TTS commands, this is the playId from the TTS command (optional) |
subtitle | string | Current TTS text being played at interruption (optional) |
position | number | Word index position in subtitle at interruption (optional) |
totalDuration | number | Total duration of TTS content (milliseconds) |
current | number | Time elapsed since TTS start at interruption (milliseconds) |
Dtmf - DTMF Event
Trigger: When a keypress is detected.
Example:
{
"event": "dtmf",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"digit": "1"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "dtmf" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
digit | string | DTMF digit (0-9, *, #, A-D) |
Metrics - Metrics Event
Trigger: When performance metrics are available.
Example:
{
"event": "metrics",
"timestamp": 1640995200000,
"key": "ttfb.asr.tencent",
"duration": 150,
"data": {
"index": 1,
"provider": "tencent"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "metrics" |
timestamp | number | Event timestamp (milliseconds) |
key | string | Metric key (e.g., "ttfb.asr.tencent") |
duration | number | Duration (milliseconds) |
data | object | Additional metric data |
Error - Error Event
Trigger: When an error occurs during processing.
Example:
{
"event": "error",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "asr",
"error": "Connection timeout to ASR service",
"code": 408
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "error" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
sender | string | Component that generated the error (asr, tts, media, etc.) |
error | string | Error message description |
code | number | Error code (optional) |
Binary - Binary Event
Trigger: When binary audio data is sent (WebSocket calls only).
Example:
{
"event": "binary",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"data": []
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "binary" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
data | array | Binary audio data bytes |
Ping - Ping Event
Trigger: When periodic Ping messages are sent (if the pingInterval parameter is set in the connection URL), see Connect to RustPBX.
Example:
{
"event": "ping",
"timestamp": 1640995200000,
"payload": "optional_payload"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "ping" |
timestamp | number | Event timestamp (milliseconds) |
payload | string | Optional payload data (optional) |
Other - Other Event
Trigger: When custom or extended events are generated.
Example:
{
"event": "other",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "custom_plugin",
"extra": {
"custom_field": "custom_value"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "other" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
sender | string | Component that generated the event |
extra | object | Additional event data (optional) |
Options
CallOption
The CallOption object is used in Invite and Accept commands, containing call configuration.
Example:
{
"denoise": true,
"offer": "SDP offer string",
"callee": "sip:callee@example.com",
"caller": "sip:caller@example.com",
"codec": "g722",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000
},
"asr": {
"provider": "tencent",
"language": "zh-CN"
},
"tts": {
"provider": "tencent",
"speaker": "xiaoyan"
}
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
denoise | boolean | ✗ | false | Enable audio processing noise reduction |
offer | string | ✗ | - | SDP offer string for WebRTC/SIP negotiation |
callee | string | ✗ | - | Callee's SIP URI or phone number (e.g., "sip:bob@example.com") |
caller | string | ✗ | - | Caller's SIP URI or phone number (e.g., "sip:alice@example.com") |
codec | string | ✗ | "pcmu" | Audio codec: "pcmu", "pcma", "g722", "pcm" (only for WebSocket calls) |
recorder | RecorderOption | ✗ | - | Call recording configuration |
vad | VADOption | ✗ | - | Voice activity detection configuration |
asr | TranscriptionOption | ✗ | - | Automatic Speech Recognition (ASR) configuration |
tts | SynthesisOption | ✗ | - | Text-to-Speech configuration |
mediaPass | MediaPassOption | ✗ | - | Media Pass configuration |
handshakeTimeout | string | ✗ | - | Connection handshake timeout (e.g., "30s") |
enableIpv6 | boolean | ✗ | false | Enable IPv6 support |
sip | SipOption | ✗ | - | SIP registration account, password, and domain configuration |
extra | object | ✗ | - | Additional parameters |
RecorderOption
Call recording configuration options.
Example:
{
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
recorderFile | string | ✓ | - | Recording file path |
samplerate | number | ✗ | 16000 | Recording sample rate, unit: Hz |
ptime | number | ✗ | 200 | Packet time, unit: milliseconds |
TranscriptionOption
Automatic Speech Recognition (ASR) configuration options.
Example:
{
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"samplerate": 16000,
"startWhenAnswer": true
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | ✗ | - | ASR provider: tencent, aliyun, Deepgram, etc. |
language | string | ✗ | - | Language (e.g., "zh-CN", "en-US") (see corresponding provider documentation for details) |
appId | string | ✗ | - | Tencent Cloud's appId |
secretId | string | ✗ | - | Tencent Cloud's secretId |
secretKey | string | ✗ | - | Tencent Cloud's secretKey, or other provider's API Key |
modelType | string | ✗ | - | ASR model type (e.g., "16k_zh", "8k_en"), see provider documentation for details |
bufferSize | number | ✗ | - | Audio buffer size, unit: bytes |
samplerate | number | ✗ | 16000 | Sample rate |
endpoint | string | ✗ | - | Custom service endpoint URL |
extra | object | ✗ | - | Provider-specific parameters |
startWhenAnswer | boolean | ✗ | false | Request ASR service after call is answered |
SynthesisOption
Text-to-Speech (TTS) configuration options.
Example:
{
"provider": "tencent",
"speaker": "xiaoyan",
"volume": 5,
"speed": 1.0,
"emotion": "neutral",
"samplerate": 16000
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | ✗ | - | TTS provider: "tencent", "aliyun", "deepgram", "voiceapi" |
speaker | string | ✗ | - | Voice, see provider documentation |
volume | number | ✗ | 5 | Volume (1-10) |
speed | number | ✗ | 1.0 | Speech rate |
samplerate | number | ✗ | 16000 | Sample rate, unit: hz |
appId | string | ✗ | - | Tencent Cloud's appId |
secretId | string | ✗ | - | Tencent Cloud's secretId |
secretKey | string | ✗ | - | Tencent Cloud's secretKey, or other provider's API Key |
codec | string | ✗ | - | Encoding format |
subtitle | boolean | ✗ | false | Whether to enable subtitles |
endpoint | string | ✗ | - | Custom TTS service endpoint URL |
extra | object | ✗ | - | Additional provider-specific parameters |
maxConcurrentTasks | number | ✗ | - | Maximum concurrent tasks for non-streaming TTS commands, default is 1 |
VADOption
Voice Activity Detection (VAD) configuration options.
Example:
{
"type": "webrtc",
"samplerate": 16000,
"speechPadding": 250,
"silencePadding": 100,
"voiceThreshold": 0.5,
"maxBufferDurationSecs": 50
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | string | ✗ | "webrtc" | VAD algorithm type: "silero" , "ten" , "webrtc" |
samplerate | number | ✗ | 16000 | Sample rate |
speechPadding | number | ✗ | 250 | Start detection speechPadding milliseconds after speech starts |
silencePadding | number | ✗ | 100 | Silence event trigger interval, unit: milliseconds |
ratio | number | ✗ | 0.5 | Speech detection ratio threshold |
voiceThreshold | number | ✗ | 0.5 | Voice energy threshold |
maxBufferDurationSecs | number | ✗ | 50 | Maximum buffer duration, unit: seconds |
silenceTimeout | number | ✗ | - | Silence detection timeout, unit: milliseconds |
endpoint | string | ✗ | - | Custom VAD service endpoint |
secretKey | string | ✗ | - | VAD service authentication key |
secretId | string | ✗ | - | VAD service authentication ID |
MediaPassOption
Media pass-through configuration options, using WebSocket service to completely take over audio processing.
RustPBX will send all audio data to the configured WebSocket service and play audio received from that service to the other party of the audio connection.
See: Media Pass
Example:
{
"url": "ws://localhost:9090/media",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 2560
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | ✓ | - | WebSocket connection URL for media stream |
inputSampleRate | number | ✓ | - | Audio sample rate received from WebSocket server (also the track's sample rate) |
outputSampleRate | number | ✓ | - | Audio sample rate sent to WebSocket server |
packetSize | number | ✗ | 2560 | Packet size sent to WebSocket server, unit: bytes |
ReferOption
Transfer configuration options.
Example:
{
"denoise": true,
"timeout": 30,
"moh": "http://example.com/hold_music.wav",
"asr": {
"provider": "tencent",
"language": "zh-CN"
},
"autoHangup": true,
"sip": {
"username": "transfer_user",
"password": "transfer_password"
}
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
denoise | boolean | ✗ | false | Enable noise reduction during transfer |
timeout | number | ✗ | - | Transfer timeout, unit: seconds |
moh | string | ✗ | - | Hold music URL to play during transfer |
asr | TranscriptionOption | ✗ | - | Automatic speech recognition configuration |
autoHangup | boolean | ✗ | false | Automatically hang up after transfer completes |
sip | SipOption | ✗ | - | SIP configuration |
SipOption
SIP protocol configuration options.
Example:
{
"username": "user",
"password": "password",
"realm": "example.com",
"headers": {
"X-Custom-Header": "value"
}
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
username | string | ✗ | - | SIP username for authentication |
password | string | ✗ | - | SIP password for authentication |
realm | string | ✗ | - | SIP domain/realm for authentication |
headers | object | ✗ | - | Additional SIP protocol headers (key-value pairs) |
Attendee
Call participant information.
Example:
{
"username": "alice",
"realm": "example.com",
"source": "sip:alice@example.com"
}
Fields:
| Field | Type | Description |
|---|---|---|
username | string | Username part of SIP URI |
realm | string | Domain/realm part of SIP URI |
source | string | Complete SIP URI or phone number |
REST API Endpoints
List Active Calls
Endpoint: GET /call/lists
Description: Returns a list of active calls.
Response:
{
"calls": [
{
"id": "session-id",
"call_type": "webrtc",
"created_at": "2024-01-01T12:00:00Z",
"option": {
"caller": "1234567890",
"callee": "0987654321"
}
}
]
}
Usage:
curl http://localhost:8080/call/lists
Terminate Call
Endpoint: POST /call/kill/{id}
Description: Terminate a specific active call.
Parameters:
id(path parameter, string): Session ID of the call to terminate
Response:
true
Usage:
curl -X POST http://localhost:8080/call/kill/session123
Get ICE Servers
Endpoint: GET /iceservers
Description: Returns ICE server configuration for WebRTC connections.
Response:
[
{
"urls": ["stun:restsend.com:3478"],
"username": null,
"credential": null
},
{
"urls": ["turn:restsend.com:3478"],
"username": "username",
"credential": "password"
}
]
Usage:
curl http://localhost:8080/iceservers
Error Handling
All endpoints return appropriate HTTP status codes:
200 OK: Success400 Bad Request: Invalid parameters404 Not Found: Resource not found500 Internal Server Error: Server error
Notes
- All WebSocket endpoints support real-time bidirectional communication
- When WebSocket connection closes, call sessions are automatically cleaned up
- Event dumping can be disabled by setting the
dump=falseparameter - ICE servers are automatically configured based on environment variables
- Audio codecs are automatically negotiated based on functionality
- VAD (Voice Activity Detection) events are used for speech detection
- ASR (Automatic Speech Recognition) provides real-time transcription
- TTS (Text-to-Speech) supports streaming synthesis
- All timestamps are in milliseconds
- trackId is used to identify which Track generated the event
- playId prevents interrupting previous TTS playback when using the same ID. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
- autoHangup automatically ends the call after TTS/Play completes