WebSocket API

Connect to RustPBX

RustPBX uses WebSocket connections

Address

The listening address of RustPBX is configured in RustPBX's config.toml:

config.toml
http_addr = "0.0.0.0:8080"

Paths

Different paths correspond to different voice call types:

/call: Audio stream transmitted via WebSocket
/call/sip: Audio stream transmitted via SIP/RTP
/call/webrtc: Audio stream transmitted via WebRTC RTP

Parameters

id (optional, string): Session ID. Defaults to server-generated UUID. (Should be set to dialogId when answering)
dump (optional, bool, default: true): Whether to enable dump
pingInterval (optional, int, unit: seconds, default: 20): WebSocket Ping interval. When enabled, will periodically receive SessionEvent::Ping events
serverSideTrack (optional, string, default: serverSideTrack): Set server-side TrackID

Example

ws://localhost:8080/call/sip?id=session123&dump=true

Commands

Commands are sent over the WebSocket connection as JSON messages, with the command field indicating the command type.

Invite - Initiate Call

Purpose: Initiate a new call.

info

SIP calls require setting option.caller and option.callee to the caller and callee SIP addresses respectively
WebRTC calls require setting option.offer to the call's SDP offer

Example:

{
    "command": "invite",
    "option": {
        "denoise": true,
        "callee": "sip:alice@192.168.3.197:12345",
        "caller": "sip:192.168.3.197:3050",
        "vad": {
            "type": "silero",
            "silenceTimeout": 5000
        },
        "asr": {
            "provider": "tencent"
        },
        "tts": {
            "provider": "tencent",
            "speaker": "601003"
        },
        "sip": {}
    }
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"invite"`
`option`	CallOption	✓	Call configuration parameters, see CallOption for details

Accept - Answer Incoming Call

Purpose: Answer an incoming call.

info

Answering calls requires setting the id parameter in the connection URL to dialogId, which is provided by the webhook request. See:

Answer Incoming Call
- Connect to RustPBX

Example:

{
    "command": "accept",
    "option": {
        "denoise": true,
        "vad": {
            "type": "silero",
            "silenceTimeout": 5000
        },
        "asr": {
            "provider": "tencent"
        },
        "tts": {
            "provider": "tencent",
            "speaker": "601003"
        }
    }
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"accept"`
`option`	CallOption	✓	Call configuration parameters, see CallOption for details

Reject - Reject Incoming Call

Purpose: Reject an incoming call.

info

Request failure response code list: Request Failure 4xx

Example:

{
  "command": "reject",
  "reason": "Busy Here",
  "code": 486
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"reject"`
`reason`	string	✓	-	Rejection reason
`code`	number	✗	-	SIP response code

Ringing - Ringing

Purpose: Send ringing response for incoming call. (180 Ringing)

Note

If recorder is set in the Ringing command, the recorder option in subsequent Accept commands will not override the recording settings during the ringing phase.

Example:

{
  "command": "ringing",
  "recorder": {
    "recorderFile": "/path/to/recording.wav",
    "samplerate": 16000,
    "ptime": 200
  },
  "earlyMedia": true,
  "ringtone": "http://example.com/ringtone.wav"
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"ringing"`
`recorder`	RecorderOption	✗	-	Call recording configuration
`earlyMedia`	boolean	✗	`false`	Enable early media during ringing
`ringtone`	string	✗	-	Custom ringtone URL

TTS - Text-to-Speech

Purpose: Convert text to speech and play audio.

Example:

{
  "command": "tts",
  "text": "Hello, this is a test message",
  "speaker": "xiaoyan",
  "playId": "unique_play_id",
  "autoHangup": false,
  "streaming": false,
  "endOfStream": false,
  "waitInputTimeout": 30,
  "option": {
    "provider": "tencent",
    "volume": 5,
    "speed": 1.0
  }
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"tts"`
`text`	string	✓	-	Text to synthesize
`speaker`	string	✗	-	Voice type, see provider voice lists
`playId`	string	✗	-	TTS Track identifier.
`autoHangup`	boolean	✗	`false`	Whether to automatically hang up after TTS playback completes
`streaming`	boolean	✗	`false`	Whether to enable streaming TTS
`endOfStream`	boolean	✗	`false`	Whether this is the last command for the current playId
`waitInputTimeout`	number	✗	-	Maximum time to wait for user input, unit: seconds
`option`	SynthesisOption	✗	-	TTS provider-specific options, can configure voice format, sample rate, etc.
`base64`	boolean	✗	`false`	Whether to use base64 encoding

tip

When streaming TTS is enabled, TTS commands will be sent in the same WebSocket connection.
Set endOfStream = true when all TTS commands for the playId have been sent. The TTS Track will exit after all command results finish playing and send a Track End event.
Set base64=true to pass base64-encoded PCM audio through the text field.
If playId is set, the Track End event sent by this TTS Track will include this playId.
If the current playId is the same as a previous TTS command's playId, it will reuse the previous TTS Track; otherwise, it will terminate the previous TTS Track and create a new TTS Track.

For details, see TTS(Text-to-Speech)

Play - Play Audio

Purpose: Play audio from URL.

Example:

{
  "command": "play",
  "url": "http://example.com/audio.mp3",
  "autoHangup": false,
  "waitInputTimeout": 30
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"play"`
`url`	string	✓	-	Audio file URL to play (supports HTTP/HTTPS). This URL will be returned as playId in the trackEnd event
`autoHangup`	boolean	✗	`false`	If true, will automatically hang up after playback completes
`waitInputTimeout`	number	✗	-	Maximum time to wait for user input, unit: seconds

Interrupt - Interrupt Playback

Purpose: Interrupt current TTS or audio playback.

info

If the current TTS result has not finished playing, an Interruption event will be triggered, containing the played time and the time when audio was received from the TTS provider. If the provider supports subtitles, it will also include the estimated position of the played text.
If graceful = true is set, it will wait for the current TTS command result to finish playing before exiting, see Interruption.

Example:

{
  "command": "interrupt",
  "graceful": false
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"interrupt"`
`graceful`	boolean	✗	`false`	Whether to gracefully interrupt

Refer - Transfer Call

Purpose: Transfer call to another party (SIP REFER).

info

See Transfer Call

Example:

{
  "command": "refer",
  "caller": "sip:alice@example.com",
  "callee": "sip:charlie@example.com",
  "options": {
    "denoise": true,
    "timeout": 30,
    "moh": "http://example.com/hold_music.wav",
    "autoHangup": true
  }
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"refer"`
`caller`	string	✓	Transfer caller SIP address (currently connected local SIP address, e.g.: `sip:{localIP}:13050`)
`callee`	string	✓	Transfer target SIP URI (e.g., `sip:bob@example.com`)
`options`	ReferOption	✗	Transfer configuration, see ReferOption for details

Mute - Mute

Purpose: Mute all or specified Tracks.

info

If trackId is specified, mute the corresponding Track; otherwise, mute all Tracks.

Example:

{
  "command": "mute",
  "trackId": "track-123"
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"mute"`
`trackId`	string	✗	Track ID to mute (if not specified, mute all tracks)

Unmute - Unmute

Purpose: Unmute muted Tracks.

info

If trackId is specified, unmute the corresponding Track; otherwise, unmute all Tracks.

Example:

{
  "command": "unmute",
  "trackId": "track-123"
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"unmute"`
`trackId`	string	✗	Track ID to unmute (if not specified, unmute all tracks)

Hangup - Hangup

Purpose: End the call.

Example:

{
  "command": "hangup",
  "reason": "user_requested",
  "initiator": "user"
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"hangup"`
`reason`	string	✗	Hangup reason
`initiator`	string	✗	Party that initiated the hangup (user, system, etc.)

Events

Events are received from the server in JSON format. All timestamps are in milliseconds. Each event contains an event field indicating the event type, and most events also include a trackId field indicating the related Track.

Incoming - Incoming Call Event

Trigger: When an incoming call is received (SIP calls only).

Example:

{
  "event": "incoming",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "caller": "sip:alice@example.com",
  "callee": "sip:bob@example.com",
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}

Fields:

Field	Type	Description
`event`	string	Always `"incoming"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`caller`	string	Caller's SIP address
`callee`	string	Callee's SIP address
`sdp`	string	SDP offer from the caller

Answer - Answer Event

Trigger: When the call is answered and SDP negotiation is complete.

info

For SIP calls, the Answer event is triggered when 200 OK is received.

Example:

{
  "event": "answer",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}

Fields:

Field	Type	Description
`event`	string	Always `"answer"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`sdp`	string	SDP answer from the callee

Reject - Reject Event

Trigger: When the call is rejected.

Example:

{
  "event": "reject",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "reason": "Busy",
  "code": 486
}

Fields:

Field	Type	Description
`event`	string	Always `"reject"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`reason`	string	Rejection reason
`code`	number	SIP response code (optional)

info

Request failure response code list: Request Failure 4xx

Ringing - Ringing Event

Trigger: When the call is ringing (SIP calls only).

Example:

{
  "event": "ringing",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "earlyMedia": false
}

Fields:

Field	Type	Description
`event`	string	Always `"ringing"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`earlyMedia`	boolean	Whether early media is available

Hangup - Hangup Event

Trigger: When the call ends.

Example:

{
  "event": "hangup",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "reason": "user_requested",
  "initiator": "user",
  "startTime": "2024-01-01T12:00:00Z",
  "hangupTime": "2024-01-01T12:05:30Z",
  "answerTime": "2024-01-01T12:00:05Z",
  "from": {
    "username": "alice",
    "realm": "example.com",
    "source": "sip:alice@example.com"
  }
}

Fields:

Field	Type	Description
`event`	string	Always `"hangup"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`reason`	string	Hangup reason (optional)
`initiator`	string	Party that initiated the hangup (optional)
`startTime`	string	ISO 8601 timestamp of call start
`hangupTime`	string	ISO 8601 timestamp of call end
`answerTime`	string	ISO 8601 timestamp of call answer (optional)
`ringingTime`	string	ISO 8601 timestamp of call ringing start (optional)
`from`	Attendee	Caller information (optional)
`to`	Attendee	Callee information (optional)
`extra`	object	Additional call metadata (optional)

Speaking - Speaking Event

Trigger: When VAD detects speech. Call Option must configure VAD

Example:

{
  "event": "speaking",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "startTime": 1640995200000
}

Fields:

Field	Type	Description
`event`	string	Always `"speaking"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`startTime`	number	Time when speech started (milliseconds)

Silence - Silence Event

Trigger: When more than speechPadding milliseconds have passed since the current speech started, and more than silencePadding milliseconds have passed since the last Silence event was triggered (if any). Call Option must configure VAD.

Example:

{
  "event": "silence",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "startTime": 1640995195000,
  "duration": 5000
}

Fields:

Field	Type	Description
`event`	string	Always `"silence"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`startTime`	number	Time when silence started (milliseconds)
`duration`	number	Silence duration (milliseconds)

AsrFinal - ASR Final Event

Trigger: Stable result of speech recognition.

Example:

{
  "event": "asrFinal",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "index": 1,
  "startTime": 1640995200000,
  "endTime": 1640995205000,
  "text": "Hello, how can I help you today?"
}

Fields:

Field	Type	Description
`event`	string	Always `"asrFinal"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`index`	number	ASR result sequence number
`startTime`	number	Speech start time (milliseconds, optional)
`endTime`	number	Speech end time (milliseconds, optional)
`text`	string	Speech recognition result

AsrDelta - ASR Delta Event

Trigger: Intermediate result of speech recognition (may change).

Example:

{
  "event": "asrDelta",
  "trackId": "track-abc123",
  "index": 1,
  "timestamp": 1640995200000,
  "text": "Hello, how can"
}

Fields:

Field	Type	Description
`event`	string	Always `"asrDelta"`
`trackId`	string	Unique identifier of the Track
`index`	number	ASR result sequence number
`timestamp`	number	Event timestamp (milliseconds)
`startTime`	number	Speech start time (milliseconds, optional)
`endTime`	number	Speech end time (milliseconds, optional)
`text`	string	Speech recognition result (may change)

TrackStart - Track Start Event

Trigger: When a Track starts (RTP, TTS, file playback, etc.).

Example:

{
  "event": "trackStart",
  "trackId": "track-tts-456",
  "timestamp": 1640995200000,
  "playId": "llm-001"
}

Fields:

Field	Type	Description
`event`	string	Always `"trackStart"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`playId`	string	TTS command's playId (optional) or Play command's URL (optional)

info

Both TTS and Play commands create corresponding Tracks. The playId in TrackStart is:

TTS Track: TTS command's playId (optional)
Play Track: Play command's URL

TrackEnd - Track End Event

Trigger: When a Track ends (RTP ends, TTS completes, file playback completes, etc.).

Example:

{
  "event": "trackEnd",
  "trackId": "track-tts-456",
  "timestamp": 1640995230000,
  "duration": 30000,
  "ssrc": 1234567890,
  "playId": "llm-001"
}

Fields:

Field	Type	Description
`event`	string	Always `"trackEnd"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`duration`	number	Track duration (milliseconds)
`ssrc`	number	RTP synchronization source identifier
`playId`	string	TTS command's playId (optional) or Play command's URL (optional)

info

Both TTS and Play commands create corresponding Tracks. The playId in TrackEnd is:

TTS Track: TTS command's playId (optional)
Play Track: Play command's URL

Interruption - Interruption Event

Trigger: When an Interrupt command is received and there are unfinished TTS commands.

Example:

{
  "event": "interruption",
  "trackId": "track-tts-456",
  "timestamp": 1640995215000,
  "playId": "llm-001",
  "subtitle": "Hello, this is a long message that was interrupted",
  "position": 5,
  "totalDuration": 30000,
  "current": 15000
}

Fields:

Field	Type	Description
`event`	string	Always `"interruption"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`playId`	string	For TTS commands, this is the playId from the TTS command (optional)
`subtitle`	string	Current TTS text being played at interruption (optional)
`position`	number	Word index position in subtitle at interruption (optional)
`totalDuration`	number	Total duration of TTS content (milliseconds)
`current`	number	Time elapsed since TTS start at interruption (milliseconds)

Dtmf - DTMF Event

Trigger: When a keypress is detected.

Example:

{
  "event": "dtmf",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "digit": "1"
}

Fields:

Field	Type	Description
`event`	string	Always `"dtmf"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`digit`	string	DTMF digit (0-9, *, #, A-D)

Metrics - Metrics Event

Trigger: When performance metrics are available.

Example:

{
  "event": "metrics",
  "timestamp": 1640995200000,
  "key": "ttfb.asr.tencent",
  "duration": 150,
  "data": {
    "index": 1,
    "provider": "tencent"
  }
}

Fields:

Field	Type	Description
`event`	string	Always `"metrics"`
`timestamp`	number	Event timestamp (milliseconds)
`key`	string	Metric key (e.g., "ttfb.asr.tencent")
`duration`	number	Duration (milliseconds)
`data`	object	Additional metric data

Error - Error Event

Trigger: When an error occurs during processing.

Example:

{
  "event": "error",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sender": "asr",
  "error": "Connection timeout to ASR service",
  "code": 408
}

Fields:

Field	Type	Description
`event`	string	Always `"error"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`sender`	string	Component that generated the error (asr, tts, media, etc.)
`error`	string	Error message description
`code`	number	Error code (optional)

Binary - Binary Event

Trigger: When binary audio data is sent (WebSocket calls only).

Example:

{
  "event": "binary",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "data": []
}

Fields:

Field	Type	Description
`event`	string	Always `"binary"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`data`	array	Binary audio data bytes

Ping - Ping Event

Trigger: When periodic Ping messages are sent (if the pingInterval parameter is set in the connection URL), see Connect to RustPBX.

Example:

{
  "event": "ping",
  "timestamp": 1640995200000,
  "payload": "optional_payload"
}

Fields:

Field	Type	Description
`event`	string	Always `"ping"`
`timestamp`	number	Event timestamp (milliseconds)
`payload`	string	Optional payload data (optional)

Other - Other Event

Trigger: When custom or extended events are generated.

Example:

{
  "event": "other",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sender": "custom_plugin",
  "extra": {
    "custom_field": "custom_value"
  }
}

Fields:

Field	Type	Description
`event`	string	Always `"other"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`sender`	string	Component that generated the event
`extra`	object	Additional event data (optional)

Options

CallOption

The CallOption object is used in Invite and Accept commands, containing call configuration.

Example:

{
  "denoise": true,
  "offer": "SDP offer string",
  "callee": "sip:callee@example.com",
  "caller": "sip:caller@example.com",
  "codec": "g722",
  "recorder": {
    "recorderFile": "/path/to/recording.wav",
    "samplerate": 16000
  },
  "asr": {
    "provider": "tencent",
    "language": "zh-CN"
  },
  "tts": {
    "provider": "tencent",
    "speaker": "xiaoyan"
  }
}

Fields:

Field	Type	Required	Default	Description
`denoise`	boolean	✗	`false`	Enable audio processing noise reduction
`offer`	string	✗	-	SDP offer string for WebRTC/SIP negotiation
`callee`	string	✗	-	Callee's SIP URI or phone number (e.g., "sip:bob@example.com")
`caller`	string	✗	-	Caller's SIP URI or phone number (e.g., "sip:alice@example.com")
`codec`	string	✗	`"pcmu"`	Audio codec: `"pcmu"`, `"pcma"`, `"g722"`, `"pcm"` (only for WebSocket calls)
`recorder`	RecorderOption	✗	-	Call recording configuration
`vad`	VADOption	✗	-	Voice activity detection configuration
`asr`	TranscriptionOption	✗	-	Automatic Speech Recognition (ASR) configuration
`tts`	SynthesisOption	✗	-	Text-to-Speech configuration
`mediaPass`	MediaPassOption	✗	-	Media Pass configuration
`handshakeTimeout`	string	✗	-	Connection handshake timeout (e.g., "30s")
`enableIpv6`	boolean	✗	`false`	Enable IPv6 support
`sip`	SipOption	✗	-	SIP registration account, password, and domain configuration
`extra`	object	✗	-	Additional parameters

RecorderOption

Call recording configuration options.

Example:

{
  "recorderFile": "/path/to/recording.wav",
  "samplerate": 16000,
  "ptime": 200
}

Fields:

Field	Type	Required	Default	Description
`recorderFile`	string	✓	-	Recording file path
`samplerate`	number	✗	`16000`	Recording sample rate, unit: Hz
`ptime`	number	✗	`200`	Packet time, unit: milliseconds

TranscriptionOption

Automatic Speech Recognition (ASR) configuration options.

Example:

{
  "provider": "tencent",
  "language": "zh-CN",
  "appId": "app_id",
  "secretId": "your_secret_id",
  "secretKey": "your_secret_key",
  "modelType": "16k_zh",
  "samplerate": 16000,
  "startWhenAnswer": true
}

Fields:

Field	Type	Required	Default	Description
`provider`	string	✗	-	ASR provider: `tencent`, `aliyun`, `Deepgram`, etc.
`language`	string	✗	-	Language (e.g., "zh-CN", "en-US") (see corresponding provider documentation for details)
`appId`	string	✗	-	Tencent Cloud's appId
`secretId`	string	✗	-	Tencent Cloud's secretId
`secretKey`	string	✗	-	Tencent Cloud's secretKey, or other provider's API Key
`modelType`	string	✗	-	ASR model type (e.g., "16k_zh", "8k_en"), see provider documentation for details
`bufferSize`	number	✗	-	Audio buffer size, unit: bytes
`samplerate`	number	✗	`16000`	Sample rate
`endpoint`	string	✗	-	Custom service endpoint URL
`extra`	object	✗	-	Provider-specific parameters
`startWhenAnswer`	boolean	✗	`false`	Request ASR service after call is answered

SynthesisOption

Text-to-Speech (TTS) configuration options.

Example:

{
  "provider": "tencent",
  "speaker": "xiaoyan",
  "volume": 5,
  "speed": 1.0,
  "emotion": "neutral",
  "samplerate": 16000
}

Fields:

Field	Type	Required	Default	Description
`provider`	string	✗	-	TTS provider: `"tencent"`, `"aliyun"`, `"deepgram"`, `"voiceapi"`
`speaker`	string	✗	-	Voice, see provider documentation
`volume`	number	✗	`5`	Volume (1-10)
`speed`	number	✗	`1.0`	Speech rate
`samplerate`	number	✗	`16000`	Sample rate, unit: hz
`appId`	string	✗	-	Tencent Cloud's appId
`secretId`	string	✗	-	Tencent Cloud's secretId
`secretKey`	string	✗	-	Tencent Cloud's secretKey, or other provider's API Key
`codec`	string	✗	-	Encoding format
`subtitle`	boolean	✗	`false`	Whether to enable subtitles
`endpoint`	string	✗	-	Custom TTS service endpoint URL
`extra`	object	✗	-	Additional provider-specific parameters
`maxConcurrentTasks`	number	✗	-	Maximum concurrent tasks for non-streaming TTS commands, default is 1

VADOption

Voice Activity Detection (VAD) configuration options.

Example:

{
  "type": "webrtc",
  "samplerate": 16000,
  "speechPadding": 250,
  "silencePadding": 100,
  "voiceThreshold": 0.5,
  "maxBufferDurationSecs": 50
}

Fields:

Field	Type	Required	Default	Description
`type`	string	✗	`"silero"`	VAD algorithm type: `"silero"` , `"ten"`
`samplerate`	number	✗	`16000`	Sample rate
`speechPadding`	number	✗	`250`	Start detection `speechPadding` milliseconds after speech starts
`silencePadding`	number	✗	`100`	Silence event trigger interval, unit: milliseconds
`ratio`	number	✗	`0.5`	Speech detection ratio threshold
`voiceThreshold`	number	✗	`0.5`	Voice energy threshold
`maxBufferDurationSecs`	number	✗	`50`	Maximum buffer duration, unit: seconds
`silenceTimeout`	number	✗	-	Silence detection timeout, unit: milliseconds
`endpoint`	string	✗	-	Custom VAD service endpoint
`secretKey`	string	✗	-	VAD service authentication key
`secretId`	string	✗	-	VAD service authentication ID

MediaPassOption

Media pass-through configuration options, using WebSocket service to completely take over audio processing.

RustPBX will send all audio data to the configured WebSocket service and play audio received from that service to the other party of the audio connection.

See: Media Pass

Example:

{
  "url": "ws://localhost:9090/media",
  "inputSampleRate": 16000,
  "outputSampleRate": 16000,
  "packetSize": 2560
}

Fields:

Field	Type	Required	Default	Description
`url`	string	✓	-	WebSocket connection URL for media stream
`inputSampleRate`	number	✓	-	Audio sample rate received from WebSocket server (also the track's sample rate)
`outputSampleRate`	number	✓	-	Audio sample rate sent to WebSocket server
`packetSize`	number	✗	`2560`	Packet size sent to WebSocket server, unit: bytes

ReferOption

Transfer configuration options.

Example:

{
  "denoise": true,
  "timeout": 30,
  "moh": "http://example.com/hold_music.wav",
  "asr": {
    "provider": "tencent",
    "language": "zh-CN"
  },
  "autoHangup": true,
  "sip": {
    "username": "transfer_user",
    "password": "transfer_password"
  }
}

Fields:

Field	Type	Required	Default	Description
`denoise`	boolean	✗	`false`	Enable noise reduction during transfer
`timeout`	number	✗	-	Transfer timeout, unit: seconds
`moh`	string	✗	-	Hold music URL to play during transfer
`asr`	TranscriptionOption	✗	-	Automatic speech recognition configuration
`autoHangup`	boolean	✗	`false`	Automatically hang up after transfer completes
`sip`	SipOption	✗	-	SIP configuration

SipOption

SIP protocol configuration options.

Example:

{
  "username": "user",
  "password": "password",
  "realm": "example.com",
  "headers": {
    "X-Custom-Header": "value"
  }
}

Fields:

Field	Type	Required	Default	Description
`username`	string	✗	-	SIP username for authentication
`password`	string	✗	-	SIP password for authentication
`realm`	string	✗	-	SIP domain/realm for authentication
`headers`	object	✗	-	Additional SIP protocol headers (key-value pairs)

Attendee

Call participant information.

Example:

{
  "username": "alice",
  "realm": "example.com",
  "source": "sip:alice@example.com"
}

Fields:

Field	Type	Description
`username`	string	Username part of SIP URI
`realm`	string	Domain/realm part of SIP URI
`source`	string	Complete SIP URI or phone number

REST API Endpoints

List Active Calls

Endpoint: GET /call/lists

Description: Returns a list of active calls.

Response:

{
  "calls": [
    {
      "id": "session-id",
      "call_type": "webrtc",
      "created_at": "2024-01-01T12:00:00Z",
      "option": {
        "caller": "1234567890",
        "callee": "0987654321"
      }
    }
  ]
}

Usage:

curl http://localhost:8080/call/lists

Terminate Call

Endpoint: POST /call/kill/{id}

Description: Terminate a specific active call.

Parameters:

id (path parameter, string): Session ID of the call to terminate

Response:

true

Usage:

curl -X POST http://localhost:8080/call/kill/session123

Get ICE Servers

Endpoint: GET /iceservers

Description: Returns ICE server configuration for WebRTC connections.

Response:

[
  {
    "urls": ["stun:restsend.com:3478"],
    "username": null,
    "credential": null
  },
  {
    "urls": ["turn:restsend.com:3478"],
    "username": "username",
    "credential": "password"
  }
]

Usage:

curl http://localhost:8080/iceservers

Error Handling

All endpoints return appropriate HTTP status codes:

200 OK: Success
400 Bad Request: Invalid parameters
404 Not Found: Resource not found
500 Internal Server Error: Server error

Notes

All WebSocket endpoints support real-time bidirectional communication
When WebSocket connection closes, call sessions are automatically cleaned up
Event dumping can be disabled by setting the dump=false parameter
ICE servers are automatically configured based on environment variables
Audio codecs are automatically negotiated based on functionality
VAD (Voice Activity Detection) events are used for speech detection
ASR (Automatic Speech Recognition) provides real-time transcription
TTS (Text-to-Speech) supports streaming synthesis
All timestamps are in milliseconds
trackId is used to identify which Track generated the event
playId prevents interrupting previous TTS playback when using the same ID. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
autoHangup automatically ends the call after TTS/Play completes

Connect to RustPBX​

Commands​

Invite - Initiate Call​

Accept - Answer Incoming Call​

Reject - Reject Incoming Call​

Ringing - Ringing​

TTS - Text-to-Speech​

Play - Play Audio​

Interrupt - Interrupt Playback​

Refer - Transfer Call​

Mute - Mute​

Unmute - Unmute​

Hangup - Hangup​

Events​

Incoming - Incoming Call Event​

Answer - Answer Event​

Reject - Reject Event​

Ringing - Ringing Event​

Hangup - Hangup Event​

Speaking - Speaking Event​

Silence - Silence Event​

AsrFinal - ASR Final Event​

AsrDelta - ASR Delta Event​

TrackStart - Track Start Event​

TrackEnd - Track End Event​

Interruption - Interruption Event​

Dtmf - DTMF Event​

Metrics - Metrics Event​

Error - Error Event​

Binary - Binary Event​

Ping - Ping Event​

Other - Other Event​

Options​

CallOption​

RecorderOption​

TranscriptionOption​

SynthesisOption​

VADOption​

MediaPassOption​

ReferOption​

SipOption​

Attendee​

REST API Endpoints​

List Active Calls​

Terminate Call​

Get ICE Servers​

Error Handling​

Notes​

Connect to RustPBX

Commands

Invite - Initiate Call

Accept - Answer Incoming Call

Reject - Reject Incoming Call

Ringing - Ringing

TTS - Text-to-Speech

Play - Play Audio

Interrupt - Interrupt Playback

Refer - Transfer Call

Mute - Mute

Unmute - Unmute

Hangup - Hangup

Events

Incoming - Incoming Call Event

Answer - Answer Event

Reject - Reject Event

Ringing - Ringing Event

Hangup - Hangup Event

Speaking - Speaking Event

Silence - Silence Event

AsrFinal - ASR Final Event

AsrDelta - ASR Delta Event

TrackStart - Track Start Event

TrackEnd - Track End Event

Interruption - Interruption Event

Dtmf - DTMF Event

Metrics - Metrics Event

Error - Error Event

Binary - Binary Event

Ping - Ping Event

Other - Other Event

Options

CallOption

RecorderOption

TranscriptionOption

SynthesisOption

VADOption

MediaPassOption

ReferOption

SipOption

Attendee

REST API Endpoints

List Active Calls

Terminate Call

Get ICE Servers

Error Handling

Notes