WebSocket API
Connect to Active Call
Active Call uses WebSocket connections
Address
The listening address of Active Call is configured in Active Call’s config.toml:
http_addr = "0.0.0.0:8080"
Paths
Different paths correspond to different voice call types:
/call: Audio stream transmitted via WebSocket/call/sip: Audio stream transmitted via SIP/RTP/call/webrtc: Audio stream transmitted via WebRTC RTP
Parameters
id(optional, string): Session ID. Defaults to server-generated UUID. (Should be set todialogIdwhen answering)dump(optional, bool, default:true): Whether to enable dumppingInterval(optional, int, unit: seconds, default: 20): WebSocket Ping interval. When enabled, will periodically receiveSessionEvent::PingeventsserverSideTrack(optional, string, default:serverSideTrack): Set server-side TrackID
Example
ws://localhost:8080/call/sip?id=session123&dump=true
Commands
Commands are sent over the WebSocket connection as JSON messages, with the command field indicating the command type.
Invite - Initiate Call
Purpose: Initiate a new call.
Example:
{
"command": "invite",
"option": {
"denoise": true,
"callee": "sip:alice@192.168.3.197:12345",
"caller": "sip:192.168.3.197:3050",
"vad": {
"type": "silero",
"silenceTimeout": 5000
},
"asr": {
"provider": "tencent"
},
"tts": {
"provider": "tencent",
"speaker": "601003"
},
"sip": {}
}
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "invite" |
option | CallOption | ✓ | Call configuration parameters, see CallOption for details |
Accept - Answer Incoming Call
Purpose: Answer an incoming call.
Example:
{
"command": "accept",
"option": {
"denoise": true,
"vad": {
"type": "silero",
"silenceTimeout": 5000
},
"asr": {
"provider": "tencent"
},
"tts": {
"provider": "tencent",
"speaker": "601003"
}
}
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "accept" |
option | CallOption | ✓ | Call configuration parameters, see CallOption for details |
Reject - Reject Incoming Call
Purpose: Reject an incoming call.
Example:
{
"command": "reject",
"reason": "Busy Here",
"code": 486
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "reject" |
reason | string | ✓ | - | Rejection reason |
code | number | ✗ | - | SIP response code |
Ringing - Ringing
Purpose: Send ringing response for incoming call. (180 Ringing)
Note
If `recorder` is set in the Ringing command, the `recorder` option in subsequent Accept commands will **not** override the recording settings during the ringing phase.Example:
{
"command": "ringing",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
},
"earlyMedia": true,
"ringtone": "http://example.com/ringtone.wav"
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "ringing" |
recorder | RecorderOption | ✗ | - | Call recording configuration |
earlyMedia | boolean | ✗ | false | Enable early media during ringing |
ringtone | string | ✗ | - | Custom ringtone URL |
TTS - Text-to-Speech
Purpose: Convert text to speech and play audio.
Example:
{
"command": "tts",
"text": "Hello, this is a test message",
"speaker": "xiaoyan",
"playId": "unique_play_id",
"autoHangup": false,
"streaming": false,
"endOfStream": false,
"waitInputTimeout": 30,
"option": {
"provider": "tencent",
"volume": 5,
"speed": 1.0
}
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "tts" |
text | string | ✓ | - | Text to synthesize |
speaker | string | ✗ | - | Voice type, see provider voice lists |
playId | string | ✗ | - | TTS Track identifier. |
autoHangup | boolean | ✗ | false | Whether to automatically hang up after TTS playback completes |
streaming | boolean | ✗ | false | Whether to enable streaming TTS |
endOfStream | boolean | ✗ | false | Whether this is the last command for the current playId |
waitInputTimeout | number | ✗ | - | Maximum time to wait for user input, unit: seconds |
option | SynthesisOption | ✗ | - | TTS provider-specific options, can configure voice format, sample rate, etc. |
base64 | boolean | ✗ | false | Whether to use base64 encoding |
For details, see TTS(Text-to-Speech)
Play - Play Audio
Purpose: Play audio from URL.
Example:
{
"command": "play",
"url": "http://example.com/audio.mp3",
"autoHangup": false,
"waitInputTimeout": 30
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "play" |
url | string | ✓ | - | Audio file URL to play (supports HTTP/HTTPS). This URL will be returned as playId in the trackEnd event |
autoHangup | boolean | ✗ | false | If true, will automatically hang up after playback completes |
waitInputTimeout | number | ✗ | - | Maximum time to wait for user input, unit: seconds |
Interrupt - Interrupt Playback
Purpose: Interrupt current TTS or audio playback.
Example:
{
"command": "interrupt",
"graceful": false
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command | string | ✓ | - | Must be "interrupt" |
graceful | boolean | ✗ | false | Whether to gracefully interrupt |
Refer - Transfer Call
Purpose: Transfer call to another party (SIP REFER).
Example:
{
"command": "refer",
"caller": "sip:alice@example.com",
"callee": "sip:charlie@example.com",
"options": {
"denoise": true,
"timeout": 30,
"moh": "http://example.com/hold_music.wav",
"autoHangup": true
}
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "refer" |
caller | string | ✓ | Transfer caller SIP address (currently connected local SIP address, e.g.: sip:{localIP}:13050) |
callee | string | ✓ | Transfer target SIP URI (e.g., sip:bob@example.com) |
options | ReferOption | ✗ | Transfer configuration, see ReferOption for details |
Mute - Mute
Purpose: Mute all or specified Tracks.
Example:
{
"command": "mute",
"trackId": "track-123"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "mute" |
trackId | string | ✗ | Track ID to mute (if not specified, mute all tracks) |
Unmute - Unmute
Purpose: Unmute muted Tracks.
Example:
{
"command": "unmute",
"trackId": "track-123"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "unmute" |
trackId | string | ✗ | Track ID to unmute (if not specified, unmute all tracks) |
Hangup - Hangup
Purpose: End the call.
Example:
{
"command": "hangup",
"reason": "user_requested",
"initiator": "user"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | ✓ | Must be "hangup" |
reason | string | ✗ | Hangup reason |
initiator | string | ✗ | Party that initiated the hangup (user, system, etc.) |
Events
Events are received from the server in JSON format. All timestamps are in milliseconds. Each event contains an event field indicating the event type, and most events also include a trackId field indicating the related Track.
Incoming - Incoming Call Event
Trigger: When an incoming call is received (SIP calls only).
Example:
{
"event": "incoming",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"caller": "sip:alice@example.com",
"callee": "sip:bob@example.com",
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "incoming" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
caller | string | Caller’s SIP address |
callee | string | Callee’s SIP address |
sdp | string | SDP offer from the caller |
Answer - Answer Event
Trigger: When the call is answered and SDP negotiation is complete.
Example:
{
"event": "answer",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "answer" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
sdp | string | SDP answer from the callee |
Reject - Reject Event
Trigger: When the call is rejected.
Example:
{
"event": "reject",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"reason": "Busy",
"code": 486
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "reject" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
reason | string | Rejection reason |
code | number | SIP response code (optional) |
Ringing - Ringing Event
Trigger: When the call is ringing (SIP calls only).
Example:
{
"event": "ringing",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"earlyMedia": false
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "ringing" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
earlyMedia | boolean | Whether early media is available |
Hangup - Hangup Event
Trigger: When the call ends.
Example:
{
"event": "hangup",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"reason": "user_requested",
"initiator": "user",
"startTime": "2024-01-01T12:00:00Z",
"hangupTime": "2024-01-01T12:05:30Z",
"answerTime": "2024-01-01T12:00:05Z",
"from": {
"username": "alice",
"realm": "example.com",
"source": "sip:alice@example.com"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "hangup" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
reason | string | Hangup reason (optional) |
initiator | string | Party that initiated the hangup (optional) |
startTime | string | ISO 8601 timestamp of call start |
hangupTime | string | ISO 8601 timestamp of call end |
answerTime | string | ISO 8601 timestamp of call answer (optional) |
ringingTime | string | ISO 8601 timestamp of call ringing start (optional) |
from | Attendee | Caller information (optional) |
to | Attendee | Callee information (optional) |
extra | object | Additional call metadata (optional) |
Speaking - Speaking Event
Trigger: When VAD detects speech. Call Option must configure VAD
Example:
{
"event": "speaking",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995200000
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "speaking" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
startTime | number | Time when speech started (milliseconds) |
Silence - Silence Event
Trigger: When more than speechPadding milliseconds have passed since the current speech started, and more than silencePadding milliseconds have passed since the last Silence event was triggered (if any). Call Option must configure VAD.
Example:
{
"event": "silence",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995195000,
"duration": 5000
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "silence" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
startTime | number | Time when silence started (milliseconds) |
duration | number | Silence duration (milliseconds) |
AsrFinal - ASR Final Event
Trigger: Stable result of speech recognition.
Example:
{
"event": "asrFinal",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"index": 1,
"startTime": 1640995200000,
"endTime": 1640995205000,
"text": "Hello, how can I help you today?"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "asrFinal" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
index | number | ASR result sequence number |
startTime | number | Speech start time (milliseconds, optional) |
endTime | number | Speech end time (milliseconds, optional) |
text | string | Speech recognition result |
AsrDelta - ASR Delta Event
Trigger: Intermediate result of speech recognition (may change).
Example:
{
"event": "asrDelta",
"trackId": "track-abc123",
"index": 1,
"timestamp": 1640995200000,
"text": "Hello, how can"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "asrDelta" |
trackId | string | Unique identifier of the Track |
index | number | ASR result sequence number |
timestamp | number | Event timestamp (milliseconds) |
startTime | number | Speech start time (milliseconds, optional) |
endTime | number | Speech end time (milliseconds, optional) |
text | string | Speech recognition result (may change) |
TrackStart - Track Start Event
Trigger: When a Track starts (RTP, TTS, file playback, etc.).
Example:
{
"event": "trackStart",
"trackId": "track-tts-456",
"timestamp": 1640995200000,
"playId": "llm-001"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "trackStart" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
playId | string | TTS command’s playId (optional) or Play command’s URL (optional) |
TrackEnd - Track End Event
Trigger: When a Track ends (RTP ends, TTS completes, file playback completes, etc.).
Example:
{
"event": "trackEnd",
"trackId": "track-tts-456",
"timestamp": 1640995230000,
"duration": 30000,
"ssrc": 1234567890,
"playId": "llm-001"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "trackEnd" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
duration | number | Track duration (milliseconds) |
ssrc | number | RTP synchronization source identifier |
playId | string | TTS command’s playId (optional) or Play command’s URL (optional) |
Interruption - Interruption Event
Trigger: When an Interrupt command is received and there are unfinished TTS commands.
Example:
{
"event": "interruption",
"trackId": "track-tts-456",
"timestamp": 1640995215000,
"playId": "llm-001",
"subtitle": "Hello, this is a long message that was interrupted",
"position": 5,
"totalDuration": 30000,
"current": 15000
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "interruption" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
playId | string | For TTS commands, this is the playId from the TTS command (optional) |
subtitle | string | Current TTS text being played at interruption (optional) |
position | number | Word index position in subtitle at interruption (optional) |
totalDuration | number | Total duration of TTS content (milliseconds) |
current | number | Time elapsed since TTS start at interruption (milliseconds) |
Dtmf - DTMF Event
Trigger: When a keypress is detected.
Example:
{
"event": "dtmf",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"digit": "1"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "dtmf" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
digit | string | DTMF digit (0-9, *, #, A-D) |
Metrics - Metrics Event
Trigger: When performance metrics are available.
Example:
{
"event": "metrics",
"timestamp": 1640995200000,
"key": "ttfb.asr.tencent",
"duration": 150,
"data": {
"index": 1,
"provider": "tencent"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "metrics" |
timestamp | number | Event timestamp (milliseconds) |
key | string | Metric key (e.g., “ttfb.asr.tencent”) |
duration | number | Duration (milliseconds) |
data | object | Additional metric data |
Error - Error Event
Trigger: When an error occurs during processing.
Example:
{
"event": "error",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "asr",
"error": "Connection timeout to ASR service",
"code": 408
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "error" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
sender | string | Component that generated the error (asr, tts, media, etc.) |
error | string | Error message description |
code | number | Error code (optional) |
Binary - Binary Event
Trigger: When binary audio data is sent (WebSocket calls only).
Example:
{
"event": "binary",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"data": []
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "binary" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
data | array | Binary audio data bytes |
Ping - Ping Event
Trigger: When periodic Ping messages are sent (if the pingInterval parameter is set in the connection URL), see Connect to Active Call.
Example:
{
"event": "ping",
"timestamp": 1640995200000,
"payload": "optional_payload"
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "ping" |
timestamp | number | Event timestamp (milliseconds) |
payload | string | Optional payload data (optional) |
Other - Other Event
Trigger: When custom or extended events are generated.
Example:
{
"event": "other",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "custom_plugin",
"extra": {
"custom_field": "custom_value"
}
}
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "other" |
trackId | string | Unique identifier of the Track |
timestamp | number | Event timestamp (milliseconds) |
sender | string | Component that generated the event |
extra | object | Additional event data (optional) |
Options
CallOption
The CallOption object is used in Invite and Accept commands, containing call configuration.
Example:
{
"denoise": true,
"offer": "SDP offer string",
"callee": "sip:callee@example.com",
"caller": "sip:caller@example.com",
"codec": "g722",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000
},
"asr": {
"provider": "tencent",
"language": "zh-CN"
},
"tts": {
"provider": "tencent",
"speaker": "xiaoyan"
}
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
denoise | boolean | ✗ | false | Enable audio processing noise reduction |
offer | string | ✗ | - | SDP offer string for WebRTC/SIP negotiation |
callee | string | ✗ | - | Callee’s SIP URI or phone number (e.g., “sip:bob@example.com”) |
caller | string | ✗ | - | Caller’s SIP URI or phone number (e.g., “sip:alice@example.com”) |
codec | string | ✗ | "pcmu" | Audio codec: "pcmu", "pcma", "g722", "pcm" (only for WebSocket calls) |
recorder | RecorderOption | ✗ | - | Call recording configuration |
vad | VADOption | ✗ | - | Voice activity detection configuration |
asr | TranscriptionOption | ✗ | - | Automatic Speech Recognition (ASR) configuration |
tts | SynthesisOption | ✗ | - | Text-to-Speech configuration |
mediaPass | MediaPassOption | ✗ | - | Media Pass configuration |
handshakeTimeout | string | ✗ | - | Connection handshake timeout (e.g., “30s”) |
enableIpv6 | boolean | ✗ | false | Enable IPv6 support |
sip | SipOption | ✗ | - | SIP registration account, password, and domain configuration |
extra | object | ✗ | - | Additional parameters |
RecorderOption
Call recording configuration options.
Example:
{
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
recorderFile | string | ✓ | - | Recording file path |
samplerate | number | ✗ | 16000 | Recording sample rate, unit: Hz |
ptime | number | ✗ | 200 | Packet time, unit: milliseconds |
TranscriptionOption
Automatic Speech Recognition (ASR) configuration options.
Example:
{
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"samplerate": 16000,
"startWhenAnswer": true
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | ✗ | - | ASR provider: tencent, aliyun, Deepgram, etc. |
language | string | ✗ | - | Language (e.g., “zh-CN”, “en-US”) (see corresponding provider documentation for details) |
appId | string | ✗ | - | Tencent Cloud’s appId |
secretId | string | ✗ | - | Tencent Cloud’s secretId |
secretKey | string | ✗ | - | Tencent Cloud’s secretKey, or other provider’s API Key |
modelType | string | ✗ | - | ASR model type (e.g., “16k_zh”, “8k_en”), see provider documentation for details |
bufferSize | number | ✗ | - | Audio buffer size, unit: bytes |
samplerate | number | ✗ | 16000 | Sample rate |
endpoint | string | ✗ | - | Custom service endpoint URL |
extra | object | ✗ | - | Provider-specific parameters |
startWhenAnswer | boolean | ✗ | false | Request ASR service after call is answered |
SynthesisOption
Text-to-Speech (TTS) configuration options.
Example:
{
"provider": "tencent",
"speaker": "xiaoyan",
"volume": 5,
"speed": 1.0,
"emotion": "neutral",
"samplerate": 16000
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | ✗ | - | TTS provider: "tencent", "aliyun", "deepgram", "voiceapi" |
speaker | string | ✗ | - | Voice, see provider documentation |
volume | number | ✗ | 5 | Volume (1-10) |
speed | number | ✗ | 1.0 | Speech rate |
samplerate | number | ✗ | 16000 | Sample rate, unit: hz |
appId | string | ✗ | - | Tencent Cloud’s appId |
secretId | string | ✗ | - | Tencent Cloud’s secretId |
secretKey | string | ✗ | - | Tencent Cloud’s secretKey, or other provider’s API Key |
codec | string | ✗ | - | Encoding format |
subtitle | boolean | ✗ | false | Whether to enable subtitles |
endpoint | string | ✗ | - | Custom TTS service endpoint URL |
extra | object | ✗ | - | Additional provider-specific parameters |
maxConcurrentTasks | number | ✗ | - | Maximum concurrent tasks for non-streaming TTS commands, default is 1 |
VADOption
Voice Activity Detection (VAD) configuration options.
Example:
{
"type": "webrtc",
"samplerate": 16000,
"speechPadding": 250,
"silencePadding": 100,
"voiceThreshold": 0.5,
"maxBufferDurationSecs": 50
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
type | string | ✗ | "silero" | VAD algorithm type: "silero" , "ten" |
samplerate | number | ✗ | 16000 | Sample rate |
speechPadding | number | ✗ | 250 | Start detection speechPadding milliseconds after speech starts |
silencePadding | number | ✗ | 100 | Silence event trigger interval, unit: milliseconds |
ratio | number | ✗ | 0.5 | Speech detection ratio threshold |
voiceThreshold | number | ✗ | 0.5 | Voice energy threshold |
maxBufferDurationSecs | number | ✗ | 50 | Maximum buffer duration, unit: seconds |
silenceTimeout | number | ✗ | - | Silence detection timeout, unit: milliseconds |
endpoint | string | ✗ | - | Custom VAD service endpoint |
secretKey | string | ✗ | - | VAD service authentication key |
secretId | string | ✗ | - | VAD service authentication ID |
MediaPassOption
Media pass-through configuration options, using WebSocket service to completely take over audio processing.
Active Call will send all audio data to the configured WebSocket service and play audio received from that service to the other party of the audio connection.
See: Media Pass
Example:
{
"url": "ws://localhost:9090/media",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 2560
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | ✓ | - | WebSocket connection URL for media stream |
inputSampleRate | number | ✓ | - | Audio sample rate received from WebSocket server (also the track’s sample rate) |
outputSampleRate | number | ✓ | - | Audio sample rate sent to WebSocket server |
packetSize | number | ✗ | 2560 | Packet size sent to WebSocket server, unit: bytes |
ReferOption
Transfer configuration options.
Example:
{
"denoise": true,
"timeout": 30,
"moh": "http://example.com/hold_music.wav",
"asr": {
"provider": "tencent",
"language": "zh-CN"
},
"autoHangup": true,
"sip": {
"username": "transfer_user",
"password": "transfer_password"
}
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
denoise | boolean | ✗ | false | Enable noise reduction during transfer |
timeout | number | ✗ | - | Transfer timeout, unit: seconds |
moh | string | ✗ | - | Hold music URL to play during transfer |
asr | TranscriptionOption | ✗ | - | Automatic speech recognition configuration |
autoHangup | boolean | ✗ | false | Automatically hang up after transfer completes |
sip | SipOption | ✗ | - | SIP configuration |
SipOption
SIP protocol configuration options.
Example:
{
"username": "user",
"password": "password",
"realm": "example.com",
"headers": {
"X-Custom-Header": "value"
}
}
Fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
username | string | ✗ | - | SIP username for authentication |
password | string | ✗ | - | SIP password for authentication |
realm | string | ✗ | - | SIP domain/realm for authentication |
headers | object | ✗ | - | Additional SIP protocol headers (key-value pairs) |
Attendee
Call participant information.
Example:
{
"username": "alice",
"realm": "example.com",
"source": "sip:alice@example.com"
}
Fields:
| Field | Type | Description |
|---|---|---|
username | string | Username part of SIP URI |
realm | string | Domain/realm part of SIP URI |
source | string | Complete SIP URI or phone number |
REST API Endpoints
List Active Calls
Endpoint: GET /call/lists
Description: Returns a list of active calls.
Response:
{
"calls": [
{
"id": "session-id",
"call_type": "webrtc",
"created_at": "2024-01-01T12:00:00Z",
"option": {
"caller": "1234567890",
"callee": "0987654321"
}
}
]
}
Usage:
curl http://localhost:8080/call/lists
Terminate Call
Endpoint: POST /call/kill/{id}
Description: Terminate a specific active call.
Parameters:
id(path parameter, string): Session ID of the call to terminate
Response:
true
Usage:
curl -X POST http://localhost:8080/call/kill/session123
Get ICE Servers
Endpoint: GET /iceservers
Description: Returns ICE server configuration for WebRTC connections.
Response:
[
{
"urls": ["stun:restsend.com:3478"],
"username": null,
"credential": null
},
{
"urls": ["turn:restsend.com:3478"],
"username": "username",
"credential": "password"
}
]
Usage:
curl http://localhost:8080/iceservers
Error Handling
All endpoints return appropriate HTTP status codes:
200 OK: Success400 Bad Request: Invalid parameters404 Not Found: Resource not found500 Internal Server Error: Server error
Notes
- All WebSocket endpoints support real-time bidirectional communication
- When WebSocket connection closes, call sessions are automatically cleaned up
- Event dumping can be disabled by setting the
dump=falseparameter - ICE servers are automatically configured based on environment variables
- Audio codecs are automatically negotiated based on functionality
- VAD (Voice Activity Detection) events are used for speech detection
- ASR (Automatic Speech Recognition) provides real-time transcription
- TTS (Text-to-Speech) supports streaming synthesis
- All timestamps are in milliseconds
- trackId is used to identify which Track generated the event
- playId prevents interrupting previous TTS playback when using the same ID. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
- autoHangup automatically ends the call after TTS/Play completes