WebSocket API

Connect to Active Call

Active Call uses WebSocket connections

Address

The listening address of Active Call is configured in Active Call’s config.toml:

http_addr = "0.0.0.0:8080"

Paths

Different paths correspond to different voice call types:

/call: Audio stream transmitted via WebSocket
/call/sip: Audio stream transmitted via SIP/RTP
/call/webrtc: Audio stream transmitted via WebRTC RTP

Parameters

id (optional, string): Session ID. Defaults to server-generated UUID. (Should be set to dialogId when answering)
dump (optional, bool, default: true): Whether to enable dump
pingInterval (optional, int, unit: seconds, default: 20): WebSocket Ping interval. When enabled, will periodically receive SessionEvent::Ping events
serverSideTrack (optional, string, default: serverSideTrack): Set server-side TrackID

Example

ws://localhost:8080/call/sip?id=session123&dump=true

Commands

Commands are sent over the WebSocket connection as JSON messages, with the command field indicating the command type.

Invite - Initiate Call

Purpose: Initiate a new call.

* SIP calls require setting `option.caller` and `option.callee` to the caller and callee SIP addresses respectively * WebRTC calls require setting `option.offer` to the call's [SDP](https://en.wikipedia.org/wiki/Session_Description_Protocol) offer

Example:

{
    "command": "invite",
    "option": {
        "denoise": true,
        "callee": "sip:alice@192.168.3.197:12345",
        "caller": "sip:192.168.3.197:3050",
        "vad": {
            "type": "silero",
            "silenceTimeout": 5000
        },
        "asr": {
            "provider": "tencent"
        },
        "tts": {
            "provider": "tencent",
            "speaker": "601003"
        },
        "sip": {}
    }
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"invite"`
`option`	CallOption	✓	Call configuration parameters, see CallOption for details

Accept - Answer Incoming Call

Purpose: Answer an incoming call.

Answering calls requires setting the `id` parameter in the connection URL to `dialogId`, which is provided by the webhook request. See: * [Answer Incoming Call](/static/docs/active-call/guide/call.mdx#accept) * [Connect to Active Call](#connect)

Example:

{
    "command": "accept",
    "option": {
        "denoise": true,
        "vad": {
            "type": "silero",
            "silenceTimeout": 5000
        },
        "asr": {
            "provider": "tencent"
        },
        "tts": {
            "provider": "tencent",
            "speaker": "601003"
        }
    }
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"accept"`
`option`	CallOption	✓	Call configuration parameters, see CallOption for details

Reject - Reject Incoming Call

Purpose: Reject an incoming call.

Request failure response code list: [Request Failure 4xx](https://datatracker.ietf.org/doc/html/rfc3261#autoid-216)

Example:

{
  "command": "reject",
  "reason": "Busy Here",
  "code": 486
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"reject"`
`reason`	string	✓	-	Rejection reason
`code`	number	✗	-	SIP response code

Ringing - Ringing

Purpose: Send ringing response for incoming call. (180 Ringing)

Note

If `recorder` is set in the Ringing command, the `recorder` option in subsequent Accept commands will **not** override the recording settings during the ringing phase.

Example:

{
  "command": "ringing",
  "recorder": {
    "recorderFile": "/path/to/recording.wav",
    "samplerate": 16000,
    "ptime": 200
  },
  "earlyMedia": true,
  "ringtone": "http://example.com/ringtone.wav"
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"ringing"`
`recorder`	RecorderOption	✗	-	Call recording configuration
`earlyMedia`	boolean	✗	`false`	Enable early media during ringing
`ringtone`	string	✗	-	Custom ringtone URL

TTS - Text-to-Speech

Purpose: Convert text to speech and play audio.

Example:

{
  "command": "tts",
  "text": "Hello, this is a test message",
  "speaker": "xiaoyan",
  "playId": "unique_play_id",
  "autoHangup": false,
  "streaming": false,
  "endOfStream": false,
  "waitInputTimeout": 30,
  "option": {
    "provider": "tencent",
    "volume": 5,
    "speed": 1.0
  }
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"tts"`
`text`	string	✓	-	Text to synthesize
`speaker`	string	✗	-	Voice type, see provider voice lists
`playId`	string	✗	-	TTS Track identifier.
`autoHangup`	boolean	✗	`false`	Whether to automatically hang up after TTS playback completes
`streaming`	boolean	✗	`false`	Whether to enable streaming TTS
`endOfStream`	boolean	✗	`false`	Whether this is the last command for the current playId
`waitInputTimeout`	number	✗	-	Maximum time to wait for user input, unit: seconds
`option`	SynthesisOption	✗	-	TTS provider-specific options, can configure voice format, sample rate, etc.
`base64`	boolean	✗	`false`	Whether to use base64 encoding

* When streaming TTS is enabled, TTS commands will be sent in the same WebSocket connection. * Set `endOfStream = true` when all TTS commands for the `playId` have been sent. The `TTS Track` will exit after all command results finish playing and send a [Track End](#track-end-event) event. * Set `base64=true` to pass base64-encoded PCM audio through the `text` field. * If `playId` is set, the [Track End](#track-end-event) event sent by this TTS Track will include this `playId`. * If the current `playId` is the same as a previous TTS command's `playId`, it will reuse the previous TTS Track; otherwise, it will terminate the previous TTS Track and create a new `TTS Track`.

For details, see TTS(Text-to-Speech)

Play - Play Audio

Purpose: Play audio from URL.

Example:

{
  "command": "play",
  "url": "http://example.com/audio.mp3",
  "autoHangup": false,
  "waitInputTimeout": 30
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"play"`
`url`	string	✓	-	Audio file URL to play (supports HTTP/HTTPS). This URL will be returned as playId in the trackEnd event
`autoHangup`	boolean	✗	`false`	If true, will automatically hang up after playback completes
`waitInputTimeout`	number	✗	-	Maximum time to wait for user input, unit: seconds

Interrupt - Interrupt Playback

Purpose: Interrupt current TTS or audio playback.

* If the current TTS result has not finished playing, an [Interruption event](#interruption-event) will be triggered, containing the played time and the time when audio was received from the TTS provider. If the provider supports subtitles, it will also include the estimated position of the played text. * If `graceful = true` is set, it will wait for the current TTS command result to finish playing before exiting, see [Interruption](/static/docs/active-call/guide/tts.mdx#interruption).

Example:

{
  "command": "interrupt",
  "graceful": false
}

Parameters:

Field	Type	Required	Default	Description
`command`	string	✓	-	Must be `"interrupt"`
`graceful`	boolean	✗	`false`	Whether to gracefully interrupt

Refer - Transfer Call

Purpose: Transfer call to another party (SIP REFER).

See [Transfer Call](/static/docs/active-call/guide/call.mdx#refer)

Example:

{
  "command": "refer",
  "caller": "sip:alice@example.com",
  "callee": "sip:charlie@example.com",
  "options": {
    "denoise": true,
    "timeout": 30,
    "moh": "http://example.com/hold_music.wav",
    "autoHangup": true
  }
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"refer"`
`caller`	string	✓	Transfer caller SIP address (currently connected local SIP address, e.g.: `sip:{localIP}:13050`)
`callee`	string	✓	Transfer target SIP URI (e.g., `sip:bob@example.com`)
`options`	ReferOption	✗	Transfer configuration, see ReferOption for details

Mute - Mute

Purpose: Mute all or specified Tracks.

If `trackId` is specified, mute the corresponding Track; otherwise, mute all Tracks.

Example:

{
  "command": "mute",
  "trackId": "track-123"
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"mute"`
`trackId`	string	✗	Track ID to mute (if not specified, mute all tracks)

Unmute - Unmute

Purpose: Unmute muted Tracks.

If `trackId` is specified, unmute the corresponding Track; otherwise, unmute all Tracks.

Example:

{
  "command": "unmute",
  "trackId": "track-123"
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"unmute"`
`trackId`	string	✗	Track ID to unmute (if not specified, unmute all tracks)

Hangup - Hangup

Purpose: End the call.

Example:

{
  "command": "hangup",
  "reason": "user_requested",
  "initiator": "user"
}

Parameters:

Field	Type	Required	Description
`command`	string	✓	Must be `"hangup"`
`reason`	string	✗	Hangup reason
`initiator`	string	✗	Party that initiated the hangup (user, system, etc.)

Events

Events are received from the server in JSON format. All timestamps are in milliseconds. Each event contains an event field indicating the event type, and most events also include a trackId field indicating the related Track.

Incoming - Incoming Call Event

Trigger: When an incoming call is received (SIP calls only).

Example:

{
  "event": "incoming",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "caller": "sip:alice@example.com",
  "callee": "sip:bob@example.com",
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}

Fields:

Field	Type	Description
`event`	string	Always `"incoming"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`caller`	string	Caller’s SIP address
`callee`	string	Callee’s SIP address
`sdp`	string	SDP offer from the caller

Answer - Answer Event

Trigger: When the call is answered and SDP negotiation is complete.

For SIP calls, the Answer event is triggered when `200 OK` is received.

Example:

{
  "event": "answer",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}

Fields:

Field	Type	Description
`event`	string	Always `"answer"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`sdp`	string	SDP answer from the callee

Reject - Reject Event

Trigger: When the call is rejected.

Example:

{
  "event": "reject",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "reason": "Busy",
  "code": 486
}

Fields:

Field	Type	Description
`event`	string	Always `"reject"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`reason`	string	Rejection reason
`code`	number	SIP response code (optional)

Request failure response code list: [Request Failure 4xx](https://datatracker.ietf.org/doc/html/rfc3261#autoid-216)

Ringing - Ringing Event

Trigger: When the call is ringing (SIP calls only).

Example:

{
  "event": "ringing",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "earlyMedia": false
}

Fields:

Field	Type	Description
`event`	string	Always `"ringing"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`earlyMedia`	boolean	Whether early media is available

Hangup - Hangup Event

Trigger: When the call ends.

Example:

{
  "event": "hangup",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "reason": "user_requested",
  "initiator": "user",
  "startTime": "2024-01-01T12:00:00Z",
  "hangupTime": "2024-01-01T12:05:30Z",
  "answerTime": "2024-01-01T12:00:05Z",
  "from": {
    "username": "alice",
    "realm": "example.com",
    "source": "sip:alice@example.com"
  }
}

Fields:

Field	Type	Description
`event`	string	Always `"hangup"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`reason`	string	Hangup reason (optional)
`initiator`	string	Party that initiated the hangup (optional)
`startTime`	string	ISO 8601 timestamp of call start
`hangupTime`	string	ISO 8601 timestamp of call end
`answerTime`	string	ISO 8601 timestamp of call answer (optional)
`ringingTime`	string	ISO 8601 timestamp of call ringing start (optional)
`from`	Attendee	Caller information (optional)
`to`	Attendee	Callee information (optional)
`extra`	object	Additional call metadata (optional)

Speaking - Speaking Event

Trigger: When VAD detects speech. Call Option must configure VAD

Example:

{
  "event": "speaking",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "startTime": 1640995200000
}

Fields:

Field	Type	Description
`event`	string	Always `"speaking"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`startTime`	number	Time when speech started (milliseconds)

Silence - Silence Event

Trigger: When more than speechPadding milliseconds have passed since the current speech started, and more than silencePadding milliseconds have passed since the last Silence event was triggered (if any). Call Option must configure VAD.

Example:

{
  "event": "silence",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "startTime": 1640995195000,
  "duration": 5000
}

Fields:

Field	Type	Description
`event`	string	Always `"silence"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`startTime`	number	Time when silence started (milliseconds)
`duration`	number	Silence duration (milliseconds)

AsrFinal - ASR Final Event

Trigger: Stable result of speech recognition.

Example:

{
  "event": "asrFinal",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "index": 1,
  "startTime": 1640995200000,
  "endTime": 1640995205000,
  "text": "Hello, how can I help you today?"
}

Fields:

Field	Type	Description
`event`	string	Always `"asrFinal"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`index`	number	ASR result sequence number
`startTime`	number	Speech start time (milliseconds, optional)
`endTime`	number	Speech end time (milliseconds, optional)
`text`	string	Speech recognition result

AsrDelta - ASR Delta Event

Trigger: Intermediate result of speech recognition (may change).

Example:

{
  "event": "asrDelta",
  "trackId": "track-abc123",
  "index": 1,
  "timestamp": 1640995200000,
  "text": "Hello, how can"
}

Fields:

Field	Type	Description
`event`	string	Always `"asrDelta"`
`trackId`	string	Unique identifier of the Track
`index`	number	ASR result sequence number
`timestamp`	number	Event timestamp (milliseconds)
`startTime`	number	Speech start time (milliseconds, optional)
`endTime`	number	Speech end time (milliseconds, optional)
`text`	string	Speech recognition result (may change)

TrackStart - Track Start Event

Trigger: When a Track starts (RTP, TTS, file playback, etc.).

Example:

{
  "event": "trackStart",
  "trackId": "track-tts-456",
  "timestamp": 1640995200000,
  "playId": "llm-001"
}

Fields:

Field	Type	Description
`event`	string	Always `"trackStart"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`playId`	string	TTS command’s playId (optional) or Play command’s URL (optional)

Both TTS and Play commands create corresponding Tracks. The playId in TrackStart is: * TTS Track: TTS command's playId (optional) * Play Track: Play command's URL

TrackEnd - Track End Event

Trigger: When a Track ends (RTP ends, TTS completes, file playback completes, etc.).

Example:

{
  "event": "trackEnd",
  "trackId": "track-tts-456",
  "timestamp": 1640995230000,
  "duration": 30000,
  "ssrc": 1234567890,
  "playId": "llm-001"
}

Fields:

Field	Type	Description
`event`	string	Always `"trackEnd"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`duration`	number	Track duration (milliseconds)
`ssrc`	number	RTP synchronization source identifier
`playId`	string	TTS command’s playId (optional) or Play command’s URL (optional)

Both TTS and Play commands create corresponding Tracks. The playId in TrackEnd is: * TTS Track: TTS command's playId (optional) * Play Track: Play command's URL

Interruption - Interruption Event

Trigger: When an Interrupt command is received and there are unfinished TTS commands.

Example:

{
  "event": "interruption",
  "trackId": "track-tts-456",
  "timestamp": 1640995215000,
  "playId": "llm-001",
  "subtitle": "Hello, this is a long message that was interrupted",
  "position": 5,
  "totalDuration": 30000,
  "current": 15000
}

Fields:

Field	Type	Description
`event`	string	Always `"interruption"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`playId`	string	For TTS commands, this is the playId from the TTS command (optional)
`subtitle`	string	Current TTS text being played at interruption (optional)
`position`	number	Word index position in subtitle at interruption (optional)
`totalDuration`	number	Total duration of TTS content (milliseconds)
`current`	number	Time elapsed since TTS start at interruption (milliseconds)

Dtmf - DTMF Event

Trigger: When a keypress is detected.

Example:

{
  "event": "dtmf",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "digit": "1"
}

Fields:

Field	Type	Description
`event`	string	Always `"dtmf"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`digit`	string	DTMF digit (0-9, *, #, A-D)

Metrics - Metrics Event

Trigger: When performance metrics are available.

Example:

{
  "event": "metrics",
  "timestamp": 1640995200000,
  "key": "ttfb.asr.tencent",
  "duration": 150,
  "data": {
    "index": 1,
    "provider": "tencent"
  }
}

Fields:

Field	Type	Description
`event`	string	Always `"metrics"`
`timestamp`	number	Event timestamp (milliseconds)
`key`	string	Metric key (e.g., “ttfb.asr.tencent”)
`duration`	number	Duration (milliseconds)
`data`	object	Additional metric data

Error - Error Event

Trigger: When an error occurs during processing.

Example:

{
  "event": "error",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sender": "asr",
  "error": "Connection timeout to ASR service",
  "code": 408
}

Fields:

Field	Type	Description
`event`	string	Always `"error"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`sender`	string	Component that generated the error (asr, tts, media, etc.)
`error`	string	Error message description
`code`	number	Error code (optional)

Binary - Binary Event

Trigger: When binary audio data is sent (WebSocket calls only).

Example:

{
  "event": "binary",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "data": []
}

Fields:

Field	Type	Description
`event`	string	Always `"binary"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`data`	array	Binary audio data bytes

Ping - Ping Event

Trigger: When periodic Ping messages are sent (if the pingInterval parameter is set in the connection URL), see Connect to Active Call.

Example:

{
  "event": "ping",
  "timestamp": 1640995200000,
  "payload": "optional_payload"
}

Fields:

Field	Type	Description
`event`	string	Always `"ping"`
`timestamp`	number	Event timestamp (milliseconds)
`payload`	string	Optional payload data (optional)

Other - Other Event

Trigger: When custom or extended events are generated.

Example:

{
  "event": "other",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sender": "custom_plugin",
  "extra": {
    "custom_field": "custom_value"
  }
}

Fields:

Field	Type	Description
`event`	string	Always `"other"`
`trackId`	string	Unique identifier of the Track
`timestamp`	number	Event timestamp (milliseconds)
`sender`	string	Component that generated the event
`extra`	object	Additional event data (optional)

Options

CallOption

The CallOption object is used in Invite and Accept commands, containing call configuration.

Example:

{
  "denoise": true,
  "offer": "SDP offer string",
  "callee": "sip:callee@example.com",
  "caller": "sip:caller@example.com",
  "codec": "g722",
  "recorder": {
    "recorderFile": "/path/to/recording.wav",
    "samplerate": 16000
  },
  "asr": {
    "provider": "tencent",
    "language": "zh-CN"
  },
  "tts": {
    "provider": "tencent",
    "speaker": "xiaoyan"
  }
}

Fields:

Field	Type	Required	Default	Description
`denoise`	boolean	✗	`false`	Enable audio processing noise reduction
`offer`	string	✗	-	SDP offer string for WebRTC/SIP negotiation
`callee`	string	✗	-	Callee’s SIP URI or phone number (e.g., “sip:bob@example.com”)
`caller`	string	✗	-	Caller’s SIP URI or phone number (e.g., “sip:alice@example.com”)
`codec`	string	✗	`"pcmu"`	Audio codec: `"pcmu"`, `"pcma"`, `"g722"`, `"pcm"` (only for WebSocket calls)
`recorder`	RecorderOption	✗	-	Call recording configuration
`vad`	VADOption	✗	-	Voice activity detection configuration
`asr`	TranscriptionOption	✗	-	Automatic Speech Recognition (ASR) configuration
`tts`	SynthesisOption	✗	-	Text-to-Speech configuration
`mediaPass`	MediaPassOption	✗	-	Media Pass configuration
`handshakeTimeout`	string	✗	-	Connection handshake timeout (e.g., “30s”)
`enableIpv6`	boolean	✗	`false`	Enable IPv6 support
`sip`	SipOption	✗	-	SIP registration account, password, and domain configuration
`extra`	object	✗	-	Additional parameters

RecorderOption

Call recording configuration options.

Example:

{
  "recorderFile": "/path/to/recording.wav",
  "samplerate": 16000,
  "ptime": 200
}

Fields:

Field	Type	Required	Default	Description
`recorderFile`	string	✓	-	Recording file path
`samplerate`	number	✗	`16000`	Recording sample rate, unit: Hz
`ptime`	number	✗	`200`	Packet time, unit: milliseconds

TranscriptionOption

Automatic Speech Recognition (ASR) configuration options.

Example:

{
  "provider": "tencent",
  "language": "zh-CN",
  "appId": "app_id",
  "secretId": "your_secret_id",
  "secretKey": "your_secret_key",
  "modelType": "16k_zh",
  "samplerate": 16000,
  "startWhenAnswer": true
}

Fields:

Field	Type	Required	Default	Description
`provider`	string	✗	-	ASR provider: `tencent`, `aliyun`, `Deepgram`, etc.
`language`	string	✗	-	Language (e.g., “zh-CN”, “en-US”) (see corresponding provider documentation for details)
`appId`	string	✗	-	Tencent Cloud’s appId
`secretId`	string	✗	-	Tencent Cloud’s secretId
`secretKey`	string	✗	-	Tencent Cloud’s secretKey, or other provider’s API Key
`modelType`	string	✗	-	ASR model type (e.g., “16k_zh”, “8k_en”), see provider documentation for details
`bufferSize`	number	✗	-	Audio buffer size, unit: bytes
`samplerate`	number	✗	`16000`	Sample rate
`endpoint`	string	✗	-	Custom service endpoint URL
`extra`	object	✗	-	Provider-specific parameters
`startWhenAnswer`	boolean	✗	`false`	Request ASR service after call is answered

SynthesisOption

Text-to-Speech (TTS) configuration options.

Example:

{
  "provider": "tencent",
  "speaker": "xiaoyan",
  "volume": 5,
  "speed": 1.0,
  "emotion": "neutral",
  "samplerate": 16000
}

Fields:

Field	Type	Required	Default	Description
`provider`	string	✗	-	TTS provider: `"tencent"`, `"aliyun"`, `"deepgram"`, `"voiceapi"`
`speaker`	string	✗	-	Voice, see provider documentation
`volume`	number	✗	`5`	Volume (1-10)
`speed`	number	✗	`1.0`	Speech rate
`samplerate`	number	✗	`16000`	Sample rate, unit: hz
`appId`	string	✗	-	Tencent Cloud’s appId
`secretId`	string	✗	-	Tencent Cloud’s secretId
`secretKey`	string	✗	-	Tencent Cloud’s secretKey, or other provider’s API Key
`codec`	string	✗	-	Encoding format
`subtitle`	boolean	✗	`false`	Whether to enable subtitles
`endpoint`	string	✗	-	Custom TTS service endpoint URL
`extra`	object	✗	-	Additional provider-specific parameters
`maxConcurrentTasks`	number	✗	-	Maximum concurrent tasks for non-streaming TTS commands, default is 1

VADOption

Voice Activity Detection (VAD) configuration options.

Example:

{
  "type": "webrtc",
  "samplerate": 16000,
  "speechPadding": 250,
  "silencePadding": 100,
  "voiceThreshold": 0.5,
  "maxBufferDurationSecs": 50
}

Fields:

Field	Type	Required	Default	Description
`type`	string	✗	`"silero"`	VAD algorithm type: `"silero"` , `"ten"`
`samplerate`	number	✗	`16000`	Sample rate
`speechPadding`	number	✗	`250`	Start detection `speechPadding` milliseconds after speech starts
`silencePadding`	number	✗	`100`	Silence event trigger interval, unit: milliseconds
`ratio`	number	✗	`0.5`	Speech detection ratio threshold
`voiceThreshold`	number	✗	`0.5`	Voice energy threshold
`maxBufferDurationSecs`	number	✗	`50`	Maximum buffer duration, unit: seconds
`silenceTimeout`	number	✗	-	Silence detection timeout, unit: milliseconds
`endpoint`	string	✗	-	Custom VAD service endpoint
`secretKey`	string	✗	-	VAD service authentication key
`secretId`	string	✗	-	VAD service authentication ID

MediaPassOption

Media pass-through configuration options, using WebSocket service to completely take over audio processing.

Active Call will send all audio data to the configured WebSocket service and play audio received from that service to the other party of the audio connection.

See: Media Pass

Example:

{
  "url": "ws://localhost:9090/media",
  "inputSampleRate": 16000,
  "outputSampleRate": 16000,
  "packetSize": 2560
}

Fields:

Field	Type	Required	Default	Description
`url`	string	✓	-	WebSocket connection URL for media stream
`inputSampleRate`	number	✓	-	Audio sample rate received from WebSocket server (also the track’s sample rate)
`outputSampleRate`	number	✓	-	Audio sample rate sent to WebSocket server
`packetSize`	number	✗	`2560`	Packet size sent to WebSocket server, unit: bytes

ReferOption

Transfer configuration options.

Example:

{
  "denoise": true,
  "timeout": 30,
  "moh": "http://example.com/hold_music.wav",
  "asr": {
    "provider": "tencent",
    "language": "zh-CN"
  },
  "autoHangup": true,
  "sip": {
    "username": "transfer_user",
    "password": "transfer_password"
  }
}

Fields:

Field	Type	Required	Default	Description
`denoise`	boolean	✗	`false`	Enable noise reduction during transfer
`timeout`	number	✗	-	Transfer timeout, unit: seconds
`moh`	string	✗	-	Hold music URL to play during transfer
`asr`	TranscriptionOption	✗	-	Automatic speech recognition configuration
`autoHangup`	boolean	✗	`false`	Automatically hang up after transfer completes
`sip`	SipOption	✗	-	SIP configuration

SipOption

SIP protocol configuration options.

Example:

{
  "username": "user",
  "password": "password",
  "realm": "example.com",
  "headers": {
    "X-Custom-Header": "value"
  }
}

Fields:

Field	Type	Required	Default	Description
`username`	string	✗	-	SIP username for authentication
`password`	string	✗	-	SIP password for authentication
`realm`	string	✗	-	SIP domain/realm for authentication
`headers`	object	✗	-	Additional SIP protocol headers (key-value pairs)

Attendee

Call participant information.

Example:

{
  "username": "alice",
  "realm": "example.com",
  "source": "sip:alice@example.com"
}

Fields:

Field	Type	Description
`username`	string	Username part of SIP URI
`realm`	string	Domain/realm part of SIP URI
`source`	string	Complete SIP URI or phone number

REST API Endpoints

List Active Calls

Endpoint: GET /call/lists

Description: Returns a list of active calls.

Response:

{
  "calls": [
    {
      "id": "session-id",
      "call_type": "webrtc",
      "created_at": "2024-01-01T12:00:00Z",
      "option": {
        "caller": "1234567890",
        "callee": "0987654321"
      }
    }
  ]
}

Usage:

curl http://localhost:8080/call/lists

Terminate Call

Endpoint: POST /call/kill/{id}

Description: Terminate a specific active call.

Parameters:

id (path parameter, string): Session ID of the call to terminate

Response:

true

Usage:

curl -X POST http://localhost:8080/call/kill/session123

Get ICE Servers

Endpoint: GET /iceservers

Description: Returns ICE server configuration for WebRTC connections.

Response:

[
  {
    "urls": ["stun:restsend.com:3478"],
    "username": null,
    "credential": null
  },
  {
    "urls": ["turn:restsend.com:3478"],
    "username": "username",
    "credential": "password"
  }
]

Usage:

curl http://localhost:8080/iceservers

Error Handling

All endpoints return appropriate HTTP status codes:

200 OK: Success
400 Bad Request: Invalid parameters
404 Not Found: Resource not found
500 Internal Server Error: Server error

Notes

All WebSocket endpoints support real-time bidirectional communication
When WebSocket connection closes, call sessions are automatically cleaned up
Event dumping can be disabled by setting the dump=false parameter
ICE servers are automatically configured based on environment variables
Audio codecs are automatically negotiated based on functionality
VAD (Voice Activity Detection) events are used for speech detection
ASR (Automatic Speech Recognition) provides real-time transcription
TTS (Text-to-Speech) supports streaming synthesis
All timestamps are in milliseconds
trackId is used to identify which Track generated the event
playId prevents interrupting previous TTS playback when using the same ID. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
autoHangup automatically ends the call after TTS/Play completes