Skip to main content

Go SDK

Download Source Code

Download from the RustPBXGo GitHub repository:

git clone github.com/restsend/rustpbxgo

Directory Structure

rustpbxgo/
├── README.md
├── client.go # SDK core implementation
├── cmd/ # Example application
│ ├── main.go # Program entry point
│ ├── llm.go # Large model interaction logic
│ ├── media.go # WebRTC media processing
│ └── webhook.go # Webhook handling
├── go.mod
└── go.sum

client.go contains the definitions of core data structures (commands, events, and callback functions).

The cmd/ directory contains an application example with SIP/WebRTC calls, incoming call handling with webhooks, and large model interaction logic.

Client

The Client struct fields mainly include:

  • endpoint: RustPBX listening address.
  • id: Set the connection session ID, mainly used for answering calls.
  • OnXXX: Callback functions for handling events.

Flow Diagram

When the client calls the Connect method, it creates two goroutines (green parts in the diagram below).

  • One is responsible for reading and parsing WebSocket messages (top)
  • The other is responsible for processing messages and sending commands (bottom). When an event is received, it calls the corresponding callback function based on the event type.
c.conn.ReadMessage()
c.conn.ReadMessage()
RustPBX
RustPBX
Event/WebSocket
Event/WebSocket
Event/c.eventChan
Event/c.eventChan
c.processEvent()
c.processEvent()
OnAsrFinal
OnAsrFinal
client
client
c.conn.WriteJson()
c.conn.WriteJson()
Command/WebSocket
Command/WebSocket
OnSpeaking
OnSpeaking
OnHangup
OnHangup
Text is not SVG - cannot display

Creating a Client

Use the NewClient function to create a client instance:

client := rustpbxgo.NewClient(endpoint, opts...)

Parameters:

ParameterTypeRequiredDescription
endpointstringRustPBX server address
opts...ClientOptionOptional configuration options

Options:

OptionParameter TypeDescription
WithLogger(logger)*logrus.LoggerSet logger
WithContext(ctx)context.ContextSet context for closing created goroutines
WithID(id)stringSet session ID for answering calls
WithDumpEvents(enable)boolEnable event dumping

Example:

client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
rustpbxgo.WithID("my-session-id"),
rustpbxgo.WithDumpEvents(true),
)

Connecting to Server

Use the Connect method to connect to RustPBX:

err := client.Connect(callType)

Parameters:

ParameterTypeOptional ValuesDescription
callTypestring"sip", "webrtc", ""Call type

Closing Client

Use the Shutdown method to close the client connection:

err := client.Shutdown()

Sending Commands

The client provides various methods for sending commands to the server.

Invite - Initiate Call

Initiate a call, returns AnswerEvent or error. See: Initiate Call.

answer, err := client.Invite(ctx, callOption)

Parameters:

ParameterTypeDescription
ctxcontext.ContextContext for cancellation
callOptionCallOptionCall configuration, see CallOption

Return Values:

TypeDescription
*AnswerEventAnswer event (if successful)
errorError information

Example:

// sip call
callOption := rustpbxgo.CallOption{
Caller: "sip:alice@example.com",
Callee: "sip:bob@example.com",
}

answer, err := client.Invite(ctx, callOption)
if err != nil {
log.Fatalf("Call failed: %v", err)
}

Accept - Answer Incoming Call

Answer an incoming call. Used for answering incoming calls, see: Answer/Reject Incoming Call.

err := client.Accept(callOption)

Parameters:

ParameterTypeDescription
callOptionCallOptionCall configuration, see CallOption
info
  • The CallOption configuration for Accept is the same as Invite, except that the callee address does not need to be set.

Example:

cmd/webhook.go
	server := gin.Default()
server.POST(prefix, func(c *gin.Context) {
var form IncomingCall
if err := c.ShouldBindJSON(&form); err != nil {
c.JSON(400, gin.H{"error": err.Error()})
return
}

client := createClient(parent, option, form.DialogID)

go func() {
ctx, cancel := context.WithCancel(parent)
defer cancel()
err := client.Connect("sip")
if err != nil {
option.Logger.Errorf("Failed to connect to server: %v", err)
}
defer client.Shutdown()

client.Accept(option.CallOption)
<-ctx.Done()
}()

c.JSON(200, gin.H{"message": "OK"})
})
server.Run(addr)

Ringing - Send Ringing

Send ringing response. Used for SIP calls, see: 180 Ringing.

err := client.Ringing(ringtone, recorder)

Parameters:

ParameterTypeRequiredDescription
ringtonestringRingtone URL
recorder*RecorderOptionRecording configuration, see RecorderOption

Example:

client.Ringing("http://example.com/ringtone.wav", recorder)

Reject - Reject Incoming Call

Reject an incoming call. Used for rejecting incoming calls, see: Answer/Reject Incoming Call.

err := client.Reject(reason)

Parameters:

ParameterTypeDescription
reasonstringRejection reason

Example:

client.Reject("Busy")

TTS - Text-to-Speech

Convert text to speech and play, see: TTS (Text-to-Speech).

Parameters:

ParameterTypeRequiredDescription
textstringText to synthesize
speakerstringVoice
playIDstringTTS Track identifier
endOfStreamboolWhether this is the last TTS command for the current playId
autoHangupboolWhether to automatically hang up after TTS playback completes
option*TTSOptionTTS options, see TTSOption
waitInputTimeout*uint32Maximum time to wait for user input (seconds)
tip
  • Set endOfStream = true to indicate that all TTS commands for the current playId have been sent. The TTS Track will exit after all command results finish playing and send a Track End event.
  • If playId is set, the Track End event sent by this TTS Track will include this playId.
    • If the current playId is the same as a previous TTS command's playId, it will reuse the previous TTS Track; otherwise, it will terminate the previous TTS Track and create a new TTS Track.

For details, see TTS(Text-to-Speech)

StreamTTS - Streaming TTS

Convert text to speech and play (for LLM streaming output).

err := client.StreamTTS(text, speaker, playID, endOfStream, autoHangup, option, waitInputTimeout)
tip

The difference from TTS is that the corresponding TTS command has streaming = true, everything else is the same.

See Streaming TTS.

Play - Play Audio

Play audio file:

err := client.Play(url, autoHangup, waitInputTimeout)

Parameters:

ParameterTypeRequiredDescription
urlstringAudio file URL
autoHangupboolWhether to automatically hang up after playback completes
waitInputTimeout*uint32Wait for input timeout (seconds)

Interrupt - Interrupt Playback

Interrupt current TTS or audio playback:

err := client.Interrupt(graceful)

Parameters:

ParameterTypeRequiredDescription
gracefulboolWhether to gracefully interrupt (wait for current TTS command to finish playing before exiting)
info
  • When TTS has not finished playing, will return Interruption event.

  • When graceful=true is set, the TTS Track will wait for the current TTS command to finish playing before exiting, otherwise it will exit immediately.

  • graceful only takes effect for non-streaming TTS (streaming=false).

See Interruption

Example:

client.OnSpeaking = func(event rustpbxgo.SpeakingEvent) {
// Immediately interrupt TTS when user speaks
client.Interrupt(false)
}

Hangup - Hangup Call

When the call is already established, use Hangup to end the call:

Parameters:

ParameterTypeRequiredDescription
reasonstringHangup reason

Refer - Transfer Call

Transfer call to another target, used for transfer-to-human logic. See: Transfer Call

err := client.Refer(caller, callee, options)

Parameters:

ParameterTypeRequiredDescription
callerstringTransfer caller SIP address
calleestringTransfer target SIP URI
options*ReferOptionTransfer options, see ReferOption

Mute - Mute

Mute all or specified Tracks:

err := client.Mute(trackID)

Parameters:

ParameterTypeDescription
trackID*stringTrack ID (if nil, mute all Tracks)

Example:

// Mute all Tracks
client.Mute(nil)

// Mute specific Track
trackID := "track-123"
client.Mute(&trackID)

Unmute - Unmute

Unmute all or specified Tracks:

err := client.Unmute(trackID)

Parameters:

ParameterTypeDescription
trackID*stringTrack ID (if nil, unmute all Tracks)

Event Callbacks

Client defines multiple fields (fields starting with On), used to set callback functions for events.

OnAnswer - Answer Callback

Trigger: Triggered when the call is answered and SDP negotiation is complete. See: AnswerEvent.

Purpose: Initialization operations after successful call. AnswerEvent contains SDP answer.

client.OnAnswer = func(event rustpbxgo.AnswerEvent) {
log.Printf("Call answered: %s", event.TrackID)
// Start sending welcome message
client.TTS("Hello, welcome to call", "", "welcome", true, false, nil, nil)
}

OnReject - Reject Callback

Trigger: Triggered when the call is rejected. See RejectEvent.

Purpose: Handle rejection logic, record rejection reason (event.Reason) or perform follow-up processing.

client.OnReject = func(event rustpbxgo.RejectEvent) {
log.Printf("Call rejected: %s", event.Reason)
// Record rejection reason, clean up resources
}

OnRinging - Ringing Callback

Trigger: Triggered when the call is ringing (SIP calls). See RingingEvent.

Purpose: Monitor call progress, determine if early media (EarlyMedia) is available.

client.OnRinging = func(event rustpbxgo.RingingEvent) {
log.Printf("Ringing, early media: %v", event.EarlyMedia)
}

OnHangup - Hangup Callback

Trigger: Triggered when the call ends. See HangupEvent.

Purpose: Clean up resources, save call records. Can get hangup reason (event.Reason), call duration (event.hangUpTime - event.startTime), caller/callee information (event.From,event.To), etc. from HangupEvent.

client.OnHangup = func(event rustpbxgo.HangupEvent) {
log.Printf("Call ended: %s, initiator: %s", event.Reason, event.Initiator)
// Save call record, clean up resources
}

OnSpeaking - Speaking Callback

Trigger: Triggered when VAD detects user starts speaking. See SpeakingEvent.

Purpose: Detect user input, commonly used to interrupt TTS playback. Can get speech start time (event.StartTime) from the event.

client.OnSpeaking = func(event rustpbxgo.SpeakingEvent) {
log.Printf("User speaking detected, interrupt playback")
// Immediately interrupt current TTS
client.Interrupt(false)
}

OnSilence - Silence Callback

Trigger: Triggered when user stops speaking is detected. See SilenceEvent.

Purpose: Determine if user has finished speaking, can combine silence duration (event.Duration) to decide whether to start processing.

client.OnSilence = func(event rustpbxgo.SilenceEvent) {
log.Printf("Silence detected, duration %d ms", event.Duration)
// User might have finished speaking, prepare to process
}

OnAsrFinal - ASR Final Result Callback

Trigger: Triggered when speech recognition obtains stable result. See AsrFinalEvent.

Purpose: Get user's final speech input (event.Text), used for business logic processing or sending to LLM. Can distinguish different speech segments through sequence number (event.Index).

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
log.Printf("User said: %s", event.Text)
// Send user input to LLM for processing
response := callLLM(event.Text)
client.TTS(response, "", "reply", true, false, nil, nil)
}

OnAsrDelta - ASR Delta Result Callback

Trigger: Intermediate result during speech recognition process, content may change. See AsrDeltaEvent.

Purpose: Display recognition progress in real-time, improve user experience. Should not be used directly for business logic processing.

client.OnAsrDelta = func(event rustpbxgo.AsrDeltaEvent) {
log.Printf("Recognizing: %s", event.Text)
// Only for display, no business processing
}

OnTrackStart - Track Start Callback

Trigger: Triggered when a Track starts (RTP, TTS, file playback, etc.). See TrackStartEvent.

Purpose: Monitor audio playback start. For TTS Track, can get TTS command's playId via event.PlayID; for Play Track, can get the playback URL.

client.OnTrackStart = func(event rustpbxgo.TrackStartEvent) {
log.Printf("Track started: %s, PlayID: %s", event.TrackID, event.PlayID)
}

OnTrackEnd - Track End Callback

Trigger: Triggered when a Track ends (RTP ends, TTS completes, file playback completes, etc.). See TrackEndEvent.

Purpose: Monitor audio playback end, can be used to control playback flow or clean up resources. Can get duration (event.Duration) and PlayID (event.PlayID) from the event.

client.OnTrackEnd = func(event rustpbxgo.TrackEndEvent) {
log.Printf("Track ended: %s, duration: %d ms, PlayID: %s",
event.TrackID, event.Duration, event.PlayID)
// TTS playback completed, can send next one
}

OnInterruption - Interruption Callback

Trigger: Triggered when Interrupt command is received and there are unfinished TTS. See InterruptionEvent.

Purpose: Get interruption information, such as played time (event.PlayedMs) and played text position (event.Subtitle, if provider supports subtitles).

client.OnInterruption = func(event rustpbxgo.InterruptionEvent) {
if event.Subtitle != nil {
log.Printf("Playback interrupted, played: %s", *event.Subtitle)
}
// Record interruption position for subsequent processing
}

OnDTMF - DTMF Callback

Trigger: Triggered when a keypress is detected. See DTMFEvent.

Purpose: Handle user keypress input, such as IVR menu selection. Can get keypress value (0-9, *, #, A-D) from event.Digit.

client.OnDTMF = func(event rustpbxgo.DTMFEvent) {
log.Printf("User pressed: %s", event.Digit)
// Handle keypress logic, such as menu navigation
if event.Digit == "1" {
client.TTS("You selected option 1", "", "menu", true, false, nil, nil)
}
}

OnError - Error Callback

Trigger: Triggered when an error occurs. See ErrorEvent.

Purpose: Handle various error situations. From event.Sender you can know the error source (asr, tts, media, etc.), from event.Error get error information.

client.OnError = func(event rustpbxgo.ErrorEvent) {
log.Printf("Error [%s]: %s", event.Sender, event.Error)
// Log error, perform fallback handling
}

OnMetrics - Metrics Callback

Trigger: Triggered when performance metrics are collected. See MetricsEvent.

Purpose: Monitor performance metrics. Can get metric name (event.Key) and duration (event.Duration) from the event.

client.OnMetrics = func(event rustpbxgo.MetricsEvent) {
log.Printf("Metric [%s]: %d ms", event.Key, event.Duration)
// Record performance data for analysis
}

OnClose - Connection Close

Trigger: Triggered when WebSocket connection closes.

Purpose: Handle connection disconnection logic, clean up resources or attempt reconnection.

client.OnClose = func(reason string) {
log.Printf("Connection closed: %s", reason)
// Clean up resources or reconnect
}

OnEvent - Generic Event Handler

Trigger: Triggered when any event is received.

Purpose: Log all events or handle undefined special events. Receives raw event type (event) and JSON data (payload).

client.OnEvent = func(event string, payload string) {
log.Printf("Event received [%s]: %s", event, payload)
// Generic event logging or special handling
}

Options

CallOption

Call configuration options:

type CallOption struct {
Denoise bool
Offer string
Callee string
Caller string
Recorder *RecorderOption
VAD *VADOption
ASR *ASROption
TTS *TTSOption
HandshakeTimeout string
EnableIPv6 bool
Sip *SipOption
Extra map[string]string
}

Field Description:

FieldTypeDescription
DenoiseboolWhether to enable noise reduction
OfferstringSDP offer string for WebRTC/SIP negotiation
CalleestringCallee's SIP URI or phone number
CallerstringCaller's SIP URI or phone number
Recorder*RecorderOptionCall recording configuration, see RecorderOption
VAD*VADOptionVoice activity detection configuration, see VADOption
ASR*ASROptionAutomatic Speech Recognition (ASR) configuration, see ASROption
TTS*TTSOptionText-to-Speech configuration, see TTSOption
HandshakeTimeoutstringConnection handshake timeout
EnableIPv6boolEnable IPv6 support
Sip*SipOptionSIP registration account, password, and domain configuration, see SipOption
Extramap[string]stringAdditional parameters

RecorderOption

Recording configuration options:

type RecorderOption struct {
RecorderFile string
Samplerate int
Ptime int
}

Field Description:

FieldTypeDefaultDescription
RecorderFilestring-Recording file path
Samplerateint16000Sample rate (Hz)
Ptimeint200Packet time (milliseconds)

ASROption

Speech recognition configuration options:

type ASROption struct {
Provider string
Model string
Language string
AppID string
SecretID string
SecretKey string
ModelType string
BufferSize int
SampleRate uint32
Endpoint string
Extra map[string]string
StartWhenAnswer bool
}

Field Description:

FieldTypeDescription
ProviderstringASR provider: tencent, aliyun, Deepgram, etc.
ModelstringModel name
LanguagestringLanguage (e.g.: zh-CN, en-US), see corresponding provider documentation for details
AppIDstringTencent Cloud's appId
SecretIDstringTencent Cloud's secretId
SecretKeystringTencent Cloud's secretKey, or other provider's API Key
ModelTypestringASR model type (e.g.: 16k_zh, 8k_en), see provider documentation for details
BufferSizeintAudio buffer size, unit: bytes
SampleRateuint32Sample rate
EndpointstringCustom service endpoint URL
Extramap[string]stringProvider-specific parameters
StartWhenAnswerboolRequest ASR service after call is answered

TTSOption

Text-to-speech synthesis configuration options:

type TTSOption struct {
Samplerate int32
Provider string
Speed float32
AppID string
SecretID string
SecretKey string
Volume int32
Speaker string
Codec string
Subtitle bool
Emotion string
Endpoint string
Extra map[string]string
WaitInputTimeout uint32
}

Field Description:

FieldTypeDescription
Samplerateint32Sample rate, unit: Hz
ProviderstringTTS provider: tencent, aliyun, deepgram, voiceapi
Speedfloat32Speech rate
AppIDstringTencent Cloud's appId
SecretIDstringTencent Cloud's secretId
SecretKeystringTencent Cloud's secretKey, or other provider's API Key
Volumeint32Volume (1-10)
SpeakerstringVoice, see provider documentation
CodecstringEncoding format
SubtitleboolWhether to enable subtitles
EmotionstringEmotion: neutral, happy, sad, angry, etc.
EndpointstringCustom TTS service endpoint URL
Extramap[string]stringProvider-specific parameters
WaitInputTimeoutuint32Maximum time to wait for user input (milliseconds)

VADOption

Voice activity detection configuration options:

type VADOption struct {
Type string
Samplerate uint32
SpeechPadding uint64
SilencePadding uint64
Ratio float32
VoiceThreshold float32
MaxBufferDurationSecs uint64
Endpoint string
SecretKey string
SecretID string
SilenceTimeout uint
}

Field Description:

FieldTypeDefaultDescription
TypestringwebrtcVAD algorithm type: silero, ten, webrtc
Samplerateuint3216000Sample rate
SpeechPaddinguint64250Start detection speechPadding milliseconds after speech starts
SilencePaddinguint64100Silence event trigger interval, unit: milliseconds
Ratiofloat320.5Speech detection ratio threshold
VoiceThresholdfloat320.5Voice energy threshold
MaxBufferDurationSecsuint6450Maximum buffer duration, unit: seconds
Endpointstring-Custom VAD service endpoint
SecretKeystring-VAD service authentication key
SecretIDstring-VAD service authentication ID
SilenceTimeoutuint5000Silence detection timeout, unit: milliseconds

SipOption

SIP configuration options:

type SipOption struct {
Username string
Password string
Realm string
Headers map[string]string
}

Field Description:

FieldTypeDescription
UsernamestringSIP username for authentication
PasswordstringSIP password for authentication
RealmstringSIP domain/realm for authentication
Headersmap[string]stringAdditional SIP protocol headers (key-value pairs)

ReferOption

Transfer configuration options:

type ReferOption struct {
Denoise bool
Timeout uint32
MusicOnHold string
AutoHangup bool
Sip *SipOption
ASR *ASROption
}

Field Description:

FieldTypeDescription
DenoiseboolWhether to enable noise reduction
Timeoutuint32Timeout (seconds)
MusicOnHoldstringHold music URL
AutoHangupboolAutomatically hang up after transfer completes
Sip*SipOptionSIP configuration
ASR*ASROptionASR configuration

Event Types

All event type definitions supported by Client:

Event

Base event structure, contains event type name.

type Event struct {
Event string `json:"event"`
}

Field Description:

FieldTypeDescription
EventstringEvent type name

IncomingEvent

Incoming call event, triggered when there is a new incoming call.

type IncomingEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Caller string `json:"caller"`
Callee string `json:"callee"`
Sdp string `json:"sdp"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
CallerstringCaller number
CalleestringCallee number
SdpstringSDP offer string

AnswerEvent

Answer event, triggered when the call is answered and SDP negotiation is complete.

type AnswerEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Sdp string `json:"sdp"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
SdpstringSDP answer string

RejectEvent

Reject event, triggered when the call is rejected.

type RejectEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Reason string `json:"reason"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
ReasonstringRejection reason

RingingEvent

Ringing event, triggered when the call is ringing (SIP calls).

type RingingEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
EarlyMedia bool `json:"earlyMedia"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
EarlyMediaboolWhether early media is available

HangupEvent

Hangup event, triggered when the call ends.

type HangupEventAttendee struct {
Username string `json:"username"`
Realm string `json:"realm"`
Source string `json:"source"`
}

type HangupEvent struct {
Timestamp uint64 `json:"timestamp"`
Reason string `json:"reason"`
Initiator string `json:"initiator"`
StartTime string `json:"startTime,omitempty"`
HangupTime string `json:"hangupTime,omitempty"`
AnswerTime *string `json:"answerTime,omitempty"`
RingingTime *string `json:"ringingTime,omitempty"`
From *HangupEventAttendee `json:"from,omitempty"`
To *HangupEventAttendee `json:"to,omitempty"`
Extra map[string]any `json:"extra,omitempty"`
}

HangupEvent Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
ReasonstringHangup reason
InitiatorstringParty that initiated the hangup
StartTimestringCall start time
HangupTimestringHangup time
AnswerTime*stringAnswer time
RingingTime*stringRinging time
From*HangupEventAttendeeCaller information
To*HangupEventAttendeeCallee information
Extramap[string]anyAdditional information

HangupEventAttendee Field Description:

FieldTypeDescription
UsernamestringUsername
RealmstringDomain
SourcestringSource

SpeakingEvent

Speaking event, triggered when VAD detects user starts speaking.

type SpeakingEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
StartTime uint64 `json:"startTime"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
StartTimeuint64Speech start time (milliseconds)

SilenceEvent

Silence event, triggered when user stops speaking is detected.

type SilenceEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
StartTime uint64 `json:"startTime"`
Duration uint64 `json:"duration"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
StartTimeuint64Silence start time (milliseconds)
Durationuint64Silence duration (milliseconds)

EouEvent

End of utterance event, triggered when end of speech is detected.

type EouEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Complete bool `json:"complete"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
CompleteboolWhether it ended completely

AsrFinalEvent

ASR final result event, triggered when speech recognition obtains stable result.

type AsrFinalEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Index uint32 `json:"index"`
StartTime *uint64 `json:"startTime,omitempty"`
EndTime *uint64 `json:"endTime,omitempty"`
Text string `json:"text"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
Indexuint32Speech segment index
StartTime*uint64Speech start time (milliseconds)
EndTime*uint64Speech end time (milliseconds)
TextstringRecognized text content

AsrDeltaEvent

ASR delta result event, intermediate result during speech recognition process, content may change.

type AsrDeltaEvent struct {
TrackID string `json:"trackId"`
Index uint32 `json:"index"`
Timestamp uint64 `json:"timestamp"`
StartTime *uint64 `json:"startTime,omitempty"`
EndTime *uint64 `json:"endTime,omitempty"`
Text string `json:"text"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Indexuint32Speech segment index
Timestampuint64Event timestamp (milliseconds)
StartTime*uint64Speech start time (milliseconds)
EndTime*uint64Speech end time (milliseconds)
TextstringRecognized text content (may change)

TrackStartEvent

Track start event, triggered when a Track starts (RTP, TTS, file playback, etc.).

type TrackStartEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
PlayId *string `json:"playId,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringTrack ID
Timestampuint64Event timestamp (milliseconds)
PlayId*stringPlay ID (TTS/Play command)

TrackEndEvent

Track end event, triggered when a Track ends (RTP ends, TTS completes, file playback completes, etc.).

type TrackEndEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Duration uint64 `json:"duration"`
PlayId *string `json:"playId,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringTrack ID
Timestampuint64Event timestamp (milliseconds)
Durationuint64Playback duration (milliseconds)
PlayId*stringPlay ID (TTS/Play command)

InterruptionEvent

Interruption event, triggered when Interrupt command is received and there are unfinished TTS.

type InterruptionEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Subtitle *string `json:"subtitle,omitempty"`
Position *uint32 `json:"position,omitempty"`
TotalDuration uint32 `json:"totalDuration"`
Current uint32 `json:"current"`
}

Field Description:

FieldTypeDescription
TrackIDstringTrack ID
Timestampuint64Event timestamp (milliseconds)
Subtitle*stringPlayed subtitle text
Position*uint32Playback position (character count)
TotalDurationuint32Total duration (milliseconds)
Currentuint32Current playback duration (milliseconds)

DTMFEvent

DTMF event, triggered when a keypress is detected.

type DTMFEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Digit string `json:"digit"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
DigitstringKeypress value (0-9, *, #, A-D)

AnswerMachineDetectionEvent

Answer machine detection event, triggered when an answer machine is detected.

type AnswerMachineDetectionEvent struct {
Timestamp uint64 `json:"timestamp"`
StartTime uint64 `json:"startTime"`
EndTime uint64 `json:"endTime"`
Text string `json:"text"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
StartTimeuint64Detection start time (milliseconds)
EndTimeuint64Detection end time (milliseconds)
TextstringDetected text

LLMFinalEvent

LLM final result event, triggered when large language model generates final result.

type LLMFinalEvent struct {
Timestamp uint64 `json:"timestamp"`
Text string `json:"text"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
TextstringFinal text generated by LLM

LLMDeltaEvent

LLM delta result event, triggered when large language model generates delta result.

type LLMDeltaEvent struct {
Timestamp uint64 `json:"timestamp"`
Word string `json:"word"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
WordstringDelta word generated by LLM

MetricsEvent

Metrics event, triggered when performance metrics are collected.

type MetricsEvent struct {
Timestamp uint64 `json:"timestamp"`
Key string `json:"key"`
Duration uint32 `json:"duration"`
Data map[string]any `json:"data"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
KeystringMetric name
Durationuint32Duration (milliseconds)
Datamap[string]anyMetric data

ErrorEvent

Error event, triggered when an error occurs.

type ErrorEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Sender string `json:"sender"`
Error string `json:"error"`
Code *uint32 `json:"code,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
SenderstringError source (asr, tts, media, etc.)
ErrorstringError information
Code*uint32Error code

AddHistoryEvent

Add history event, triggered when conversation history is added.

type AddHistoryEvent struct {
Sender string `json:"sender"`
Timestamp uint64 `json:"timestamp"`
Speaker string `json:"speaker"`
Text string `json:"text"`
}

Field Description:

FieldTypeDescription
SenderstringSender
Timestampuint64Event timestamp (milliseconds)
SpeakerstringSpeaker identifier
TextstringConversation text

OtherEvent

Other event, used to handle undefined event types.

type OtherEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Sender string `json:"sender"`
Extra map[string]string `json:"extra,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
SenderstringSender
Extramap[string]stringAdditional information

Complete Examples

SIP Call Example

package main

import (
"context"
"log"
"os"
"os/signal"
"syscall"

"github.com/restsend/rustpbxgo"
"github.com/sirupsen/logrus"
)

func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

logger := logrus.New()
logger.SetLevel(logrus.InfoLevel)

// Create client
client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
)

// Set event handlers
client.OnAnswer = func(event rustpbxgo.AnswerEvent) {
logger.Info("Call answered")
// Send welcome message
client.TTS("Hello, welcome to call", "", "greeting", true, false, nil, nil)
}

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
logger.Infof("User said: %s", event.Text)
// Respond based on user input
client.TTS("I received your message", "", "response", true, false, nil, nil)
}

client.OnHangup = func(event rustpbxgo.HangupEvent) {
logger.Infof("Call ended: %s", event.Reason)
cancel()
}

// Connect to server
if err := client.Connect("sip"); err != nil {
log.Fatalf("Connection failed: %v", err)
}
defer client.Shutdown()

// Configure call
callOption := rustpbxgo.CallOption{
Caller: "sip:1000@example.com",
Callee: "sip:2000@example.com",
Denoise: true,
Sip: &rustpbxgo.SipOption{
Username: "user",
Password: "pass",
Realm: "example.com",
},
ASR: &rustpbxgo.ASROption{
Provider: "tencent",
Language: "zh-CN",
},
TTS: &rustpbxgo.TTSOption{
Provider: "tencent",
Speaker: "xiaoyan",
},
VAD: &rustpbxgo.VADOption{
Type: "webrtc",
SilenceTimeout: 5000,
},
}

// Initiate call
_, err := client.Invite(ctx, callOption)
if err != nil {
log.Fatalf("Call failed: %v", err)
}

// Wait for signal
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

select {
case <-ctx.Done():
logger.Info("Call ended")
case <-sigChan:
logger.Info("Interrupt signal received")
client.Hangup("user_interrupt")
}
}

Answer Incoming Call Example

package main

import (
"context"
"encoding/json"
"log"
"net/http"

"github.com/restsend/rustpbxgo"
"github.com/sirupsen/logrus"
)

type WebhookRequest struct {
DialogID string `json:"dialogId"`
Caller string `json:"caller"`
Callee string `json:"callee"`
}

func main() {
logger := logrus.New()

// Set up Webhook handler
http.HandleFunc("/webhook", func(w http.ResponseWriter, r *http.Request) {
var req WebhookRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}

logger.Infof("Incoming call: %s -> %s", req.Caller, req.Callee)

// Handle incoming call
go handleIncomingCall(req.DialogID, req.Caller, req.Callee, logger)

w.WriteHeader(http.StatusOK)
})

log.Fatal(http.ListenAndServe(":8090", nil))
}

func handleIncomingCall(dialogID, caller, callee string, logger *logrus.Logger) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// Create client with dialogID
client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
rustpbxgo.WithID(dialogID),
)

client.OnHangup = func(event rustpbxgo.HangupEvent) {
logger.Info("Call ended")
cancel()
}

// Connect to server
if err := client.Connect("sip"); err != nil {
logger.Errorf("Connection failed: %v", err)
return
}
defer client.Shutdown()

// Send ringing
recorder := &rustpbxgo.RecorderOption{
RecorderFile: "/recordings/" + dialogID + ".wav",
Samplerate: 16000,
}
client.Ringing("", recorder)

// Answer incoming call
callOption := rustpbxgo.CallOption{
Caller: caller,
Callee: callee,
ASR: &rustpbxgo.ASROption{
Provider: "tencent",
Language: "zh-CN",
},
TTS: &rustpbxgo.TTSOption{
Provider: "tencent",
Speaker: "xiaoyan",
},
}

if err := client.Accept(callOption); err != nil {
logger.Errorf("Answer failed: %v", err)
return
}

// Send welcome message
client.TTS("Hello, I am an intelligent assistant", "", "greeting", true, false, nil, nil)

// Wait for call to end
<-ctx.Done()
}

Streaming TTS Example (LLM Integration)

package main

import (
"bufio"
"context"
"encoding/json"
"log"
"net/http"

"github.com/restsend/rustpbxgo"
"github.com/sirupsen/logrus"
)

func streamLLMResponse(client *rustpbxgo.Client, userInput string) {
// Simulate calling LLM API
req, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", nil)
req.Header.Set("Content-Type", "application/json")

resp, err := http.DefaultClient.Do(req)
if err != nil {
log.Printf("LLM request failed: %v", err)
return
}
defer resp.Body.Close()

playID := "llm-stream"
scanner := bufio.NewScanner(resp.Body)

for scanner.Scan() {
line := scanner.Text()

// Parse SSE data
var data map[string]interface{}
if err := json.Unmarshal([]byte(line), &data); err != nil {
continue
}

// Get delta text
if content, ok := data["content"].(string); ok && content != "" {
// Send streaming TTS
isEnd := data["finish_reason"] != nil
client.StreamTTS(content, "", playID, isEnd, false, nil, nil)
}
}
}

func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

logger := logrus.New()

client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
)

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
logger.Infof("User input: %s", event.Text)

// Interrupt current playback
client.Interrupt()

// Stream response
go streamLLMResponse(client, event.Text)
}

// ... other code
}

Best Practices

Error Handling

Always check errors and handle them appropriately:

if err := client.Connect("sip"); err != nil {
log.Fatalf("Connection failed: %v", err)
}

if err := client.TTS("Hello", "", "1", true, false, nil, nil); err != nil {
log.Printf("TTS failed: %v", err)
// Retry or use fallback
}

Resource Cleanup

Ensure proper resource cleanup:

defer client.Shutdown()

Context Management

Use context to control lifecycle:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

client := rustpbxgo.NewClient(
endpoint,
rustpbxgo.WithContext(ctx),
)

Logging

Use appropriate log levels:

logger := logrus.New()
logger.SetLevel(logrus.InfoLevel) // Production environment
// logger.SetLevel(logrus.DebugLevel) // Development environment

Event Handling

Avoid long-running operations in event handlers, use goroutines:

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
go func() {
// Long-running operation
response := processWithLLM(event.Text)
client.TTS(response, "", "reply", true, false, nil, nil)
}()
}

Troubleshooting

Connection Issues

If unable to connect to server:

  1. Check if endpoint URL is correct
  2. Confirm server is running
  3. Check firewall settings
  4. Enable debug logging to see detailed information
logger.SetLevel(logrus.DebugLevel)
client := rustpbxgo.NewClient(
endpoint,
rustpbxgo.WithLogger(logger),
rustpbxgo.WithDumpEvents(true),
)

ASR Not Working

If ASR cannot recognize speech:

  1. Confirm ASR configuration is correct
  2. Check if API keys are valid
  3. Verify sample rate settings
  4. Check VAD configuration

TTS No Sound

If TTS has no sound:

  1. Check if TTS configuration is correct
  2. Verify speaker parameter
  3. Confirm endOfStream is set correctly
  4. Check network connection

More Resources

Reference

For complete example code, see: /Users/yangli/Desktop/rustpbxgo/cmd