Go SDK
Download Source Code
Download from the RustPBXGo GitHub repository:
git clone github.com/restsend/rustpbxgo
Directory Structure
rustpbxgo/
├── README.md
├── client.go # SDK core implementation
├── cmd/ # Example application
│ ├── main.go # Program entry point
│ ├── llm.go # Large model interaction logic
│ ├── media.go # WebRTC media processing
│ └── webhook.go # Webhook handling
├── go.mod
└── go.sum
client.go contains the definitions of core data structures (commands, events, and callback functions).
The cmd/ directory contains an application example with SIP/WebRTC calls, incoming call handling with webhooks, and large model interaction logic.
Client
The Client struct fields mainly include:
endpoint: RustPBX listening address.id: Set the connection session ID, mainly used for answering calls.OnXXX: Callback functions for handling events.
Flow Diagram
When the client calls the Connect method, it creates two goroutines (green parts in the diagram below).
- One is responsible for reading and parsing WebSocket messages (top)
- The other is responsible for processing messages and sending commands (bottom). When an event is received, it calls the corresponding callback function based on the event type.
Creating a Client
Use the NewClient function to create a client instance:
client := rustpbxgo.NewClient(endpoint, opts...)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
endpoint | string | ✅ | RustPBX server address |
opts | ...ClientOption | ❌ | Optional configuration options |
Options:
| Option | Parameter Type | Description |
|---|---|---|
WithLogger(logger) | *logrus.Logger | Set logger |
WithContext(ctx) | context.Context | Set context for closing created goroutines |
WithID(id) | string | Set session ID for answering calls |
WithDumpEvents(enable) | bool | Enable event dumping |
Example:
client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
rustpbxgo.WithID("my-session-id"),
rustpbxgo.WithDumpEvents(true),
)
Connecting to Server
Use the Connect method to connect to RustPBX:
err := client.Connect(callType)
Parameters:
| Parameter | Type | Optional Values | Description |
|---|---|---|---|
callType | string | "sip", "webrtc", "" | Call type |
Closing Client
Use the Shutdown method to close the client connection:
err := client.Shutdown()
Sending Commands
The client provides various methods for sending commands to the server.
Invite - Initiate Call
Initiate a call, returns AnswerEvent or error. See: Initiate Call.
answer, err := client.Invite(ctx, callOption)
Parameters:
| Parameter | Type | Description |
|---|---|---|
ctx | context.Context | Context for cancellation |
callOption | CallOption | Call configuration, see CallOption |
Return Values:
| Type | Description |
|---|---|
*AnswerEvent | Answer event (if successful) |
error | Error information |
Example:
// sip call
callOption := rustpbxgo.CallOption{
Caller: "sip:alice@example.com",
Callee: "sip:bob@example.com",
}
answer, err := client.Invite(ctx, callOption)
if err != nil {
log.Fatalf("Call failed: %v", err)
}
Accept - Answer Incoming Call
Answer an incoming call. Used for answering incoming calls, see: Answer/Reject Incoming Call.
err := client.Accept(callOption)
Parameters:
| Parameter | Type | Description |
|---|---|---|
callOption | CallOption | Call configuration, see CallOption |
- The CallOption configuration for Accept is the same as Invite, except that the callee address does not need to be set.
Example:
server := gin.Default()
server.POST(prefix, func(c *gin.Context) {
var form IncomingCall
if err := c.ShouldBindJSON(&form); err != nil {
c.JSON(400, gin.H{"error": err.Error()})
return
}
client := createClient(parent, option, form.DialogID)
go func() {
ctx, cancel := context.WithCancel(parent)
defer cancel()
err := client.Connect("sip")
if err != nil {
option.Logger.Errorf("Failed to connect to server: %v", err)
}
defer client.Shutdown()
client.Accept(option.CallOption)
<-ctx.Done()
}()
c.JSON(200, gin.H{"message": "OK"})
})
server.Run(addr)
Ringing - Send Ringing
Send ringing response. Used for SIP calls, see: 180 Ringing.
err := client.Ringing(ringtone, recorder)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
ringtone | string | ❌ | Ringtone URL |
recorder | *RecorderOption | ❌ | Recording configuration, see RecorderOption |
Example:
client.Ringing("http://example.com/ringtone.wav", recorder)
Reject - Reject Incoming Call
Reject an incoming call. Used for rejecting incoming calls, see: Answer/Reject Incoming Call.
err := client.Reject(reason)
Parameters:
| Parameter | Type | Description |
|---|---|---|
reason | string | Rejection reason |
Example:
client.Reject("Busy")
TTS - Text-to-Speech
Convert text to speech and play, see: TTS (Text-to-Speech).
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | ✅ | Text to synthesize |
speaker | string | ❌ | Voice |
playID | string | ❌ | TTS Track identifier |
endOfStream | bool | ✅ | Whether this is the last TTS command for the current playId |
autoHangup | bool | ✅ | Whether to automatically hang up after TTS playback completes |
option | *TTSOption | ❌ | TTS options, see TTSOption |
waitInputTimeout | *uint32 | ❌ | Maximum time to wait for user input (seconds) |
- Set
endOfStream = trueto indicate that all TTS commands for the current playId have been sent. The TTS Track will exit after all command results finish playing and send a Track End event. - If
playIdis set, the Track End event sent by this TTS Track will include thisplayId.- If the current
playIdis the same as a previous TTS command'splayId, it will reuse the previous TTS Track; otherwise, it will terminate the previous TTS Track and create a new TTS Track.
- If the current
For details, see TTS(Text-to-Speech)
StreamTTS - Streaming TTS
Convert text to speech and play (for LLM streaming output).
err := client.StreamTTS(text, speaker, playID, endOfStream, autoHangup, option, waitInputTimeout)
The difference from TTS is that the corresponding TTS command has streaming = true, everything else is the same.
See Streaming TTS.
Play - Play Audio
Play audio file:
err := client.Play(url, autoHangup, waitInputTimeout)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | ✅ | Audio file URL |
autoHangup | bool | ✅ | Whether to automatically hang up after playback completes |
waitInputTimeout | *uint32 | ❌ | Wait for input timeout (seconds) |
Interrupt - Interrupt Playback
Interrupt current TTS or audio playback:
err := client.Interrupt(graceful)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
graceful | bool | ✅ | Whether to gracefully interrupt (wait for current TTS command to finish playing before exiting) |
-
When TTS has not finished playing, will return Interruption event.
-
When
graceful=trueis set, the TTS Track will wait for the current TTS command to finish playing before exiting, otherwise it will exit immediately. -
gracefulonly takes effect for non-streaming TTS (streaming=false).
See Interruption
Example:
client.OnSpeaking = func(event rustpbxgo.SpeakingEvent) {
// Immediately interrupt TTS when user speaks
client.Interrupt(false)
}
Hangup - Hangup Call
When the call is already established, use Hangup to end the call:
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
reason | string | ❌ | Hangup reason |
Refer - Transfer Call
Transfer call to another target, used for transfer-to-human logic. See: Transfer Call
err := client.Refer(caller, callee, options)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
caller | string | ✅ | Transfer caller SIP address |
callee | string | ✅ | Transfer target SIP URI |
options | *ReferOption | ❌ | Transfer options, see ReferOption |
Mute - Mute
Mute all or specified Tracks:
err := client.Mute(trackID)
Parameters:
| Parameter | Type | Description |
|---|---|---|
trackID | *string | Track ID (if nil, mute all Tracks) |
Example:
// Mute all Tracks
client.Mute(nil)
// Mute specific Track
trackID := "track-123"
client.Mute(&trackID)
Unmute - Unmute
Unmute all or specified Tracks:
err := client.Unmute(trackID)
Parameters:
| Parameter | Type | Description |
|---|---|---|
trackID | *string | Track ID (if nil, unmute all Tracks) |
Event Callbacks
Client defines multiple fields (fields starting with On), used to set callback functions for events.
OnAnswer - Answer Callback
Trigger: Triggered when the call is answered and SDP negotiation is complete. See: AnswerEvent.
Purpose: Initialization operations after successful call. AnswerEvent contains SDP answer.
client.OnAnswer = func(event rustpbxgo.AnswerEvent) {
log.Printf("Call answered: %s", event.TrackID)
// Start sending welcome message
client.TTS("Hello, welcome to call", "", "welcome", true, false, nil, nil)
}
OnReject - Reject Callback
Trigger: Triggered when the call is rejected. See RejectEvent.
Purpose: Handle rejection logic, record rejection reason (event.Reason) or perform follow-up processing.
client.OnReject = func(event rustpbxgo.RejectEvent) {
log.Printf("Call rejected: %s", event.Reason)
// Record rejection reason, clean up resources
}
OnRinging - Ringing Callback
Trigger: Triggered when the call is ringing (SIP calls). See RingingEvent.
Purpose: Monitor call progress, determine if early media (EarlyMedia) is available.
client.OnRinging = func(event rustpbxgo.RingingEvent) {
log.Printf("Ringing, early media: %v", event.EarlyMedia)
}
OnHangup - Hangup Callback
Trigger: Triggered when the call ends. See HangupEvent.
Purpose: Clean up resources, save call records. Can get hangup reason (event.Reason), call duration (event.hangUpTime - event.startTime), caller/callee information (event.From,event.To), etc. from HangupEvent.
client.OnHangup = func(event rustpbxgo.HangupEvent) {
log.Printf("Call ended: %s, initiator: %s", event.Reason, event.Initiator)
// Save call record, clean up resources
}
OnSpeaking - Speaking Callback
Trigger: Triggered when VAD detects user starts speaking. See SpeakingEvent.
Purpose: Detect user input, commonly used to interrupt TTS playback. Can get speech start time (event.StartTime) from the event.
client.OnSpeaking = func(event rustpbxgo.SpeakingEvent) {
log.Printf("User speaking detected, interrupt playback")
// Immediately interrupt current TTS
client.Interrupt(false)
}
OnSilence - Silence Callback
Trigger: Triggered when user stops speaking is detected. See SilenceEvent.
Purpose: Determine if user has finished speaking, can combine silence duration (event.Duration) to decide whether to start processing.
client.OnSilence = func(event rustpbxgo.SilenceEvent) {
log.Printf("Silence detected, duration %d ms", event.Duration)
// User might have finished speaking, prepare to process
}
OnAsrFinal - ASR Final Result Callback
Trigger: Triggered when speech recognition obtains stable result. See AsrFinalEvent.
Purpose: Get user's final speech input (event.Text), used for business logic processing or sending to LLM. Can distinguish different speech segments through sequence number (event.Index).
client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
log.Printf("User said: %s", event.Text)
// Send user input to LLM for processing
response := callLLM(event.Text)
client.TTS(response, "", "reply", true, false, nil, nil)
}
OnAsrDelta - ASR Delta Result Callback
Trigger: Intermediate result during speech recognition process, content may change. See AsrDeltaEvent.
Purpose: Display recognition progress in real-time, improve user experience. Should not be used directly for business logic processing.
client.OnAsrDelta = func(event rustpbxgo.AsrDeltaEvent) {
log.Printf("Recognizing: %s", event.Text)
// Only for display, no business processing
}
OnTrackStart - Track Start Callback
Trigger: Triggered when a Track starts (RTP, TTS, file playback, etc.). See TrackStartEvent.
Purpose: Monitor audio playback start. For TTS Track, can get TTS command's playId via event.PlayID; for Play Track, can get the playback URL.
client.OnTrackStart = func(event rustpbxgo.TrackStartEvent) {
log.Printf("Track started: %s, PlayID: %s", event.TrackID, event.PlayID)
}
OnTrackEnd - Track End Callback
Trigger: Triggered when a Track ends (RTP ends, TTS completes, file playback completes, etc.). See TrackEndEvent.
Purpose: Monitor audio playback end, can be used to control playback flow or clean up resources. Can get duration (event.Duration) and PlayID (event.PlayID) from the event.
client.OnTrackEnd = func(event rustpbxgo.TrackEndEvent) {
log.Printf("Track ended: %s, duration: %d ms, PlayID: %s",
event.TrackID, event.Duration, event.PlayID)
// TTS playback completed, can send next one
}
OnInterruption - Interruption Callback
Trigger: Triggered when Interrupt command is received and there are unfinished TTS. See InterruptionEvent.
Purpose: Get interruption information, such as played time (event.PlayedMs) and played text position (event.Subtitle, if provider supports subtitles).
client.OnInterruption = func(event rustpbxgo.InterruptionEvent) {
if event.Subtitle != nil {
log.Printf("Playback interrupted, played: %s", *event.Subtitle)
}
// Record interruption position for subsequent processing
}
OnDTMF - DTMF Callback
Trigger: Triggered when a keypress is detected. See DTMFEvent.
Purpose: Handle user keypress input, such as IVR menu selection. Can get keypress value (0-9, *, #, A-D) from event.Digit.
client.OnDTMF = func(event rustpbxgo.DTMFEvent) {
log.Printf("User pressed: %s", event.Digit)
// Handle keypress logic, such as menu navigation
if event.Digit == "1" {
client.TTS("You selected option 1", "", "menu", true, false, nil, nil)
}
}
OnError - Error Callback
Trigger: Triggered when an error occurs. See ErrorEvent.
Purpose: Handle various error situations. From event.Sender you can know the error source (asr, tts, media, etc.), from event.Error get error information.
client.OnError = func(event rustpbxgo.ErrorEvent) {
log.Printf("Error [%s]: %s", event.Sender, event.Error)
// Log error, perform fallback handling
}
OnMetrics - Metrics Callback
Trigger: Triggered when performance metrics are collected. See MetricsEvent.
Purpose: Monitor performance metrics. Can get metric name (event.Key) and duration (event.Duration) from the event.
client.OnMetrics = func(event rustpbxgo.MetricsEvent) {
log.Printf("Metric [%s]: %d ms", event.Key, event.Duration)
// Record performance data for analysis
}
OnClose - Connection Close
Trigger: Triggered when WebSocket connection closes.
Purpose: Handle connection disconnection logic, clean up resources or attempt reconnection.
client.OnClose = func(reason string) {
log.Printf("Connection closed: %s", reason)
// Clean up resources or reconnect
}
OnEvent - Generic Event Handler
Trigger: Triggered when any event is received.
Purpose: Log all events or handle undefined special events. Receives raw event type (event) and JSON data (payload).
client.OnEvent = func(event string, payload string) {
log.Printf("Event received [%s]: %s", event, payload)
// Generic event logging or special handling
}
Options
CallOption
Call configuration options:
type CallOption struct {
Denoise bool
Offer string
Callee string
Caller string
Recorder *RecorderOption
VAD *VADOption
ASR *ASROption
TTS *TTSOption
HandshakeTimeout string
EnableIPv6 bool
Sip *SipOption
Extra map[string]string
}
Field Description:
| Field | Type | Description |
|---|---|---|
Denoise | bool | Whether to enable noise reduction |
Offer | string | SDP offer string for WebRTC/SIP negotiation |
Callee | string | Callee's SIP URI or phone number |
Caller | string | Caller's SIP URI or phone number |
Recorder | *RecorderOption | Call recording configuration, see RecorderOption |
VAD | *VADOption | Voice activity detection configuration, see VADOption |
ASR | *ASROption | Automatic Speech Recognition (ASR) configuration, see ASROption |
TTS | *TTSOption | Text-to-Speech configuration, see TTSOption |
HandshakeTimeout | string | Connection handshake timeout |
EnableIPv6 | bool | Enable IPv6 support |
Sip | *SipOption | SIP registration account, password, and domain configuration, see SipOption |
Extra | map[string]string | Additional parameters |
RecorderOption
Recording configuration options:
type RecorderOption struct {
RecorderFile string
Samplerate int
Ptime int
}
Field Description:
| Field | Type | Default | Description |
|---|---|---|---|
RecorderFile | string | - | Recording file path |
Samplerate | int | 16000 | Sample rate (Hz) |
Ptime | int | 200 | Packet time (milliseconds) |
ASROption
Speech recognition configuration options:
type ASROption struct {
Provider string
Model string
Language string
AppID string
SecretID string
SecretKey string
ModelType string
BufferSize int
SampleRate uint32
Endpoint string
Extra map[string]string
StartWhenAnswer bool
}
Field Description:
| Field | Type | Description |
|---|---|---|
Provider | string | ASR provider: tencent, aliyun, Deepgram, etc. |
Model | string | Model name |
Language | string | Language (e.g.: zh-CN, en-US), see corresponding provider documentation for details |
AppID | string | Tencent Cloud's appId |
SecretID | string | Tencent Cloud's secretId |
SecretKey | string | Tencent Cloud's secretKey, or other provider's API Key |
ModelType | string | ASR model type (e.g.: 16k_zh, 8k_en), see provider documentation for details |
BufferSize | int | Audio buffer size, unit: bytes |
SampleRate | uint32 | Sample rate |
Endpoint | string | Custom service endpoint URL |
Extra | map[string]string | Provider-specific parameters |
StartWhenAnswer | bool | Request ASR service after call is answered |
TTSOption
Text-to-speech synthesis configuration options:
type TTSOption struct {
Samplerate int32
Provider string
Speed float32
AppID string
SecretID string
SecretKey string
Volume int32
Speaker string
Codec string
Subtitle bool
Emotion string
Endpoint string
Extra map[string]string
WaitInputTimeout uint32
}
Field Description:
| Field | Type | Description |
|---|---|---|
Samplerate | int32 | Sample rate, unit: Hz |
Provider | string | TTS provider: tencent, aliyun, deepgram, voiceapi |
Speed | float32 | Speech rate |
AppID | string | Tencent Cloud's appId |
SecretID | string | Tencent Cloud's secretId |
SecretKey | string | Tencent Cloud's secretKey, or other provider's API Key |
Volume | int32 | Volume (1-10) |
Speaker | string | Voice, see provider documentation |
Codec | string | Encoding format |
Subtitle | bool | Whether to enable subtitles |
Emotion | string | Emotion: neutral, happy, sad, angry, etc. |
Endpoint | string | Custom TTS service endpoint URL |
Extra | map[string]string | Provider-specific parameters |
WaitInputTimeout | uint32 | Maximum time to wait for user input (milliseconds) |
VADOption
Voice activity detection configuration options:
type VADOption struct {
Type string
Samplerate uint32
SpeechPadding uint64
SilencePadding uint64
Ratio float32
VoiceThreshold float32
MaxBufferDurationSecs uint64
Endpoint string
SecretKey string
SecretID string
SilenceTimeout uint
}
Field Description:
| Field | Type | Default | Description |
|---|---|---|---|
Type | string | webrtc | VAD algorithm type: silero, ten, webrtc |
Samplerate | uint32 | 16000 | Sample rate |
SpeechPadding | uint64 | 250 | Start detection speechPadding milliseconds after speech starts |
SilencePadding | uint64 | 100 | Silence event trigger interval, unit: milliseconds |
Ratio | float32 | 0.5 | Speech detection ratio threshold |
VoiceThreshold | float32 | 0.5 | Voice energy threshold |
MaxBufferDurationSecs | uint64 | 50 | Maximum buffer duration, unit: seconds |
Endpoint | string | - | Custom VAD service endpoint |
SecretKey | string | - | VAD service authentication key |
SecretID | string | - | VAD service authentication ID |
SilenceTimeout | uint | 5000 | Silence detection timeout, unit: milliseconds |
SipOption
SIP configuration options:
type SipOption struct {
Username string
Password string
Realm string
Headers map[string]string
}
Field Description:
| Field | Type | Description |
|---|---|---|
Username | string | SIP username for authentication |
Password | string | SIP password for authentication |
Realm | string | SIP domain/realm for authentication |
Headers | map[string]string | Additional SIP protocol headers (key-value pairs) |
ReferOption
Transfer configuration options:
type ReferOption struct {
Denoise bool
Timeout uint32
MusicOnHold string
AutoHangup bool
Sip *SipOption
ASR *ASROption
}
Field Description:
| Field | Type | Description |
|---|---|---|
Denoise | bool | Whether to enable noise reduction |
Timeout | uint32 | Timeout (seconds) |
MusicOnHold | string | Hold music URL |
AutoHangup | bool | Automatically hang up after transfer completes |
Sip | *SipOption | SIP configuration |
ASR | *ASROption | ASR configuration |
Event Types
All event type definitions supported by Client:
Event
Base event structure, contains event type name.
type Event struct {
Event string `json:"event"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
Event | string | Event type name |
IncomingEvent
Incoming call event, triggered when there is a new incoming call.
type IncomingEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Caller string `json:"caller"`
Callee string `json:"callee"`
Sdp string `json:"sdp"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Caller | string | Caller number |
Callee | string | Callee number |
Sdp | string | SDP offer string |
AnswerEvent
Answer event, triggered when the call is answered and SDP negotiation is complete.
type AnswerEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Sdp string `json:"sdp"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Sdp | string | SDP answer string |
RejectEvent
Reject event, triggered when the call is rejected.
type RejectEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Reason string `json:"reason"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Reason | string | Rejection reason |
RingingEvent
Ringing event, triggered when the call is ringing (SIP calls).
type RingingEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
EarlyMedia bool `json:"earlyMedia"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
EarlyMedia | bool | Whether early media is available |
HangupEvent
Hangup event, triggered when the call ends.
type HangupEventAttendee struct {
Username string `json:"username"`
Realm string `json:"realm"`
Source string `json:"source"`
}
type HangupEvent struct {
Timestamp uint64 `json:"timestamp"`
Reason string `json:"reason"`
Initiator string `json:"initiator"`
StartTime string `json:"startTime,omitempty"`
HangupTime string `json:"hangupTime,omitempty"`
AnswerTime *string `json:"answerTime,omitempty"`
RingingTime *string `json:"ringingTime,omitempty"`
From *HangupEventAttendee `json:"from,omitempty"`
To *HangupEventAttendee `json:"to,omitempty"`
Extra map[string]any `json:"extra,omitempty"`
}
HangupEvent Field Description:
| Field | Type | Description |
|---|---|---|
Timestamp | uint64 | Event timestamp (milliseconds) |
Reason | string | Hangup reason |
Initiator | string | Party that initiated the hangup |
StartTime | string | Call start time |
HangupTime | string | Hangup time |
AnswerTime | *string | Answer time |
RingingTime | *string | Ringing time |
From | *HangupEventAttendee | Caller information |
To | *HangupEventAttendee | Callee information |
Extra | map[string]any | Additional information |
HangupEventAttendee Field Description:
| Field | Type | Description |
|---|---|---|
Username | string | Username |
Realm | string | Domain |
Source | string | Source |
SpeakingEvent
Speaking event, triggered when VAD detects user starts speaking.
type SpeakingEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
StartTime uint64 `json:"startTime"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
StartTime | uint64 | Speech start time (milliseconds) |
SilenceEvent
Silence event, triggered when user stops speaking is detected.
type SilenceEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
StartTime uint64 `json:"startTime"`
Duration uint64 `json:"duration"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
StartTime | uint64 | Silence start time (milliseconds) |
Duration | uint64 | Silence duration (milliseconds) |
EouEvent
End of utterance event, triggered when end of speech is detected.
type EouEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Complete bool `json:"complete"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Complete | bool | Whether it ended completely |
AsrFinalEvent
ASR final result event, triggered when speech recognition obtains stable result.
type AsrFinalEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Index uint32 `json:"index"`
StartTime *uint64 `json:"startTime,omitempty"`
EndTime *uint64 `json:"endTime,omitempty"`
Text string `json:"text"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Index | uint32 | Speech segment index |
StartTime | *uint64 | Speech start time (milliseconds) |
EndTime | *uint64 | Speech end time (milliseconds) |
Text | string | Recognized text content |
AsrDeltaEvent
ASR delta result event, intermediate result during speech recognition process, content may change.
type AsrDeltaEvent struct {
TrackID string `json:"trackId"`
Index uint32 `json:"index"`
Timestamp uint64 `json:"timestamp"`
StartTime *uint64 `json:"startTime,omitempty"`
EndTime *uint64 `json:"endTime,omitempty"`
Text string `json:"text"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Index | uint32 | Speech segment index |
Timestamp | uint64 | Event timestamp (milliseconds) |
StartTime | *uint64 | Speech start time (milliseconds) |
EndTime | *uint64 | Speech end time (milliseconds) |
Text | string | Recognized text content (may change) |
TrackStartEvent
Track start event, triggered when a Track starts (RTP, TTS, file playback, etc.).
type TrackStartEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
PlayId *string `json:"playId,omitempty"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
PlayId | *string | Play ID (TTS/Play command) |
TrackEndEvent
Track end event, triggered when a Track ends (RTP ends, TTS completes, file playback completes, etc.).
type TrackEndEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Duration uint64 `json:"duration"`
PlayId *string `json:"playId,omitempty"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Duration | uint64 | Playback duration (milliseconds) |
PlayId | *string | Play ID (TTS/Play command) |
InterruptionEvent
Interruption event, triggered when Interrupt command is received and there are unfinished TTS.
type InterruptionEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Subtitle *string `json:"subtitle,omitempty"`
Position *uint32 `json:"position,omitempty"`
TotalDuration uint32 `json:"totalDuration"`
Current uint32 `json:"current"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Subtitle | *string | Played subtitle text |
Position | *uint32 | Playback position (character count) |
TotalDuration | uint32 | Total duration (milliseconds) |
Current | uint32 | Current playback duration (milliseconds) |
DTMFEvent
DTMF event, triggered when a keypress is detected.
type DTMFEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Digit string `json:"digit"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Digit | string | Keypress value (0-9, *, #, A-D) |
AnswerMachineDetectionEvent
Answer machine detection event, triggered when an answer machine is detected.
type AnswerMachineDetectionEvent struct {
Timestamp uint64 `json:"timestamp"`
StartTime uint64 `json:"startTime"`
EndTime uint64 `json:"endTime"`
Text string `json:"text"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
Timestamp | uint64 | Event timestamp (milliseconds) |
StartTime | uint64 | Detection start time (milliseconds) |
EndTime | uint64 | Detection end time (milliseconds) |
Text | string | Detected text |
LLMFinalEvent
LLM final result event, triggered when large language model generates final result.
type LLMFinalEvent struct {
Timestamp uint64 `json:"timestamp"`
Text string `json:"text"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
Timestamp | uint64 | Event timestamp (milliseconds) |
Text | string | Final text generated by LLM |
LLMDeltaEvent
LLM delta result event, triggered when large language model generates delta result.
type LLMDeltaEvent struct {
Timestamp uint64 `json:"timestamp"`
Word string `json:"word"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
Timestamp | uint64 | Event timestamp (milliseconds) |
Word | string | Delta word generated by LLM |
MetricsEvent
Metrics event, triggered when performance metrics are collected.
type MetricsEvent struct {
Timestamp uint64 `json:"timestamp"`
Key string `json:"key"`
Duration uint32 `json:"duration"`
Data map[string]any `json:"data"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
Timestamp | uint64 | Event timestamp (milliseconds) |
Key | string | Metric name |
Duration | uint32 | Duration (milliseconds) |
Data | map[string]any | Metric data |
ErrorEvent
Error event, triggered when an error occurs.
type ErrorEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Sender string `json:"sender"`
Error string `json:"error"`
Code *uint32 `json:"code,omitempty"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Sender | string | Error source (asr, tts, media, etc.) |
Error | string | Error information |
Code | *uint32 | Error code |
AddHistoryEvent
Add history event, triggered when conversation history is added.
type AddHistoryEvent struct {
Sender string `json:"sender"`
Timestamp uint64 `json:"timestamp"`
Speaker string `json:"speaker"`
Text string `json:"text"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
Sender | string | Sender |
Timestamp | uint64 | Event timestamp (milliseconds) |
Speaker | string | Speaker identifier |
Text | string | Conversation text |
OtherEvent
Other event, used to handle undefined event types.
type OtherEvent struct {
TrackID string `json:"trackId"`
Timestamp uint64 `json:"timestamp"`
Sender string `json:"sender"`
Extra map[string]string `json:"extra,omitempty"`
}
Field Description:
| Field | Type | Description |
|---|---|---|
TrackID | string | Call track ID |
Timestamp | uint64 | Event timestamp (milliseconds) |
Sender | string | Sender |
Extra | map[string]string | Additional information |
Complete Examples
SIP Call Example
package main
import (
"context"
"log"
"os"
"os/signal"
"syscall"
"github.com/restsend/rustpbxgo"
"github.com/sirupsen/logrus"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
logger := logrus.New()
logger.SetLevel(logrus.InfoLevel)
// Create client
client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
)
// Set event handlers
client.OnAnswer = func(event rustpbxgo.AnswerEvent) {
logger.Info("Call answered")
// Send welcome message
client.TTS("Hello, welcome to call", "", "greeting", true, false, nil, nil)
}
client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
logger.Infof("User said: %s", event.Text)
// Respond based on user input
client.TTS("I received your message", "", "response", true, false, nil, nil)
}
client.OnHangup = func(event rustpbxgo.HangupEvent) {
logger.Infof("Call ended: %s", event.Reason)
cancel()
}
// Connect to server
if err := client.Connect("sip"); err != nil {
log.Fatalf("Connection failed: %v", err)
}
defer client.Shutdown()
// Configure call
callOption := rustpbxgo.CallOption{
Caller: "sip:1000@example.com",
Callee: "sip:2000@example.com",
Denoise: true,
Sip: &rustpbxgo.SipOption{
Username: "user",
Password: "pass",
Realm: "example.com",
},
ASR: &rustpbxgo.ASROption{
Provider: "tencent",
Language: "zh-CN",
},
TTS: &rustpbxgo.TTSOption{
Provider: "tencent",
Speaker: "xiaoyan",
},
VAD: &rustpbxgo.VADOption{
Type: "webrtc",
SilenceTimeout: 5000,
},
}
// Initiate call
_, err := client.Invite(ctx, callOption)
if err != nil {
log.Fatalf("Call failed: %v", err)
}
// Wait for signal
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
select {
case <-ctx.Done():
logger.Info("Call ended")
case <-sigChan:
logger.Info("Interrupt signal received")
client.Hangup("user_interrupt")
}
}
Answer Incoming Call Example
package main
import (
"context"
"encoding/json"
"log"
"net/http"
"github.com/restsend/rustpbxgo"
"github.com/sirupsen/logrus"
)
type WebhookRequest struct {
DialogID string `json:"dialogId"`
Caller string `json:"caller"`
Callee string `json:"callee"`
}
func main() {
logger := logrus.New()
// Set up Webhook handler
http.HandleFunc("/webhook", func(w http.ResponseWriter, r *http.Request) {
var req WebhookRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
logger.Infof("Incoming call: %s -> %s", req.Caller, req.Callee)
// Handle incoming call
go handleIncomingCall(req.DialogID, req.Caller, req.Callee, logger)
w.WriteHeader(http.StatusOK)
})
log.Fatal(http.ListenAndServe(":8090", nil))
}
func handleIncomingCall(dialogID, caller, callee string, logger *logrus.Logger) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// Create client with dialogID
client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
rustpbxgo.WithID(dialogID),
)
client.OnHangup = func(event rustpbxgo.HangupEvent) {
logger.Info("Call ended")
cancel()
}
// Connect to server
if err := client.Connect("sip"); err != nil {
logger.Errorf("Connection failed: %v", err)
return
}
defer client.Shutdown()
// Send ringing
recorder := &rustpbxgo.RecorderOption{
RecorderFile: "/recordings/" + dialogID + ".wav",
Samplerate: 16000,
}
client.Ringing("", recorder)
// Answer incoming call
callOption := rustpbxgo.CallOption{
Caller: caller,
Callee: callee,
ASR: &rustpbxgo.ASROption{
Provider: "tencent",
Language: "zh-CN",
},
TTS: &rustpbxgo.TTSOption{
Provider: "tencent",
Speaker: "xiaoyan",
},
}
if err := client.Accept(callOption); err != nil {
logger.Errorf("Answer failed: %v", err)
return
}
// Send welcome message
client.TTS("Hello, I am an intelligent assistant", "", "greeting", true, false, nil, nil)
// Wait for call to end
<-ctx.Done()
}
Streaming TTS Example (LLM Integration)
package main
import (
"bufio"
"context"
"encoding/json"
"log"
"net/http"
"github.com/restsend/rustpbxgo"
"github.com/sirupsen/logrus"
)
func streamLLMResponse(client *rustpbxgo.Client, userInput string) {
// Simulate calling LLM API
req, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", nil)
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
log.Printf("LLM request failed: %v", err)
return
}
defer resp.Body.Close()
playID := "llm-stream"
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
// Parse SSE data
var data map[string]interface{}
if err := json.Unmarshal([]byte(line), &data); err != nil {
continue
}
// Get delta text
if content, ok := data["content"].(string); ok && content != "" {
// Send streaming TTS
isEnd := data["finish_reason"] != nil
client.StreamTTS(content, "", playID, isEnd, false, nil, nil)
}
}
}
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
logger := logrus.New()
client := rustpbxgo.NewClient(
"ws://localhost:8080",
rustpbxgo.WithLogger(logger),
rustpbxgo.WithContext(ctx),
)
client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
logger.Infof("User input: %s", event.Text)
// Interrupt current playback
client.Interrupt()
// Stream response
go streamLLMResponse(client, event.Text)
}
// ... other code
}
Best Practices
Error Handling
Always check errors and handle them appropriately:
if err := client.Connect("sip"); err != nil {
log.Fatalf("Connection failed: %v", err)
}
if err := client.TTS("Hello", "", "1", true, false, nil, nil); err != nil {
log.Printf("TTS failed: %v", err)
// Retry or use fallback
}
Resource Cleanup
Ensure proper resource cleanup:
defer client.Shutdown()
Context Management
Use context to control lifecycle:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
client := rustpbxgo.NewClient(
endpoint,
rustpbxgo.WithContext(ctx),
)
Logging
Use appropriate log levels:
logger := logrus.New()
logger.SetLevel(logrus.InfoLevel) // Production environment
// logger.SetLevel(logrus.DebugLevel) // Development environment
Event Handling
Avoid long-running operations in event handlers, use goroutines:
client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
go func() {
// Long-running operation
response := processWithLLM(event.Text)
client.TTS(response, "", "reply", true, false, nil, nil)
}()
}
Troubleshooting
Connection Issues
If unable to connect to server:
- Check if endpoint URL is correct
- Confirm server is running
- Check firewall settings
- Enable debug logging to see detailed information
logger.SetLevel(logrus.DebugLevel)
client := rustpbxgo.NewClient(
endpoint,
rustpbxgo.WithLogger(logger),
rustpbxgo.WithDumpEvents(true),
)
ASR Not Working
If ASR cannot recognize speech:
- Confirm ASR configuration is correct
- Check if API keys are valid
- Verify sample rate settings
- Check VAD configuration
TTS No Sound
If TTS has no sound:
- Check if TTS configuration is correct
- Verify speaker parameter
- Confirm
endOfStreamis set correctly - Check network connection
More Resources
Reference
For complete example code, see: /Users/yangli/Desktop/rustpbxgo/cmd