Go SDK

Download Source Code

Download from the Active CallGo GitHub repository:

git clone github.com/restsend/rustpbxgo

Directory Structure

rustpbxgo/
├── README.md           
├── client.go           # SDK core implementation
├── cmd/                # Example application
│   ├── main.go         # Program entry point
│   ├── llm.go          # Large model interaction logic
│   ├── media.go        # WebRTC media processing
│   └── webhook.go      # Webhook handling
├── go.mod             
└── go.sum            

client.go contains the definitions of core data structures (commands, events, and callback functions).

The cmd/ directory contains an application example with SIP/WebRTC calls, incoming call handling with webhooks, and large model interaction logic.

Client

The Client struct fields mainly include: * endpoint: Active Call listening address. * id: Set the connection session ID, mainly used for answering calls. * OnXXX: Callback functions for handling events.

Flow Diagram

When the client calls the Connect method, it creates two goroutines (green parts in the diagram below).

  • One is responsible for reading and parsing WebSocket messages (top)
  • The other is responsible for processing messages and sending commands (bottom). When an event is received, it calls the corresponding callback function based on the event type.
GoArch

Creating a Client

Use the NewClient function to create a client instance:

client := rustpbxgo.NewClient(endpoint, opts...)

Parameters:

ParameterTypeRequiredDescription
endpointstringActive Call server address
opts…ClientOptionOptional configuration options

Options:

OptionParameter TypeDescription
WithLogger(logger)*logrus.LoggerSet logger
WithContext(ctx)context.ContextSet context for closing created goroutines
WithID(id)stringSet session ID for answering calls
WithDumpEvents(enable)boolEnable event dumping

Example:

client := rustpbxgo.NewClient(
    "ws://localhost:8080",
    rustpbxgo.WithLogger(logger),
    rustpbxgo.WithContext(ctx),
    rustpbxgo.WithID("my-session-id"),
    rustpbxgo.WithDumpEvents(true),
)

Connecting to Server

Use the Connect method to connect to Active Call:

err := client.Connect(callType)

Parameters:

ParameterTypeOptional ValuesDescription
callTypestring"sip", "webrtc", ""Call type

Closing Client

Use the Shutdown method to close the client connection:

err := client.Shutdown()

Sending Commands

The client provides various methods for sending commands to the server.

Invite - Initiate Call

Initiate a call, returns AnswerEvent or error. See: Initiate Call.

answer, err := client.Invite(ctx, callOption)

Parameters:

ParameterTypeDescription
ctxcontext.ContextContext for cancellation
callOptionCallOptionCall configuration, see CallOption

Return Values:

TypeDescription
*AnswerEventAnswer event (if successful)
errorError information

Example:

// sip call
callOption := rustpbxgo.CallOption{
    Caller: "sip:alice@example.com",
    Callee: "sip:bob@example.com",
}

answer, err := client.Invite(ctx, callOption)
if err != nil {
    log.Fatalf("Call failed: %v", err)
}

Accept - Answer Incoming Call

Answer an incoming call. Used for answering incoming calls, see: Answer/Reject Incoming Call.

err := client.Accept(callOption)

Parameters:

ParameterTypeDescription
callOptionCallOptionCall configuration, see CallOption
* The [CallOption](#calloption) configuration for Accept is the same as Invite, except that the callee address does not need to be set.

Example:

	server := gin.Default()
	server.POST(prefix, func(c *gin.Context) {
		var form IncomingCall
		if err := c.ShouldBindJSON(&form); err != nil {
			c.JSON(400, gin.H{"error": err.Error()})
			return
		}

		client := createClient(parent, option, form.DialogID)

		go func() {
			ctx, cancel := context.WithCancel(parent)
			defer cancel()
			err := client.Connect("sip")
			if err != nil {
				option.Logger.Errorf("Failed to connect to server: %v", err)
			}
			defer client.Shutdown()

			client.Accept(option.CallOption)
			<-ctx.Done()
		}()

		c.JSON(200, gin.H{"message": "OK"})
	})
	server.Run(addr)

Ringing - Send Ringing

Send ringing response. Used for SIP calls, see: 180 Ringing.

err := client.Ringing(ringtone, recorder)

Parameters:

ParameterTypeRequiredDescription
ringtonestringRingtone URL
recorder*RecorderOptionRecording configuration, see RecorderOption

Example:

client.Ringing("http://example.com/ringtone.wav", recorder)

Reject - Reject Incoming Call

Reject an incoming call. Used for rejecting incoming calls, see: Answer/Reject Incoming Call.

err := client.Reject(reason)

Parameters:

ParameterTypeDescription
reasonstringRejection reason

Example:

client.Reject("Busy")

TTS - Text-to-Speech

Convert text to speech and play, see: TTS (Text-to-Speech).

Parameters:

ParameterTypeRequiredDescription
textstringText to synthesize
speakerstringVoice
playIDstringTTS Track identifier
endOfStreamboolWhether this is the last TTS command for the current playId
autoHangupboolWhether to automatically hang up after TTS playback completes
option*TTSOptionTTS options, see TTSOption
waitInputTimeout*uint32Maximum time to wait for user input (seconds)
* Set `endOfStream = true` to indicate that all TTS commands for the current [playId](/static/docs/active-call/guide/tts.mdx#playid) have been sent. The [TTS Track](/static/docs/active-call/guide/tts.mdx#tts-track) will exit after all command results finish playing and send a [Track End](#track-end-event) event. * If `playId` is set, the [Track End](#track-end-event) event sent by this TTS Track will include this `playId`. * If the current `playId` is the same as a previous TTS command's `playId`, it will reuse the previous TTS Track; otherwise, it will terminate the previous TTS Track and create a new TTS Track.

For details, see TTS(Text-to-Speech)

StreamTTS - Streaming TTS

Convert text to speech and play (for LLM streaming output).

err := client.StreamTTS(text, speaker, playID, endOfStream, autoHangup, option, waitInputTimeout)
The difference from TTS is that the corresponding [TTS command](/static/docs/active-call/sdk/websocket#tts) has `streaming = true`, everything else is the same.

See Streaming TTS.

Play - Play Audio

Play audio file:

err := client.Play(url, autoHangup, waitInputTimeout)

Parameters:

ParameterTypeRequiredDescription
urlstringAudio file URL
autoHangupboolWhether to automatically hang up after playback completes
waitInputTimeout*uint32Wait for input timeout (seconds)

Interrupt - Interrupt Playback

Interrupt current TTS or audio playback:

err := client.Interrupt(graceful)

Parameters:

ParameterTypeRequiredDescription
gracefulboolWhether to gracefully interrupt (wait for current TTS command to finish playing before exiting)
* When TTS has not finished playing, will return [Interruption event](/static/docs/active-call/sdk/websocket#interruption-event).
  • When graceful=true is set, the TTS Track will wait for the current TTS command to finish playing before exiting, otherwise it will exit immediately.

  • graceful only takes effect for non-streaming TTS (streaming=false).

See Interruption

Example:

client.OnSpeaking = func(event rustpbxgo.SpeakingEvent) {
    // Immediately interrupt TTS when user speaks
    client.Interrupt(false)
}

Hangup - Hangup Call

When the call is already established, use Hangup to end the call:

Parameters:

ParameterTypeRequiredDescription
reasonstringHangup reason

Refer - Transfer Call

Transfer call to another target, used for transfer-to-human logic. See: Transfer Call

err := client.Refer(caller, callee, options)

Parameters:

ParameterTypeRequiredDescription
callerstringTransfer caller SIP address
calleestringTransfer target SIP URI
options*ReferOptionTransfer options, see ReferOption

Mute - Mute

Mute all or specified Tracks:

err := client.Mute(trackID)

Parameters:

ParameterTypeDescription
trackID*stringTrack ID (if nil, mute all Tracks)

Example:

// Mute all Tracks
client.Mute(nil)

// Mute specific Track
trackID := "track-123"
client.Mute(&trackID)

Unmute - Unmute

Unmute all or specified Tracks:

err := client.Unmute(trackID)

Parameters:

ParameterTypeDescription
trackID*stringTrack ID (if nil, unmute all Tracks)

Event Callbacks

Client defines multiple fields (fields starting with On), used to set callback functions for events.

OnAnswer - Answer Callback

Trigger: Triggered when the call is answered and SDP negotiation is complete. See: AnswerEvent.

Purpose: Initialization operations after successful call. AnswerEvent contains SDP answer.

client.OnAnswer = func(event rustpbxgo.AnswerEvent) {
    log.Printf("Call answered: %s", event.TrackID)
    // Start sending welcome message
    client.TTS("Hello, welcome to call", "", "welcome", true, false, nil, nil)
}

OnReject - Reject Callback

Trigger: Triggered when the call is rejected. See RejectEvent.

Purpose: Handle rejection logic, record rejection reason (event.Reason) or perform follow-up processing.

client.OnReject = func(event rustpbxgo.RejectEvent) {
    log.Printf("Call rejected: %s", event.Reason)
    // Record rejection reason, clean up resources
}

OnRinging - Ringing Callback

Trigger: Triggered when the call is ringing (SIP calls). See RingingEvent.

Purpose: Monitor call progress, determine if early media (EarlyMedia) is available.

client.OnRinging = func(event rustpbxgo.RingingEvent) {
    log.Printf("Ringing, early media: %v", event.EarlyMedia)
}

OnHangup - Hangup Callback

Trigger: Triggered when the call ends. See HangupEvent.

Purpose: Clean up resources, save call records. Can get hangup reason (event.Reason), call duration (event.hangUpTime - event.startTime), caller/callee information (event.From,event.To), etc. from HangupEvent.

client.OnHangup = func(event rustpbxgo.HangupEvent) {
    log.Printf("Call ended: %s, initiator: %s", event.Reason, event.Initiator)
    // Save call record, clean up resources
}

OnSpeaking - Speaking Callback

Trigger: Triggered when VAD detects user starts speaking. See SpeakingEvent.

Purpose: Detect user input, commonly used to interrupt TTS playback. Can get speech start time (event.StartTime) from the event.

client.OnSpeaking = func(event rustpbxgo.SpeakingEvent) {
    log.Printf("User speaking detected, interrupt playback")
    // Immediately interrupt current TTS
    client.Interrupt(false)
}

OnSilence - Silence Callback

Trigger: Triggered when user stops speaking is detected. See SilenceEvent.

Purpose: Determine if user has finished speaking, can combine silence duration (event.Duration) to decide whether to start processing.

client.OnSilence = func(event rustpbxgo.SilenceEvent) {
    log.Printf("Silence detected, duration %d ms", event.Duration)
    // User might have finished speaking, prepare to process
}

OnAsrFinal - ASR Final Result Callback

Trigger: Triggered when speech recognition obtains stable result. See AsrFinalEvent.

Purpose: Get user’s final speech input (event.Text), used for business logic processing or sending to LLM. Can distinguish different speech segments through sequence number (event.Index).

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
    log.Printf("User said: %s", event.Text)
    // Send user input to LLM for processing
    response := callLLM(event.Text)
    client.TTS(response, "", "reply", true, false, nil, nil)
}

OnAsrDelta - ASR Delta Result Callback

Trigger: Intermediate result during speech recognition process, content may change. See AsrDeltaEvent.

Purpose: Display recognition progress in real-time, improve user experience. Should not be used directly for business logic processing.

client.OnAsrDelta = func(event rustpbxgo.AsrDeltaEvent) {
    log.Printf("Recognizing: %s", event.Text)
    // Only for display, no business processing
}

OnTrackStart - Track Start Callback

Trigger: Triggered when a Track starts (RTP, TTS, file playback, etc.). See TrackStartEvent.

Purpose: Monitor audio playback start. For TTS Track, can get TTS command’s playId via event.PlayID; for Play Track, can get the playback URL.

client.OnTrackStart = func(event rustpbxgo.TrackStartEvent) {
    log.Printf("Track started: %s, PlayID: %s", event.TrackID, event.PlayID)
}

OnTrackEnd - Track End Callback

Trigger: Triggered when a Track ends (RTP ends, TTS completes, file playback completes, etc.). See TrackEndEvent.

Purpose: Monitor audio playback end, can be used to control playback flow or clean up resources. Can get duration (event.Duration) and PlayID (event.PlayID) from the event.

client.OnTrackEnd = func(event rustpbxgo.TrackEndEvent) {
    log.Printf("Track ended: %s, duration: %d ms, PlayID: %s", 
        event.TrackID, event.Duration, event.PlayID)
    // TTS playback completed, can send next one
}

OnInterruption - Interruption Callback

Trigger: Triggered when Interrupt command is received and there are unfinished TTS. See InterruptionEvent.

Purpose: Get interruption information, such as played time (event.PlayedMs) and played text position (event.Subtitle, if provider supports subtitles).

client.OnInterruption = func(event rustpbxgo.InterruptionEvent) {
    if event.Subtitle != nil {
        log.Printf("Playback interrupted, played: %s", *event.Subtitle)
    }
    // Record interruption position for subsequent processing
}

OnDTMF - DTMF Callback

Trigger: Triggered when a keypress is detected. See DTMFEvent.

Purpose: Handle user keypress input, such as IVR menu selection. Can get keypress value (0-9, *, #, A-D) from event.Digit.

client.OnDTMF = func(event rustpbxgo.DTMFEvent) {
    log.Printf("User pressed: %s", event.Digit)
    // Handle keypress logic, such as menu navigation
    if event.Digit == "1" {
        client.TTS("You selected option 1", "", "menu", true, false, nil, nil)
    }
}

OnError - Error Callback

Trigger: Triggered when an error occurs. See ErrorEvent.

Purpose: Handle various error situations. From event.Sender you can know the error source (asr, tts, media, etc.), from event.Error get error information.

client.OnError = func(event rustpbxgo.ErrorEvent) {
    log.Printf("Error [%s]: %s", event.Sender, event.Error)
    // Log error, perform fallback handling
}

OnMetrics - Metrics Callback

Trigger: Triggered when performance metrics are collected. See MetricsEvent.

Purpose: Monitor performance metrics. Can get metric name (event.Key) and duration (event.Duration) from the event.

client.OnMetrics = func(event rustpbxgo.MetricsEvent) {
    log.Printf("Metric [%s]: %d ms", event.Key, event.Duration)
    // Record performance data for analysis
}

OnClose - Connection Close

Trigger: Triggered when WebSocket connection closes.

Purpose: Handle connection disconnection logic, clean up resources or attempt reconnection.

client.OnClose = func(reason string) {
    log.Printf("Connection closed: %s", reason)
    // Clean up resources or reconnect
}

OnEvent - Generic Event Handler

Trigger: Triggered when any event is received.

Purpose: Log all events or handle undefined special events. Receives raw event type (event) and JSON data (payload).

client.OnEvent = func(event string, payload string) {
    log.Printf("Event received [%s]: %s", event, payload)
    // Generic event logging or special handling
}

Options

CallOption

Call configuration options:

type CallOption struct {
    Denoise          bool
    Offer            string
    Callee           string
    Caller           string
    Recorder         *RecorderOption
    VAD              *VADOption
    ASR              *ASROption
    TTS              *TTSOption
    HandshakeTimeout string
    EnableIPv6       bool
    Sip              *SipOption
    Extra            map[string]string
}

Field Description:

FieldTypeDescription
DenoiseboolWhether to enable noise reduction
OfferstringSDP offer string for WebRTC/SIP negotiation
CalleestringCallee’s SIP URI or phone number
CallerstringCaller’s SIP URI or phone number
Recorder*RecorderOptionCall recording configuration, see RecorderOption
VAD*VADOptionVoice activity detection configuration, see VADOption
ASR*ASROptionAutomatic Speech Recognition (ASR) configuration, see ASROption
TTS*TTSOptionText-to-Speech configuration, see TTSOption
HandshakeTimeoutstringConnection handshake timeout
EnableIPv6boolEnable IPv6 support
Sip*SipOptionSIP registration account, password, and domain configuration, see SipOption
Extramap[string]stringAdditional parameters

RecorderOption

Recording configuration options:

type RecorderOption struct {
    RecorderFile string
    Samplerate   int
    Ptime        int
}

Field Description:

FieldTypeDefaultDescription
RecorderFilestring-Recording file path
Samplerateint16000Sample rate (Hz)
Ptimeint200Packet time (milliseconds)

ASROption

Speech recognition configuration options:

type ASROption struct {
    Provider        string
    Model           string
    Language        string
    AppID           string
    SecretID        string
    SecretKey       string
    ModelType       string
    BufferSize      int
    SampleRate      uint32
    Endpoint        string
    Extra           map[string]string
    StartWhenAnswer bool
}

Field Description:

FieldTypeDescription
ProviderstringASR provider: tencent, aliyun, Deepgram, etc.
ModelstringModel name
LanguagestringLanguage (e.g.: zh-CN, en-US), see corresponding provider documentation for details
AppIDstringTencent Cloud’s appId
SecretIDstringTencent Cloud’s secretId
SecretKeystringTencent Cloud’s secretKey, or other provider’s API Key
ModelTypestringASR model type (e.g.: 16k_zh, 8k_en), see provider documentation for details
BufferSizeintAudio buffer size, unit: bytes
SampleRateuint32Sample rate
EndpointstringCustom service endpoint URL
Extramap[string]stringProvider-specific parameters
StartWhenAnswerboolRequest ASR service after call is answered

TTSOption

Text-to-speech synthesis configuration options:

type TTSOption struct {
    Samplerate       int32
    Provider         string
    Speed            float32
    AppID            string
    SecretID         string
    SecretKey        string
    Volume           int32
    Speaker          string
    Codec            string
    Subtitle         bool
    Emotion          string
    Endpoint         string
    Extra            map[string]string
    WaitInputTimeout uint32
}

Field Description:

FieldTypeDescription
Samplerateint32Sample rate, unit: Hz
ProviderstringTTS provider: tencent, aliyun, deepgram, voiceapi
Speedfloat32Speech rate
AppIDstringTencent Cloud’s appId
SecretIDstringTencent Cloud’s secretId
SecretKeystringTencent Cloud’s secretKey, or other provider’s API Key
Volumeint32Volume (1-10)
SpeakerstringVoice, see provider documentation
CodecstringEncoding format
SubtitleboolWhether to enable subtitles
EmotionstringEmotion: neutral, happy, sad, angry, etc.
EndpointstringCustom TTS service endpoint URL
Extramap[string]stringProvider-specific parameters
WaitInputTimeoutuint32Maximum time to wait for user input (milliseconds)

VADOption

Voice activity detection configuration options:

type VADOption struct {
    Type                  string
    Samplerate            uint32
    SpeechPadding         uint64
    SilencePadding        uint64
    Ratio                 float32
    VoiceThreshold        float32
    MaxBufferDurationSecs uint64
    Endpoint              string
    SecretKey             string
    SecretID              string
    SilenceTimeout        uint
}

Field Description:

FieldTypeDefaultDescription
TypestringsileroVAD algorithm type: silero, ten
Samplerateuint3216000Sample rate
SpeechPaddinguint64250Start detection speechPadding milliseconds after speech starts
SilencePaddinguint64100Silence event trigger interval, unit: milliseconds
Ratiofloat320.5Speech detection ratio threshold
VoiceThresholdfloat320.5Voice energy threshold
MaxBufferDurationSecsuint6450Maximum buffer duration, unit: seconds
Endpointstring-Custom VAD service endpoint
SecretKeystring-VAD service authentication key
SecretIDstring-VAD service authentication ID
SilenceTimeoutuint5000Silence detection timeout, unit: milliseconds

SipOption

SIP configuration options:

type SipOption struct {
    Username string
    Password string
    Realm    string
    Headers  map[string]string
}

Field Description:

FieldTypeDescription
UsernamestringSIP username for authentication
PasswordstringSIP password for authentication
RealmstringSIP domain/realm for authentication
Headersmap[string]stringAdditional SIP protocol headers (key-value pairs)

ReferOption

Transfer configuration options:

type ReferOption struct {
    Denoise     bool
    Timeout     uint32
    MusicOnHold string
    AutoHangup  bool
    Sip         *SipOption
    ASR         *ASROption
}

Field Description:

FieldTypeDescription
DenoiseboolWhether to enable noise reduction
Timeoutuint32Timeout (seconds)
MusicOnHoldstringHold music URL
AutoHangupboolAutomatically hang up after transfer completes
Sip*SipOptionSIP configuration
ASR*ASROptionASR configuration

Event Types

All event type definitions supported by Client:

Event

Base event structure, contains event type name.

type Event struct {
    Event string `json:"event"`
}

Field Description:

FieldTypeDescription
EventstringEvent type name

IncomingEvent

Incoming call event, triggered when there is a new incoming call.

type IncomingEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    Caller    string `json:"caller"`
    Callee    string `json:"callee"`
    Sdp       string `json:"sdp"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
CallerstringCaller number
CalleestringCallee number
SdpstringSDP offer string

AnswerEvent

Answer event, triggered when the call is answered and SDP negotiation is complete.

type AnswerEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    Sdp       string `json:"sdp"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
SdpstringSDP answer string

RejectEvent

Reject event, triggered when the call is rejected.

type RejectEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    Reason    string `json:"reason"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
ReasonstringRejection reason

RingingEvent

Ringing event, triggered when the call is ringing (SIP calls).

type RingingEvent struct {
    TrackID    string `json:"trackId"`
    Timestamp  uint64 `json:"timestamp"`
    EarlyMedia bool   `json:"earlyMedia"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
EarlyMediaboolWhether early media is available

HangupEvent

Hangup event, triggered when the call ends.

type HangupEventAttendee struct {
    Username string `json:"username"`
    Realm    string `json:"realm"`
    Source   string `json:"source"`
}

type HangupEvent struct {
    Timestamp   uint64               `json:"timestamp"`
    Reason      string               `json:"reason"`
    Initiator   string               `json:"initiator"`
    StartTime   string               `json:"startTime,omitempty"`
    HangupTime  string               `json:"hangupTime,omitempty"`
    AnswerTime  *string              `json:"answerTime,omitempty"`
    RingingTime *string              `json:"ringingTime,omitempty"`
    From        *HangupEventAttendee `json:"from,omitempty"`
    To          *HangupEventAttendee `json:"to,omitempty"`
    Extra       map[string]any       `json:"extra,omitempty"`
}

HangupEvent Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
ReasonstringHangup reason
InitiatorstringParty that initiated the hangup
StartTimestringCall start time
HangupTimestringHangup time
AnswerTime*stringAnswer time
RingingTime*stringRinging time
From*HangupEventAttendeeCaller information
To*HangupEventAttendeeCallee information
Extramap[string]anyAdditional information

HangupEventAttendee Field Description:

FieldTypeDescription
UsernamestringUsername
RealmstringDomain
SourcestringSource

SpeakingEvent

Speaking event, triggered when VAD detects user starts speaking.

type SpeakingEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    StartTime uint64 `json:"startTime"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
StartTimeuint64Speech start time (milliseconds)

SilenceEvent

Silence event, triggered when user stops speaking is detected.

type SilenceEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    StartTime uint64 `json:"startTime"`
    Duration  uint64 `json:"duration"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
StartTimeuint64Silence start time (milliseconds)
Durationuint64Silence duration (milliseconds)

EouEvent

End of utterance event, triggered when end of speech is detected.

type EouEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    Complete  bool   `json:"complete"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
CompleteboolWhether it ended completely

AsrFinalEvent

ASR final result event, triggered when speech recognition obtains stable result.

type AsrFinalEvent struct {
    TrackID   string  `json:"trackId"`
    Timestamp uint64  `json:"timestamp"`
    Index     uint32  `json:"index"`
    StartTime *uint64 `json:"startTime,omitempty"`
    EndTime   *uint64 `json:"endTime,omitempty"`
    Text      string  `json:"text"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
Indexuint32Speech segment index
StartTime*uint64Speech start time (milliseconds)
EndTime*uint64Speech end time (milliseconds)
TextstringRecognized text content

AsrDeltaEvent

ASR delta result event, intermediate result during speech recognition process, content may change.

type AsrDeltaEvent struct {
    TrackID   string  `json:"trackId"`
    Index     uint32  `json:"index"`
    Timestamp uint64  `json:"timestamp"`
    StartTime *uint64 `json:"startTime,omitempty"`
    EndTime   *uint64 `json:"endTime,omitempty"`
    Text      string  `json:"text"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Indexuint32Speech segment index
Timestampuint64Event timestamp (milliseconds)
StartTime*uint64Speech start time (milliseconds)
EndTime*uint64Speech end time (milliseconds)
TextstringRecognized text content (may change)

TrackStartEvent

Track start event, triggered when a Track starts (RTP, TTS, file playback, etc.).

type TrackStartEvent struct {
    TrackID   string  `json:"trackId"`
    Timestamp uint64  `json:"timestamp"`
    PlayId    *string `json:"playId,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringTrack ID
Timestampuint64Event timestamp (milliseconds)
PlayId*stringPlay ID (TTS/Play command)

TrackEndEvent

Track end event, triggered when a Track ends (RTP ends, TTS completes, file playback completes, etc.).

type TrackEndEvent struct {
    TrackID   string  `json:"trackId"`
    Timestamp uint64  `json:"timestamp"`
    Duration  uint64  `json:"duration"`
    PlayId    *string `json:"playId,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringTrack ID
Timestampuint64Event timestamp (milliseconds)
Durationuint64Playback duration (milliseconds)
PlayId*stringPlay ID (TTS/Play command)

InterruptionEvent

Interruption event, triggered when Interrupt command is received and there are unfinished TTS.

type InterruptionEvent struct {
    TrackID       string  `json:"trackId"`
    Timestamp     uint64  `json:"timestamp"`
    Subtitle      *string `json:"subtitle,omitempty"`
    Position      *uint32 `json:"position,omitempty"`
    TotalDuration uint32  `json:"totalDuration"`
    Current       uint32  `json:"current"`
}

Field Description:

FieldTypeDescription
TrackIDstringTrack ID
Timestampuint64Event timestamp (milliseconds)
Subtitle*stringPlayed subtitle text
Position*uint32Playback position (character count)
TotalDurationuint32Total duration (milliseconds)
Currentuint32Current playback duration (milliseconds)

DTMFEvent

DTMF event, triggered when a keypress is detected.

type DTMFEvent struct {
    TrackID   string `json:"trackId"`
    Timestamp uint64 `json:"timestamp"`
    Digit     string `json:"digit"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
DigitstringKeypress value (0-9, *, #, A-D)

AnswerMachineDetectionEvent

Answer machine detection event, triggered when an answer machine is detected.

type AnswerMachineDetectionEvent struct {
    Timestamp uint64 `json:"timestamp"`
    StartTime uint64 `json:"startTime"`
    EndTime   uint64 `json:"endTime"`
    Text      string `json:"text"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
StartTimeuint64Detection start time (milliseconds)
EndTimeuint64Detection end time (milliseconds)
TextstringDetected text

LLMFinalEvent

LLM final result event, triggered when large language model generates final result.

type LLMFinalEvent struct {
    Timestamp uint64 `json:"timestamp"`
    Text      string `json:"text"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
TextstringFinal text generated by LLM

LLMDeltaEvent

LLM delta result event, triggered when large language model generates delta result.

type LLMDeltaEvent struct {
    Timestamp uint64 `json:"timestamp"`
    Word      string `json:"word"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
WordstringDelta word generated by LLM

MetricsEvent

Metrics event, triggered when performance metrics are collected.

type MetricsEvent struct {
    Timestamp uint64         `json:"timestamp"`
    Key       string         `json:"key"`
    Duration  uint32         `json:"duration"`
    Data      map[string]any `json:"data"`
}

Field Description:

FieldTypeDescription
Timestampuint64Event timestamp (milliseconds)
KeystringMetric name
Durationuint32Duration (milliseconds)
Datamap[string]anyMetric data

ErrorEvent

Error event, triggered when an error occurs.

type ErrorEvent struct {
    TrackID   string  `json:"trackId"`
    Timestamp uint64  `json:"timestamp"`
    Sender    string  `json:"sender"`
    Error     string  `json:"error"`
    Code      *uint32 `json:"code,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
SenderstringError source (asr, tts, media, etc.)
ErrorstringError information
Code*uint32Error code

AddHistoryEvent

Add history event, triggered when conversation history is added.

type AddHistoryEvent struct {
    Sender    string `json:"sender"`
    Timestamp uint64 `json:"timestamp"`
    Speaker   string `json:"speaker"`
    Text      string `json:"text"`
}

Field Description:

FieldTypeDescription
SenderstringSender
Timestampuint64Event timestamp (milliseconds)
SpeakerstringSpeaker identifier
TextstringConversation text

OtherEvent

Other event, used to handle undefined event types.

type OtherEvent struct {
    TrackID   string            `json:"trackId"`
    Timestamp uint64            `json:"timestamp"`
    Sender    string            `json:"sender"`
    Extra     map[string]string `json:"extra,omitempty"`
}

Field Description:

FieldTypeDescription
TrackIDstringCall track ID
Timestampuint64Event timestamp (milliseconds)
SenderstringSender
Extramap[string]stringAdditional information

Complete Examples

SIP Call Example

package main

import (
    "context"
    "log"
    "os"
    "os/signal"
    "syscall"
    
    "github.com/restsend/rustpbxgo"
    "github.com/sirupsen/logrus"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    
    logger := logrus.New()
    logger.SetLevel(logrus.InfoLevel)
    
    // Create client
    client := rustpbxgo.NewClient(
        "ws://localhost:8080",
        rustpbxgo.WithLogger(logger),
        rustpbxgo.WithContext(ctx),
    )
    
    // Set event handlers
    client.OnAnswer = func(event rustpbxgo.AnswerEvent) {
        logger.Info("Call answered")
        // Send welcome message
        client.TTS("Hello, welcome to call", "", "greeting", true, false, nil, nil)
    }
    
    client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
        logger.Infof("User said: %s", event.Text)
        // Respond based on user input
        client.TTS("I received your message", "", "response", true, false, nil, nil)
    }
    
    client.OnHangup = func(event rustpbxgo.HangupEvent) {
        logger.Infof("Call ended: %s", event.Reason)
        cancel()
    }
    
    // Connect to server
    if err := client.Connect("sip"); err != nil {
        log.Fatalf("Connection failed: %v", err)
    }
    defer client.Shutdown()
    
    // Configure call
    callOption := rustpbxgo.CallOption{
        Caller: "sip:1000@example.com",
        Callee: "sip:2000@example.com",
        Denoise: true,
        Sip: &rustpbxgo.SipOption{
            Username: "user",
            Password: "pass",
            Realm: "example.com",
        },
        ASR: &rustpbxgo.ASROption{
            Provider: "tencent",
            Language: "zh-CN",
        },
        TTS: &rustpbxgo.TTSOption{
            Provider: "tencent",
            Speaker: "xiaoyan",
        },
        VAD: &rustpbxgo.VADOption{
            Type: "ten",
            SilenceTimeout: 5000,
        },
    }
    
    // Initiate call
    _, err := client.Invite(ctx, callOption)
    if err != nil {
        log.Fatalf("Call failed: %v", err)
    }
    
    // Wait for signal
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    
    select {
    case <-ctx.Done():
        logger.Info("Call ended")
    case <-sigChan:
        logger.Info("Interrupt signal received")
        client.Hangup("user_interrupt")
    }
}

Answer Incoming Call Example

package main

import (
    "context"
    "encoding/json"
    "log"
    "net/http"
    
    "github.com/restsend/rustpbxgo"
    "github.com/sirupsen/logrus"
)

type WebhookRequest struct {
    DialogID string `json:"dialogId"`
    Caller   string `json:"caller"`
    Callee   string `json:"callee"`
}

func main() {
    logger := logrus.New()
    
    // Set up Webhook handler
    http.HandleFunc("/webhook", func(w http.ResponseWriter, r *http.Request) {
        var req WebhookRequest
        if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
            http.Error(w, err.Error(), http.StatusBadRequest)
            return
        }
        
        logger.Infof("Incoming call: %s -> %s", req.Caller, req.Callee)
        
        // Handle incoming call
        go handleIncomingCall(req.DialogID, req.Caller, req.Callee, logger)
        
        w.WriteHeader(http.StatusOK)
    })
    
    log.Fatal(http.ListenAndServe(":8090", nil))
}

func handleIncomingCall(dialogID, caller, callee string, logger *logrus.Logger) {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    
    // Create client with dialogID
    client := rustpbxgo.NewClient(
        "ws://localhost:8080",
        rustpbxgo.WithLogger(logger),
        rustpbxgo.WithContext(ctx),
        rustpbxgo.WithID(dialogID),
    )
    
    client.OnHangup = func(event rustpbxgo.HangupEvent) {
        logger.Info("Call ended")
        cancel()
    }
    
    // Connect to server
    if err := client.Connect("sip"); err != nil {
        logger.Errorf("Connection failed: %v", err)
        return
    }
    defer client.Shutdown()
    
    // Send ringing
    recorder := &rustpbxgo.RecorderOption{
        RecorderFile: "/recordings/" + dialogID + ".wav",
        Samplerate: 16000,
    }
    client.Ringing("", recorder)
    
    // Answer incoming call
    callOption := rustpbxgo.CallOption{
        Caller: caller,
        Callee: callee,
        ASR: &rustpbxgo.ASROption{
            Provider: "tencent",
            Language: "zh-CN",
        },
        TTS: &rustpbxgo.TTSOption{
            Provider: "tencent",
            Speaker: "xiaoyan",
        },
    }
    
    if err := client.Accept(callOption); err != nil {
        logger.Errorf("Answer failed: %v", err)
        return
    }
    
    // Send welcome message
    client.TTS("Hello, I am an intelligent assistant", "", "greeting", true, false, nil, nil)
    
    // Wait for call to end
    <-ctx.Done()
}

Streaming TTS Example (LLM Integration)

package main

import (
    "bufio"
    "context"
    "encoding/json"
    "log"
    "net/http"
    
    "github.com/restsend/rustpbxgo"
    "github.com/sirupsen/logrus"
)

func streamLLMResponse(client *rustpbxgo.Client, userInput string) {
    // Simulate calling LLM API
    req, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", nil)
    req.Header.Set("Content-Type", "application/json")
    
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        log.Printf("LLM request failed: %v", err)
        return
    }
    defer resp.Body.Close()
    
    playID := "llm-stream"
    scanner := bufio.NewScanner(resp.Body)
    
    for scanner.Scan() {
        line := scanner.Text()
        
        // Parse SSE data
        var data map[string]interface{}
        if err := json.Unmarshal([]byte(line), &data); err != nil {
            continue
        }
        
        // Get delta text
        if content, ok := data["content"].(string); ok && content != "" {
            // Send streaming TTS
            isEnd := data["finish_reason"] != nil
            client.StreamTTS(content, "", playID, isEnd, false, nil, nil)
        }
    }
}

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    
    logger := logrus.New()
    
    client := rustpbxgo.NewClient(
        "ws://localhost:8080",
        rustpbxgo.WithLogger(logger),
        rustpbxgo.WithContext(ctx),
    )
    
    client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
        logger.Infof("User input: %s", event.Text)
        
        // Interrupt current playback
        client.Interrupt()
        
        // Stream response
        go streamLLMResponse(client, event.Text)
    }
    
    // ... other code
}

Best Practices

Error Handling

Always check errors and handle them appropriately:

if err := client.Connect("sip"); err != nil {
    log.Fatalf("Connection failed: %v", err)
}

if err := client.TTS("Hello", "", "1", true, false, nil, nil); err != nil {
    log.Printf("TTS failed: %v", err)
    // Retry or use fallback
}

Resource Cleanup

Ensure proper resource cleanup:

defer client.Shutdown()

Context Management

Use context to control lifecycle:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

client := rustpbxgo.NewClient(
    endpoint,
    rustpbxgo.WithContext(ctx),
)

Logging

Use appropriate log levels:

logger := logrus.New()
logger.SetLevel(logrus.InfoLevel) // Production environment
// logger.SetLevel(logrus.DebugLevel) // Development environment

Event Handling

Avoid long-running operations in event handlers, use goroutines:

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
    go func() {
        // Long-running operation
        response := processWithLLM(event.Text)
        client.TTS(response, "", "reply", true, false, nil, nil)
    }()
}

Troubleshooting

Connection Issues

If unable to connect to server:

  1. Check if endpoint URL is correct
  2. Confirm server is running
  3. Check firewall settings
  4. Enable debug logging to see detailed information
logger.SetLevel(logrus.DebugLevel)
client := rustpbxgo.NewClient(
    endpoint,
    rustpbxgo.WithLogger(logger),
    rustpbxgo.WithDumpEvents(true),
)

ASR Not Working

If ASR cannot recognize speech:

  1. Confirm ASR configuration is correct
  2. Check if API keys are valid
  3. Verify sample rate settings
  4. Check VAD configuration

TTS No Sound

If TTS has no sound:

  1. Check if TTS configuration is correct
  2. Verify speaker parameter
  3. Confirm endOfStream is set correctly
  4. Check network connection

More Resources

Reference

For complete example code, see: /Users/yangli/Desktop/rustpbxgo/cmd