Skip to main content

Code Explanation

In the previous chapter, we have successfully run a voice assistant.

This chapter will explain the code in detail to help readers understand how to use the RustPBX SDK.

Project address: https://github.com/restsend/rustpbxgo

Directory Structure

rustpbxgo/
├── README.md
├── client.go # SDK core definition
├── cmd/ # Example application
│ ├── main.go # Program entry
│ ├── llm.go # Large model interaction
│ ├── media.go # WebRTC
│ └── webhook.go # WebHook handling
├── go.mod
└── go.sum
  • client.go: Includes RustPBX Go core data structure Client definition and its methods.
  • cmd/ directory: Voice assistant example code, including SIP/WebRTC calls, incoming call handling with webhooks, and large model interaction logic.

client.go - Client Definition

Connect to RustPBX

  • NewClient: Create client
    • endpoint: specify RustPBX server address
  • Connect: Connect to server
    • callType: specify call type (here we use "webrtc")
  • Shutdown: Close client

Send Commands

RustPBXGo sends corresponding commands to RustPBX by calling methods.

Commands used here include:

  • Invite: Initiate call
    • For WebRTC calls, need to set offer in CallOption as the SDP offer of the call target
  • TTS: Call text-to-speech service and play result
  • Play: Play audio file from URL
  • Interrupt: Interrupt current playback (TTS and file playback)
  • Hangup: Hang up

Register Callback Functions

During the entire process from call establishment to termination, various events are triggered, as shown in the diagram:

Phone
Phone
UserAgent
UserAgent
INVITE
INVITE
BYE
BYE
4xx
4xx
200 OK
200 OK
200 OK
200 OK
Events
Events
Track Start Ringing Reject Answer Track End Hangup
180 Ringing
180 Ringing
Text is not SVG - cannot display

Handle corresponding events by setting callback functions.

For example, to handle AsrFinal events (stable speech recognition results), you can set a callback function in the OnAsrFinal field.

Flow Chart

When the client calls the Connect method, it creates two goroutines (green parts in the diagram below):

  • One is responsible for reading WebSocket messages and parsing (top)

  • One is responsible for processing messages and sending commands (bottom), calls processEvent method when receiving events, and calls corresponding callback functions based on event type.

c.conn.ReadMessage()
c.conn.ReadMessage()
RustPBX
RustPBX
Event/WebSocket
Event/WebSocket
Event/c.eventChan
Event/c.eventChan
c.processEvent()
c.processEvent()
OnAsrFinal
OnAsrFinal
client
client
c.conn.WriteJson()
c.conn.WriteJson()
Command/WebSocket
Command/WebSocket
OnSpeaking
OnSpeaking
OnHangup
OnHangup
Text is not SVG - cannot display

In the next section, we will see how to use these APIs to send commands and handle events.

main.go - Example Application Entry

main.go is the program entry, main logic includes:

Create WebRTCPeer

Create WebRTCPeer and generate SDP offer, then write the SDP offer into the callOption.Offer field

main.go
    localSdp, err := mediaHandler.Setup(codec, iceServers)
if err != nil {
logger.Fatalf("Failed to get local SDP: %v", err)
}
logger.Infof("Offer SDP: %v", localSdp)
callOption.Offer = localSdp
info

WebRTC calls need to set callOption.Offer, SIP calls need to set callOption.Caller and callOption.Callee.

See:

Create Client, and Register Callback Functions

Use the NewClient function to create a client. The main parameter here is option.Endpoint to set the RustPBX server address.

Detailed Parameters

Here we mainly handle AsrFinal events, set a callback function in the OnAsrFinal field.

Here we call the LLMHandler.QueryStream method to input the recognition result event.Text into the large model.

main.go
client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
...
response, err := option.LLMHandler.QueryStream(option.OpenaiModel, event.Text, option.StreamingTTS, client, option.ReferCaller)
...
}

Connect/Disconnect RustPBX

main.go
    err = client.Connect(callType)
if err != nil {
logger.Fatalf("Failed to connect to server: %v", err)
}
defer client.Shutdown()

Here we use webrtc calls, so the callType parameter is "webrtc".

Send Welcome Message Using TTS Command

main.go
    client.TTS(greeting, "", "1", true, false, nil, nil)

Here the greeting parameter value is read from the command line, used as the text parameter of the TTS command, which is the text to be converted.

llm.go - Large Model Interaction

llm.go is responsible for interacting with large models, including defining tools, making requests, and handling responses.

Responses include text and tool calls. If it's text, we will play it through TTS commands.

Here we choose Go OpenAI as the large model client.

Define Tools (Tools)

What is Tool Calling?

Tool Calling (also called Function Calling) allows LLM to call externally defined functions when needed. For example:

We wrap the Hangup and Refer commands as tools.

  • When the user says "transfer me to human", the LLM will call the Refer tool to implement transfer
  • When the user says "goodbye", the LLM will call the Hangup tool to implement hangup

See: OpenAI Function Calling

Here we define two tools: hangup and refer:

llm.go
    var hangupDefinition = openai.FunctionDefinition{
Name: "hangup",
Description: "End the conversation and hang up the call",
Parameters: json.RawMessage(`{
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Reason for hanging up the call"
}
},
"required": []
}`),
}

var referDefinition = openai.FunctionDefinition{
Name: "refer",
Description: "Refer the call to another target",
Parameters: json.RawMessage(`{
"type": "object",
"properties": {
},
"required": []
}`),
}

When the large model's response contains tool calls, we will call the Hangup and Refer methods respectively.

These two methods will then send Hangup and Refer commands to RustPBX.

Make Request

Here we use the CreateChatCompletionStream method to make requests.

llm.go
    stream, err := h.client.CreateChatCompletionStream(h.ctx, request)
if err != nil {
return "", err
}
defer stream.Close()

Since we set Stream: true in the request, it will return a streaming response.

llm.go
	request := openai.ChatCompletionRequest{
Model: model,
Messages: h.messages,
Temperature: 0.7,
Stream: true,
}

Streaming responses are implemented through Server-Sent Events technology. Each call to the stream.Recv() method returns partial results.

Handle Response

When LLM decides to call a tool, the result will contain toolCall.

We determine which tool to call based on the toolCall.Function.Name field, then call the corresponding tool.

llm.go
    if len(response.Choices) > 0 && len(response.Choices[0].Delta.ToolCalls) > 0 {
for _, toolCall := range response.Choices[0].Delta.ToolCalls {
if toolCall.Function.Name == "hangup" {
err := tools.HandleHangup("LLM requested hangup");
}
if toolCall.Function.Name == "refer" {
err := tools.HandleRefer();
}
}
}

For final text responses, we play them through TTS commands.

for {
response, err := stream.Recv()
if err == io.EOF {
break // Stream ended
}

// Get text content
content := response.Choices[0].Delta.Content
if content != "" {
fullResponse += content

// Immediately send to TTS (streaming playback)
if isFinished {
ttsWriter.Write(content, true, false) // endOfStream=true, last segment
} else {
ttsWriter.Write(content, false, false) // endOfStream=false, intermediate segment
}
}
}
TTSWriter Parameter Description
  • text: Text segment to play
  • endOfStream: Whether it's the last segment
  • autoHangup: Whether to automatically hang up after playback completes

Summary

In this chapter, we introduced the code structure of RustPBX Go.

And how to connect to RustPBX, send commands, handle events, and briefly introduced large model interaction and tool calling.

Next chapter we will introduce how to add custom tools to implement a weather query Agent.