Code Explanation

In the previous chapter, we have successfully run a voice assistant.

This chapter will explain the code in detail to help readers understand how to use the Active Call SDK.

Project address: https://github.com/restsend/rustpbxgo

Directory Structure

rustpbxgo/
├── README.md           
├── client.go           # SDK core definition
├── cmd/                # Example application
│   ├── main.go         # Program entry
│   ├── llm.go          # Large model interaction
│   ├── media.go        # WebRTC
│   └── webhook.go      # WebHook handling
├── go.mod             
└── go.sum            
  • client.go: Includes Active Call Go core data structure Client definition and its methods.
  • cmd/ directory: Voice assistant example code, including SIP/WebRTC calls, incoming call handling with webhooks, and large model interaction logic.

client.go - Client Definition

Connect to Active Call

  • NewClient: Create client
    • endpoint: specify Active Call server address
  • Connect: Connect to server
    • callType: specify call type (here we use "webrtc")
  • Shutdown: Close client

See also

* [Create Client](/static/docs/active-call/sdk/go.mdx#create-client) * [Connect to Server](/static/docs/active-call/guide/connect.mdx)

Send Commands

Active CallGo sends corresponding commands to Active Call by calling methods.

Commands used here include:

  • Invite: Initiate call
    • For WebRTC calls, need to set offer in CallOption as the SDP offer of the call target
  • TTS: Call text-to-speech service and play result
  • Play: Play audio file from URL
  • Interrupt: Interrupt current playback (TTS and file playback)
  • Hangup: Hang up

See also

* [Call](/static/docs/active-call/guide/call.mdx#call) * [Text-to-Speech (TTS)](/static/docs/active-call/guide/tts.mdx) * [Play Audio (Play)](/static/docs/active-call/guide/play.mdx)

Register Callback Functions

During the entire process from call establishment to termination, various events are triggered, as shown in the diagram:

CallEvent

Handle corresponding events by setting callback functions.

For example, to handle AsrFinal events (stable speech recognition results), you can set a callback function in the OnAsrFinal field.

Flow Chart

When the client calls the Connect method, it creates two goroutines (green parts in the diagram below):

  • One is responsible for reading WebSocket messages and parsing (top)

  • One is responsible for processing messages and sending commands (bottom), calls processEvent method when receiving events, and calls corresponding callback functions based on event type.

GoArch

In the next section, we will see how to use these APIs to send commands and handle events.

main.go - Example Application Entry

main.go is the program entry, main logic includes:

Create WebRTCPeer

Create WebRTCPeer and generate SDP offer, then write the SDP offer into the callOption.Offer field

    localSdp, err := mediaHandler.Setup(codec, iceServers)
    if err != nil {
        logger.Fatalf("Failed to get local SDP: %v", err)
    }
    logger.Infof("Offer SDP: %v", localSdp)
    callOption.Offer = localSdp
WebRTC calls need to set `callOption.Offer`, SIP calls need to set `callOption.Caller` and `callOption.Callee`.

See:

Create Client, and Register Callback Functions

Use the NewClient function to create a client. The main parameter here is option.Endpoint to set the Active Call server address.

Detailed Parameters

* [Create Client](/static/docs/active-call/sdk/go.mdx#create-client)

Here we mainly handle AsrFinal events, set a callback function in the OnAsrFinal field.

Here we call the LLMHandler.QueryStream method to input the recognition result event.Text into the large model.

client.OnAsrFinal = func(event rustpbxgo.AsrFinalEvent) {
    ...
    response, err := option.LLMHandler.QueryStream(option.OpenaiModel, event.Text, option.StreamingTTS, client, option.ReferCaller)
    ...
}

Connect/Disconnect Active Call

    err = client.Connect(callType)
    if err != nil {
        logger.Fatalf("Failed to connect to server: %v", err)
    }
    defer client.Shutdown()

Here we use webrtc calls, so the callType parameter is "webrtc".

See also

* [Connect to Server](/static/docs/active-call/guide/connect.mdx#path)

Send Welcome Message Using TTS Command

    client.TTS(greeting, "", "1", true, false, nil, nil)

Here the greeting parameter value is read from the command line, used as the text parameter of the TTS command, which is the text to be converted.

See also

* [TTS Command Details](/static/docs/active-call/guide/tts.mdx) * [TTS Method](/static/docs/active-call/sdk/go.mdx#tts)

llm.go - Large Model Interaction

llm.go is responsible for interacting with large models, including defining tools, making requests, and handling responses.

Responses include text and tool calls. If it’s text, we will play it through TTS commands.

Here we choose Go OpenAI as the large model client.

Define Tools (Tools)

What is Tool Calling?

Tool Calling (also called Function Calling) allows LLM to call externally defined functions when needed. For example:

We wrap the Hangup and Refer commands as tools.

  • When the user says “transfer me to human”, the LLM will call the Refer tool to implement transfer
  • When the user says “goodbye”, the LLM will call the Hangup tool to implement hangup

See: OpenAI Function Calling

Here we define two tools: hangup and refer:

    var hangupDefinition = openai.FunctionDefinition{
        Name:        "hangup",
        Description: "End the conversation and hang up the call",
        Parameters: json.RawMessage(`{
            "type": "object",
            "properties": {
                "reason": {
                    "type": "string",
                    "description": "Reason for hanging up the call"
                }
            },
            "required": []
        }`),
    }

    var referDefinition = openai.FunctionDefinition{
        Name:        "refer",
        Description: "Refer the call to another target",
        Parameters: json.RawMessage(`{
            "type": "object",
            "properties": {
            },
            "required": []
        }`),
    }

When the large model’s response contains tool calls, we will call the Hangup and Refer methods respectively.

These two methods will then send Hangup and Refer commands to Active Call.

Make Request

Here we use the CreateChatCompletionStream method to make requests.

    stream, err := h.client.CreateChatCompletionStream(h.ctx, request)
    if err != nil {
        return "", err
    }
    defer stream.Close()

Since we set Stream: true in the request, it will return a streaming response.

	request := openai.ChatCompletionRequest{
		Model:       model,
		Messages:    h.messages,
		Temperature: 0.7,
		Stream:      true,
	}

Streaming responses are implemented through Server-Sent Events technology. Each call to the stream.Recv() method returns partial results.

Handle Response

When LLM decides to call a tool, the result will contain toolCall.

We determine which tool to call based on the toolCall.Function.Name field, then call the corresponding tool.

    if len(response.Choices) > 0 && len(response.Choices[0].Delta.ToolCalls) > 0 {
        for _, toolCall := range response.Choices[0].Delta.ToolCalls {
            if toolCall.Function.Name == "hangup" {
                err := tools.HandleHangup("LLM requested hangup");
            }
            if toolCall.Function.Name == "refer" {
                err := tools.HandleRefer(); 
            }
        }
    }

For final text responses, we play them through TTS commands.

for {
    response, err := stream.Recv()
    if err == io.EOF {
        break  // Stream ended
    }
    
    // Get text content
    content := response.Choices[0].Delta.Content
    if content != "" {
        fullResponse += content
        
        // Immediately send to TTS (streaming playback)
        if isFinished {
            ttsWriter.Write(content, true, false)  // endOfStream=true, last segment
        } else {
            ttsWriter.Write(content, false, false) // endOfStream=false, intermediate segment
        }
    }
}

TTSWriter Parameter Description

- `text`: Text segment to play - `endOfStream`: Whether it's the last segment - `autoHangup`: Whether to automatically hang up after playback completes

Summary

In this chapter, we introduced the code structure of Active Call Go.

And how to connect to Active Call, send commands, handle events, and briefly introduced large model interaction and tool calling.

Next chapter we will introduce how to add custom tools to implement a weather query Agent.