Overview
GrokRealtimeLLMService provides real-time, multimodal conversation capabilities using xAI’s Grok Voice Agent API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management with low-latency response times.
Grok Realtime API Reference
Pipecat’s API methods for Grok Realtime integration
Example Implementation
Complete Grok Realtime conversation example
Grok Voice Documentation
Official xAI Grok Voice Agent API documentation
xAI Console
Access Grok models and manage API keys
Installation
To use Grok Realtime services, install the required dependencies:Prerequisites
xAI Account Setup
Before using Grok Realtime services, you need:- xAI Account: Sign up at xAI Console
- API Key: Generate a Grok API key from your account dashboard
- Model Access: Ensure access to Grok Voice Agent models
- Usage Limits: Configure appropriate usage limits and billing
Required Environment Variables
XAI_API_KEY: Your xAI API key for authentication
Key Features
- Real-time Speech-to-Speech: Direct audio processing with low latency
- Multilingual Support: Support for multiple languages
- Voice Activity Detection: Server-side VAD for automatic speech detection
- Function Calling: Seamless support for external functions and tool integration
- Multiple Voice Options: Various voice personalities available
- WebSocket Support: Real-time bidirectional audio streaming
Configuration
GrokRealtimeLLMService
xAI API key for authentication.
WebSocket base URL for the Grok Realtime API. Override for custom deployments.
Configuration properties for the realtime session. If
None, uses default
SessionProperties with voice "Ara" and server-side VAD enabled. See
SessionProperties below.Runtime-updatable settings for this service. Use this to configure session
properties, system instructions, and other model parameters. See
GrokRealtimeLLMSettings below.
Whether to start with audio input paused.
GrokRealtimeLLMSettings
Runtime-updatable settings for the service. This is the preferred way to configure session properties and other parameters.Grok Realtime session properties (voice, audio config, tools, etc.).
instructions is synced bidirectionally with the top-level
system_instruction field.System instructions for the assistant. This field is synced with
session_properties.instructions - updating either will update both.LLMSettings: model, temperature, max_tokens, top_p, top_k, frequency_penalty, presence_penalty, seed.
SessionProperties
Session-level configuration that can be passed via the deprecatedsession_properties constructor argument or (preferred) via settings=GrokRealtimeLLMSettings(session_properties=...). These settings can be updated during the session using LLMUpdateSettingsFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
instructions | str | None | System instructions for the assistant. |
voice | Literal["Ara", "Rex", "Sal", "Eve", "Leo"] | "Ara" | Voice the model uses to respond. |
turn_detection | TurnDetection | TurnDetection(type="server_vad") | Turn detection configuration. Set to None for manual turn detection. |
audio | AudioConfiguration | None | Configuration for input and output audio formats. |
tools | List[GrokTool] | None | Available tools: web_search, x_search, file_search, or custom function tools. |
AudioConfiguration
Theaudio field in SessionProperties accepts an AudioConfiguration with input and output sub-configurations:
AudioInput (audio.input):
| Parameter | Type | Default | Description |
|---|---|---|---|
format | AudioFormat | None | Input audio format. Supports PCMAudioFormat (configurable rate), PCMUAudioFormat (8kHz), or PCMAAudioFormat (8kHz). |
audio.output):
| Parameter | Type | Default | Description |
|---|---|---|---|
format | AudioFormat | None | Output audio format. Same format options as input. |
Built-in Tools
Grok provides several built-in tools in addition to custom function tools:| Tool | Type | Description |
|---|---|---|
WebSearchTool | web_search | Search the web for current information |
XSearchTool | x_search | Search X (Twitter) for posts. Supports allowed_x_handles filter. |
FileSearchTool | file_search | Search uploaded document collections by vector_store_ids |
Usage
Basic Setup
With Session Configuration (Recommended)
Using thesettings parameter is the preferred approach:
The
system_instruction field in GrokRealtimeLLMSettings is synced with
session_properties.instructions. You can set it at either location.With Built-in Tools
Updating Settings at Runtime
SessionProperties fields (like
instructions, voice, tools) can be
passed directly in the settings dictionary. They will be automatically
routed to the session_properties field.Notes
- Audio format auto-configuration: If audio format is not specified in
session_properties, the service automatically configures PCM input/output using the pipeline’s sample rates. - Server-side VAD: Enabled by default. When VAD is enabled, the server handles speech detection and turn management automatically. Set
turn_detectiontoNoneto manage turns manually. - Audio before setup: Audio is not sent to Grok until the conversation setup is complete, preventing sample rate mismatches.
- Available voices: Ara (default), Rex, Sal, Eve, and Leo.
- G.711 support: PCMU and PCMA formats are supported at a fixed 8000 Hz rate, useful for telephony integrations.
Event Handlers
| Event | Description |
|---|---|
on_conversation_item_created | Called when a new conversation item is created in the session |
on_conversation_item_updated | Called when a conversation item is updated or completed |