Table of Contents

Class RealtimeEvent.Server.InputAudioBuffer

public static class RealtimeEvent.Server.InputAudioBuffer
Inheritance
object
RealtimeEvent.Server.InputAudioBuffer

Fields

Cleared

Returned when the input audio buffer is cleared by the client.

public const string Cleared = "input_audio_buffer.cleared"

Field Value

string

Committed

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode.

public const string Committed = "input_audio_buffer.committed"

Field Value

string

SpeechStarted

Returned in server turn detection mode when speech is detected.

public const string SpeechStarted = "input_audio_buffer.speech_started"

Field Value

string

SpeechStopped

Returned in server turn detection mode when speech stops.

public const string SpeechStopped = "input_audio_buffer.speech_stopped"

Field Value

string

TimeoutTriggered

Added 2024-09-19
Returned when the Server VAD timeout is triggered for the input audio buffer. This is configured with idle_timeout_ms in the turn_detection settings of the session, and it indicates that there hasn't been any speech detected for the configured duration.

The audio_start_ms and audio_end_ms fields indicate the segment of audio after the last model response up to the triggering time, as an offset from the beginning of audio written to the input audio buffer. This means it demarcates the segment of audio that was silent and the difference between the start and end values will roughly match the configured timeout.

The empty audio will be committed to the conversation as an input_audio item (there will be a input_audio_buffer.committed event) and a model response will be generated. There may be speech that didn't trigger VAD but is still detected by the model, so the model may respond with something relevant to the conversation or a prompt to continue speaking.

public const string TimeoutTriggered = "input_audio_buffer.timeout_triggered"

Field Value

string