Class SpeechToTextRequest

Namespace: Glitch9.AIDevKit.ElevenLabs

public class SpeechToTextRequest : IMultipartFormRequest

Inheritance: object

SpeechToTextRequest

Properties

AdditionalFormats

Optional. A list of additional formats to export the transcript to.

public List<ElevenLabsFormat> AdditionalFormats { get; set; }

Property Value

List<ElevenLabsFormat>

CloudStorageUrl

Optional. The valid AWS S3 or Google Cloud Storage URL of the file to transcribe. Exactly one of the file or cloud_storage_url parameters must be provided. The file must be a valid publicly accessible cloud storage URL. The file size must be less than 2GB. URL can be pre-signed.

public string CloudStorageUrl { get; set; }

Property Value

string

Diarize

Optional. Whether to annotate which speaker is currently talking in the uploaded file. Defaults to false.

public bool? Diarize { get; set; }

Property Value

bool?

File

Optional. The file to transcribe. All major audio and video formats are supported. Exactly one of the file or cloud_storage_url parameters must be provided. The file size must be less than 1GB.

public File<AudioClip> File { get; set; }

Property Value

File<AudioClip>

InputFormat

Optional. The format of input audio. Options are ‘pcm_s16le_16’ or ‘other’. For pcm_s16le_16, the input audio must be 16-bit PCM at a 16kHz sample rate, single channel (mono), and little-endian byte order. atency will be lower than with passing an encoded waveform. Defaults to 'other'.

public ElevenLabsTypes.InputFormat InputFormat { get; set; }

Property Value

ElevenLabsTypes.InputFormat

LanguageCode

Optional. An ISO-639-1 or ISO-639-3 language_code corresponding to the language of the audio file. Can sometimes improve transcription performance if known beforehand. Defaults to null, in this case the language is predicted automatically.

public string LanguageCode { get; set; }

Property Value

string

Model

Required. The ID of the model to use for transcription, currently only ‘scribe_v1’ and ‘scribe_v1_experimental’ are available.

public string Model { get; set; }

Property Value

string

NumSpeakers

Optional. The maximum amount of speakers talking in the uploaded file. Can help with predicting who speaks when. The maximum amount of speakers that can be predicted is 32. Defaults to null, in this case the amount of speakers is set to the maximum value the model supports.

public int? NumSpeakers { get; set; }

Property Value

int?

TagAudioEvents

Optional. Whether to tag audio events like (laughter), (footsteps), etc. in the transcription. Defaults to true.

public bool? TagAudioEvents { get; set; }

Property Value

bool?

TimestampsGranularity

Optional. The granularity of the timestamps in the transcription. ‘word’ provides word-level timestamps and ‘character’ provides character-level timestamps per word. Allowed values: none, word, character. Defaults to 'word'.

public TimestampsGranularity? TimestampsGranularity { get; set; }

Property Value

TimestampsGranularity?

Table of Contents

Class SpeechToTextRequest

Properties

AdditionalFormats

Property Value

CloudStorageUrl

Property Value

Diarize

Property Value

File

Property Value

InputFormat

Property Value

LanguageCode

Property Value

Model

Property Value

NumSpeakers

Property Value

TagAudioEvents

Property Value

TimestampsGranularity

Property Value