Namespace Glitch9.AIDevKit

Classes

AIDevKitHub: Exposed central hub for AIDevKit for customization and user context.

AIDevKitSettings

AIProviderSettings: Base class for AI client settings. This class is used to store API keys and other settings related to AI clients.

AIProviders

AIRequest

AllowedTools

Annotation

AnnotationJsonConverter: Polymorphic JSON converter for Annotation and its derived types. Uses the "type" discriminator field.

AnnotationWrapper: Non-flattened wrapper for different types of annotations.

AnthropicModel

ApiPopupAttribute

ApiRefAttribute

ApiSpecificAttribute

ApiSpecificPropertyAttribute

ApproximateLocation

AudioData

AudioDelta

AudioGenerationRequest<TSelf, TPrompt>

AudioIsolationParameters

AudioIsolationRequest

AudioPart

AudioPrice

AudioUsage

BaseApiSpecificPropertyAttribute

BinaryGenerativeAudioEvent: A generative audio event that contains binary audio data.

BinaryGenerativeAudioEventParser

BrokenResponseException

ClickAction

CodeGenerationRequest: Added new on 2025.05.28 Task for generating code snippets or scripts for Unity C#.

CodeInterpreter: A tool that runs Python code to help generate a response to a prompt.

CodeInterpreter.FileIdSet: Code interpreter container.

CodeInterpreterOutput: A tool call to run code.

CodeInterpreterOutputImage

CodeInterpreterOutputLogs

CodeInterpreterParameters

CodeInterpreterResult: Be careful. This is not a separate tool call, but a sub-object used within CodeInterpreterCall.

ComparisonFilter: A filter used to compare a specified attribute key to a given value using a defined comparison operation.

CompletionRequest: Legacy completion request for models that do not support chat-based interactions.

CompoundFilter: Combine multiple filters using and or or.

ComputerAction

ComputerUse: A tool that controls a virtual computer.

ComputerUseCall: A tool call to a computer use tool. See the computer use guide for more information.

ComputerUseOutput: The output of a computer tool call.

ComputerUseParameters

ComputerUseSafetyCheck

ComputerUseScreenshotInfo

ContainerFileCitation

ContentPart: Base class for different types of content parts in a message. Each content part has a defined type, such as Text, Image(Url/Base64/FileId), Audio(Base64), or File(Base64/FileId).

ContentPart<T>: Base class for different types of content parts in a message. Each content part has a defined type, such as Text, Image(Url/Base64/FileId), Audio(Base64), or File(Base64/FileId).

ConversationItem

ConversationItemExtensions

ConversationItemStatus

ConversationItemType

CountTokensOutput

CountTokensRequest

CustomTool: A custom tool that processes input using a specified format.

CustomToolCall: A call to a custom tool created by the model.

CustomToolChoice

CustomToolFormat: Polymorphic format for custom tool input.

CustomToolFormatConverter: Polymorphic converter for CustomToolFormat.

CustomToolOutput: The output of a custom tool call from your code, being sent back to the model.

CustomToolParameters

DeleteModelRequest

DeltaEventBase

DeltaEvent<T>

DoubleClickAction

DragAction

DragAction.Coordinate

ElevenLabsModel

ElevenLabsTypes: Types only used by ElevenLabs API.
This shit is here instead of the ElevenLabs assembly because this is used by GENTask and the UnityEditor Generator Windows.

EmbeddingPrompt

EmbeddingRequest

EmbeddingResult

Embedding_OpenAI

EmptyResponseException

FieldRefAttribute

FileCatalog: ScriptableObject database for storing file data. This database is used to keep track of the files available in the AI library.

FileCatalog.Repo: Database for storing file data.

FileCitation

FileData

FileDeleteRequest

FileDownloadRequest

FilePart

FilePath

FileSearch: A tool that searches for relevant content from uploaded files.

FileSearch.RankingOptions: Ranking options for search.

FileSearchOutput

FileSearchParameters

FileSearchResult

FileUploadRequest

FindAction

FineTuningFile: A JSONL file is a text file where each line is a valid JSON object. This format is commonly used for training data in machine learning tasks, including fine-tuning.

FineTuningRequest

FluentApiRequestBuilderExt

FluentApiRequestCallerExt

Beginner-friendly fluent extension methods that create request objects for generative AI. These helpers do not send any network calls until you invoke .ExecuteAsync().

Pattern: host.GENXxx().SetModel(...).ExecuteAsync()
Thin factories only; they return strongly-typed *Request objects.
No background work, no I/O, no async until .ExecuteAsync().

FluentApiRequestOptions: 원래 RESTOptions썼는데 존나 헷갈려서 따로 DTO로 분리

FluentApiRequest<TSelf, TResult>

FreePrice

FrequencyPenalty: Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Function: Structured representation of a function declaration as defined by the OpenAPI 3.03 specification. Included in this declaration are the function name and parameters. This FunctionDeclaration is a representation of a block of code that can be used as a Tool by the model and executed by the client.

FunctionCall: A tool call to run a function. See the function calling guide for more information.

FunctionOutput: The output of a function tool call.

FunctionParameters

FunctionPropertyAttribute: Attribute for marking properties as function parameters in JSON Schema for LLM function calls.

Note: It's a duplicate of JsonSchemaPropertyAttribute for clarity and intent.

FunctionSchemaAttribute: OpenAI styled JSON Schema attribute for annotating classes for LLM function calls.

Note: It's a duplicate of StrictJsonSchemaAttribute for clarity and intent.

FunctionToolChoice

GeneratedAudio

GeneratedBase<T>: You will never know if the AI generated result is a single or multiple values. So this class is used to represent both cases: a value or an array of values.

GeneratedExtensions

GeneratedImage: Represents the Url or the content of an image generated by Image Generation AI.

GeneratedOutput<T>

GeneratedText: Represents a generated text result from an AI model.

Generated<T>: New generic-type based design for File outputs

GenerativeRequestStreamCompleter<TEvent, TChunk, TOutput>

GenerativeRequest<TSelf, TInput, TOutput, TChunk, TEvent>: Abstract base class for all generative AI tasks. Provides common properties and methods for handling prompts, models, outputs, and execution.

GenerativeSequence

Orchestrates a series of generative tasks (text/image/audio) where each step can consume the previous output.

Build a pipeline with Append* methods, then run once with ExecuteAsync().
Keeps the most recent outputs (text/image/audio) in an internal buffer for the next step.
Type safety at runtime: each appended task must return the expected type (e.g., string for Text).
Stops on first exception; wrap ExecuteAsync() in your own try/catch if you need partial tolerance.

await new GENSequence()
    .AppendText(new GENResponseTask(new TextPrompt("Give me a short poem about the ocean.")))
    .AppendTextToImage(text => new GENImageTask(new TextPrompt($"Illustrate: {text}")))
    .AppendInterval(0.5f)
    .AppendImageToAudio(tex => new GENSpeechTask(new TextPrompt("Narrate the poem over ambient waves.")))
    .ExecuteAsync();

GetCreditsRequest: Get total credits purchased and used for the authenticated user

GetCustomModelRequest

GetModelRequest

GetVoiceRequest

GoogleModel

GoogleTypes: Types only used by Google API.
This shit is here instead of the Google assembly because this is used by GENTask and the UnityEditor Generator Windows.

GoogleTypes.UploadMetadata

GrammarCustomToolFormat: A grammar defined by the user.

HostedToolChoice

Only for Responses API. Indicates that the model should use a built-in tool to generate a response.
Learn more about built-in tools: https://platform.openai.com/docs/guides/tools

Allowed types (2025-09-21):

file_search
web_search_preview
computer_use_preview
code_interpreter
image_generation

HyperParameters: The hyperparameters used for the fine-tuning job.

IPromptExtensions

ImageCompressionLevel: output_compression integer or null Optional Defaults to 100 The compression level (0-100%) for the generated images. This parameter is only supported for gpt-image-1 with the webp or jpeg output formats, and defaults to 100.

ImageData

ImageDelta

ImageGenerationOutput: An image generation request made by the model.

ImageGenerationRequest: Task for generating image(s) from text using supported models (e.g., OpenAI DALL·E, Google Imagen).

ImageGenerationTool: A tool that generates images using a model like gpt-image-1.

ImageInpaintingRequest: Task for editing an existing image based on a text prompt and optional mask (OpenAI or Google Gemini).

ImageParameters

ImagePart

ImagePrice

ImagePrompt: A specialized prompt for various image-related requests, such as image inpainting, rotation, animation, etc.
This class is used to pass the instruction and the image to the respective image model for processing.

ImageQualitySwitchAttribute

ImageReference: A reference to an image, either by file ID or base64-encoded data.

ImageSizeSwitchAttribute

ImageUsage

InappropriatePromptException

IncompleteDetails: Details on why the response is incomplete. Will be null if the response is not incomplete.

IncompleteResponseException

InputAudioBufferEvent

InterruptedResponseException

InvalidPromptException

ItemReference

JsonSchemaFormat

KeyboardTypeAction

LanguageModelRequest<TSelf, TInput, TOutput>: Base class for text generation tasks using LLM models. Supports instructions, role-based prompts, and attachments.

ListCustomModelsRequest

ListCustomVoicesRequest

ListFilesRequest

ListModelsRequest

ListVoicesRequest

LocalPropertyAttribute

LocalShell: A tool that allows the model to execute shell commands in a local environment.

LocalShellCall: A call to run a command on the local shell.

LocalShellOutput: The output from a local shell tool call.

LocalShellParameters

Location

LogProb

LogitBias

Optional. Modify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling.

The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. Defaults to null.

Logprobs: Whether to return log probabilities of the Output tokens or not. If true, returns the log probabilities of each Output token returned in the content of message. This option is currently not available on the gpt-4-vision-preview model. Defaults to 0.

Mcp: Give the model access to additional tools via remote Model Context Protocol (MCP) servers.

McpApprovalRequest: A request for human approval of a tool invocation.
Model > User

McpApprovalRequestEvent

McpApprovalResponse: A response to an MCP approval request.
User > Model

McpException

McpHttpException

McpListToolsCallOutput: A list of tools available on an MCP server.
Model > User

McpOutput: An invocation of a tool on an MCP server. This is both a tool call (from model to user) and a tool call output (from user to model).
Model > User > And you should send the corresponding output back to the model

McpParameters

McpProtocalException

McpServerConnectorNotFoundException

McpServerNotFoundException

McpToolApprovalFilter: Specify which of the MCP server's tools require approval.

McpToolChoice

McpToolExecutionException

McpToolInfo: Information about a tool available on an MCP server.

McpToolPermissionInfo

McpToolRefList: List of allowed tool names or a filter object.

Message

MessageContent

Text:

ChatCompletion > ChatChoice[] > Message[] > MessageContent > StringOrPart > Text
MessageContentPart:
ChatCompletion > ChatChoice[] > Message[] > MessageContent > StringOrPart > MessageContentPart[]

MessageMapper

MicrosoftTypes

Model: ScriptableObject representation of a generative AI model with metadata, configuration, and pricing information. Supports token limits, ownership, creation time, and dynamic pricing for various content types (text, image, audio).

ModelCatalog: ScriptableObject database for storing model data. This database is used to keep track of the models available in the AI library.

ModelCatalog.Repo: Database for storing model data.

ModelFamily: Defines the family names of various AI models and services.

Warning!! DO NOT MAKE THIS INTO ENUM
Enum will make it hard to maintain because if you insert a new family inbetween existing families, it will break the order.

ModelNotFoundInLibraryException

ModelNotFoundOnApiException

ModelPopupAttribute

ModelPrice

ModelRefAttribute

ModelResponseException

ModelTypeExtensions

ModerationParameters

ModerationPrompt: Not directly used as a prompt, but other prompts can convert to this type for moderation requests.
This class is used to pass the text and optional images to the moderation model for processing.

ModerationRequest: Audio not supported yet.

ModerationResult

MouseActionBase

MoveAction

NCount: The number of responses to generate. Must be between 1 and 10.

NonStreamGenerativeRequest<TSelf, TInput, TOutput>

NoopGenerativeParameters

NotSupportedEndpointException: Exception thrown when a requested endpoint is not supported by the specified API.

OpenAIModel

OpenAITypes: Types only used by OpenAI API.
This shit is here instead of the OpenAI assembly because this is used by GENTask and the UnityEditor Generator Windows.

OpenPageAction

OpenRouterModel

PerplexityTypes

PresencePenalty: Defaults to 0 Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

PressKeyAction

ProjectContext

Prompt

PromptBase<T>

PromptFeedback: A set of the feedback metadata the prompt specified in GenerateContentRequest.Contents.

PromptTemplate: A reference to a predefined prompt template stored on the AI provider's servers.
This allows you to use complex prompt templates without having to include the full text of the prompt in your request.
Instead, you can simply reference the prompt by its unique identifier and provide any necessary variables for substitution.
This can help to keep your requests smaller and more manageable, especially when working with large or complex prompts.

Example Template: "Write a daily report for ${name} about today's sales. Include top 3 products."

Prompt<T>

ProviderBridgeAttribute

ProviderBridgeRegistry

RateLimitExceededException

RealtimeApiException

RealtimeSessionStatusEvent

Reasoning

ReasoningOptions

RecordMergeOptions

RedactedReasoning: Anthropic-specific class. Represents a block of content where the model's internal reasoning or "thinking" has been intentionally hidden (redacted) before being returned to the client.

RequestPrice

RequestUsage

ResponseFormat

ResponseMessage

SafetyFeedback

Safety feedback for an entire request.

This field is populated if content in the input and/or response is blocked due to safety settings. SafetyFeedback may not exist for every HarmCategory. Each SafetyFeedback will return the safety settings used by the request as well as the lowest HarmProbability that should be allowed in order to return a result.

SafetyIdentifier: A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information. https://platform.openai.com/docs/guides/safety-best-practices#safety-identifiers

SafetyRating: A safety rating associated with a {@link GenerateContentCandidate}

SafetySetting: Safety setting, affecting the safety-blocking behavior. Passing a safety setting for a category changes the allowed probability that content is blocked.

SafetySettingExtensions

ScreenshotAction

ScrollAction

SearchAction

Seed

Random seed for deterministic sampling (when supported):

Purpose – Reproduce the same output across runs with identical inputs.
Scope – Holds only if provider, model/deployment, version, and all params are unchanged.
null – Lets the service choose a random seed (non-deterministic).
Range – 0–4,294,967,295 (32-bit).
Support – Some models/services ignore seeds; if unsupported, this has no effect.

SegmentObject

ServerDictionary: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.

ShellCommand

ShellCommandCatalog

ShellCommandCatalog.Repo

ShellCommandEntry

SoundEffectGenerationRequest: Task for generating sound effects based on a text prompt.

SpeechGenerationOptions

SpeechGenerationRequest: Task for generating synthetic speech (text-to-speech) using the specified model.

SpeechGenerationRequestBase<TSelf, TPrompt>

SpeechParameters

SpeechSpeed: The speed of the model's spoken response as a multiple of the original speed. 1.0 is the default speed. 0.25 is the minimum speed. 1.5 is the maximum speed. This value can only be changed in between model turns, not while a response is in progress.

This parameter is a post-processing adjustment to the audio after it is generated, it's also possible to prompt the model to speak faster or slower.

SpeechTranslationRequest: Task for translating speech into English text using the speech translation model.

SpokenLanguagePopupAttribute

StatefulItem

StreamEventArgs

StreamEventArgs<TType>

StreamOptions

StrictJsonSchema: OpenAI styled JSON Schema for strict response formatting.

StrictJsonSchemaAttribute: OpenAI styled Strict JSON Schema attribute for annotating classes.

StrictJsonSchemaAttribute > StrictJsonSchema > JsonSchemaFormat(ResponseFormat)

StrictJsonSchemaExtensions

StringOrTextAsset

StringOrTextAssetExtensions

StructuredOutputRequest<T>: Task for generating structured output (e.g., JSON) using an LLM model.

StructuredOutput<T>

SystemMessage

Temperature

Sampling temperature: controls randomness in output.

Lower = deterministic
Higher = creative

Range: 0.0–2.0 (typical: 0.7–1.0).

TextCustomToolFormat: Unconstrained free-form text format.

TextDelta

TextDeltaThrottler: 들쑥날쑥 들어오는 텍스트 조각을 내부 버퍼에 모아 고정 주기(기본 30Hz)로 합쳐 내보내는 스로틀러.

TextEditorParameters

TextOutput

TextPart: Text, Refusal, InputText, OutputText content part.

TextResponseOptions

TextSpansEvent

TimeWindowExtensions

TokenCount: Used to set 'max_tokens', 'max_completions_tokens', 'max_output_tokens', etc. Must be greater than or equal to 1024. Set it null to disable the limit.

TokenId

TokenPrice

TokenUsage

TokenizeOutput

TokenizeRequest

Tool: Base class for all tools, includes type.

ToolCall

ToolCallArgs

ToolCallEvent

ToolChoice: This can be a String or an Object Specifies a tool the model should use. Use to force the model to call a specific tool.

ToolMessage: API don't send these anymore. It is only used to send the tool outputs from the 'client' side.

ToolOutput

ToolOutputEvent

ToolOutputTimeoutException

ToolReference

ToolStatusEvent

ToolTypeExtensions

TopK

TopP

Transcript

TranscriptionParameters

TranscriptionPrice

TranscriptionRequest: Task for converting speech audio into text (speech-to-text).

TranscriptionRequestBase<TSelf, TOutput, TEvent>

TranscriptionUsage

TruncationStrategy

UnhandledToolCallException

UnknownAction

UnknownItem

UploadedFile

UploadedFileExtensions

UrlCitation

UrlSearchSource

Usage

UsageMetadata: Usage metadata returned by AI service providers after a generation request. Contains token usage details for billing and monitoring.

UserMessage

VerboseTranscript: Represents a verbose json transcription response returned by model, based on the provided input. Used by OpenAI, GroqCloud, and other compatible services.

VideoGenerationRequest

VideoParameters

VisualMediaGenerationRequest<TTask, TPrompt, TOutput, TChunk, TEvent>

Voice

VoiceCatalog: ScriptableObject database for storing voice data used for TTS (Text-to-Speech) and other voice-related tasks.

VoiceCatalog.Repo: Database for storing voice data.

VoiceChangeParameters

VoiceChangeRequest

VoiceData

VoicePopupAttribute

VoiceStyleConverter

VoiceUtil

WaitAction

WebSearch: Search the Internet for sources related to the prompt.

WebSearchAction

WebSearchFilter: Filters for the search.

WebSearchOptions

WebSearchOptionsWrapper

WebSearchOutput: A tool call to perform a web search action.
This tool call does not have a corresponding output class, as the results are returned via text messages.

WebSearchParameters

WebSearchPreview: This tool searches the web for relevant results to use in a response.

WebSearchPrice

WebSearchSource

WebSearchUsage

WordObject

XSearchParameters

Structs

FluentApiRequestType

ImageQuality: The quality of the image that will be generated. HD creates images with finer details and greater consistency across the image. This param is only supported for DallE3.

ImageSize

ServiceTier: The service tier to use for the request. "auto" lets the system choose the appropriate tier based on context. Different providers may have different tier names and meanings. See provider documentation for details.

StreamHeader

TextSpan

ToolType

TruncationType

VideoSize

Interfaces

IAssetData

IAssetFilter<T>

IChatApiListenerBase

IComputerUseResult

ICreditData

IDeltaListener<TEvent>

IErrorHandler

IEventListener<T>

IFileSearchFilter

IFineTuningResult

IGeneratedFiles

IGeneratedOutput

IGenerativeAudioEvent

IGenerativeEvent<TChunk, TOutput>

IGenerativeImageEvent

IGenerativeParameters

IGenerativeRequest

IImageDeltaListener

IInputAudioBufferListener

ILanguageModelRequest

IListener

IMcp

IModelData: Interface for model data retrieved from various AI APIs. (e.g., /v1/models) This interface defines the properties that all model data should implement. It is used to standardize the model data across different AI providers.

IModeratable

IMultiProviderJsonWriter<TModel>

INoopStreamEvent<TOutput>

INoopStreamEvent<TChunk, TOutput>

IPrompt: 'Prompt' is a general term for the input given to an AI model to generate a response.
This can include text prompts, image prompts, audio prompts, or any combination thereof.
The prompt can be a simple string, a more complex object, or even a file.
The purpose of the prompt is to guide the AI model in generating a relevant and accurate response.

IPromptWithFiles: Prompts that require loading (e.g., from files) should implement this interface.
This ensures that any necessary loading operations are handled before the prompt is used.

IProviderBridge

IProviderData

IRealtimeApiListener

IResponsesApiListener

ISequentialRequest: Interface for tasks that can be executed as part of a sequence.

ITextDeltaListener

ITextSpanData

ITextSpansListener

IToolCallArgsListener

IToolCallOutput

IToolOutputListener

IToolParameters

IToolStatusListener

ITranscriptionEvent

IUploadedFile

Represents a file object retrieved from an AI provider (e.g., /v1/files).

Provides common properties such as file size, MIME type, and timestamps.
Normalizes file information across providers like OpenAI and Google.
Used as the return type for file-related tasks (e.g., UploadFileTask, DownloadFileTask).

IUsageHandler

IUserProfile: Attach this interface to your user class to enable AIDevKit features. This interface is used to provide user-specific context and settings.

IVoiceData: Interface for voice data retrieved from various AI APIs.
This interface defines the properties that all voice data should implement. It is used to standardize the voice data across different AI providers.

Enums

Api: Identifies the available AI service providers for API integrations. Didn't call it 'Provider' because some services are self-hosted/local (e.g. Ollama).

ArtStyle

ChatRole

CodeReferenceSource

ComparisonType

CompoundType

ContentFormat

CustomToolFormatType

ElevenLabsTypes.InputFormat: The format of input audio. Options are ‘pcm_s16le_16’ or ‘other’ For pcm_s16le_16, the input audio must be 16-bit PCM at a 16kHz sample rate, single channel (mono), and little-endian byte order. Latency will be lower than with passing an encoded waveform.

ElevenLabsTypes.OutputFormat: Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs. Default is mp3_44100_128.

EmbedTaskType: Google Only. Task type for embedding content.

GameGenre

GameTheme

Gender: Mainly used as a TTS Voice property.

GeneratedImageFormat

GoogleTypes.AspectRatio

GoogleTypes.PersonGeneration

GoogleTypes.Resolution

HarmBlockThreshold: Block at and beyond a specified harm probability.

HarmCategory: Represents the category of harm that a piece of content may fall into. This is used in moderation tasks to classify content based on its potential harm.

HarmProbability: Probability that a prompt or candidate matches a harm category.

ImageBackground

InputAudioBufferEvent.Type

ItemStatus

Used by Responses API.
Unified status for all response items (messages, tool calls, searches, code-interpreter runs, generations, etc.).

Core lifecycle values (in_progress, completed, incomplete, failed) map directly to the OpenAI Responses API and are populated when items are returned from the API.
Domain-phase values (searching, generating, interpreting, partial) are used by AIDevKit to represent more detailed sub-states for search, tool execution, code interpretation, and streaming generation.

For input messages (system, developer, user), this field is optional.
For assistant output items (assistant messages, tools, searches, code interpreter, etc.), this field is required when returned from the API or internal pipelines.
partial indicates a non-final, intermediate snapshot (e.g., while streaming).

LanguageTone

McpServerConnectionType

MediaGenOp

MicrosoftTypes.TranscriptionApi

Modalities

"Modality" refers to the type or form of data that a model is designed to process, either as input or output. In AI and machine learning contexts, modality describes the nature of the information being handled — such as text, image, audio, or video.

For example:

A text-to-text model like GPT-4 processes text inputs and generates text outputs.
A text-to-image model like DALL·E takes text prompts and produces images.
A multimodal model like Gemini can process multiple types of data simultaneously, such as combining text and image inputs.

The concept of modality helps categorize models based on the kinds of sensory or informational data they handle, and is especially important for understanding the capabilities and limitations of a model.

ModelCapabilities: Unified Model Capabilities Enum Combines capabilities across different model types for easier management.

ModelType: Types of AI Models. Multi-modal models such as Gemini should be classified under their primary function, typically as Language Models.

MouseButton

NamingRule: Defines rules for generating unique file names.

OSMask

OpenAITypes.AudioStreamFormat

OpenAITypes.Fidelity

OpenAITypes.ImageDetail

OpenAITypes.ImageStyle: The style of the generated images. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This param is only supported for DallE3.