Public Preview

GPT 4o documentation

Overview

GPT-4o is OpenAI’s multimodal model (text, vision, audio) announced May 2024, focused on interactive latency and structured outputs.

Inputs and Outputs

  • Inputs: text, images, audio (provider-reported streaming)
  • Outputs: text, JSON, audio

Inference & Decoding

Supports common decoding controls and structured outputs via JSON Schema or function calling. Speculative/assisted generation is supported provider-side.

Safety & Compliance

Use moderation endpoints and policy guidance. Watermarking support is partial/experimental where applicable.

Evals & Quality

OpenAI cites competitive results across reasoning/coding; public third-party evals vary by prompt.

Deployment & Ops

Available via OpenAI and Azure OpenAI; runtime features include streaming and structured outputs.

Pricing & Licensing

Commercial API; per-token pricing; see platform pricing pages.

References

See YAML references.