GPT-4o

Technology

The latest multimodal AI model from OpenAI, featuring faster, cheaper performance and the ability to process text, audio, images, and video inputs. The 'o' stands for 'Omni'.

First Mentioned

10/12/2025, 6:12:42 AM

Last Updated

10/12/2025, 6:13:19 AM

Research Retrieved

10/12/2025, 6:13:19 AM

Summary

GPT-4o, an "omni" model developed by OpenAI, was released in May 2024 as a multilingual and multimodal generative pre-trained transformer capable of processing and generating text, images, and audio. Initially available for free on ChatGPT, it offered paid subscribers higher usage limits. Following the release of GPT-5 in August 2025, GPT-4o was temporarily removed from ChatGPT but later reintroduced for paid users due to customer feedback. Its audio generation capabilities were integrated into ChatGPT's Advanced Voice Mode, and a smaller version, GPT-4o mini, replaced GPT-3.5 Turbo on the ChatGPT interface in July 2024. GPT-4o's image generation feature was rolled out in March 2025, supplanting DALL-E 3 within ChatGPT. The model was highlighted as a faster and cheaper multimodal option during a discussion on the All-In Podcast, where its performance was benchmarked against models like Claude 3 Opus.

Referenced in 1 Document

Document e720ff55...

Research Data

Extracted Attributes

Type
Multilingual, multimodal generative pre-trained transformer
Developer
OpenAI
Full Name
GPT-4o
Capabilities
Processes and generates text, images, and audio
Key Features
Faster, cheaper, multimodal
Model Family
GPT-4 family
Meaning of 'o'
omni
Input Modalities
Text, audio, image, video (any combination)
Output Modalities
Text, audio, image (any combination)
Audio Response Time (average)
320 milliseconds
Audio Response Time (minimum)
232 milliseconds
Performance on Non-English Text
Significant improvement over previous models
Performance on English Text and Code
Matches GPT-4 Turbo

Timeline

Inception date of GPT-4o. (Source: Wikidata)
2024-05-13
GPT-4o was released by OpenAI, initially available for free on ChatGPT with higher usage limits for paid subscribers. Its launch was discussed on the All-In Podcast, where its performance was benchmarked against models like Claude 3 Opus. (Source: Wikipedia, Summary, Related Documents)
2024-05
OpenAI released GPT-4o mini, a smaller version of GPT-4o, which replaced GPT-3.5 Turbo on the ChatGPT interface. (Source: Wikipedia, Summary)
2024-07-18
GPT-4o's image generation feature was released, replacing DALL-E 3 in ChatGPT. (Source: Wikipedia, Summary)
2025-03
GPT-4o was removed from ChatGPT following the release of GPT-5. (Source: Wikipedia, Summary)
2025-08
GPT-4o was reintroduced for paid subscribers in ChatGPT after users complained about its removal. (Source: Wikipedia, Summary)
2025-08

Wikipedia

View on Wikipedia

GPT-4o

GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities were used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. GPT-4o's ability to generate images was released later, in March 2025, when it replaced DALL-E 3 in ChatGPT.

Web Search Results

GPT-4o 101: What It Is and How It Works - Grammarly
GPT-4o (the “o” stands for omni) is an advanced AI model developed by OpenAI, designed to power generative AI platforms such as ChatGPT. Unlike its predecessors, GPT-4o is the first version in the GPT series capable of processing text, audio, and images simultaneously. This multimodal capability enables the model to understand and generate responses across different formats much more quickly, making interactions more seamless and natural. [...] ## How does GPT-4o work? GPT-4o is a type of multimodal language model, which is an evolution of large language models (LLMs). LLMs are highly advanced machine learning models capable of identifying patterns in large amounts of text. Multimodal models can process text, images, and audio and return any of these as outputs. [...] The introduction of GPT-4o marks a significant evolution from earlier GPT models, which primarily focused on text processing. With its ability to handle multiple input types, GPT-4o supports a broader range of applications, from creating and analyzing images to transcribing and translating audio. This versatility allows for more dynamic and engaging user experiences, whether in creative, educational, or practical contexts. GPT-4o opens up new possibilities for innovative AI-driven solutions by
GPT-4o - Wikipedia
GPT-4o ("o" for "omni") is a multilingual, multimodalgenerative pre-trained transformer developed by OpenAI and released in May 2024.( It can process and generate text, images and audio.( Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits.( GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal.(
What Is GPT-4o? | IBM
Artificial Intelligence IT automation # What is GPT-4o? ## Authors Ivan Belcic Staff writer Cole Stryker Staff Editor, AI Models IBM Think ## What is GPT-4o? GPT-4o is a multimodal and multilingual generative pretrained transformer model released in May 2024 by artificial intelligence (AI) developer OpenAI. It is the flagship large language model (LLM) in the GPT-4 family of AI models, which also includes GPT-4o mini, GPT-4 Turbo and the original GPT-4. [...] The “o” in GPT-4o stands for omni and highlights that GPT-4o is a multimodal AI model with sound and vision capabilities. This means it can accept prompt datasets as a mixture of text, audio, image and video input. GPT-4o is also capable of image generation. GPT-4o brings multimedia input and output capabilities to the same transformer-powered GPT-4 intelligence fueling the other models in its line. [...] ## How is GPT-4o different from GPT-4 Turbo? GPT-4o is an “all-in-one” flagship model capable of processing multimodal inputs and outputs on its own as a single neural network. With previous models such as GPT-4 Turbo and GPT-3.5, users would need OpenAI APIs and other supporting models to input and generate varied content types. While GPT-4 Turbo can process image prompts, it is not capable of processing audio without API assistance.
What is GPT-4o? A summary of OpenAI's new multi-modal model
Conclusion In conclusion, GPT-4o is a truly multi-modal extension of GPT-4 Turbo, accepting text, audio, and images as inputs and generating them as outputs, in any combination. GPT-4o outperforms other SOTA multi-modal models on audio and vision tasks but doesn’t demonstrate any marked improvement on text and natural language tasks. Furthermore, OpenAI is rolling out these multi-modal capabilities iteratively, starting with text and vision capabilities. [...] The “o” in GPT-4o stands for “omni,” a nod to the model’s new multi-modal capabilities. In the context of AI models, multi-modal refers to the ability of a model to process and generate content in multiple forms of data or "modes." In the case of GPT-4o, these modes are text, audio, images and videos. Any combination of modes can be used as inputs and requested as outputs for GPT-4o. Here are a few examples: [...] GPT-4o uses a single neural network to process inputs and generate outputs, representing a departure from previous multi-modal products (like Voice Mode) offered by OpenAI. The single-model design of GPT-4o likely contributes to its increased efficiency, resulting in doubled speed and a 50% reduction in API usage costs. Here are a few examples of some practical tasks GPT-4o can be used for:
Hello GPT-4o - OpenAI
GPT‑4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time⁠(opens in a new window) in a conversation. It matches GPT‑4 Turbo performance on text in English and code, with significant improvement [...] GPT‑4o is our latest step in pushing the boundaries of deep learning, this time in the direction of practical usability. We spent a lot of effort over the last two years working on efficiency improvements at every layer of the stack. As a first fruit of this research, we’re able to make a GPT‑4 level model available much more broadly. GPT‑4o’s capabilities will be rolled out iteratively (with extended red team access starting today).

Wikidata

View on Wikidata

Instance Of
Q112702082
Inception Date
5/13/2024

Location Data

祐美大廈(第四座) Edifício Yao Mei (4º Fase), 8-10, 友聯街 Rua da União, 新橋 Barca, 花王堂區 Santo António, 澳門 Macau, 519000, 中国

yes

Coordinates: 22.2035793, 113.5467038

Open Map

GPT-4o

First Mentioned

Last Updated

Research Retrieved

Summary

Referenced in 1 Document

Research Data

Extracted Attributes

Type

Developer

Full Name

Capabilities

Key Features

Model Family

Meaning of 'o'

Input Modalities

Output Modalities

Audio Response Time (average)

Audio Response Time (minimum)

Performance on Non-English Text

Performance on English Text and Code

Timeline

Wikipedia

GPT-4o

Web Search Results

Wikidata

Instance Of

Inception Date

Location Data