GPT-4o Omni

Technology

OpenAI's flagship multimodal AI model. The controversy with Scarlett Johansson erupted following the launch and demonstration of this model, which featured the 'Sky' voice she claims mimics her own.

First Mentioned

10/12/2025, 6:00:17 AM

Last Updated

10/12/2025, 6:00:53 AM

Research Retrieved

10/12/2025, 6:00:53 AM

Summary

GPT-4o Omni, developed by OpenAI and released in May 2024, is a groundbreaking multilingual, multimodal generative pre-trained transformer capable of processing and generating text, images, and audio. Initially offered for free on ChatGPT with higher limits for paid subscribers, it was temporarily removed in August 2025 following the release of GPT-5, but was later reintroduced for paid users due to significant public demand. Its advanced audio capabilities were integrated into ChatGPT's Advanced Voice Mode, and a smaller iteration, GPT-4o mini, was launched in July 2024, taking the place of GPT-3.5 Turbo on ChatGPT. The model's image generation feature, which supplanted DALL-E 3 within ChatGPT, was rolled out in March 2025. The development and use of GPT-4o have also been associated with notable controversies, including a high-profile legal dispute with actress Scarlett Johansson concerning likeness rights and voice usage, and internal turmoil at OpenAI, marked by the mass resignation of its Super Alignment team.

Referenced in 1 Document

Document 07c2ff99...

Research Data

Extracted Attributes

Name
GPT-4o Omni
Type
Multilingual, Multimodal Generative Pre-trained Transformer
Modality
Multilingual, Multimodal
Developer
OpenAI
Capabilities
Processes and generates text, images, audio
Release Date
2024-05
Meaning of 'o'
omni
Audio Processing
Natively supports voice-to-voice
Language Support
More than 50 different languages
Initial Availability
Free on ChatGPT with higher limits for paid subscribers
Performance (MMLU benchmark)
88.7 (compared to 86.5 for GPT-4)
Audio Response Time (average)
320 milliseconds
Audio Response Time (minimum)
232 milliseconds
OpenAI Internal Risk Classification
Medium-risk model (evaluated on cybersecurity, CBRN, persuasion, model autonomy)

Timeline

GPT-4o Omni was announced during OpenAI's Spring Updates event and released, initially available for free on ChatGPT with higher limits for paid subscribers. (Source: summary, Wikipedia, web_search_results)
2024-05-13
OpenAI released GPT-4o mini, a smaller version of GPT-4o, which replaced GPT-3.5 Turbo on the ChatGPT interface. (Source: summary, Wikipedia, web_search_results)
2024-07
GPT-4o's image generation feature was rolled out, replacing DALL-E 3 in ChatGPT. (Source: summary, Wikipedia, web_search_results)
2025-03
GPT-4o was temporarily removed from ChatGPT following the release of GPT-5. (Source: summary, Wikipedia)
2025-08
GPT-4o was reintroduced for paid subscribers after users complained about its sudden removal. (Source: summary, Wikipedia)
2025-08
A legal dispute involving Scarlett Johansson and OpenAI arose over likeness rights and the voice used in the GPT-4o Omni model. (Source: summary, related_documents)
Ongoing
Internal turmoil at OpenAI, including the mass resignation of its Super Alignment team, occurred, casting doubt on OpenAI's commitment to AI Safety. (Source: summary, related_documents)
Ongoing
GPT-4o's audio-generation capabilities were integrated into ChatGPT's Advanced Voice Mode. (Source: summary, Wikipedia)
Ongoing

Wikipedia

View on Wikipedia

GPT-4o

GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities were used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. GPT-4o's ability to generate images was released later, in March 2025, when it replaced DALL-E 3 in ChatGPT.

Web Search Results

GPT-4o
GPT-4o ("o" for "omni") is a multilingual, multimodalgenerative pre-trained transformer developed by OpenAI and released in May 2024.( It can process and generate text, images and audio.( Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits.( GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal.( [...] 2. ^ _a__b_Wiggers, Kyle (May 13, 2024). "OpenAI debuts GPT-4o 'omni' model now powering ChatGPT". _TechCrunch_. Retrieved May 13, 2024. 3. ^Robison, Kylie (March 25, 2025). "OpenAI rolls out image generation powered by GPT-4o to ChatGPT". _The Verge_. Retrieved March 31, 2025. 4. ^Colburn, Thomas. "OpenAI unveils GPT-4o, a fresh multimodal AI flagship model". _The Register_. Retrieved May 18, 2024. [...] When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation.( GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 for GPT-4.( Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice.( The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team
Hello GPT-4o
GPT‑4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time⁠(opens in a new window) in a conversation. It matches GPT‑4 Turbo performance on text in English and code, with significant improvement
GPT-4o explained: Everything you need to know - TechTarget
OpenAI announced GPT-4 Omni (GPT-4o) as the company's new flagship multimodal language model on May 13, 2024, during the company's Spring Updates event. As part of the event, OpenAI released multiple videos demonstrating the intuitive voice response and output capabilities of the model. In July 2024, OpenAI launched GPT-4o mini, its most advanced small model. What is GPT-4o? [...] GPT-4o is the flagship model of the OpenAI LLM technology portfolio. The _o_ stands for "omni" and isn't just some kind of marketing hyperbole, but rather a reference to the model's multiple modalities for text, vision and audio. [...] Multimodal reasoning and generation. GPT-4o integrates text, voice and vision into a single model, allowing it to process and respond to a combination of data types. The model can understand audio, images and text at the same speed. It can also generate responses via audio, images and text. Language and audio processing. GPT-4o has advanced capabilities in handling more than 50 different languages.
What Is GPT-4o? | IBM
The “o” in GPT-4o stands for omni and highlights that GPT-4o is a multimodal AI model with sound and vision capabilities. This means it can accept prompt datasets as a mixture of text, audio, image and video input. GPT-4o is also capable of image generation. GPT-4o brings multimedia input and output capabilities to the same transformer-powered GPT-4 intelligence fueling the other models in its line. [...] Artificial Intelligence IT automation # What is GPT-4o? ## Authors Ivan Belcic Staff writer Cole Stryker Staff Editor, AI Models IBM Think ## What is GPT-4o? GPT-4o is a multimodal and multilingual generative pretrained transformer model released in May 2024 by artificial intelligence (AI) developer OpenAI. It is the flagship large language model (LLM) in the GPT-4 family of AI models, which also includes GPT-4o mini, GPT-4 Turbo and the original GPT-4. [...] OpenAI classified GPT-4o as a medium-risk model on their internal scale. Models are evaluated on four threat metrics—cybersecurity, CBRN (chemical, biological, radiological and nuclear threats), persuasion and model autonomy. OpenAI assesses models according to the degree to which they can be used to advance developments in each threat field.
[PDF] GPT-4o: The Cutting-Edge Advancement in Multimodal LLM
new version of ChatGPT, called GPT-4o. The recent version of ChatGPT is based on GPT-4o ar-chitecture which is gaining success on the previous chatbots. It was released on May 13, 2024, called GPT-4omni (or GPT-4o) the latest multimodal LLM from OpenAI . The term ”omni,” derived from the Latin word ”omnis,” meaning ”all” or ”every,” highlights the model’s omni-modal capa-bilities . GPT-4o can process and understand multimodal inputs, including text, images, audio, and video, making it a [...] know about GPT-4o , what-is-gpt-4o, [Accessed 26-05-2024], May 2024. R. Montti, OpenAI Announces GPT-4o Omni , https : / / www. searchenginejournal . com / openai - announces -chatgpt-4o-omni/516189/, [Accessed 25-05-2024], May 2024. M. Zeff and Gizmodo, OpenAI’s new ChatGPT sounds more human than ever , https : / / qz . com / openai - new -chatgpt - gpt4 - omni - voice - human - ai - 1851475246, [Accessed 26-05-2024], May 2024. K. Wiggers, OpenAI debuts GPT-4o ‘omni’ model now powering [...] a per-spective from organic materials,” AI , vol. 4, no. 2, pp. 401–405, 2023. S. M. Kerner, GPT-4o explained: Everything you need to know , GPT- 4o - explained - Everything - you - need - to - know, [Accessed 26-05-2024], May 2024. GeeksforGeeks, OpenAI Launches GPT-4o Omni , https: //www.geeksforgeeks.org/openai- announces- gpt- 4o-omni, [Accessed 25-05-2024], May 2024. OpenAI, Hello GPT-4o , [Accessed 10-06-2024], May 2024. K. Doyle, ’The ”o” is for omni’ and other things you should

GPT-4o Omni

First Mentioned

Last Updated

Research Retrieved

Summary

Referenced in 1 Document

Research Data

Extracted Attributes

Name

Type

Modality

Developer

Capabilities

Release Date

Meaning of 'o'

Audio Processing

Language Support

Initial Availability

Performance (MMLU benchmark)

Audio Response Time (average)

Audio Response Time (minimum)

OpenAI Internal Risk Classification

Timeline

Wikipedia

GPT-4o

Web Search Results