GPT-4o Omni

Technology

OpenAI's flagship multimodal AI model. The controversy with Scarlett Johansson erupted following the launch and demonstration of this model, which featured the 'Sky' voice she claims mimics her own.


First Mentioned

10/12/2025, 6:00:17 AM

Last Updated

10/12/2025, 6:00:53 AM

Research Retrieved

10/12/2025, 6:00:53 AM

Summary

GPT-4o Omni, developed by OpenAI and released in May 2024, is a groundbreaking multilingual, multimodal generative pre-trained transformer capable of processing and generating text, images, and audio. Initially offered for free on ChatGPT with higher limits for paid subscribers, it was temporarily removed in August 2025 following the release of GPT-5, but was later reintroduced for paid users due to significant public demand. Its advanced audio capabilities were integrated into ChatGPT's Advanced Voice Mode, and a smaller iteration, GPT-4o mini, was launched in July 2024, taking the place of GPT-3.5 Turbo on ChatGPT. The model's image generation feature, which supplanted DALL-E 3 within ChatGPT, was rolled out in March 2025. The development and use of GPT-4o have also been associated with notable controversies, including a high-profile legal dispute with actress Scarlett Johansson concerning likeness rights and voice usage, and internal turmoil at OpenAI, marked by the mass resignation of its Super Alignment team.

Referenced in 1 Document
Research Data
Extracted Attributes
  • Name

    GPT-4o Omni

  • Type

    Multilingual, Multimodal Generative Pre-trained Transformer

  • Modality

    Multilingual, Multimodal

  • Developer

    OpenAI

  • Capabilities

    Processes and generates text, images, audio

  • Release Date

    2024-05

  • Meaning of 'o'

    omni

  • Audio Processing

    Natively supports voice-to-voice

  • Language Support

    More than 50 different languages

  • Initial Availability

    Free on ChatGPT with higher limits for paid subscribers

  • Performance (MMLU benchmark)

    88.7 (compared to 86.5 for GPT-4)

  • Audio Response Time (average)

    320 milliseconds

  • Audio Response Time (minimum)

    232 milliseconds

  • OpenAI Internal Risk Classification

    Medium-risk model (evaluated on cybersecurity, CBRN, persuasion, model autonomy)

Timeline
  • GPT-4o Omni was announced during OpenAI's Spring Updates event and released, initially available for free on ChatGPT with higher limits for paid subscribers. (Source: summary, Wikipedia, web_search_results)

    2024-05-13

  • OpenAI released GPT-4o mini, a smaller version of GPT-4o, which replaced GPT-3.5 Turbo on the ChatGPT interface. (Source: summary, Wikipedia, web_search_results)

    2024-07

  • GPT-4o's image generation feature was rolled out, replacing DALL-E 3 in ChatGPT. (Source: summary, Wikipedia, web_search_results)

    2025-03

  • GPT-4o was temporarily removed from ChatGPT following the release of GPT-5. (Source: summary, Wikipedia)

    2025-08

  • GPT-4o was reintroduced for paid subscribers after users complained about its sudden removal. (Source: summary, Wikipedia)

    2025-08

  • A legal dispute involving Scarlett Johansson and OpenAI arose over likeness rights and the voice used in the GPT-4o Omni model. (Source: summary, related_documents)

    Ongoing

  • Internal turmoil at OpenAI, including the mass resignation of its Super Alignment team, occurred, casting doubt on OpenAI's commitment to AI Safety. (Source: summary, related_documents)

    Ongoing

  • GPT-4o's audio-generation capabilities were integrated into ChatGPT's Advanced Voice Mode. (Source: summary, Wikipedia)

    Ongoing

GPT-4o

GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities were used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. GPT-4o's ability to generate images was released later, in March 2025, when it replaced DALL-E 3 in ChatGPT.

Web Search Results
  • GPT-4o

    GPT-4o ("o" for "omni") is a multilingual, multimodalgenerative pre-trained transformer developed by OpenAI and released in May 2024.( It can process and generate text, images and audio.( Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits.( GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal.( [...] 2. ^ _a__b_Wiggers, Kyle (May 13, 2024). "OpenAI debuts GPT-4o 'omni' model now powering ChatGPT". _TechCrunch_. Retrieved May 13, 2024. 3. ^Robison, Kylie (March 25, 2025). "OpenAI rolls out image generation powered by GPT-4o to ChatGPT". _The Verge_. Retrieved March 31, 2025. 4. ^Colburn, Thomas. "OpenAI unveils GPT-4o, a fresh multimodal AI flagship model". _The Register_. Retrieved May 18, 2024. [...] When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation.( GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 for GPT-4.( Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice.( The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team

  • Hello GPT-4o

    GPT‑4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time⁠(opens in a new window) in a conversation. It matches GPT‑4 Turbo performance on text in English and code, with significant improvement

  • GPT-4o explained: Everything you need to know - TechTarget

    OpenAI announced GPT-4 Omni (GPT-4o) as the company's new flagship multimodal language model on May 13, 2024, during the company's Spring Updates event. As part of the event, OpenAI released multiple videos demonstrating the intuitive voice response and output capabilities of the model. In July 2024, OpenAI launched GPT-4o mini, its most advanced small model. What is GPT-4o? [...] GPT-4o is the flagship model of the OpenAI LLM technology portfolio. The _o_ stands for "omni" and isn't just some kind of marketing hyperbole, but rather a reference to the model's multiple modalities for text, vision and audio. [...] Multimodal reasoning and generation. GPT-4o integrates text, voice and vision into a single model, allowing it to process and respond to a combination of data types. The model can understand audio, images and text at the same speed. It can also generate responses via audio, images and text. Language and audio processing. GPT-4o has advanced capabilities in handling more than 50 different languages.

  • What Is GPT-4o? | IBM

    The “o” in GPT-4o stands for omni and highlights that GPT-4o is a multimodal AI model with sound and vision capabilities. This means it can accept prompt datasets as a mixture of text, audio, image and video input. GPT-4o is also capable of image generation. GPT-4o brings multimedia input and output capabilities to the same transformer-powered GPT-4 intelligence fueling the other models in its line. [...] Artificial Intelligence IT automation # What is GPT-4o? ## Authors Ivan Belcic Staff writer Cole Stryker Staff Editor, AI Models IBM Think ## What is GPT-4o? GPT-4o is a multimodal and multilingual generative pretrained transformer model released in May 2024 by artificial intelligence (AI) developer OpenAI. It is the flagship large language model (LLM) in the GPT-4 family of AI models, which also includes GPT-4o mini, GPT-4 Turbo and the original GPT-4. [...] OpenAI classified GPT-4o as a medium-risk model on their internal scale. Models are evaluated on four threat metrics—cybersecurity, CBRN (chemical, biological, radiological and nuclear threats), persuasion and model autonomy. OpenAI assesses models according to the degree to which they can be used to advance developments in each threat field.

  • [PDF] GPT-4o: The Cutting-Edge Advancement in Multimodal LLM

    new version of ChatGPT, called GPT-4o. The recent version of ChatGPT is based on GPT-4o ar-chitecture which is gaining success on the previous chatbots. It was released on May 13, 2024, called GPT-4omni (or GPT-4o) the latest multimodal LLM from OpenAI . The term ”omni,” derived from the Latin word ”omnis,” meaning ”all” or ”every,” highlights the model’s omni-modal capa-bilities . GPT-4o can process and understand multimodal inputs, including text, images, audio, and video, making it a [...] know about GPT-4o , what-is-gpt-4o, [Accessed 26-05-2024], May 2024. R. Montti, OpenAI Announces GPT-4o Omni , https : / / www. searchenginejournal . com / openai - announces -chatgpt-4o-omni/516189/, [Accessed 25-05-2024], May 2024. M. Zeff and Gizmodo, OpenAI’s new ChatGPT sounds more human than ever , https : / / qz . com / openai - new -chatgpt - gpt4 - omni - voice - human - ai - 1851475246, [Accessed 26-05-2024], May 2024. K. Wiggers, OpenAI debuts GPT-4o ‘omni’ model now powering [...] a per-spective from organic materials,” AI , vol. 4, no. 2, pp. 401–405, 2023. S. M. Kerner, GPT-4o explained: Everything you need to know , GPT- 4o - explained - Everything - you - need - to - know, [Accessed 26-05-2024], May 2024. GeeksforGeeks, OpenAI Launches GPT-4o Omni , https: //www.geeksforgeeks.org/openai- announces- gpt- 4o-omni, [Accessed 25-05-2024], May 2024. OpenAI, Hello GPT-4o , [Accessed 10-06-2024], May 2024. K. Doyle, ’The ”o” is for omni’ and other things you should