OpenAI on Monday announced GPT-4o, a new AI model that the company claims is a significant step towards “much more natural human-computer interaction.” The model accepts a combination of text, audio, and images as input and can generate output in all three formats. It also recognizes emotions, allows interruptions mid-speech, and responds almost as quickly as a human during conversations.
Read: Largest air carbon capture plant in the world goes live
“The special thing about GPT-4o is that it brings GPT-4 level intelligence to everyone, including our free users,” said OpenAI CTO Mira Murati during a live-streamed presentation. “This is the first time we’re making a huge step forward in ease of use.”
During the presentation, OpenAI demonstrated GPT-4o’s capabilities, including live translation between English and Italian, real-time assistance with solving a linear equation on paper, and providing deep breathing guidance to another OpenAI executive by simply listening to his breaths.
The “o” in GPT-4o stands for “omni,” reflecting the model’s multimodal capabilities. OpenAI explained that GPT-4o was trained across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. This differs from the company’s previous models, GPT-3.5 and GPT-4, which allowed users to ask questions by speaking but then transcribed the speech into text, stripping out tone and emotion and slowing interactions.
OpenAI plans to make the new model available to everyone, including free ChatGPT users, over the next few weeks. Additionally, they are releasing a desktop version of ChatGPT, initially for Mac, which paid users can access starting today.
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
— OpenAI (@OpenAI) May 13, 2024
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
OpenAI’s announcement comes just a day before Google I/O, the company’s annual developer conference. Shortly after OpenAI revealed GPT-4o, Google teased a version of Gemini, its own AI chatbot, with similar capabilities.
By introducing GPT-4o, OpenAI aims to enhance the accessibility and usability of advanced AI, making sophisticated interactions more seamless and natural for all users. The new model’s ability to handle multimodal inputs and outputs, along with its improved real-time response and emotional recognition, marks a significant evolution in AI technology. As OpenAI continues to push the boundaries of what’s possible with artificial intelligence, the competition in the AI space, highlighted by Google’s upcoming advancements, is set to intensify.