With countless calls, OpenAI has finally been launched.
At 1am Beijing time on May 14th, OpenAI livestreamed an update to its products. At a half-hour online press conference, OpenAI's Chief Technology Officer Mira Murati announced a series of upgrades to the GPT-4. The main highlights of the press conference are as follows:
has launched a new model GPT-4o, where "o" represents "omni" (comprehensive, versatile). Meanwhile, GPT-4o is open to all users for free.
The new model has strong multimodal interaction capabilities. In the press conference demonstration, GPT-4o has the ability to communicate smoothly with humans through text, images, videos, and voice, and can understand screen information.
The ChatGPT desktop application has been released and is currently compatible with macOS. The Windows version will be released later this year.
AI assistant is taking shape
Before the press conference, the reporter noticed that OpenAI's official website had changed the description of GPT-4 from "state-of-the-art model" to "advanced model", preparing for the release of GPT-4o in advance.
As the most advanced model in OpenAI, the unique feature of GPT-4o is that it can accept any combination of text, audio, and image as input and generate content from the aforementioned modalities. This means that GPT-4o has the basic prototype of an AI assistant and has taken another step forward on the road to universal artificial intelligence.
At the press conference, Murati demonstrated the real-time voice dialogue function with Mark Chen, the head of cutting-edge research at OpenAI, and Barret Zoph, the head of the post training team. From the demonstration effect, the interaction between GPT-4o and humans has become more timely and natural. It is reported that GPT-4o can respond to audio input within 232 milliseconds, which is close to the reaction time of a human conversation. Prior to this, using voice mode to communicate with ChatGPT resulted in an average latency of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). GPT-4o not only responds to conversations in real-time without awkward and lengthy delays, but also generates voice with various emotional styles.
For example, when asked "How have you been lately?", GPT-4o not only says "I'm doing well," but also asks "How are you?" When asked to tell a bedtime story about robots and love, GPT-4o is interrupted and asked to tell the story in a more emotional and dramatic way. Subsequently, GPT-4o's voice tone during storytelling becomes more dramatic, full of emotions, and can even end the conversation in the form of singing.
From then on, it became so easy for parents to coax their children to sleep.
Not only that, GPT-4o also has visual and voice interaction functions, which can view graphical equations. Zoph opened a video call on his phone and said to GPT-4o, "I need to write a linear equation on a piece of paper. Don't tell me the answer, just give me the process of solving it." Then, Zoph wrote the equation 3x+1=4 and asked how to solve the problem. GPT-4o continuously provides suggestions for the next step when Zoph asks for help and questions through gentle prompts, resulting in a correct result of x=1.
From then on, it became easier for parents to guide their children in doing homework.
In addition, GPT-4o can read screen information in real-time, help answer code questions and analyze charts; Capable of real-time translation across languages, able to translate into the corresponding language without delay when the speaker converses in Italian and English, and able to mimic the speaker's tone; Being able to recognize and analyze human emotions, when the speaker presents a selfie and asks them to judge their emotions, GPT-4o analyzes, "You look very happy, maybe even a little excited, you should be in a good mood."
Although the CEO of OpenAI, Sam Ultraman, did not appear at the press conference, he was broadcasting real-time updates of OpenAI on his personal social media platform. After the press conference, he posted an update that only included the word "her". According to previous reports from foreign media, Ultraman has stated that his favorite artificial intelligence movie is "Her", with the ultimate goal of developing a virtual AI assistant similar to that in movies, striving to make existing voice assistants such as Apple Siri more practical and intelligent.
"Cut Hu" Google, show goodwill to Apple
As early as a week ago, there was a lot of news about OpenAI releasing new products. There are reports that OpenAI will release GPT-5, and there are also reports that OpenAI is about to release an AI search engine based on ChatGPT, launching an attack on Google. On May 11th, Ultraman denied the above rumors on his personal social media platform and said, "It's not GPT-5 or a search engine, but we've been working hard to develop new things that we think people will like! It feels like magic to me!"
It is worth noting that Google will hold an I/O developer conference on May 14th to announce updates in areas such as Android and Google Search. OpenAI chose to hold a press conference the day before the I/O developer conference, undoubtedly not wanting to be taken away by Google. This is not the first time. On February 16th of this year, OpenAI released the Sora Bunsen video model without any prior preparation, which attracted global attention. At that time, Google had just upgraded the Gemini Pro model, but it paled in the face of Sora's popularity.
Now that OpenAI has declared war again, the pressure is directly on Google, which is about to confront it head-on. According to a research report by Huafu Securities, ChatGPT still ranks first in total traffic among mainstream AI models overseas. In other major models, Claude, Perception, and Character.ai saw a certain increase in traffic in April, but Google's Gemini traffic declined in April, with a month on month decrease of 1.4%. It can be seen that Google is facing increasingly strong competition from OpenAI on the road to competing for big models.
On the contrary, the behind the scenes winner hidden in this new product launch event is undoubtedly Apple. The reporter noticed that the entire press conference was demonstrated using iPhone and MacBook Pro, and a Mac desktop version of ChatGPT was also released, which seems to imply that OpenAI will cooperate with Apple to integrate the ability of large models into Apple devices.
In fact, this collaboration has already shown signs in some of OpenAI's previous actions and media reports. According to Bloomberg on May 10th, Apple is in talks with OpenAI to finalize an agreement to introduce OpenAI's large model technology into the iPhone this year. Through this transaction, Apple will be able to provide "chatbots" supported by ChatGPT as part of the artificial intelligence feature in iOS 18. However, the report also pointed out that Apple has negotiated with Google to authorize the Gemini chatbot, but no agreement has been reached yet.
Recently, Ultraman participated in the podcast "All in Podcast", where he talked about many hot topics and directions in artificial intelligence. He stated that OpenAI will continue to improve the quality of voice functionality, and "believes that voice interaction may be an important clue to future interaction methods.". When asked by the host if he had collaborated with Jony Ive (the "father of the iPhone" and former Apple chief designer), Ultraman also said, "Yes, we are exchanging some ideas.".
In February of this year, Apple CEO Tim Cook revealed to the public that the company is developing generative AI software features and will introduce new Siri features supported by large language models in iOS 18, but did not mention whether there is any cooperation with OpenAI. It is reported that Apple will hold the WWDC Global Developers Conference in June to showcase cutting-edge innovations in iOS, iPadOS, macOS, watchOS, tvOS, and visionOS.
Analysts believe that if Apple can cooperate with OpenAI, it can not only shorten the product development cycle, but also quickly improve the intelligence level of its own products. Apple, which has fallen far behind in the era of generative AI, may have a beautiful "turnaround battle" by integrating globally leading big models into its hardware, and the answer may also be revealed in June.