Google King Returns? How strong is the latest big model and can it challenge GPT-4
我放心你带套猛
发表于 2023-12-7 18:06:42
265
0
0
Technology giant Google has launched a long established new model that can run on mobile phones and significantly reduce computing costs.
On December 6th local time, Google announced the launch of the "largest, strongest, and most versatile" new large-scale language model Gemini. Gemini will be the first large-scale model to run directly on a mobile phone, applied to Google Pixel 8 Pro smartphones and chatbot Bard. Google plans to license Gemini to customers through Google Cloud and will integrate it with other products in Google services in the coming months.
Google has invented many computer science concepts that make generative AI applications possible, but was once in a passive position due to OpenAI's chatbot ChatGPT released last year. Faced with the threat posed by the collaboration between OpenAI and Microsoft, one of Google's biggest competitors, Google launched its own chatbot Bard in September this year. Not long after, OpenAI released a more powerful AI software GPT-4, which became a major benchmark in the field of AI. Now, in response to GPT-4, Google has launched Gemini.
"Google has found its rightful place in the AI competition"
Demis Hassabis, CEO of Google DeepMind and representative of the Gemini team, stated at a press conference that Google has run 32 comprehensive multimodal benchmarks to compare the GPT-4 of Gemini and OpenAI, and Gemini is "significantly ahead of 30 out of 32 benchmarks.".
According to Google, Gemini performs excellently in various tasks during the later stages of training. For example, MMLU (Massive Multi Task Language Understanding) is one of the most popular methods for testing AI model knowledge and problem-solving abilities, and Gemini achieved a score of 90.0% in MMLU for the first time, being the first model to surpass human experts in MMLU testing.
Gemini's score rate on MMLU surpassed that of human experts for the first time. Source: Official Video
Gemini includes a set of three different scale models: Gemini Ultra is the largest and most powerful category, positioned as a competitor to GPT-4; Gemini Pro is a mid-range model that performs better than GPT-3.5 and can scale multiple tasks; Gemini Nano is used for specific tasks and mobile devices.
Among them, the Gemini Nano will be installed on the latest Pixel 8 Pro smartphone in the Google Pixel series, supporting new features such as "summary" in recording applications, and launching the "smart reply" function in the Google Keyboard Input Method Gboard. According to foreign media reports, Google has stated that the Gemini Nano will run "locally" on the device, and the model is specially optimized for mobile devices, so Android developers can easily build AI applications and features that support offline work or use personal information retained on the device.
Analysis suggests that this progress can help solve a major economic problem in the field of technology. Utilizing the computing power of mobile phones to run generative AI, rather than relying on cloud servers operated by large technology companies, will greatly reduce the cost of operating such systems. For those who wish to limit their personal data to devices, this also provides a layer of security. Previously, Samsung Electronics publicly showcased its first generative AI model, Gauss, in November, but it is limited to internal employees and is expected to be installed on the Galaxy S24 series phones in the first half of next year.
"I believe that the AI transformation we are witnessing will be the most profound in our lives, much larger than the previous transformation in mobile technology or the internet. This new era model represents one of the largest scientific and engineering efforts our company has ever made," wrote Sundar Pichai, CEO of Alphabet, Google's parent company, in a blog post
On the eve of Gemini's release, Pichai stated in an interview that one of the main reasons Gemini attracted attention was that it is fundamentally a multimodal model, and stated that the transition to AI is very profound and is still in its early stages, There are infinite opportunities ahead: "When we developed Gemini, we applied a lot of previous experience. We spent more time developing Gemini Ultra, partly to conduct strict security testing. At the same time, we are also fine-tuning it to fully unleash its potential."
On the X (formerly Twitter) platform, Elon Musk also commented under Pichai's Gemini introduction article, "Impressive." Musk also responded to a post by Hasabis, congratulated him, and agreed with SpaceX founder Tom Mueller's comment on Gemini, This comment reads: "I know it's difficult to define what AGI (General Artificial Intelligence) is, but no matter what it is, it's closer than you imagine."
According to Google, as a collaborative effort among various Google teams, including Google Research, Gemini is able to extract insights from hundreds of thousands of documents by reading, filtering, and understanding information, and can also understand numbers well. For example, importing a data graph and new data to Gemini, Gemini can provide the code behind this data graph and generate a data graph that imports the new data.
Gemini generates the right image from the left image and new data. Source: Official Video
In addition to text, Gemini can also understand various forms of input and output, including text, code, audio, images, and videos. Gemini is able to understand information with subtle differences and answer questions related to complex topics, which makes her particularly skilled at explaining reasoning in complex subjects such as mathematics and physics.
Gemini is able to answer questions step by step based on photos. Source: Official Video
Google also released a six minute video showcasing some interesting interactions between testers and Gemini, including asking Gemini to recognize images and describe them in multiple languages, using a map to design intelligence quizzes, and playing cup games and reasoning games with Gemini.
Throughout the process, Gemini's reaction speed was very fast, and he also generated audio and pictures to assist in answering, using some colloquial and even humorous expressions, which can be said to be eye opening. In the comments section, netizens praised the video as "shocking" and celebrated Google's return to its rightful position in the AI competition.
Gemini provides animal shapes that can be made based on two balls of yarn. Source: Official Video
When asked which direction the duck should go, Gemini said it should go to the left side with companions. Source: Official Video
In terms of coding, Gemini can also understand, interpret, and generate high-quality code written in the world's most popular programming languages, including Python, Java, C++, and Go. It can work across languages and reason complex information, and can also be used as an engine for higher-level coding systems.
Starting from December 13th, developers and enterprise clients will be able to access Gemini Pro through the Gemini API (Application Programming Interface) in Google AI Studio or Google Cloud Vertex AI, and Android developers will be able to build using Gemini Nano.
Gemini will bring the largest update since its release to the Google chatbot Bard. Google announced that starting from the day of the launch event, Bard will use Gemini Pro to achieve advanced reasoning, planning, understanding, and other functions, providing English services in over 170 countries and regions. Google plans to expand to different modalities, support new languages and regions in the coming months. At the beginning of next year, Google will launch Bard Advanced, which will use Gemini Ultra.
However, due to regulatory reasons, Bard equipped with Gemini technology will not be available in EU countries and the UK. "We will definitely work hard to solve this problem and are collaborating with local regulatory agencies to ensure that we have sufficient communication with relevant parties before launching the service in any specific region," said Sissie Hsiao, Google's Vice President and Bard Project Leader
Exaggerated promotional videos?
However, shortly after the release of Gemini, some netizens pointed out some inappropriate aspects in the promotional materials.
According to a 60 page technical report released by Google, in MMLU testing, Gemini's results are written below“ cot@32 ”The small word annotation indicates that it used the thought chain suggestion technique, tried 32 times, and selected the best result from them. As a comparison, GPT-4 provides 5 examples of silent word techniques. Under this standard, Gemini Ultra's test result is actually 83.7%, lower than GPT-4's 86.4%.
Moreover, in the graph displaying the comparison of MMLU test scores, Gemini's 90.0% test results were actually only slightly inferior to the 89.8% score of human experts, but were far apart.
Philipp Schmid, the technical director of HuggingFace, has fixed this graph using the data disclosed in the technical report. The following two data points show the GPT-4 (left) and Gemini (right) scores when using the silent word technique to give 5 examples. Source: X
Subsequently, Jeff Dean, Chief Scientist of Google DeepMind, responded to this question in a discussion on the X platform, writing, "We reported on these two methods. We believe it would be interesting for the community to see our newly developed CoT method and understand its differences from other methods."
And for that exciting interactive demonstration video, some people also discovered issues from the disclaimer in the opening text. Machine learning instructor Santiago Valdarrama believes that the statement may imply that the video presented is carefully selected and not recorded in real-time, but edited. In its statement, Google wrote, "We have been shooting video materials, testing them on various challenges, presenting a series of images to Gemini, and asking it to reason out what it sees."
Disclaimer at the beginning of the demonstration video. Source: Official Video
Subsequently, Google explained the multimodal interaction process in a blog post and indirectly acknowledged that only by using static images and multiple prompts to piece together can the effects in the demonstration video be achieved. For example, in the video, Gemini takes turns showing off her fists, scissor hands, and open palms, and Gemini can immediately conclude that she is playing a guessing game. In the article, Google acknowledges that Gemini would only come to the conclusion of a guessing game if they simultaneously displayed these three gestures to Gemini and indicated that it was a game.
Of course, even with some exaggeration in terms of promotion, the performance of Gemini cannot be underestimated.
Who can win the technology giant competition?
Since the beginning of this year, major technology giants have been making continuous moves in the field of AI, each with unique tricks.
Among them, Microsoft, one of Google's biggest competitors, is particularly prominent. In February of this year, Microsoft implanted the chatbot Bing AI into its search engine Bing. A month later, Microsoft launched the Microsoft 365 Copilot, which introduced the capabilities of the large language model GPT-4 into Office software. In addition, to help Microsoft maintain its leading advantage in introducing AI in office tools, Microsoft 365 Copilot Enterprise Edition was officially launched on November 1st, with a monthly subscription fee of $30. More than a month ago, Microsoft announced that the AI assistant Copilot will be officially integrated into Windows 11.
At the first developer conference in November, OpenAI also launched a new model GPT-4 Turbo that supports up to 12800 tokens, as well as a series of upgrades to the chatbot ChatGPT, including custom GPT. Among them, Turbo supports a contextual dialogue length of 12800 tokens and has visual input capability. It enters the multimodal API together with the text graph model DALL · E 3 and the new voice synthesis model (TTS).
For many years, Facebook's parent company Meta has also been an active participant in the AI field. In July of this year, Meta announced that its large model Llama 2, a competitor to GPT4, was officially open source, and anyone can download, modify, and add it to their products for free. This approach has won praise from some tech startups who are concerned that Google, Microsoft, and OpenAI will try to monopolize the AI market and exclude any competitors. But Meta's measures have also been criticized for making it easier for people to use AI technology for evil, such as designing computer viruses, generating sound or images to commit fraud, and so on.
The e-commerce giant Amazon, which has always been considered lagging behind in the AI competition, is also accelerating. At the 2023 re: Invent Global Conference last week, Amazon Cloud Technology (AWS) launched a generative AI assistant called "Amazon Q", which can "easily chat, generate content, and take action.". Amazon Q will focus on the workplace rather than targeting consumers. In the future, Amazon will charge a monthly subscription fee of $20 to enterprise users, while the monthly subscription fee for versions provided to developers and IT personnel is $25.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Faraday Future: Second brand FX plans to launch two models with a price not exceeding $50000
- Robin Lee: The average daily adjustment amount of Wenxin Model exceeded 1.5 billion, 30 times more than that of a year ago
- Will DeepMind's open-source biomolecule prediction model win the Nobel Prize and ignite a wave of AI pharmaceuticals?
- "AI new generation" big model manufacturer Qi "roll" agent, Robin Lee said that it will usher in an era of "making money by thinking"
- Robin Lee said that the illusion of the big model has basically eliminated the actual measurement of ERNIE Bot?
- AI Weekly | Yang Zhilin claims that Kimi has over 36 million monthly active users; Robin Lee: The illusion of big model is basically eliminated
- ERNIE Bot has more than 400 million users, Baidu Wu Tian: the big model is reshaping the industrial intelligence engine
- In October of this year, Tesla Model Y won the sales championship for first tier and new first tier city models
- Alibaba CEO Wu Yongming: AI development requires a batch of open-source models of different scales and fields
- Baidu's Q3 core net profit increased by 17%, exceeding expectations. Wenxin's large model daily usage reached 1.5 billion
-
知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
- caffycat
- 昨天 11:18
- 支持
- 反对
- 回复
- 收藏
-
每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
- star8699
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
- goodfriendboy
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
人民网北京11月22日电 (记者栗翘楚、任妍)2024广州车展,在新能源汽车占据“半壁江山”的同时,正加速向智能网联新能源汽车全面过渡,随着“端到端”成为新宠,智能驾驶解决方案成为本届广州车展各大车企竞 ...
- 3233340
- 昨天 17:06
- 支持
- 反对
- 回复
- 收藏