Google's Big Model Has Finally taken a Big Step Gemini vs GPT-4
阿豆学长长ov
发表于 2023-12-8 10:00:39
234
0
0
On December 6th, US time, Google officially released the Gemini model. Google CEO Sundar Pichai stated that this is Google's most powerful and versatile model to date.
It has been one year and one week since ChatGPT was released. With the release of ChatGPT, OpenAI has become the most dazzling company in the field of artificial intelligence, especially in the field of large models. It is also a catch up target for all other technology companies, including Google.
For the past eight years, Google has been using AI first as its corporate strategy, and AlphaGo, which defeated the human Go champion in 2016, was created by Google. It is not an exaggeration to say that Google has sparked a wave of AI that has changed the development of the entire AI industry, but now it urgently needs to prove itself in the field of big models.
It is reported that the Gemini 1.0 version includes three different sizes, namely Gemini Ultra, Gemini Pro, and Gemini Nano. Among them, the Gemini Nano is mainly used on the device side, and the Pixel 8 Pro will be the first smartphone equipped with the Gemini Nano; Gemini Pro is suitable for expanding in various tasks, and Google plans to use Gemini Pro to upgrade its chatbot Bard, as well as more Google products including search, advertising, Chrome, and more.
For the most powerful Gemini Ultra, Google stated that it is currently undergoing trust and security checks, as well as further refining the model through fine-tuning and human feedback based reinforcement learning (RLHF). It is expected to be launched to developers and enterprise customers early next year.
Sandal Pichai stated that the release of Gemini is an important milestone in the development of artificial intelligence and the beginning of a new era for Gu Ge.
Beyond GPT-4?
According to Demis Hassabis, CEO of Google DeepMind, Gemini is a multimodal model built by the Google team from scratch, which means it can summarize and seamlessly understand and process different types of information, including text, code, audio, images, and videos.
In terms of performance testing, Gemini Ultra outperformed the current best performance in 30 out of 32 benchmark tests for large language models. Additionally, in MMLU (Massive Multi Task Language Understanding), Gemini Ultra scored 90%, becoming the first large model to surpass human experts.
Demis Hassabis stated that during the testing of image benchmarks, the Gemini Ultra surpassed previously state-of-the-art models without the help of image character recognition (OCR) systems. These benchmark tests highlight Gemini's multimodal ability and also show early signs of its more complex reasoning ability.
At present, the standard method for creating multimodal models is mainly to train individual components of different modalities and then concatenate them together. But the result of this operation is that these models sometimes perform well in performing certain tasks (such as describing images), but often find it difficult to handle more complex reasoning.
"We designed Gemini as a native multimodal model, which was pre trained for different modalities from the beginning, and then we fine tuned it with additional multimodal data to further improve its performance." Demis Hassabis explained, "This helps Gemini seamlessly understand and reason various inputs from the beginning, far superior to existing multimodal models, and its capabilities have reached the most advanced level in almost all fields."
For example, in terms of reasoning, Gemini 1.0 can understand complex written and visual information. By reading, filtering, and understanding information, it can extract insights from hundreds of thousands of documents.
In addition, Gemini 1.0 has been trained to recognize and understand text, images, audio, etc. at the same time, so it can better understand subtle information and answer questions related to complex topics, such as reasoning in complex disciplines such as mathematics and physics.
In terms of coding, Gemini 1.0 is able to understand, interpret, and generate high-quality code for the world's most popular programming languages, such as Python, Java, C++, and Go. Two years ago, Google launched the AI code generation platform AlphaCode. Now, with the help of Gemini, the platform has iterated to AlphaCode 2, and its performance has been greatly improved, which can solve almost twice the number of problems before.
Still continuously optimizing security
Sandal Pichai stated that millions of people are now using generative AI in Google products to do things they couldn't do a year ago, from answering more complex questions to collaborating and creating with new tools. At the same time, developers are using Google's models and infrastructure to build new generative AI applications, and startups and businesses around the world are also continuously growing using Google's AI tools.
In its view, this trend is already somewhat unbelievable, but it is only the beginning.
"We are boldly and responsibly carrying out this work. This means that our research needs to be ambitious, pursuing the ability to bring enormous benefits to humanity and society, while also establishing safeguards and collaborating with governments and experts to address the risks that arise as AI becomes stronger," said Sandal Pichai.
Therefore, during the development process of Gemini, Google also strengthened its security review work. Demis Hassabis introduced that based on Google's AI principles and product security policies, the Google team is adding new protection measures to Gemini's multimodal capabilities.
Not only that, Demis Hassabis also emphasized that at every stage of development, Google considers potential risks and strives to test and mitigate them.
It is reported that Gemini has the most comprehensive security assessment among all Google AI models to date, including assessment of bias and harmful information. Meanwhile, in order to identify blind spots in internal evaluation methods, Google is also collaborating with various external experts and teams to conduct stress tests on the Gemini model on various issues.
Another noteworthy point is that Gemini's training is based on Google's own Tensor Processing Units (TPUs) - v4 and v5e. On these TPUs, Gemini runs faster and has lower costs than previous models from Google. So in addition to the new model, Google has also announced the launch of a new TPU system - Cloud TPU v5p, which is designed specifically for training cutting-edge AI models and will also be used for Gemini development.
Industry insiders have told reporters that although Google's Gemini has surpassed GPT-4 in many aspects of performance, there is still a time gap between it and OpenAI. GPT-4 has been released for more than half a year, and the new generation model should also be in the development process.
"So for Google, comparing various benchmark tests with GPT-4 is only one aspect of demonstrating its current capabilities, and whether it can rely on its own accumulation and powerful resources to shorten the time gap with OpenAI is the key," the person pointed out. In addition, as a new infrastructure built by Google in the era of big models, whether Gemini can meet the needs of daily users and enterprise customers is the true standard for testing Gemini's capabilities, rather than testing data.
Demis Hassabis said that Google has started experimenting with Gemini in search, which makes user search generation experience faster, reducing latency by 40% in English searches in the United States, and also improving quality.
And in the process of accelerating the landing of Gemini 1.0, Google is also further expanding its future version's features, including adding context windows to process more information and provide better response.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- The delivery time for two iPhone 16 models has been shortened! What signal?
- Apple lowers prices of various iPhone models in India
- Baidu Shen Dou: Upgrade computing platform capability for 100000 card computing power cluster, Wenxin large model daily usage exceeds 700 million times
- Meta releases heavyweight new products: $299 Quest 3S headset, AR glasses prototype, multimodal AI model
- Baidu World 2024 will be held on November 12th, and the daily average number of adjustments for the Wenxin large model has exceeded 700 million times
- 挑战Model Y 蔚来的品牌下沉“阳谋”
- Ford CEO tired of making 'boring' car models, personalized and electrified products become 'new favorites'
- Dialogue | Baidu Li Tao: The overlap between automotive intelligence and the wave of big models is a historical inevitability
- Boeing announces 10% layoffs, first delivery of 777X model postponed to 2026
- Faraday Future plans to launch the first model of its second brand by the end of next year
-
苹果知名分析师郭明錤周四(10月31日)在社交媒体上发文表示,苹果明年可能会减少对芯片制造商博通Wi-Fi芯片的依赖,并推出自己的处理器。 郭明錤在社交媒体平台X上写道,“在2025年下半年的新产品(例如iPh ...
- uturn
- 昨天 14:42
- 支持
- 反对
- 回复
- 收藏
-
10月30日,小鹏汽车生态企业小鹏汇天宣布,旗下分体式飞行汽车“陆地航母”即将亮相2024中国航展,11月12日将在中国航展第二展区(斗门莲洲)进行全球首次公开飞行,同时“陆地航母”也将在珠海国际航展中心8号 ...
- yxtianyouyou
- 前天 11:43
- 支持
- 反对
- 回复
- 收藏
-
交易所监管文件显示,当地时间11月1日,亚马逊创始人杰夫·贝索斯拟出售约1635万股亚马逊股票,预计套现约30.5亿美元。今年7月,贝索斯已申请额外出售约2500万股亚马逊股票,按当时股价计算可套现约50亿美元。 ...
- blueskybb
- 13 小时前
- 支持
- 反对
- 回复
- 收藏
-
近日,凯撒海湾目的地(山东)运营管理有限责任公司(简称“凯撒海湾”)与携程旅悦集团签署战略合作协议,双方将围绕“海上目的地运营”、“旅游产品与服务创新”、“研学旅行”、“日韩及海外旅游市场开拓”等 ...
- llyyy2008
- 11 小时前
- 支持
- 反对
- 回复
- 收藏