首页 News 正文

Google's Big Model Has Finally taken a Big Step Gemini vs GPT-4

阿豆学长长ov
222 0 0

On December 6th, US time, Google officially released the Gemini model. Google CEO Sundar Pichai stated that this is Google's most powerful and versatile model to date.
It has been one year and one week since ChatGPT was released. With the release of ChatGPT, OpenAI has become the most dazzling company in the field of artificial intelligence, especially in the field of large models. It is also a catch up target for all other technology companies, including Google.
For the past eight years, Google has been using AI first as its corporate strategy, and AlphaGo, which defeated the human Go champion in 2016, was created by Google. It is not an exaggeration to say that Google has sparked a wave of AI that has changed the development of the entire AI industry, but now it urgently needs to prove itself in the field of big models.
It is reported that the Gemini 1.0 version includes three different sizes, namely Gemini Ultra, Gemini Pro, and Gemini Nano. Among them, the Gemini Nano is mainly used on the device side, and the Pixel 8 Pro will be the first smartphone equipped with the Gemini Nano; Gemini Pro is suitable for expanding in various tasks, and Google plans to use Gemini Pro to upgrade its chatbot Bard, as well as more Google products including search, advertising, Chrome, and more.
For the most powerful Gemini Ultra, Google stated that it is currently undergoing trust and security checks, as well as further refining the model through fine-tuning and human feedback based reinforcement learning (RLHF). It is expected to be launched to developers and enterprise customers early next year.
Sandal Pichai stated that the release of Gemini is an important milestone in the development of artificial intelligence and the beginning of a new era for Gu Ge.
Beyond GPT-4?
According to Demis Hassabis, CEO of Google DeepMind, Gemini is a multimodal model built by the Google team from scratch, which means it can summarize and seamlessly understand and process different types of information, including text, code, audio, images, and videos.
In terms of performance testing, Gemini Ultra outperformed the current best performance in 30 out of 32 benchmark tests for large language models. Additionally, in MMLU (Massive Multi Task Language Understanding), Gemini Ultra scored 90%, becoming the first large model to surpass human experts.
Demis Hassabis stated that during the testing of image benchmarks, the Gemini Ultra surpassed previously state-of-the-art models without the help of image character recognition (OCR) systems. These benchmark tests highlight Gemini's multimodal ability and also show early signs of its more complex reasoning ability.
At present, the standard method for creating multimodal models is mainly to train individual components of different modalities and then concatenate them together. But the result of this operation is that these models sometimes perform well in performing certain tasks (such as describing images), but often find it difficult to handle more complex reasoning.
"We designed Gemini as a native multimodal model, which was pre trained for different modalities from the beginning, and then we fine tuned it with additional multimodal data to further improve its performance." Demis Hassabis explained, "This helps Gemini seamlessly understand and reason various inputs from the beginning, far superior to existing multimodal models, and its capabilities have reached the most advanced level in almost all fields."
For example, in terms of reasoning, Gemini 1.0 can understand complex written and visual information. By reading, filtering, and understanding information, it can extract insights from hundreds of thousands of documents.
In addition, Gemini 1.0 has been trained to recognize and understand text, images, audio, etc. at the same time, so it can better understand subtle information and answer questions related to complex topics, such as reasoning in complex disciplines such as mathematics and physics.
In terms of coding, Gemini 1.0 is able to understand, interpret, and generate high-quality code for the world's most popular programming languages, such as Python, Java, C++, and Go. Two years ago, Google launched the AI code generation platform AlphaCode. Now, with the help of Gemini, the platform has iterated to AlphaCode 2, and its performance has been greatly improved, which can solve almost twice the number of problems before.
Still continuously optimizing security
Sandal Pichai stated that millions of people are now using generative AI in Google products to do things they couldn't do a year ago, from answering more complex questions to collaborating and creating with new tools. At the same time, developers are using Google's models and infrastructure to build new generative AI applications, and startups and businesses around the world are also continuously growing using Google's AI tools.
In its view, this trend is already somewhat unbelievable, but it is only the beginning.
"We are boldly and responsibly carrying out this work. This means that our research needs to be ambitious, pursuing the ability to bring enormous benefits to humanity and society, while also establishing safeguards and collaborating with governments and experts to address the risks that arise as AI becomes stronger," said Sandal Pichai.
Therefore, during the development process of Gemini, Google also strengthened its security review work. Demis Hassabis introduced that based on Google's AI principles and product security policies, the Google team is adding new protection measures to Gemini's multimodal capabilities.
Not only that, Demis Hassabis also emphasized that at every stage of development, Google considers potential risks and strives to test and mitigate them.
It is reported that Gemini has the most comprehensive security assessment among all Google AI models to date, including assessment of bias and harmful information. Meanwhile, in order to identify blind spots in internal evaluation methods, Google is also collaborating with various external experts and teams to conduct stress tests on the Gemini model on various issues.
Another noteworthy point is that Gemini's training is based on Google's own Tensor Processing Units (TPUs) - v4 and v5e. On these TPUs, Gemini runs faster and has lower costs than previous models from Google. So in addition to the new model, Google has also announced the launch of a new TPU system - Cloud TPU v5p, which is designed specifically for training cutting-edge AI models and will also be used for Gemini development.
Industry insiders have told reporters that although Google's Gemini has surpassed GPT-4 in many aspects of performance, there is still a time gap between it and OpenAI. GPT-4 has been released for more than half a year, and the new generation model should also be in the development process.
"So for Google, comparing various benchmark tests with GPT-4 is only one aspect of demonstrating its current capabilities, and whether it can rely on its own accumulation and powerful resources to shorten the time gap with OpenAI is the key," the person pointed out. In addition, as a new infrastructure built by Google in the era of big models, whether Gemini can meet the needs of daily users and enterprise customers is the true standard for testing Gemini's capabilities, rather than testing data.
Demis Hassabis said that Google has started experimenting with Gemini in search, which makes user search generation experience faster, reducing latency by 40% in English searches in the United States, and also improving quality.
And in the process of accelerating the landing of Gemini 1.0, Google is also further expanding its future version's features, including adding context windows to process more information and provide better response.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   每经AI快讯,据亿航智能官微消息,公司EH216-S无人驾驶电动垂直起降航空器(eVTOL)获得巴西国家民航局颁发的试验飞行许可证书,并计划在巴西进行测试和试飞。关于EH216-S无人驾驶eVTOL在巴西的认证,中国民航局 ...
    潇湘才子
    13 小时前
    支持
    反对
    回复
    收藏
  •   今年7月,美国三大海外“债主”所持美国国债齐刷刷缩水,其中日本美债持仓已降至去年10月以来最低。   根据美国财政部当地时间9月18日公布的国际资本流动报告(TIC),2024年7月,美国前三大海外“债主”日本 ...
    520hacker
    前天 20:44
    支持
    反对
    回复
    收藏
  •   上证报中国证券网讯(记者俞立严)9月19日,蔚来全新品牌乐道的首款车型——乐道L60正式上市。新车定位家庭智能电动SUV,在采用BaaS电池租用服务后,L60的售价可低至14.99万元,电池租用月费最低为599元。乐道L6 ...
    anhao007
    昨天 11:03
    支持
    反对
    回复
    收藏
  •   每经记者袁园   日前,国务院印发的《关于加强监管防范风险推动保险业高质量发展的若干意见》提出,以新能源汽车商业保险为重点,深化车险综合改革。   “车险综改”从2015年就已经开始逐步推进了,经过 ...
    moshulong
    昨天 21:50
    支持
    反对
    回复
    收藏
阿豆学长长ov 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    27