首页 News 正文

Google's "counterattack" has sparked investor discussions on the commercialization of multimodal models

白云追月素
273 0 0

After the one-year anniversary of ChatGPT's launch, recently, Google's multimodal Gemini suddenly launched, which was seen by the industry as the "strongest counterattack by Google". The discussion among investment institutions regarding Gemini has exploded. Industry insiders believe that Gemini has significant optimization in visual recognition and inference, and when it comes to commercial scenarios, real-time interaction scenarios may become the focus of multimodal artificial intelligence model applications.
Gemini is "too stunning"
Recently, Google CEO Sandal Pichai announced the official launch of Gemini 1.0. Eli Collins, Vice President of Google DeepMind Products, stated that this is Google's most powerful and versatile large model to date.
It is understood that compared to existing large models on the market, Gemini has been created as a multimodal model from the beginning, which means it can summarize and smoothly understand, manipulate, and combine different types of information, including text, code, audio, images, and videos. In terms of flexibility, it can run from the data center to mobile devices.
After watching the Gemini series demonstration video, many investors expressed that it was "too shocking". "After watching Gemini's demonstration video, its ability to understand multimodality is astonishing. In addition, Gemini's reasoning ability currently seems to surpass ChatGPT." Associate Professor Sun Haifeng from the School of Computer Science at Beijing University of Posts and Telecommunications said that on the one hand, Gemini far surpasses OpenAI's ChatGPT in multimodal information processing. Gemini can support both multimodal information input and multimodal information output. A typical feature of Gemini is its support for interleaved sequences of text, images, audio, and video as inputs, which is difficult to implement for ChatGPT or traditional multimodal models. Generally speaking, ChatGPT only supports text output, and other modalities of output require calling third-party APIs for implementation. Gemini's interleaved sequence input method is more suitable for the needs of the vast majority of scenarios. On the other hand, in Gemini's technical report, its accuracy in MMLU dataset testing reached 90.04%, surpassing human experts, marking a milestone in the evolution of its reasoning ability.
On the day after Gemini was launched, Google was questioned by the outside world for claiming that multimodal videos were edited and collaged, and Gemini was suspected of exaggerating its advertising. Google also provided an explanation: the video does indeed have elements of post production and editing, and all interactions with Gemini are not perceived in real time, but rather the effects of images and prompts given by the staff. Therefore, Gemini still needs further development in reading videos.
Real time interaction scenarios or commercial focus
Affected by this news, domestic investors have launched heated discussions on multimodal technology and its applications.
A first tier investor in a certain technology track stated that compared to ChatGPT-4, Gemini's image recognition and reasoning abilities, as well as its current apparent response speed, have greatly improved. He personally believes that Gemini and OpenAI have their own unique products, and suitable scenarios need to be found for commercial implementation. "Having suitable scene adaptation and identifying value-added needs is still crucial, but Gemini has indeed further opened up the imagination space of AI models."
"It can be boldly imagined that when a multimodal model runs on a robot, it may achieve embodied intelligence. In addition, when the multimodal model is combined with Google Glass, it may be upgraded to a super intelligent agent," said another investor.
A technician introduced that humans have five senses, and the world we build and the media we consume are presented in this way. The multimodal model means that Gemini can understand the world around her in the same way as humans and absorb any type of input and output - whether it's text, code, audio, images, videos. The most crucial technology among them is how to mix all these patterns, how to collect as much data as possible from any number of inputs and senses, and then provide equally diverse responses.
"Gemini is more like a human, closer to human visual recognition and some reasoning and judgment. OpenAI's ChatGPT is more like a big knowledge base, which can provide people with information reference. The two are not about who surpasses each other, but have significant differences in focus direction," said an investor.
Sun Haifeng said that it is not yet clear what the specific implementation structure of Gemini is, but this pattern that can interleave multiple modal information as input is very needed in many scenarios, especially real-time interaction scenarios.
Another technology investor believes that the release of Gemini means that big companies have a more definite first mover advantage in artificial intelligence. For example, Google's Gemini has outstanding visual reasoning capabilities because they have a variety of search engine based materials as a large amount of training data. In addition, large factories have obvious advantages in data, traffic, capital, computing power, and application scenarios.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
    caffycat
    昨天 11:18
    支持
    反对
    回复
    收藏
  •   每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
    star8699
    3 天前
    支持
    反对
    回复
    收藏
  •   上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
    goodfriendboy
    3 天前
    支持
    反对
    回复
    收藏
  •   人民网北京11月22日电 (记者栗翘楚、任妍)2024广州车展,在新能源汽车占据“半壁江山”的同时,正加速向智能网联新能源汽车全面过渡,随着“端到端”成为新宠,智能驾驶解决方案成为本届广州车展各大车企竞 ...
    3233340
    昨天 17:06
    支持
    反对
    回复
    收藏
白云追月素 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    39