首页 News 正文

Robin Lee breaks the illusion of "running points" of big models: the list does not mean that the gap between all future models will become larger

丽颜美容院郧
1113 0 0

Whenever a new version of the big model is released, the industry is always enthusiastic about referencing third-party ranking data, using their own big model and GPT-4 together; quot; Run a score& quot;, Claiming to have surpassed certain indicators in order to demonstrate their expertise in large-scale modeling technology.
But in a recent communication between Baidu Chairman Robin Lee and internal employees, he broke the gap in the big model industry& quot; Window paper& quot;。" Every time a new model is released, I have to compare it with GPT-4o and say that my score is already similar to it, and even exceeds it in some individual items, but this does not mean that there is no gap with the most advanced model& amp;quot;
He further explained that the differences between models are multidimensional. One dimension is the gap in basic abilities such as comprehension, generation, logical reasoning, and memory; Another dimension is cost. Although some models can achieve the same effect, their high cost and slow inference speed are actually not as good as advanced models.
&Amp; quot; Another issue is the over fitting of the test set. Every model that wants to prove its ability will go to the leaderboard, and when it comes to the leaderboard, it has to guess what others are testing and which questions I can use what techniques to do correctly. Therefore, from the leaderboard or test set, you may think that the abilities are very close, but there is still a significant gap in practical applications& amp;quot; Robin Lee said.
A large model practitioner told the reporter that Robin Lee mentioned the over fitting of the test set, which mainly refers to the phenomenon that the model learned the training data too carefully during the model training process, so that the model performed very well on the training data, but performed poorly on the test data that he had never seen before. This usually means that the model is too complex, to the point where it can& quot; Remember& quot; The noise and details in the training data are not universal, so the model cannot generalize well to more new data.
The above-mentioned individuals believe that there are indeed limitations to ranking and scoring, for example, due to the openness of the evaluation dataset, models can be trained in a targeted manner to improve rankings, resulting in; quot; Brushing the charts& quot; Although it is a phenomenon, it is not completely meaningless. The ranking still provides a quantitative evaluation standard, helping people quickly understand the performance of different large models, promoting continuous optimization of the technical level of large models through competition, and also has a certain promotional and advertising effect.
In Robin Lee's opinion; quot; The hype from some self media, coupled with the motivation to promote each new model when it is released, gives people the impression that the differences in capabilities between models are already relatively small, but in fact, it is not the case& amp;quot; Robin Lee said that in the actual use process, Baidu does not allow technicians to compete in the rankings. The real measure of the ability of the big model should be in specific application scenarios to see whether it can meet user needs and generate value gains.
And for the large model industry, it is often mentioned that; quot; Leading by 12 months or trailing by 18 months; quot;, He doesn't think it's that important either. Because every company operates in a perfectly competitive market environment, there are many competitors in any direction they pursue& amp;quot; If you can always guarantee a lead of 12-18 months over your competitors, then you are invincible. Don't think that 12-18 months is a short time. Even if you can guarantee a lead of 6 months over your competitors, you have won. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share& amp;quot;
He judged that the gap between large models in the future may continue to widen. Due to the high ceiling of the large model, it is still far from the ideal situation, so the model needs to be constantly iterated, updated, and upgraded quickly; We need to invest continuously for several years or even decades to meet user needs, reduce costs, and increase efficiency.
In addition to discussing whether there are barriers to the competition of big models, Robin Lee also mentioned that there are quite a lot of misunderstandings about big models in the outside world, including open source closed source model efficiency, AI Agent and other topics.
Robin Lee is a firm supporter of the closed source model; quot; Before the era of big models, people were accustomed to open source meaning free and low cost& amp;quot;  He explained that, for example, open-source Linux is free to use because computers already exist. But these are not valid in the era of big models. Big model inference is expensive, and open-source models do not provide computing power. You have to buy your own equipment, which cannot achieve efficient utilization of computing power.
&Amp; quot; Open source models are not efficient& amp;quot;  He said,& quot;  To be precise, the closed source model should be called the business model, which is a machine resource and GPU used by countless users to share research and development costs and inference. The GPU usage efficiency is the highest, with Baidu Wenxin Big Model 3.5 and 4.0 having GPU usage rates of over 90%& amp;quot;
Robin Lee analyzed that the open source model is valuable in the fields of teaching and scientific research; But in the business world, when pursuing efficiency, effectiveness, and lowest cost, open source models have no advantages.
He also expressed his views on the evolution of the application of large models, with Copilot being the first to appear, providing assistance to humans; Next is the Agent intelligent agent, which has a certain degree of autonomy and can use tools, reflect, and evolve on its own; If this level of automation continues to develop, it will become an AI worker capable of independently completing various tasks.
At present, agents have attracted more and more attention from large model companies and customers. Robin Lee believes that although many people are optimistic about this development direction, so far, agents have not reached a consensus.
&Amp; quot; The threshold for intelligent agents is indeed very low; quot;,  He said that many people don't know how to turn big models into applications, and intelligent agents are a very direct, efficient, and simple way to build intelligent agents on top of models, which is quite convenient.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   每经AI快讯,据亿航智能官微消息,公司EH216-S无人驾驶电动垂直起降航空器(eVTOL)获得巴西国家民航局颁发的试验飞行许可证书,并计划在巴西进行测试和试飞。关于EH216-S无人驾驶eVTOL在巴西的认证,中国民航局 ...
    潇湘才子
    10 小时前
    支持
    反对
    回复
    收藏
  •   今年7月,美国三大海外“债主”所持美国国债齐刷刷缩水,其中日本美债持仓已降至去年10月以来最低。   根据美国财政部当地时间9月18日公布的国际资本流动报告(TIC),2024年7月,美国前三大海外“债主”日本 ...
    520hacker
    前天 20:44
    支持
    反对
    回复
    收藏
  •   上证报中国证券网讯(记者俞立严)9月19日,蔚来全新品牌乐道的首款车型——乐道L60正式上市。新车定位家庭智能电动SUV,在采用BaaS电池租用服务后,L60的售价可低至14.99万元,电池租用月费最低为599元。乐道L6 ...
    anhao007
    昨天 11:03
    支持
    反对
    回复
    收藏
  •   每经记者袁园   日前,国务院印发的《关于加强监管防范风险推动保险业高质量发展的若干意见》提出,以新能源汽车商业保险为重点,深化车险综合改革。   “车险综改”从2015年就已经开始逐步推进了,经过 ...
    moshulong
    昨天 21:50
    支持
    反对
    回复
    收藏
丽颜美容院郧 新手上路
  • 粉丝

    0

  • 关注

    0

  • 主题

    1