首页 News 正文

Robin Lee breaks the illusion of "running points" of big models: the list does not mean that the gap between all future models will become larger

丽颜美容院郧
1161 0 0

Whenever a new version of the big model is released, the industry is always enthusiastic about referencing third-party ranking data, using their own big model and GPT-4 together; quot; Run a score& quot;, Claiming to have surpassed certain indicators in order to demonstrate their expertise in large-scale modeling technology.
But in a recent communication between Baidu Chairman Robin Lee and internal employees, he broke the gap in the big model industry& quot; Window paper& quot;。" Every time a new model is released, I have to compare it with GPT-4o and say that my score is already similar to it, and even exceeds it in some individual items, but this does not mean that there is no gap with the most advanced model& amp;quot;
He further explained that the differences between models are multidimensional. One dimension is the gap in basic abilities such as comprehension, generation, logical reasoning, and memory; Another dimension is cost. Although some models can achieve the same effect, their high cost and slow inference speed are actually not as good as advanced models.
&Amp; quot; Another issue is the over fitting of the test set. Every model that wants to prove its ability will go to the leaderboard, and when it comes to the leaderboard, it has to guess what others are testing and which questions I can use what techniques to do correctly. Therefore, from the leaderboard or test set, you may think that the abilities are very close, but there is still a significant gap in practical applications& amp;quot; Robin Lee said.
A large model practitioner told the reporter that Robin Lee mentioned the over fitting of the test set, which mainly refers to the phenomenon that the model learned the training data too carefully during the model training process, so that the model performed very well on the training data, but performed poorly on the test data that he had never seen before. This usually means that the model is too complex, to the point where it can& quot; Remember& quot; The noise and details in the training data are not universal, so the model cannot generalize well to more new data.
The above-mentioned individuals believe that there are indeed limitations to ranking and scoring, for example, due to the openness of the evaluation dataset, models can be trained in a targeted manner to improve rankings, resulting in; quot; Brushing the charts& quot; Although it is a phenomenon, it is not completely meaningless. The ranking still provides a quantitative evaluation standard, helping people quickly understand the performance of different large models, promoting continuous optimization of the technical level of large models through competition, and also has a certain promotional and advertising effect.
In Robin Lee's opinion; quot; The hype from some self media, coupled with the motivation to promote each new model when it is released, gives people the impression that the differences in capabilities between models are already relatively small, but in fact, it is not the case& amp;quot; Robin Lee said that in the actual use process, Baidu does not allow technicians to compete in the rankings. The real measure of the ability of the big model should be in specific application scenarios to see whether it can meet user needs and generate value gains.
And for the large model industry, it is often mentioned that; quot; Leading by 12 months or trailing by 18 months; quot;, He doesn't think it's that important either. Because every company operates in a perfectly competitive market environment, there are many competitors in any direction they pursue& amp;quot; If you can always guarantee a lead of 12-18 months over your competitors, then you are invincible. Don't think that 12-18 months is a short time. Even if you can guarantee a lead of 6 months over your competitors, you have won. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share& amp;quot;
He judged that the gap between large models in the future may continue to widen. Due to the high ceiling of the large model, it is still far from the ideal situation, so the model needs to be constantly iterated, updated, and upgraded quickly; We need to invest continuously for several years or even decades to meet user needs, reduce costs, and increase efficiency.
In addition to discussing whether there are barriers to the competition of big models, Robin Lee also mentioned that there are quite a lot of misunderstandings about big models in the outside world, including open source closed source model efficiency, AI Agent and other topics.
Robin Lee is a firm supporter of the closed source model; quot; Before the era of big models, people were accustomed to open source meaning free and low cost& amp;quot;  He explained that, for example, open-source Linux is free to use because computers already exist. But these are not valid in the era of big models. Big model inference is expensive, and open-source models do not provide computing power. You have to buy your own equipment, which cannot achieve efficient utilization of computing power.
&Amp; quot; Open source models are not efficient& amp;quot;  He said,& quot;  To be precise, the closed source model should be called the business model, which is a machine resource and GPU used by countless users to share research and development costs and inference. The GPU usage efficiency is the highest, with Baidu Wenxin Big Model 3.5 and 4.0 having GPU usage rates of over 90%& amp;quot;
Robin Lee analyzed that the open source model is valuable in the fields of teaching and scientific research; But in the business world, when pursuing efficiency, effectiveness, and lowest cost, open source models have no advantages.
He also expressed his views on the evolution of the application of large models, with Copilot being the first to appear, providing assistance to humans; Next is the Agent intelligent agent, which has a certain degree of autonomy and can use tools, reflect, and evolve on its own; If this level of automation continues to develop, it will become an AI worker capable of independently completing various tasks.
At present, agents have attracted more and more attention from large model companies and customers. Robin Lee believes that although many people are optimistic about this development direction, so far, agents have not reached a consensus.
&Amp; quot; The threshold for intelligent agents is indeed very low; quot;,  He said that many people don't know how to turn big models into applications, and intelligent agents are a very direct, efficient, and simple way to build intelligent agents on top of models, which is quite convenient.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   自美国大选结果出炉、美国前总统特朗普宣布胜选以来,作为其本次竞选的“大金主”、特斯拉首席执行官埃隆·马斯克也赢麻了。   截至上周五美股收盘,特斯拉股价单日暴涨8.19%,总市值也重新站稳1万亿美元关口 ...
    chpuu
    前天 10:56
    支持
    反对
    回复
    收藏
  •   本报讯 (记者李豪悦)11月12日,腾讯音乐娱乐集团(以下简称“腾讯音乐”)宣布其截至2024年9月30日止第三季度的未经审计财务业绩。   2024年第三季度,腾讯音乐娱乐集团业绩表现稳健,总收入为70.2亿元,同 ...
    覃志辉
    昨天 20:07
    支持
    反对
    回复
    收藏
  •   波音公司当地时间11月11日表示,该公司负责商用飞机品质的高级副总裁伊丽莎白·伦德(Elizabeth Lund)将于12月退休。伦德在波音公司工作了33年,在1月5日阿拉斯加航空公司一架新的737 MAX 9飞机在空中舱门脱落 ...
    charles1121
    昨天 13:15
    支持
    反对
    回复
    收藏
  •   近日,福特纵横户外主题乐园于“孔孟之乡、运河之都”——中国济宁正式揭牌。该公园是继福特在北美成功打造了六座“Bronco Nation”福特烈马公园后,全球第七座、北美以外唯一、也是中国首座户外主题乐园。   ...
    fly520
    前天 13:27
    支持
    反对
    回复
    收藏
丽颜美容院郧 新手上路
  • 粉丝

    0

  • 关注

    0

  • 主题

    1