首页 News 正文

Robin Lee breaks the illusion of "running points" of big models: the list does not mean that the gap between all future models will become larger

丽颜美容院郧
1166 0 0

Whenever a new version of the big model is released, the industry is always enthusiastic about referencing third-party ranking data, using their own big model and GPT-4 together; quot; Run a score& quot;, Claiming to have surpassed certain indicators in order to demonstrate their expertise in large-scale modeling technology.
But in a recent communication between Baidu Chairman Robin Lee and internal employees, he broke the gap in the big model industry& quot; Window paper& quot;。" Every time a new model is released, I have to compare it with GPT-4o and say that my score is already similar to it, and even exceeds it in some individual items, but this does not mean that there is no gap with the most advanced model& amp;quot;
He further explained that the differences between models are multidimensional. One dimension is the gap in basic abilities such as comprehension, generation, logical reasoning, and memory; Another dimension is cost. Although some models can achieve the same effect, their high cost and slow inference speed are actually not as good as advanced models.
&Amp; quot; Another issue is the over fitting of the test set. Every model that wants to prove its ability will go to the leaderboard, and when it comes to the leaderboard, it has to guess what others are testing and which questions I can use what techniques to do correctly. Therefore, from the leaderboard or test set, you may think that the abilities are very close, but there is still a significant gap in practical applications& amp;quot; Robin Lee said.
A large model practitioner told the reporter that Robin Lee mentioned the over fitting of the test set, which mainly refers to the phenomenon that the model learned the training data too carefully during the model training process, so that the model performed very well on the training data, but performed poorly on the test data that he had never seen before. This usually means that the model is too complex, to the point where it can& quot; Remember& quot; The noise and details in the training data are not universal, so the model cannot generalize well to more new data.
The above-mentioned individuals believe that there are indeed limitations to ranking and scoring, for example, due to the openness of the evaluation dataset, models can be trained in a targeted manner to improve rankings, resulting in; quot; Brushing the charts& quot; Although it is a phenomenon, it is not completely meaningless. The ranking still provides a quantitative evaluation standard, helping people quickly understand the performance of different large models, promoting continuous optimization of the technical level of large models through competition, and also has a certain promotional and advertising effect.
In Robin Lee's opinion; quot; The hype from some self media, coupled with the motivation to promote each new model when it is released, gives people the impression that the differences in capabilities between models are already relatively small, but in fact, it is not the case& amp;quot; Robin Lee said that in the actual use process, Baidu does not allow technicians to compete in the rankings. The real measure of the ability of the big model should be in specific application scenarios to see whether it can meet user needs and generate value gains.
And for the large model industry, it is often mentioned that; quot; Leading by 12 months or trailing by 18 months; quot;, He doesn't think it's that important either. Because every company operates in a perfectly competitive market environment, there are many competitors in any direction they pursue& amp;quot; If you can always guarantee a lead of 12-18 months over your competitors, then you are invincible. Don't think that 12-18 months is a short time. Even if you can guarantee a lead of 6 months over your competitors, you have won. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share& amp;quot;
He judged that the gap between large models in the future may continue to widen. Due to the high ceiling of the large model, it is still far from the ideal situation, so the model needs to be constantly iterated, updated, and upgraded quickly; We need to invest continuously for several years or even decades to meet user needs, reduce costs, and increase efficiency.
In addition to discussing whether there are barriers to the competition of big models, Robin Lee also mentioned that there are quite a lot of misunderstandings about big models in the outside world, including open source closed source model efficiency, AI Agent and other topics.
Robin Lee is a firm supporter of the closed source model; quot; Before the era of big models, people were accustomed to open source meaning free and low cost& amp;quot;  He explained that, for example, open-source Linux is free to use because computers already exist. But these are not valid in the era of big models. Big model inference is expensive, and open-source models do not provide computing power. You have to buy your own equipment, which cannot achieve efficient utilization of computing power.
&Amp; quot; Open source models are not efficient& amp;quot;  He said,& quot;  To be precise, the closed source model should be called the business model, which is a machine resource and GPU used by countless users to share research and development costs and inference. The GPU usage efficiency is the highest, with Baidu Wenxin Big Model 3.5 and 4.0 having GPU usage rates of over 90%& amp;quot;
Robin Lee analyzed that the open source model is valuable in the fields of teaching and scientific research; But in the business world, when pursuing efficiency, effectiveness, and lowest cost, open source models have no advantages.
He also expressed his views on the evolution of the application of large models, with Copilot being the first to appear, providing assistance to humans; Next is the Agent intelligent agent, which has a certain degree of autonomy and can use tools, reflect, and evolve on its own; If this level of automation continues to develop, it will become an AI worker capable of independently completing various tasks.
At present, agents have attracted more and more attention from large model companies and customers. Robin Lee believes that although many people are optimistic about this development direction, so far, agents have not reached a consensus.
&Amp; quot; The threshold for intelligent agents is indeed very low; quot;,  He said that many people don't know how to turn big models into applications, and intelligent agents are a very direct, efficient, and simple way to build intelligent agents on top of models, which is quite convenient.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
    caffycat
    11 小时前
    支持
    反对
    回复
    收藏
  •   每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
    star8699
    前天 19:48
    支持
    反对
    回复
    收藏
  •   上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
    goodfriendboy
    前天 20:09
    支持
    反对
    回复
    收藏
  •   人民网北京11月22日电 (记者栗翘楚、任妍)2024广州车展,在新能源汽车占据“半壁江山”的同时,正加速向智能网联新能源汽车全面过渡,随着“端到端”成为新宠,智能驾驶解决方案成为本届广州车展各大车企竞 ...
    3233340
    6 小时前
    支持
    反对
    回复
    收藏
丽颜美容院郧 新手上路
  • 粉丝

    0

  • 关注

    0

  • 主题

    1