首页 News 正文

Meta's most powerful model surpasses GPT-4o, Zuckerberg once again stirs up the debate over open and closed sources

楚一帆
191 0 0

After OpenAI suddenly launched a "small model" GPT-4o mini, Meta decided to throw out its large model explosion with super large parameters.
On July 24th, Meta released the open-source large model series Llama 3.1 405B, as well as upgraded models in two sizes: 70B and 8B.
Llama 3.1 405B is considered the strongest open-source model currently available. According to the information released by Meta, the model supports a context length of 128K and has added support for eight languages. It is comparable to flagship models such as GPT-4o and Claude 3.5 Sonnet in terms of general knowledge, operability, mathematics, tool usage, and multilingual translation. Even in human evaluation comparisons, its overall performance is better than these two models.
Meanwhile, the upgraded versions of the 8B and 70B models are also multilingual and have been expanded to 128K context length.
Llama 3.1 405B is the largest model of Meta to date. Meta stated that the training of this model involves over 15 trillion tokens, and in order to achieve the desired results within a reasonable time, the team optimized the entire training stack, using over 16000 H100 GPUs - the first Llama model to be trained on such a large scale of computing power.
This difficult training objective was broken down by the team into multiple key steps. In order to ensure maximum training stability, Meta did not choose the MoE architecture (hybrid expert architecture), but instead adopted the standard Transformer model architecture with only decoders for minor adjustments.
According to Meta, the team also used an iterative post training process, supervised fine-tuning and direct preference optimization for each round, creating the highest quality synthetic data for each round to improve the performance of each ability. Compared to the previous version of Llama, the team has improved and enhanced the quantity and quality of data used before and after training.
At the same time as the explosion of Llama 3.1 405B, Mark Zuckerberg issued a statement titled "Open source AI is the way forward", emphasizing once again the significance and value of open source big models, and directly targeting big model companies such as OpenAI that have taken the path of closed source.
Zuckerberg reiterated the story of open-source Linux and closed source Unix, stating that the former supports more features and a wider ecosystem, and is the industry standard foundation for cloud computing and running most mobile device operating systems. I believe that artificial intelligence will also develop in a similar way
He pointed out that several technology companies are developing leading closed source models, but open source models are rapidly narrowing this gap. The most direct evidence is that Llama 2 was previously only comparable to outdated older generation models, but Llama 3 is now comparable to the latest models and has achieved leadership in certain fields.
He expects that starting next year, Llama 3 will become the most advanced model in the industry - and before that, Llama has already taken a leading position in openness, modifiability, and cost efficiency.
Zuckerberg cited many reasons to explain why the world needs open source models, stating that for developers, in addition to a more transparent development environment to better train, fine tune, and refine their own models, another important factor is the need for an efficient and affordable model.
He explained that for user oriented and offline inference tasks, developers can run Llama 3.1 405B on their own infrastructure at a cost of approximately 50% of closed source models such as GPT-4o.
The debate over the two major paths of open source and closed source has been discussed extensively in the industry before, but the main tone at that time was that each has its own value. Open source can benefit developers in a cost-effective way and is conducive to the technological iteration and development of large language models themselves, while closed source can concentrate resources to break through performance bottlenecks faster and deeper, and is more likely to be the first to achieve AGI (General Artificial Intelligence) than open source.
In other words, the industry generally believes that open source is difficult to catch up with closed source in terms of model performance. The emergence of Llama 3.1 405B may prompt the industry to reconsider this conclusion, which is likely to affect a large group of enterprises and developers who are already inclined to use closed source model services.
At present, Meta's ecosystem is already very large. After the launch of the Llama 3.1 model, over 25 partners will provide related services, including Amazon AWS, Nvidia Databricks、Groq、 Dell, Microsoft Azure, and Google Cloud, among others.
However, Zuckerberg's expectation for the Llama series models to be in a leading position is next year, and there is a possibility that they may be overturned by closed source models in the middle. During this period, the outside world may pay attention to closed source large models that cannot match the performance level of Llama 3.1 405B, and their current situation is indeed somewhat awkward.
He also specifically talked about the competition between China and the United States in the field of big models, believing that it is unrealistic for the United States to always lead China for several years in this area. But even a small lead of a few months can accumulate over time, giving the United States a clear advantage.
The advantage of the United States is decentralization and open innovation. Some people believe that we must close our models to prevent China from acquiring these models, but I think this will not work and will only put the United States and its allies at a disadvantage. "In Zuckerberg's view, a world with only closed models will lead to a few large companies and geopolitical rivals being able to gain leading models, while startups, universities, and small businesses will miss opportunities. In addition, restricting American innovation to closed development increases the possibility of being completely unable to lead.
On the contrary, I believe our best strategy is to establish a strong open ecosystem, allowing our leading companies to work closely with governments and allies to ensure they can make the best use of the latest developments and achieve sustainable first mover advantages in the long term, "said Zackberg.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   苹果知名分析师郭明錤周四(10月31日)在社交媒体上发文表示,苹果明年可能会减少对芯片制造商博通Wi-Fi芯片的依赖,并推出自己的处理器。   郭明錤在社交媒体平台X上写道,“在2025年下半年的新产品(例如iPh ...
    uturn
    前天 14:42
    支持
    反对
    回复
    收藏
  •   10月30日,小鹏汽车生态企业小鹏汇天宣布,旗下分体式飞行汽车“陆地航母”即将亮相2024中国航展,11月12日将在中国航展第二展区(斗门莲洲)进行全球首次公开飞行,同时“陆地航母”也将在珠海国际航展中心8号 ...
    yxtianyouyou
    3 天前
    支持
    反对
    回复
    收藏
  •   交易所监管文件显示,当地时间11月1日,亚马逊创始人杰夫·贝索斯拟出售约1635万股亚马逊股票,预计套现约30.5亿美元。今年7月,贝索斯已申请额外出售约2500万股亚马逊股票,按当时股价计算可套现约50亿美元。 ...
    blueskybb
    昨天 09:27
    支持
    反对
    回复
    收藏
  •   近日,凯撒海湾目的地(山东)运营管理有限责任公司(简称“凯撒海湾”)与携程旅悦集团签署战略合作协议,双方将围绕“海上目的地运营”、“旅游产品与服务创新”、“研学旅行”、“日韩及海外旅游市场开拓”等 ...
    llyyy2008
    昨天 11:01
    支持
    反对
    回复
    收藏
楚一帆 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    38