首页 News 正文

Meta's most powerful model surpasses GPT-4o, Zuckerberg once again stirs up the debate over open and closed sources

楚一帆
168 0 0

After OpenAI suddenly launched a "small model" GPT-4o mini, Meta decided to throw out its large model explosion with super large parameters.
On July 24th, Meta released the open-source large model series Llama 3.1 405B, as well as upgraded models in two sizes: 70B and 8B.
Llama 3.1 405B is considered the strongest open-source model currently available. According to the information released by Meta, the model supports a context length of 128K and has added support for eight languages. It is comparable to flagship models such as GPT-4o and Claude 3.5 Sonnet in terms of general knowledge, operability, mathematics, tool usage, and multilingual translation. Even in human evaluation comparisons, its overall performance is better than these two models.
Meanwhile, the upgraded versions of the 8B and 70B models are also multilingual and have been expanded to 128K context length.
Llama 3.1 405B is the largest model of Meta to date. Meta stated that the training of this model involves over 15 trillion tokens, and in order to achieve the desired results within a reasonable time, the team optimized the entire training stack, using over 16000 H100 GPUs - the first Llama model to be trained on such a large scale of computing power.
This difficult training objective was broken down by the team into multiple key steps. In order to ensure maximum training stability, Meta did not choose the MoE architecture (hybrid expert architecture), but instead adopted the standard Transformer model architecture with only decoders for minor adjustments.
According to Meta, the team also used an iterative post training process, supervised fine-tuning and direct preference optimization for each round, creating the highest quality synthetic data for each round to improve the performance of each ability. Compared to the previous version of Llama, the team has improved and enhanced the quantity and quality of data used before and after training.
At the same time as the explosion of Llama 3.1 405B, Mark Zuckerberg issued a statement titled "Open source AI is the way forward", emphasizing once again the significance and value of open source big models, and directly targeting big model companies such as OpenAI that have taken the path of closed source.
Zuckerberg reiterated the story of open-source Linux and closed source Unix, stating that the former supports more features and a wider ecosystem, and is the industry standard foundation for cloud computing and running most mobile device operating systems. I believe that artificial intelligence will also develop in a similar way
He pointed out that several technology companies are developing leading closed source models, but open source models are rapidly narrowing this gap. The most direct evidence is that Llama 2 was previously only comparable to outdated older generation models, but Llama 3 is now comparable to the latest models and has achieved leadership in certain fields.
He expects that starting next year, Llama 3 will become the most advanced model in the industry - and before that, Llama has already taken a leading position in openness, modifiability, and cost efficiency.
Zuckerberg cited many reasons to explain why the world needs open source models, stating that for developers, in addition to a more transparent development environment to better train, fine tune, and refine their own models, another important factor is the need for an efficient and affordable model.
He explained that for user oriented and offline inference tasks, developers can run Llama 3.1 405B on their own infrastructure at a cost of approximately 50% of closed source models such as GPT-4o.
The debate over the two major paths of open source and closed source has been discussed extensively in the industry before, but the main tone at that time was that each has its own value. Open source can benefit developers in a cost-effective way and is conducive to the technological iteration and development of large language models themselves, while closed source can concentrate resources to break through performance bottlenecks faster and deeper, and is more likely to be the first to achieve AGI (General Artificial Intelligence) than open source.
In other words, the industry generally believes that open source is difficult to catch up with closed source in terms of model performance. The emergence of Llama 3.1 405B may prompt the industry to reconsider this conclusion, which is likely to affect a large group of enterprises and developers who are already inclined to use closed source model services.
At present, Meta's ecosystem is already very large. After the launch of the Llama 3.1 model, over 25 partners will provide related services, including Amazon AWS, Nvidia Databricks、Groq、 Dell, Microsoft Azure, and Google Cloud, among others.
However, Zuckerberg's expectation for the Llama series models to be in a leading position is next year, and there is a possibility that they may be overturned by closed source models in the middle. During this period, the outside world may pay attention to closed source large models that cannot match the performance level of Llama 3.1 405B, and their current situation is indeed somewhat awkward.
He also specifically talked about the competition between China and the United States in the field of big models, believing that it is unrealistic for the United States to always lead China for several years in this area. But even a small lead of a few months can accumulate over time, giving the United States a clear advantage.
The advantage of the United States is decentralization and open innovation. Some people believe that we must close our models to prevent China from acquiring these models, but I think this will not work and will only put the United States and its allies at a disadvantage. "In Zuckerberg's view, a world with only closed models will lead to a few large companies and geopolitical rivals being able to gain leading models, while startups, universities, and small businesses will miss opportunities. In addition, restricting American innovation to closed development increases the possibility of being completely unable to lead.
On the contrary, I believe our best strategy is to establish a strong open ecosystem, allowing our leading companies to work closely with governments and allies to ensure they can make the best use of the latest developments and achieve sustainable first mover advantages in the long term, "said Zackberg.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   过去一周的时间里,有关苹果微信“二选一”的话题持续霸占各个平台热搜,甚至有媒体还在微博发起了“如果苹果微信二选一,你选择iPhone还是微信?”的投票,当然结果是微信取得了压倒性的胜利。   从最新的 ...
    lub_pig
    昨天 17:05
    支持
    反对
    回复
    收藏
  •   今日,特斯拉AI团队发布产品路线图,其中,预计2025年第一季度在中国和欧洲推出完全自动驾驶(FSD),但仍有待监管批准。   自2016年以来,马斯克一直在探索特斯拉的FSD自动驾驶方案。2024年,特斯拉FSD V12 ...
    seisei
    前天 16:32
    支持
    反对
    回复
    收藏
  • 【全球市场】1、道指跌0.54%,纳指涨0.25%,标普跌0.30%。2、特斯拉涨近5%,亚马逊涨超2%。3、纳斯达克中国金龙指数涨0.88%,蔚来涨超14%。
    wishii
    昨天 22:03
    支持
    反对
    回复
    收藏
  • 【ASML CEO回应对华出口限制:会有更多应对措施】当地时间9月4日,荷兰计算机芯片设备供应商ASML首席执行官Christophe Fouquet在花旗银行的一场会议上表示,美国限制ASML对华出口是出于“经济动机”。他预计该公司应 ...
    mbgg2797
    前天 09:15
    支持
    反对
    回复
    收藏
楚一帆 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    38