首页 News 正文

Open source community watershed: Meta model Llama 3 with the highest release parameters or up to 400 billion

六月清晨搅
193 0 0

In order to maintain the company's position in the field of AI (artificial intelligence) open source big models, social media giant Meta has launched its latest open source model.
On April 18th local time, Meta announced on its official website the release of its latest large model, Llama 3. At present, Llama 3 has opened two small parameter versions, 8 billion (8B) and 70 billion (70B), with a context window of 8k. Meta stated that by using higher quality training data and fine-tuning instructions, Llama 3 has achieved a "significant improvement" compared to the previous generation Llama 2.
In the future, Meta will launch a larger parameter version of Llama 3, which will have over 400 billion parameters. Meta will also introduce new features such as multimodality for Llama 3 in the future, including longer context windows and Llama 3 research papers.
Meta wrote in the announcement, "Through Llama 3, we are committed to building open-source models that can compete with today's best proprietary models. We want to handle developer feedback, improve the overall practicality of Llama 3, and continue to play a leading role in responsible use and deployment of LLM (Large Language Models)."
On the 18th, Meta's stock price (Nasdaq: META) closed at $501.80 per share, up 1.54%, with a total market value of $1.28 trillion.
"The best open source big model currently on the market"
According to Meta, Llama 3 has demonstrated state-of-the-art performance on various industry benchmarks, providing new features including improved inference capabilities, and is currently the best open-source large model on the market.
At the architecture level, Llama3 has chosen the standard decoder only Transformer architecture, using a tokenizer that includes a 128K token vocabulary. Llama 3 was pre trained on two 24K GPU clusters created by Meta, using over 15T of publicly available data, including 5% non English data covering over 30 languages. The training data volume was seven times that of the previous generation Llama 2, and the code included was four times that of Llama 2.
According to Meta's test results, the Llama 38B model outperforms Gemma 7B and Mistral 7B Instrument on multiple performance benchmarks such as MMLU, GPQA, and HumanEval, while the 70B model surpasses the well-known closed source model Claude 3's intermediate version Sonnet, with three wins and two losses compared to Google's Gemini Pro 1.5.
Llama 3 performs exceptionally well on multiple performance benchmarks. Source: Meta official website
In addition to conventional datasets, Meta is also committed to optimizing the performance of Llama 3 in practical scenarios, and has specifically developed a high-quality manual testing set for this purpose. This test set contains 1800 pieces of data, covering 12 key use cases such as seeking advice, closed ended question answering, brainstorming, coding, and writing, and is kept confidential by the development team.
In this test set, the results show that Llama 3 outperforms Llama 2 significantly and also surpasses well-known models such as Claude 3 Sonnet, Mistral Medium, and GPT-3.5.
Llama 3 achieved excellent results on the manual test set. Source: Meta official website
Although the 400B+model of Llama 3 is still being trained, Meta has also demonstrated some of its testing results, seemingly aimed at benchmarking against the strongest version of Claude 3, Opus. However, Meta has not released the comparison results between the Llama 3 larger parameter model and GPT-4 equivalent specification players.
The 400B+model of Llama 3 is still being trained. Source: Meta official website
The Llama 3 model will soon be available to developers on Amazon AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM Watson X, Amazon Azure, Nvidia NIM, and Snowflake, and will receive hardware platform support from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm. In order for Llama 3 to be developed responsibly, Meta will also provide new trust and security tools, including Llama Guard 2, Code Shield, and CyberSec Eval 2.
Meanwhile, Meta has released an official web version of Meta AI based on Llama3. At present, the platform is still in its early stages, with only two major functions: dialogue and painting. Users do not need to register to use the dialogue function, while using the painting function requires users to register and log in to an account.
Injecting vitality into the open source community
Meta's AI path has always been closely linked to open source, and once Llama 3 was launched, it was warmly welcomed by the open source community.
Although there are some roast about the size of Llama 3's 8k context window, Meta said that it would soon expand the Llama 3's context window. Matt Shumer, CEO and co-founder of email startup Otherside AI, is also optimistic about this and said, "We are entering a new world where GPT-4 level models are open source and accessible for free."
According to Jim Fan, a senior research scientist at Nvidia, the upcoming larger parameter Llama 3 model marks a "watershed" for the open source community, which can change the decision-making methods of many academic research and startups, and "is expected to see a surge in vitality throughout the entire ecosystem.".
However, it is worth noting that Meta has not released the training data for Llama 3, only stating that it is entirely from publicly available data. Strictly speaking, so-called "open source" software should be fully open to the public during the development and distribution process, including the source code of software products, training data, and other content. Previously, the "strongest open source model" DBRX released by data company Databricks not only had standard configurations far beyond ordinary computers, but also had this issue.
The launch of Llama 3 closely follows the progress made by Meta's self-developed chips. Just last week, Meta announced the latest version of its self-developed chip MTIA. MTIA is a customized chip series designed by Meta specifically for AI training and inference work. Compared to the Meta's first generation AI inference accelerator MTIA v1, which was officially announced in May last year, the latest version of the chip has significantly improved performance, specifically designed for the ranking and recommendation system of Meta's social software. Analysis indicates that Meta's goal is to reduce dependence on chip manufacturers such as Nvidia.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   耐克公司上季度销售额不及预期,撤回新财年全年业绩指引。   当地时间10月1日,耐克公司(NYSE:NIKE)发布截至2024年8月31日的2025财年第一财季财务业绩。该季度实现营收116亿美元,不及市场预期,同比下滑 ...
    覃志辉
    3 天前
    支持
    反对
    回复
    收藏
  • 纳斯达克中国金龙指数收涨4.94%,热门中概股大涨,哔哩哔哩涨超10%。
    3215779
    前天 10:58
    支持
    反对
    回复
    收藏
  •   据美国CNBC网站报道,当地时间10月3日,美国特斯拉公司表示,由于后视摄像头图像延迟,可能会影响驾驶员视野,增加撞车风险,公司将在全美范围内召回超过2.7万辆电动皮卡。   报道称,这是该公司对电动皮卡 ...
    hk1990
    昨天 10:21
    支持
    反对
    回复
    收藏
  • 【规模创造历史:特斯拉召回超2.7万辆电动皮卡】特斯拉公司宣布,由于后视摄像头图像存在延迟问题,可能对驾驶员视野造成影响并增加碰撞风险,因此将在全美范围内召回超过2.7万辆Cybertruck。报道称,这是该公司迄今 ...
    事业为上
    昨天 09:47
    支持
    反对
    回复
    收藏
六月清晨搅 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    30