首页 News 正文

Musk's Action Counterattacks Open Source's Top Model Pressure on OpenAI

六月清晨搅
273 0 0

It seems that Musk made a completely different choice from Altman to demonstrate his unwavering commitment to open source AI models. On March 17th, Musk announced the open-source Grok-1, making it the largest open-source large language model with the largest number of parameters currently available, with 314 billion parameters, far exceeding the 175 billion of OpenAI GPT-3.5.
Interestingly, Grok-1 announced that its open source cover image will be generated by Midjournal, making it an "AI help AI".
Musk, who has been roast that OpenAI is not open, naturally wants to insinuate something on the social platform, "We want to know more about the open part of OpenAI."
Grok-1 follows the Apache 2.0 protocol to open model weights and architecture. This means that it allows users to freely use, modify, and distribute the software, whether for personal or commercial use. This openness encourages broader research and application development. Since its release, the project has won 6.5k stars on GitHub and its popularity is still increasing.
The project description clearly emphasizes that since Grok-1 is a large-scale (314B parameter) model, a machine with sufficient GPU memory is needed to test the model using the example code. Netizens suggest that this may require a machine with 628 GB of GPU memory.
In addition, the implementation efficiency of the MoE layer in this repository is not high, and the reason for choosing this implementation is to avoid the need for a custom kernel to verify the correctness of the model.
Currently, popular open source models include Meta's Llama2 and France's Mistral. Generally speaking, releasing open-source models helps the community conduct large-scale testing and feedback, which means that the iteration speed of the model itself can also be accelerated.
Grok-1 is a Mixture of Experts (MOE) big model developed by xAI, an AI startup under Musk, over the past four months. Review the development process of this model:
After announcing the establishment of xAI, researchers first trained a prototype language model (Grok-0) with 33 billion parameters. This model approached the capabilities of LLaMA2 (70B) on the standard language model testing benchmark, but used fewer training resources;
Subsequently, researchers made significant improvements to the reasoning and encoding capabilities of the model, ultimately developing Grok-1 and releasing it in November 2023. This is a more powerful SOTA language model that achieved 63.2% performance in HumanEval encoding tasks and 73% in MMLU, surpassing all other models in its computational class, including ChatGPT-3.5 and Inflection-1.
What are the advantages of Grok-1 compared to other large models?
XAI emphasizes that Grok-1 is their own large model trained from scratch, that is, starting from October 2023, using custom training stacks to train on JAX and Rust without fine-tuning for specific tasks (such as conversations);
A unique and fundamental advantage of Grok-1 is that it can understand the world in real-time through the X platform, which allows it to answer spicy questions rejected by most other AI systems. The training data used in the released version of Grok-1 comes from the Internet data as of the third quarter of 2023 and the data provided by the AI trainers of xAI;
The Mixture of Experts model with 314 billion parameters has an active weight ratio of 25% for each token, providing it with powerful language comprehension and generation capabilities due to its large number of parameters.
XAI previously introduced that Grok-1 will serve as the engine behind Grok for natural language processing tasks, including question answering, information retrieval, creative writing, and encoding assistance. In the future, the understanding and retrieval of long context, as well as multimodal ability, will be one of the directions that this model will explore.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
    caffycat
    5 小时前
    支持
    反对
    回复
    收藏
  •   每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
    star8699
    前天 19:48
    支持
    反对
    回复
    收藏
  •   上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
    goodfriendboy
    前天 20:09
    支持
    反对
    回复
    收藏
  •   百度创始人李彦宏19日在2024中国5G+工业互联网大会上发表演讲时表示,多智能体协作应用无代码工具"秒哒"发布不到3天,超过5000家企业排队申请测试。此外,L4级端到端的自动驾驶大模型发布后,搭载百度这一大模 ...
    laozhucn
    3 天前
    支持
    反对
    回复
    收藏
六月清晨搅 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    30