Musk's Action Counterattacks Open Source's Top Model Pressure on OpenAI
六月清晨搅
发表于 2024-3-18 13:13:44
264
0
0
It seems that Musk made a completely different choice from Altman to demonstrate his unwavering commitment to open source AI models. On March 17th, Musk announced the open-source Grok-1, making it the largest open-source large language model with the largest number of parameters currently available, with 314 billion parameters, far exceeding the 175 billion of OpenAI GPT-3.5.
Interestingly, Grok-1 announced that its open source cover image will be generated by Midjournal, making it an "AI help AI".
Musk, who has been roast that OpenAI is not open, naturally wants to insinuate something on the social platform, "We want to know more about the open part of OpenAI."
Grok-1 follows the Apache 2.0 protocol to open model weights and architecture. This means that it allows users to freely use, modify, and distribute the software, whether for personal or commercial use. This openness encourages broader research and application development. Since its release, the project has won 6.5k stars on GitHub and its popularity is still increasing.
The project description clearly emphasizes that since Grok-1 is a large-scale (314B parameter) model, a machine with sufficient GPU memory is needed to test the model using the example code. Netizens suggest that this may require a machine with 628 GB of GPU memory.
In addition, the implementation efficiency of the MoE layer in this repository is not high, and the reason for choosing this implementation is to avoid the need for a custom kernel to verify the correctness of the model.
Currently, popular open source models include Meta's Llama2 and France's Mistral. Generally speaking, releasing open-source models helps the community conduct large-scale testing and feedback, which means that the iteration speed of the model itself can also be accelerated.
Grok-1 is a Mixture of Experts (MOE) big model developed by xAI, an AI startup under Musk, over the past four months. Review the development process of this model:
After announcing the establishment of xAI, researchers first trained a prototype language model (Grok-0) with 33 billion parameters. This model approached the capabilities of LLaMA2 (70B) on the standard language model testing benchmark, but used fewer training resources;
Subsequently, researchers made significant improvements to the reasoning and encoding capabilities of the model, ultimately developing Grok-1 and releasing it in November 2023. This is a more powerful SOTA language model that achieved 63.2% performance in HumanEval encoding tasks and 73% in MMLU, surpassing all other models in its computational class, including ChatGPT-3.5 and Inflection-1.
What are the advantages of Grok-1 compared to other large models?
XAI emphasizes that Grok-1 is their own large model trained from scratch, that is, starting from October 2023, using custom training stacks to train on JAX and Rust without fine-tuning for specific tasks (such as conversations);
A unique and fundamental advantage of Grok-1 is that it can understand the world in real-time through the X platform, which allows it to answer spicy questions rejected by most other AI systems. The training data used in the released version of Grok-1 comes from the Internet data as of the third quarter of 2023 and the data provided by the AI trainers of xAI;
The Mixture of Experts model with 314 billion parameters has an active weight ratio of 25% for each token, providing it with powerful language comprehension and generation capabilities due to its large number of parameters.
XAI previously introduced that Grok-1 will serve as the engine behind Grok for natural language processing tasks, including question answering, information retrieval, creative writing, and encoding assistance. In the future, the understanding and retrieval of long context, as well as multimodal ability, will be one of the directions that this model will explore.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Baidu Shen Dou: Upgrade computing platform capability for 100000 card computing power cluster, Wenxin large model daily usage exceeds 700 million times
- Meta releases heavyweight new products: $299 Quest 3S headset, AR glasses prototype, multimodal AI model
- Baidu World 2024 will be held on November 12th, and the daily average number of adjustments for the Wenxin large model has exceeded 700 million times
- 挑战Model Y 蔚来的品牌下沉“阳谋”
- Ford CEO tired of making 'boring' car models, personalized and electrified products become 'new favorites'
- Dialogue | Baidu Li Tao: The overlap between automotive intelligence and the wave of big models is a historical inevitability
- Boeing announces 10% layoffs, first delivery of 777X model postponed to 2026
- Google's executive in charge of core profitable products has stepped down, as its search business is facing unprecedented pressure
- Faraday Future plans to launch the first model of its second brand by the end of next year
- Will a third brand launch hybrid models overseas? NIO responds: Continuing the pure electric technology route
-
【科技记者古尔曼:苹果计划于12月第一周发布iOS 18.2系统更新 带来更多人工智能功能】科技记者古尔曼透露,苹果计划于12月第一周发布iOS 18.2系统更新。iOS 18.2将为iPhone 15 Pro机型和所有iPhone 16机型带来更多 ...
- cristianna
- 昨天 17:32
- 支持
- 反对
- 回复
- 收藏
-
近日,爱立信中国区总裁方迎在接受《经济参考报》记者采访时表示,5G技术在全球范围内得到了迅速发展,但面临商业潜力未能充分挖掘、网络运营难度较以往更高两大挑战。因此,运营商在继续5G网络部署的同时,应关 ...
- blueskybb
- 昨天 15:05
- 支持
- 反对
- 回复
- 收藏
-
【特斯拉首次聘请了一位品牌大使】特斯拉近日公布了首位品牌大使韩国奥运射击选手金艺智,这一举动有些出人意料,毕竟它曾经对广告的态度十分不屑。 ...
- xyyg
- 昨天 13:34
- 支持
- 反对
- 回复
- 收藏
-
“新四化”的时代浪潮下,新能源汽车行业百家争鸣。伴随着自主品牌不断崛起,合资品牌当下的生存状况备受外界关注,如何打好电动化时代的突围战,成为合资品牌的新课题。 作为国内合资车企的代表之一,上汽 ...
- mbgg2797
- 1 小时前
- 支持
- 反对
- 回复
- 收藏