Can it solve 99% of usage scenarios! "Microsoft and Nvidia are betting that small models and big models are no longer popular?
行走的蜗牛_
发表于 2024-8-22 19:42:29
1219
0
0
On the path of artificial intelligence development, tech giants once competed to develop large-scale language models, but now a new trend has emerged: Small Language Models (SLMs) are gradually emerging, challenging the previous notion of "bigger is better".
On August 21st local time, Microsoft and Nvidia successively released their latest small language models - Phi-3.5-mini-instruct and Mistral Nemo Minitron 8B. The main selling point of these two models is that they achieve a good balance between computing resource usage and functional performance. In some aspects, their performance can even rival that of large models.
Clem Delangue, CEO of Hugging Face, an artificial intelligence startup, pointed out that up to 99% of usage scenarios can be solved through SLM and predicted that 2024 will be the year of SLM. According to incomplete statistics, tech giants including Meta, Microsoft, and Google have released nine small models this year.
The cost of training large models is rising, but the performance improvement is limited
The rise of SLM is not accidental, but closely related to the challenges of performance improvement and resource consumption in large models (LLM).
The performance comparison released by AI startups Vellum and Hugging Face in April this year shows that the performance gap between LLMs is rapidly narrowing, especially in specific tasks such as multiple-choice questions, reasoning, and mathematical problems, where the differences between top-level models are minimal. For example, in multiple-choice questions, Claude 3 Opus, GPT-4, and Gemini Ultra all scored over 83%, while in reasoning tasks, Claude 3 Opus, GPT-4, and Gemini 1.5 Pro all scored over 92% accurately.
Gary Marcus, former head of Uber AI, pointed out that LLM's latest research papers all point in the same direction, with over a dozen LLMs in the same field as GPT-4. "Some of them have slightly better performance than GPT-4, but there hasn't been a qualitative leap. I think everyone would say GPT-4 is one step ahead of GPT-3.5, but there hasn't been any qualitative leap in the following year or so
Compared to the limited performance improvement, the training cost of LLM is constantly increasing. Training these models requires massive amounts of data and billions or even trillions of parameters, resulting in extremely high resource consumption. The computing power and energy consumption required to train and run LLM are staggering, making it difficult for small organizations or individuals to participate in core LLM development.
The International Energy Agency estimates that the electricity consumption related to data centers, cryptocurrencies, and artificial intelligence will be roughly equivalent to the total electricity consumption of Japan by 2026.
OpenAI CEO Sam Altman once stated at an event at MIT that the cost of training GPT-4 is at least $100 million, while Anthropic CEO Dario Amodei predicts that the cost of training models in the future could reach $100 billion.
In addition, the complexity of the tools and techniques required to use LLM also increases the learning curve for developers. The entire process from training to deployment takes a long time, which slows down the development speed. A study by the University of Cambridge suggests that companies may need 90 days or more to deploy a machine learning model.
Another major issue with LLM is the tendency to create "illusions" - where the output generated by the model appears reasonable but is actually incorrect. This is because the training method of LLM is to predict the next most likely word based on patterns in the data, rather than truly understanding the information. Therefore, LLM may confidently generate false statements, fabricate facts, or combine unrelated concepts in absurd ways. How to detect and reduce these 'illusions' is a continuous challenge in developing reliable and trustworthy language models.
Expanding parameters is not the only way to improve performance
The concerns about the huge energy demand for LLM and the market opportunities to provide more diversified AI options for enterprises have led technology companies to gradually shift their focus to SLM.
The Daily Economic News reporter noticed that both AI startups such as Arcee, Sakana AI, and Hugging Face, as well as tech giants, are investing in and serving customers through SLM and more cost-effective ways.
Previously, Google, Meta, OpenAI, and Anthropic have all released smaller language models that are more compact and flexible than their flagship LLM. This not only reduces the cost of development and deployment, but also provides cheaper solutions for commercial customers. Given that investors are increasingly concerned about the high costs and uncertain returns of AI companies, more technology companies may choose this path. Even Microsoft and Nvidia have now launched their own small models (SLM).
SLM is a streamlined version of LLM with fewer parameters and simpler design, requiring less data and training time - just a few minutes or hours. This makes SLM more efficient and easier to deploy on small devices. For example, they can be embedded into mobile phones without consuming supercomputing resources, thereby reducing costs and significantly improving response speed.
Microsoft pointed out in its report on small model technology that Phi-3.5-mini-instruct is a high-performance language model designed for local deployment on mobile phones.
Another major advantage of SLM is its specialization for specific applications. SLM focuses on specific tasks or domains, making them more efficient in practical applications. For example, in sentiment analysis, named entity recognition, or domain specific question answering, SLM often outperforms general models. This customization enables enterprises to create efficient models that meet their specific needs.
SLM is also less prone to "illusions" in specific domains, as they are typically trained on narrower and more targeted datasets, which helps the model learn the patterns and information most relevant to its task. The focus of SLM reduces the possibility of generating irrelevant, unexpected, or inconsistent outputs.
Despite its small scale, SLM's performance in certain aspects is not inferior to large models. Microsoft's latest Phi-3.5-mini-instruct only has 3.8 billion parameters, but its performance is much better than models such as Llama3.18B and Mistral 7B, which have much higher parameters. Aaron Mueller, a language model research expert at Northeastern University in the United States, pointed out that expanding the number of parameters is not the only way to improve model performance, and training with higher quality data can also produce similar effects.
OpenAI CEO Sam Altman stated at an event in April that he believes we are currently at the end of the era of giant models, and "we will improve their performance through other means
However, it should be noted that although the specialization of SLM is a major advantage, there are also limitations. These models may perform poorly outside of their specific training domains, lack a broad knowledge base, and are unable to generate relevant content on a wide range of topics compared to LLM. This limitation requires organizations to deploy multiple SLMs to cover different areas of demand, which may complicate AI infrastructure.
With the rapid development of the AI field, the standards for small models may continue to change. David Ha, co-founder and CEO of Sakana, a small model startup in Tokyo, said that what seemed like a huge AI model a few years ago now appears "moderate". Size is always relative, "David Ha said.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Top 20 US Stock Exchange Transactions: Microsoft Falls Over 6% After Results, Largest Single Day Drop in Two Years
- icrosoft는 2025년 10월 14일에 Windows 10 시스템에 대한 지원 서비스를 종료한다고 공식 발표했습니다.
- Nvidia's intraday market value surpasses Apple again, and the battle for the top spot in the US stock market is becoming increasingly fierce
- Private equity leader Jinglin's US stock holdings exposed: selling off Nvidia, Microsoft adding positions in Apple, Tencent Music, etc
- Xiaopeng Motors announces launch of chip upgrade crowdfunding for different car models: successful, immediately developed, failed, original refund
- How will 'Trump 2.0' affect Nvidia? Wall Street consensus: More is good!
- The delivery volume of Jike 7X model exceeds 20000
- Huang Renxun makes a major announcement! Nvidia and SoftBank collaborate, SoftBank accelerates AI layout
- The delivery volume of Jike 7X model exceeds 25000 units
- Wall Street raises Nvidia target price one after another: Blackwell craze is coming!
-
知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
- caffycat
- 12 小时前
- 支持
- 反对
- 回复
- 收藏
-
每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
- star8699
- 前天 19:48
- 支持
- 反对
- 回复
- 收藏
-
上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
- goodfriendboy
- 前天 20:09
- 支持
- 反对
- 回复
- 收藏
-
人民网北京11月22日电 (记者栗翘楚、任妍)2024广州车展,在新能源汽车占据“半壁江山”的同时,正加速向智能网联新能源汽车全面过渡,随着“端到端”成为新宠,智能驾驶解决方案成为本届广州车展各大车企竞 ...
- 3233340
- 6 小时前
- 支持
- 反对
- 回复
- 收藏