Can it solve 99% of usage scenarios! "Microsoft and Nvidia are betting that small models and big models are no longer popular?

On the path of artificial intelligence development, tech giants once competed to develop large-scale language models, but now a new trend has emerged: Small Language Models (SLMs) are gradually emerging, challenging the previous notion of "bigger is better".
On August 21st local time, Microsoft and Nvidia successively released their latest small language models - Phi-3.5-mini-instruct and Mistral Nemo Minitron 8B. The main selling point of these two models is that they achieve a good balance between computing resource usage and functional performance. In some aspects, their performance can even rival that of large models.
Clem Delangue, CEO of Hugging Face, an artificial intelligence startup, pointed out that up to 99% of usage scenarios can be solved through SLM and predicted that 2024 will be the year of SLM. According to incomplete statistics, tech giants including Meta, Microsoft, and Google have released nine small models this year.
The cost of training large models is rising, but the performance improvement is limited
The rise of SLM is not accidental, but closely related to the challenges of performance improvement and resource consumption in large models (LLM).
The performance comparison released by AI startups Vellum and Hugging Face in April this year shows that the performance gap between LLMs is rapidly narrowing, especially in specific tasks such as multiple-choice questions, reasoning, and mathematical problems, where the differences between top-level models are minimal. For example, in multiple-choice questions, Claude 3 Opus, GPT-4, and Gemini Ultra all scored over 83%, while in reasoning tasks, Claude 3 Opus, GPT-4, and Gemini 1.5 Pro all scored over 92% accurately.
Gary Marcus, former head of Uber AI, pointed out that LLM's latest research papers all point in the same direction, with over a dozen LLMs in the same field as GPT-4. "Some of them have slightly better performance than GPT-4, but there hasn't been a qualitative leap. I think everyone would say GPT-4 is one step ahead of GPT-3.5, but there hasn't been any qualitative leap in the following year or so
Compared to the limited performance improvement, the training cost of LLM is constantly increasing. Training these models requires massive amounts of data and billions or even trillions of parameters, resulting in extremely high resource consumption. The computing power and energy consumption required to train and run LLM are staggering, making it difficult for small organizations or individuals to participate in core LLM development.
The International Energy Agency estimates that the electricity consumption related to data centers, cryptocurrencies, and artificial intelligence will be roughly equivalent to the total electricity consumption of Japan by 2026.
OpenAI CEO Sam Altman once stated at an event at MIT that the cost of training GPT-4 is at least $100 million, while Anthropic CEO Dario Amodei predicts that the cost of training models in the future could reach $100 billion.
In addition, the complexity of the tools and techniques required to use LLM also increases the learning curve for developers. The entire process from training to deployment takes a long time, which slows down the development speed. A study by the University of Cambridge suggests that companies may need 90 days or more to deploy a machine learning model.
Another major issue with LLM is the tendency to create "illusions" - where the output generated by the model appears reasonable but is actually incorrect. This is because the training method of LLM is to predict the next most likely word based on patterns in the data, rather than truly understanding the information. Therefore, LLM may confidently generate false statements, fabricate facts, or combine unrelated concepts in absurd ways. How to detect and reduce these 'illusions' is a continuous challenge in developing reliable and trustworthy language models.
Expanding parameters is not the only way to improve performance
The concerns about the huge energy demand for LLM and the market opportunities to provide more diversified AI options for enterprises have led technology companies to gradually shift their focus to SLM.
The Daily Economic News reporter noticed that both AI startups such as Arcee, Sakana AI, and Hugging Face, as well as tech giants, are investing in and serving customers through SLM and more cost-effective ways.
Previously, Google, Meta, OpenAI, and Anthropic have all released smaller language models that are more compact and flexible than their flagship LLM. This not only reduces the cost of development and deployment, but also provides cheaper solutions for commercial customers. Given that investors are increasingly concerned about the high costs and uncertain returns of AI companies, more technology companies may choose this path. Even Microsoft and Nvidia have now launched their own small models (SLM).
SLM is a streamlined version of LLM with fewer parameters and simpler design, requiring less data and training time - just a few minutes or hours. This makes SLM more efficient and easier to deploy on small devices. For example, they can be embedded into mobile phones without consuming supercomputing resources, thereby reducing costs and significantly improving response speed.
Microsoft pointed out in its report on small model technology that Phi-3.5-mini-instruct is a high-performance language model designed for local deployment on mobile phones.
Another major advantage of SLM is its specialization for specific applications. SLM focuses on specific tasks or domains, making them more efficient in practical applications. For example, in sentiment analysis, named entity recognition, or domain specific question answering, SLM often outperforms general models. This customization enables enterprises to create efficient models that meet their specific needs.
SLM is also less prone to "illusions" in specific domains, as they are typically trained on narrower and more targeted datasets, which helps the model learn the patterns and information most relevant to its task. The focus of SLM reduces the possibility of generating irrelevant, unexpected, or inconsistent outputs.
Despite its small scale, SLM's performance in certain aspects is not inferior to large models. Microsoft's latest Phi-3.5-mini-instruct only has 3.8 billion parameters, but its performance is much better than models such as Llama3.18B and Mistral 7B, which have much higher parameters. Aaron Mueller, a language model research expert at Northeastern University in the United States, pointed out that expanding the number of parameters is not the only way to improve model performance, and training with higher quality data can also produce similar effects.
OpenAI CEO Sam Altman stated at an event in April that he believes we are currently at the end of the era of giant models, and "we will improve their performance through other means
However, it should be noted that although the specialization of SLM is a major advantage, there are also limitations. These models may perform poorly outside of their specific training domains, lack a broad knowledge base, and are unable to generate relevant content on a wide range of topics compared to LLM. This limitation requires organizations to deploy multiple SLMs to cover different areas of demand, which may complicate AI infrastructure.
With the rapid development of the AI field, the standards for small models may continue to change. David Ha, co-founder and CEO of Sakana, a small model startup in Tokyo, said that what seemed like a huge AI model a few years ago now appears "moderate". Size is always relative, "David Ha said.

比特币“大户”惨遭香橼做空！微策略股价日内暴跌31%

文远知行：旗下自动驾驶环卫车与无人扫路机在新加坡投入运营

斗鱼第三季度实现营收10.63亿元

极氪陈奇：高阶智驾引领出行新潮流