Pika financing and Kwai online Why does Keling Apple's AI product "burn the cold stove"?
123458039
发表于 2024-6-11 19:29:04
1252
0
0
Apple Inc. (AAPL. US) launched an AI product called Apple Intelligence at the WWDC Developer Conference, but the stock price closed down 1.91% on the same day. Interestingly, on June 11th, the Sora index (8841756. WI) in Wind data increased by 1.55%.
Why is there such a difference?
Apple has chosen to avoid the current hot video models and has launched AI related updates that focus more on the text field. The rise of domestic concept stocks is closely related to the recent popularity of cultural and educational video models. Foreign companies such as Pika, a celebrity AI video generation company, have completed a new round of financing, with a total of 80 million US dollars in Series B financing. The company's valuation will exceed 470 million US dollars. In China, for example, Kwai (1024. HK) "Kering" video generation model was officially launched, adopting a technical route similar to Sora.
In the eyes of many industry insiders, Apple's focus on integrating AI text rather than video is more driven by considerations such as cost and practicality.
Apple avoids Sora's "battle zone"
The built-in large language model launched by Apple allows iPhone, iPad, and Mac to understand and generate language and images. Siri has semantic retrieval function by connecting to ChatGPT, which can intelligently search for photos, calendars, files, emails, and other content. It can also use most of ChatGPT's functions without registration.
Guo Minggui, an analyst at Tianfeng International Securities, posted a brief review stating that Apple's newly released Apple Intelligence suite demonstrates the advantages of ecological integration and interface design, which is very practical for users but only adds icing on the cake for investors. The latter is looking forward to seeing original and essential features.
Han Xu, Chief Researcher of Facewall Intelligence, told reporters that from the perspective of accessing operating systems, Apple mainly needs AI to understand human intentions and call system level interfaces. These requirements are not completely consistent with Sora's starting point, but are more compatible with the large model of multimodal input and text output. Models like Sora that generate images or videos are currently more suitable for integration with software, especially visual processing software.
Why didn't Apple join Sora's "battle"?
A person from an AIGC video application manufacturer told reporters that from a product thinking and business perspective, Apple will only focus on areas that are relatively mature and have a more significant input-output ratio to visibility. At the level of mobile hardware interaction, there are more scenarios for using text. From research and development investment to actual inference costs, this field is also relatively more cost-effective for Apple's current technological accumulation.
Another industry technician stated that today's LLM service (Large Language Model Service) has basically achieved breakeven in the field of text, but not necessarily in the field of text, graphics, and video. This is also an important reason why the Apple WWDC conference has not yet integrated video AIGC capabilities.
Compared to Apple's actions, the domestic big model market currently has high expectations for the video industry. In April this year, Professor Zhu Jun, vice president of the Artificial Intelligence Research Institute of Tsinghua University, co-founder and chief scientist of Student Digital Science and Technology, on behalf of Tsinghua University and Student Digital Science and Technology, released China's first video model Vidu. Not long ago, the video model "Kering" launched by Kwai also triggered some hot debate.
The reporter took Sora's representative video copy as the prompt word, input Kwai "Keling" to generate video contrast, take "Tokyo street girls walking" as an example, at that time Sora video had errors such as leg deformation, dislocation of leg crossing and transposition, and right leg walking in front twice in a row. Kwai "Kering" also has similar problems.
Tianfeng Securities believes that the improvement of Kwai 3D VAE+DiT architecture in computing power, model and data quality has shown that it can achieve commercial results. At the same time, the customization of time length and proportion has greatly enhanced the availability of generated materials. Although it is inferior to Sora in some complex semantic understanding, there is little difference in a slightly simple scenario.
Multimodal becomes an opportunity for China's big model race
An excellent video generation model needs to consider four core elements - model design, data assurance, computational efficiency, and the expansion of model capabilities.
Regarding the immaturity of Sora, OpenAI has stated that Sora may have difficulty accurately simulating the physical principles of complex scenes, may not understand causal relationships, may confuse spatial details of prompts, and may have difficulty accurately describing events that occur over time, such as following specific camera trajectories.
But this seems more like a common problem. Founder Wang Changhu of Aishi Technology previously stated that current video models directly learn physics knowledge from video data, but real videos often contain a lot of information, making it difficult to accurately learn each physical law separately. By adding 3D modeling information such as human hands and animal tails as constraints while inputting visual images to the model, it can assist in learning the large model and optimize the effect.
The Kelingda model adopts the native cultural and biological video technology route, replacing the combination of image generation and timing modules. At present, mainstream video generation models usually use 2D VAE with Stable Diffusion for spatial compression in hidden space encoding/decoding, but this poses significant information redundancy for videos. Therefore, the Kwai big model team has developed a 3D VAE network by itself, trying to find the balance between training performance and effect. In addition, in terms of temporal information modeling, the Kwai big model team has designed a 3D Attention mechanism as a spatio-temporal modeling module.
Tang Jiayu, CEO of Shengshu Technology, mentioned that research on multimodal large models is still in its early stages and the technological maturity is not yet high. This is different from the hot language models, as foreign countries have already taken the lead by an era. Therefore, compared to struggling with language models, Tang Jiayu believes that multimodality is an important opportunity for domestic teams to seize the big model track. This is similar to Zhou Zhifeng, a partner of Qiming Venture Capital, who also believes that today's big models have gradually moved from pure language mode to multimodal exploration.
Lin Yonghua, Vice President and Chief Engineer of Beijing Zhiyuan Artificial Intelligence Research Institute, told First Financial reporters that China has a certain possibility of overtaking on bends in the multimodal field, but the success factors of multimodal models still lie in computing power, algorithms, and data. At present, at the algorithmic level, there is not a significant difference between the Chinese and American teams, and the industry still has ways to solve computing power problems. However, obtaining massive high-quality data is still very difficult.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Xinye Technology Pakistan Business Platform Officially Launched
- Quantum Song continues to innovate the integrated service model of "online+offline" to promote the expansion and quality improvement of the silver economy
- Boeing has submitted a $25 billion financing plan
- Liu Qiangdong, Zhang Zetian and his wife have reported the malicious rumors on the internet
- Boeing reportedly considering a $15 billion financing plan to weather the crisis
- When will Apple AI be launched in China? Cook responds!
- Starting from tomorrow, the CSI A500 over-the-counter fund will be available for sale. The online section of JD Finance has opened for appointment
- Boeing reportedly plans to raise over $15 billion in financing as early as Monday
- Apple's intelligent overseas launch, domestic manufacturers bet on AI to compete
- JD Seven Fresh's' Super Breakthrough 'Ignites Consumer Trend, 72 Hour Transaction Users and Online Orders Increase by Three Digits YoY
-
随着“银十”结束,各家造车新势力都交出了一份亮眼的成绩单。 理想领跑10月新势力交付榜,鸿蒙智行重回4万辆,零跑、深蓝、极氪、小鹏等单月交付量均创新高,岚图、阿维塔、智己等实现破万,但哪吒却消失在 ...
- fanadam
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
2024年11月7日,由新华社新闻信息中心、新华社上海分社、新华社品牌工作办公室主办的“品牌·让世界更美好”中外品牌论坛在上海举办。此次论坛,理想汽车荣获“通用ESG企业评价规范”年度最佳品牌奖。理想汽车将 ...
- cool88817
- 15 分钟前
- 支持
- 反对
- 回复
- 收藏
-
何思文表示,“在进博会这个平台上,我们开启的是倾听模式,通过进博会展出各类产品,收集消费者的需求和反馈,进而帮助决定未来进口到中国的产品。过去,汽车行业的许多创新源于美国加州或欧洲。我相信,中国正 ...
- MaxLucky
- 6 小时前
- 支持
- 反对
- 回复
- 收藏
-
11月5日至10日,第七届中国国际进口博览会(下称“进博会”)在国家会展中心(上海)举办。在进博会期间,平安健康医疗科技有限公司(下称“平安健康”)与美敦力(上海)管理有限公司(下称“美敦力”)达成战 ...
- Hidden2
- 前天 17:06
- 支持
- 反对
- 回复
- 收藏