Pika financing and Kwai online Why does Keling Apple's AI product "burn the cold stove"?
123458039
发表于 2024-6-11 19:29:04
1257
0
0
Apple Inc. (AAPL. US) launched an AI product called Apple Intelligence at the WWDC Developer Conference, but the stock price closed down 1.91% on the same day. Interestingly, on June 11th, the Sora index (8841756. WI) in Wind data increased by 1.55%.
Why is there such a difference?
Apple has chosen to avoid the current hot video models and has launched AI related updates that focus more on the text field. The rise of domestic concept stocks is closely related to the recent popularity of cultural and educational video models. Foreign companies such as Pika, a celebrity AI video generation company, have completed a new round of financing, with a total of 80 million US dollars in Series B financing. The company's valuation will exceed 470 million US dollars. In China, for example, Kwai (1024. HK) "Kering" video generation model was officially launched, adopting a technical route similar to Sora.
In the eyes of many industry insiders, Apple's focus on integrating AI text rather than video is more driven by considerations such as cost and practicality.
Apple avoids Sora's "battle zone"
The built-in large language model launched by Apple allows iPhone, iPad, and Mac to understand and generate language and images. Siri has semantic retrieval function by connecting to ChatGPT, which can intelligently search for photos, calendars, files, emails, and other content. It can also use most of ChatGPT's functions without registration.
Guo Minggui, an analyst at Tianfeng International Securities, posted a brief review stating that Apple's newly released Apple Intelligence suite demonstrates the advantages of ecological integration and interface design, which is very practical for users but only adds icing on the cake for investors. The latter is looking forward to seeing original and essential features.
Han Xu, Chief Researcher of Facewall Intelligence, told reporters that from the perspective of accessing operating systems, Apple mainly needs AI to understand human intentions and call system level interfaces. These requirements are not completely consistent with Sora's starting point, but are more compatible with the large model of multimodal input and text output. Models like Sora that generate images or videos are currently more suitable for integration with software, especially visual processing software.
Why didn't Apple join Sora's "battle"?
A person from an AIGC video application manufacturer told reporters that from a product thinking and business perspective, Apple will only focus on areas that are relatively mature and have a more significant input-output ratio to visibility. At the level of mobile hardware interaction, there are more scenarios for using text. From research and development investment to actual inference costs, this field is also relatively more cost-effective for Apple's current technological accumulation.
Another industry technician stated that today's LLM service (Large Language Model Service) has basically achieved breakeven in the field of text, but not necessarily in the field of text, graphics, and video. This is also an important reason why the Apple WWDC conference has not yet integrated video AIGC capabilities.
Compared to Apple's actions, the domestic big model market currently has high expectations for the video industry. In April this year, Professor Zhu Jun, vice president of the Artificial Intelligence Research Institute of Tsinghua University, co-founder and chief scientist of Student Digital Science and Technology, on behalf of Tsinghua University and Student Digital Science and Technology, released China's first video model Vidu. Not long ago, the video model "Kering" launched by Kwai also triggered some hot debate.
The reporter took Sora's representative video copy as the prompt word, input Kwai "Keling" to generate video contrast, take "Tokyo street girls walking" as an example, at that time Sora video had errors such as leg deformation, dislocation of leg crossing and transposition, and right leg walking in front twice in a row. Kwai "Kering" also has similar problems.
Tianfeng Securities believes that the improvement of Kwai 3D VAE+DiT architecture in computing power, model and data quality has shown that it can achieve commercial results. At the same time, the customization of time length and proportion has greatly enhanced the availability of generated materials. Although it is inferior to Sora in some complex semantic understanding, there is little difference in a slightly simple scenario.
Multimodal becomes an opportunity for China's big model race
An excellent video generation model needs to consider four core elements - model design, data assurance, computational efficiency, and the expansion of model capabilities.
Regarding the immaturity of Sora, OpenAI has stated that Sora may have difficulty accurately simulating the physical principles of complex scenes, may not understand causal relationships, may confuse spatial details of prompts, and may have difficulty accurately describing events that occur over time, such as following specific camera trajectories.
But this seems more like a common problem. Founder Wang Changhu of Aishi Technology previously stated that current video models directly learn physics knowledge from video data, but real videos often contain a lot of information, making it difficult to accurately learn each physical law separately. By adding 3D modeling information such as human hands and animal tails as constraints while inputting visual images to the model, it can assist in learning the large model and optimize the effect.
The Kelingda model adopts the native cultural and biological video technology route, replacing the combination of image generation and timing modules. At present, mainstream video generation models usually use 2D VAE with Stable Diffusion for spatial compression in hidden space encoding/decoding, but this poses significant information redundancy for videos. Therefore, the Kwai big model team has developed a 3D VAE network by itself, trying to find the balance between training performance and effect. In addition, in terms of temporal information modeling, the Kwai big model team has designed a 3D Attention mechanism as a spatio-temporal modeling module.
Tang Jiayu, CEO of Shengshu Technology, mentioned that research on multimodal large models is still in its early stages and the technological maturity is not yet high. This is different from the hot language models, as foreign countries have already taken the lead by an era. Therefore, compared to struggling with language models, Tang Jiayu believes that multimodality is an important opportunity for domestic teams to seize the big model track. This is similar to Zhou Zhifeng, a partner of Qiming Venture Capital, who also believes that today's big models have gradually moved from pure language mode to multimodal exploration.
Lin Yonghua, Vice President and Chief Engineer of Beijing Zhiyuan Artificial Intelligence Research Institute, told First Financial reporters that China has a certain possibility of overtaking on bends in the multimodal field, but the success factors of multimodal models still lie in computing power, algorithms, and data. At present, at the algorithmic level, there is not a significant difference between the Chinese and American teams, and the industry still has ways to solve computing power problems. However, obtaining massive high-quality data is still very difficult.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Boeing reportedly considering a $15 billion financing plan to weather the crisis
- When will Apple AI be launched in China? Cook responds!
- Starting from tomorrow, the CSI A500 over-the-counter fund will be available for sale. The online section of JD Finance has opened for appointment
- Boeing reportedly plans to raise over $15 billion in financing as early as Monday
- Apple's intelligent overseas launch, domestic manufacturers bet on AI to compete
- JD Seven Fresh's' Super Breakthrough 'Ignites Consumer Trend, 72 Hour Transaction Users and Online Orders Increase by Three Digits YoY
- Dada Group's profitability continues to improve in the third quarter. During the "Double 11" period, the daily peak of online orders delivered in seconds reached a new high
- Zaiding Pharmaceutical plans to issue 7.84 million depositary shares, with an expected financing of no more than 230 million US dollars
- Alibaba launches financing plan: plans to issue priority unsecured dual currency notes
- Deep | Zaiding Pharmaceutical is addicted to financing, with accumulated losses exceeding $2 billion
-
知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
- caffycat
- 昨天 11:18
- 支持
- 反对
- 回复
- 收藏
-
每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
- star8699
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
- goodfriendboy
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
人民网北京11月22日电 (记者栗翘楚、任妍)2024广州车展,在新能源汽车占据“半壁江山”的同时,正加速向智能网联新能源汽车全面过渡,随着“端到端”成为新宠,智能驾驶解决方案成为本届广州车展各大车企竞 ...
- 3233340
- 昨天 17:06
- 支持
- 反对
- 回复
- 收藏