Can Sora make video generation more imaginative and help popularize Apple Vision Pro?

Last week, US technology stocks staged a "capital feast". Boosted by the release of the Sora model, Nvidia led the stock price of AI companies to soar, with the company's market value hitting the $2 trillion mark for the first time in history, and driving up the stock prices of companies such as Meta and Microsoft. Market insiders believe that AI is expected to further drive Nvidia's rise, and after joining the "$2 trillion" club, Nvidia's ability to surpass Apple's market value will no longer be unattainable.
In this wave of technology triggered by AI big models, Apple has not been able to get a share of the pie. Since its market value was surpassed by Microsoft last month, Apple's stock price has been continuously declining. The release of the company's heavyweight hybrid reality wearable product Vision Pro also failed to boost Apple's stock price performance. Since the official launch of Vision Pro nearly a month ago, Apple's stock price has fallen by more than 6%, with a current market value of $2.8 trillion, nearly $200 billion behind Microsoft.
At the beginning of the release of Vision Pro, the market had high expectations for the device. Apple CEO Cook announced the release of this device as the arrival of the era of spatial computing. He said, "Vision Pro is the most advanced consumer electronics device ever, and its revolutionary and magical user interface will redefine the way we connect, create, and explore."
After the release of the Sora model, users quickly converted the videos generated by Sora into 3D spatial videos that can be viewed by Apple Vision Pro. A tech blogger said, "Sora+Vision Pro means you can describe a world and exist in it."
From this perspective, the release of Sora has brought more content to Vision Pro. But experts emphasized to First Financial reporters that the videos created by Sora are still ordinary videos, not spatial videos based on spatial computing, so they cannot be directly created for Vision Pro yet.
"In theory, all videos can be converted into spatial videos. Sora does not understand spatial computing, and the videos it generates are also ordinary videos, so there is no direct correlation with Vision Pro," said the technical director of a 3D generative AI startup company to a reporter from First Financial.
He stated that he also used Apple's Vision Pro to make some 3D videos, but these videos are only for display and there is not yet a mature idea on how to develop Vision Pro's future application scenarios.
He told a reporter from First Financial News that 3D videos need to include spatial information, such as the position information of each pixel in space, also known as "depth". Ordinary videos can be converted into spatial videos with 3D effects that Vision Pro can watch by generating depth.
The technical leader mentioned above believes that Sora's video generation will not involve 3D generation in the short term, as there are essential differences between 3D and text, image, and video generation. The difficulty lies not in obtaining highly consistent continuous multi angle images, but in industrial standards. "Otherwise, photo modeling would have dominated model production work long ago," he said.
In the industrial sector, Apple is already seeking cooperation with 3D software companies. Philippe Laufer, Global Brand Executive Vice President of Dassault Systems, confirmed to First Financial reporters that he is currently collaborating with Apple to develop a new design experience for Vision Pro. "Dassault's commercial customers are also very needed by Apple," Laufer said.
Liu Yaodong, Vice President of Liade Group and CEO of Virtual Moving Point, told First Financial reporters that a controversial focus regarding the Sora model is whether it understands the physical world, including its understanding of spatial sense. Based on the information currently released, Sora does not yet possess this ability. "To put it bluntly, Sora is currently a bit like being able to generate a moving jigsaw puzzle world, but it is still two-dimensional," he said.
Zhang Hongjiang, former chairman of Beijing Zhiyuan Artificial Intelligence Research Institute, also told First Financial reporters that the Sora model and spatial computing are two different things and do not involve the concept of spatial computing.
Wu Fei, Director of the Institute of Artificial Intelligence at Zhejiang University, introduced from a technical perspective that Sora first maps text words and visual sub blocks to isomorphic low dimensional implicit spaces, introduces diffusion models in this low dimensional implicit space, iterates visual information repeatedly, and meticulously explores the correlation between text words, spatial sub blocks, and spatiotemporal sub blocks.
"This approach is like first projecting heterogeneous information such as text and vision into isomorphic space through 'car on track, book on text', and then iteratively using 'first destroy (add noise)' and 'then reconstruct (remove noise)' to understand the temporal and spatial relationships of various units in the video, thereby identifying and learning complex visual physical laws such as texture, motion, lighting, occlusion, and interaction." Wu Fei wrote in a popular science article.
However, some industry insiders believe that the combination of the Sora model and spatial computing may open up deeper methods for simulating and understanding the physical world in the future, accelerating the realization of the so-called "metaverse".
Research firm Canalys analyst Liu Jiansen told First Financial reporters, "It can be said that generative AI will be helpful in building virtual worlds similar to the metaverse, so Sora will to some extent promote the ecological application of Vision Pro. However, the initial users of Vision Pro are likely to be industry developers, and popularizing it to individual consumers still requires a process."
Canalys previously predicted that the Vision Pro may face shortages within one year after its launch, and production in five years may increase to 12.6 million units, accounting for approximately 1% of the current iPhone installation; At that time, the number of Vision Pro users is expected to reach 20 million, accounting for 15% of MacBook's installed capacity.
Although there have been recent online rumors that the first batch of Vision Pro users have returned their products, Liu Jiansen maintains the original expectations of First Financial and the organization's demand for Vision Pro.

比特币“大户”惨遭香橼做空！微策略股价日内暴跌31%

文远知行：旗下自动驾驶环卫车与无人扫路机在新加坡投入运营

斗鱼第三季度实现营收10.63亿元

极氪陈奇：高阶智驾引领出行新潮流