A fresh move or all? Google Developer Conference launches 22 consecutive moves to counter OpenAI
胡胡胡美丽_ss
发表于 2024-5-15 20:20:13
220
0
0
Faced with the sudden release of precise "sniping" by OpenAI, Google introduced 22 new features/products in a row at its annual I/O developer conference in the early morning of May 15th Beijing time, intending to use the "multi-point flowering" tactic to grab the attention of users who have been taken away from OpenAI.
Compared to May 14th, OpenAI showcased the stunning interactive capabilities brought by GPT-4o in a 26 minute online live broadcast. The live speech at the Google Developer Conference lasted for 1 hour and 52 minutes, with product line leaders taking turns to showcase Google's capabilities in intelligent assistants, video generation, image generation, music creation, AI search, and more. There were as many as 22 new features and upgrades related to the event.
A reporter from New Beijing News and Shell Finance browsed through the entire press conference and found that Google has launched many impressive new features and concepts, such as Project Astra, an intelligent assistant that helps owners answer questions through mobile phone cameras or AR glasses; Veo, a video model benchmarking Sora; New AI search methods such as ask Photos feature and direct integration of Gemini into Android underlying architecture.
However, as a veteran search engine and the previous AI leader, Google has not forgotten its "original intention" of doing search. Liz Reid, the head of Google's search business, demonstrated a series of new features combining search and AI on site and left a sentence "just ask". "Google can help you search, investigate, plan, brainstorm... all you need to do is ask."
The AI intelligent assistant Astra can solve problems and find things through the camera, but it is for video demonstrations
At the press conference, Demis Hassabis, co-founder and CEO of DeepMind, presented a video. In the video, a tester holding a mobile phone or wearing VR glasses "looks" at the surrounding scenery while asking Google AI Assistant questions, such as "Tell me when you see something that can make a sound." The intelligent assistant Project Astra, equipped with the large model Gemini, answers fluently, such as "This is a speaker." The tester directly drew a red arrow on the black speaker of the speaker on the screen: "What is this called?" "High frequency speaker."
In this presentation, Google AI Assistant's performance is comparable to that of a human expert. Even when the user looks out of the window, the intelligent assistant immediately gives the user's detailed address: "This is obviously the King's Cross Road area in London." At the same time, it can also understand painting and images, such as giving advice on a system flowchart written on a whiteboard. "Adding cache between servers and databases can improve speed.".
Demis stated that Project Astra is the prototype of his AI assistant, which he has been looking forward to for decades, and the future of general AI. "AI personal assistants can process information faster by continuously encoding video frames, combining video and voice inputs into event timelines, and caching this information for effective recall."
Google CEO Sundar Pichai has stated that Google plans to add Astra features to its Gemini applications and products starting this year. However, he also emphasized that although the ultimate goal is to "achieve seamless connectivity in Astra's software," the product will be cautiously launched and "the commercialization path will be driven by quality.".
However, Astra seems to have not reflected the GPT-4o's ability to understand user emotions as demonstrated the previous day, and OpenAI's live broadcast was a live demonstration, while Astra's functionality is only reflected in the video. Of course, Demis firmly stated that the demonstration video has not been forged or tampered with.
Pichai stated that Project Astra's multimedia chat feature will appear on Gemini chatbots later this year.
Launch Gemini 1.5 Pro large model with long text doubling from 1 million tokens to 2 million tokens
Behind Google Smart Assistant, the Google big model Gemini has also been upgraded. At this developer conference, Pichai announced a major update regarding the Gemini 1.5 Pro. Firstly, Google has increased the context length of Gemini 1.5 Pro from the original 1 million tokens (statement units) to 2 million tokens. This upgrade will greatly enhance its data processing capabilities, making the model more adept at handling more complex and massive data.
The upgraded Gemini 1.5 Pro has achieved significant improvements in multiple public benchmark tests, particularly in image and video understanding, demonstrating advanced performance. This model can not only understand the text content, but also accurately interpret the information in images and videos.
It is understood that Gemini 1.5 Pro can infer video images and audio uploaded in Google AI Studio. In addition, Google has integrated 1.5 Pro into Google products, such as Gemini Advanced and Workspace applications. In terms of fees, the Gemini 1.5 Pro charges $3.5 per 1 million tokens.
Google has also launched the Gemini 1.5 Flash, which has been optimized for speed and efficiency. This is the Gemini series model that can provide the fastest API (interface) speed. It is optimized for large-scale, large-scale, and high-frequency tasks, providing more cost-effective services, and has a long text window of 1 million tokens.
Google announced that Gemini 1.5 Pro will be open to developers worldwide. This means that both professional developers and amateur enthusiasts can have a deeper understanding and use of this powerful model.
Wensheng everything? Show off muscles comprehensively in the fields of video, pictures, and music
In addition to benchmarking against the new intelligent assistant feature launched by OpenAI the day before, Google also showcased a series of AI generated big models, including the Veno, a cultural and video model benchmarking against Sora, the Music AI Sandbox, an AI music creation tool benchmarking against Suno, and Google's highest quality cultural and graphic model, Imagen 3.
Among them, the most anticipated one was Google's Wensheng Video Model, and when Demis showed off the Veo icon, the audience erupted with the most enthusiastic applause.
Demis introduced that Veo is the culmination of technology in the field of video generation, including various technologies developed by Google for generating query networks over the years. With just one text, image, or video prompt, Veo can generate and edit high-quality 1080p videos of different visual styles for over 70 seconds, and the video length can be extended arbitrarily.
The Veo generated video displayed by Google at the press conference is a set of shots of a car driving from a cyberpunk style night to a modern realistic style day. The video is relatively blurry in the dark part and clear enough in the day part, with high quality. However, a reporter from Shell Finance noticed that most of the time the video was focused on the rear of the car following the camera, and the performance quality of the video was relatively less refined than Sora, with more shots from different angles.
According to the promotional video, the film director also used Veo. "Veo helps us turn inspiration into reality." The film director said, "Artificial intelligence can help us quickly identify errors in our ideas and correct them, improving efficiency." Google stated that with a deep understanding of natural language and visual semantics, the Veo model has made breakthroughs in understanding video content, rendering high-definition images, simulating physical principles, and other aspects. The videos generated by Veo can accurately and meticulously express the user's creative intention.
Starting from May 15th, Google will provide a preview version of Veo for some creators in VideoFX, and creators can join Google's waiting list. Google will also introduce some features of Veo into products such as YouTube short videos.
It is worth noting that in response to the previous news that OpenAI relies on YouTube video content to train the Sora model (Google is the parent company of YouTube), Pichai stated that if Google confirms the authenticity of this news, it will "solve this problem".
"All you need to do is ask"
Pichai mentioned in his speech that one of the most exciting changes brought about by Gemini is in Google search. "One of our biggest investment and innovation areas is our founding product - search." Pichai recalls that Google created search 25 years ago, and now in the Gemini era, search has also reached a new level.
Pichai demonstrated a new feature called "Ask Photos" on site. When users pay in the parking lot but forget their license plate number, they may search for keywords in their phone photos and browse through a large number of past photos to find the license plate. But now, Google Album is smart enough to determine which car is the expected one based on its location, the number of times it has appeared in photos over the years, and other data, and to return the actual license plate number and verify its image in text replies.
Another new feature is AI Overview, which presents users with complete answers including viewpoints, insights, and links compared to traditional search engine results. Users can input questions in the search box to obtain an AI summary answer, and can handle ultra long questions.
If users want to find a suitable yoga or Pilates studio, they need to consider factors such as time, price, and distance simultaneously. AI search can help users extract and integrate information and present it in the AI search overview, ultimately displaying discount details for the best yoga studio in Boston, walking time from home, and saving users several hours of time. This function is also applicable to travel, gathering and other planning, as well as the formulation of dining plans.
Pichai said that Google's AI search overview has three unique advantages: real-time information, ranking and quality system, and Gemini model capabilities. The AI overview function will be gradually opened to users in the United States and various countries.
In addition, Google will soon launch a video search feature. Rose Yao, Vice President of Search Products, demonstrated on site the method of shooting a broken phonograph through a mobile phone camera and then asking Google questions. She received responses on where the phonograph was broken and how to repair it.
It is worth noting that as the developer of the Android system, Google has stated its intention to do "system level AI", which means using Gemini at the bottom of the Android system. When Gemini runs at the system level, users will not need to install any AI applications, but will directly enjoy related functions in the mobile operating system.
For example, when a user is watching a video, their phone can pop up a prompt asking if they want to know more about the video. When the user asks about the details in the video, Gemini can directly find the answer from the video.
Google specifically emphasizes that these experiences are only available on Android phones, seeming to be in direct opposition to OpenAI's use of Apple phones and computers for demonstrations. The "immortal fight" between Google and OpenAI will continue to be fought on the operating system side.
However, Pichai also stated in a post meeting interview that Google does not rule out maintaining a partnership with Apple. "We have always been committed to providing an excellent experience for the Apple ecosystem, and I believe we have many ways to ensure that our products are accessible. Today, we see that AI overview has become a popular feature on iOS, so we will continue to work hard."
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Robin Lee's first speech at Baidu Create AI developer conference on April 16 will bring three development artifacts
- One quarter of Baidu's code is written by AI programmers, and now individual developers can use Baidu Comate for free
- Outlook for Google I/O Developer Conference: Faced with a pincer battle between OpenAI and Microsoft, imminent
- Tesla's latest technology will debut at the World Artificial Intelligence Conference
- 2024 World Artificial Intelligence Conference | Robin Lee on AI applications: the most promising agent
- Trump Bitcoin Conference Fundraising Event 'Price List' Exposed, Group Photo Starting at $60000
- Only for paid developers! The debut of "Apple Intelligence" features comprehensive upgrades such as Siri, but ChatGPT has not yet been integrated. This time, Apple has "abandoned" Nvidia
- Apple responds to EU regulations, EU developers will be able to promote products independently
- Financial OneConnect makes its debut at the 2024 East Asia Insurance Conference
- Yixian Group presents five major research achievements at IFSCC conference
-
知名做空机构香橼研究(Citron Research)周四(11月21日)在社交媒体平台X上发布消息称,该公司已决定做空“比特币大户”微策略(Microstrategy)这家公司,并认为该公司已经将自己变身成为一家比特币投资基金 ...
- caffycat
- 昨天 11:18
- 支持
- 反对
- 回复
- 收藏
-
每经AI快讯,11月20日,文远知行宣布旗下自动驾驶环卫车S6与无人扫路机S1分别在新加坡滨海湾海岸大道与滨海艺术中心正式投入运营。据介绍,这是新加坡首个商业化运营的自动驾驶环卫项目。 ...
- star8699
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
上证报中国证券网讯(记者王子霖)11月20日,斗鱼发布2024年第三季度未经审计的财务报告。本季度斗鱼依托丰富的游戏内容生态,充分发挥主播资源和新业务潜力,持续为用户提供高质量的直播内容及游戏服务,进一步 ...
- goodfriendboy
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
人民网北京11月22日电 (记者栗翘楚、任妍)2024广州车展,在新能源汽车占据“半壁江山”的同时,正加速向智能网联新能源汽车全面过渡,随着“端到端”成为新宠,智能驾驶解决方案成为本届广州车展各大车企竞 ...
- 3233340
- 昨天 17:06
- 支持
- 反对
- 回复
- 收藏