Google, a division of Alphabet Inc., is releasing an enhanced AI model. Dubbed Gemini 1.5 Pro, this sophisticated model represents a major advancement in the field of generative AI as it is built to handle higher amounts of text and video than its rivals.
Gemini 1.5 Pro is slated for distribution to cloud users and developers on Thursday. This is part of Google’s continuous endeavor to showcase its skills in the quickly developing field of artificial intelligence.
Google Vice President and Gemini Co-Tech Lead Oriol Vinyals emphasized the importance of providing the research that serves as the foundation for the new approach. Vinyals told reporters at a briefing, “We’re excited to see what the world will make of the new capabilities tomorrow.”
Google states that the mid-size Gemini 1.5 Pro model has performance levels comparable to the bigger Gemini 1.0 Ultra model. The announcement comes after OpenAI’s ChatGPT triumph in late 2022, which led Google to establish itself as a leading innovator in cutting-edge generative AI technology that can produce text, graphics, and videos in response to human input.
When Google first debuted Gemini in December, it came in three versions designed for different kinds of work and compatible with a variety of devices, from smartphones to massive data centers. In reaction to the developments of OpenAI and Microsoft Corp., Google aims to draw consumers in with even more potent tools.
Gemini 1.5 Pro sets itself apart with its capacity to process large volumes of data in response to user commands and its ability to facilitate quicker and more effective training. As the “longest context window” among large-scale AI models, Google boasts that Gemini 1.5 Pro can process up to an hour’s worth of video, 11 hours of audio, or more than 700,000 words in a document. Google claims that this is more capable of digesting data than the most recent AI models from OpenAI and Anthropic.
Google demonstrated the features of Gemini 1.5 Pro in a pre-recorded video. Once, after consuming a 402-page PDF transcript of the Apollo 11 moon landing, the AI model was able to identify phrases that highlighted “three funny moments.” In a different experiment, the model was given a preliminary sketch and asked to locate a certain scene from a 44-minute Buster Keaton film.
Google notes that even with its sophisticated features, Gemini 1.5 Pro is not perfect, just like any other generative model. The AI model could have flaws including hallucinations, sporadic slowness, and difficulties deciphering user intent, therefore different questioning strategies are needed to get accurate answers. Vinyals stressed that efforts are still being made to optimize the model’s performance, and it is now in an experimental and research stage.
While select cloud clients can access the model in private preview on the enterprise platform Vertex AI, developers can examine Gemini 1.5 Pro in Google’s AI Studio. Furthermore, Google revealed that a larger worldwide client base would now have access to their large-scale Gemini 1.0 Ultra on Vertex AI.