How Google is using its latest Gemini AI model to train robots “navigate the world
By
Isha Sharma
Google, at I/O 2024, showcased multimodal capabilities in the Gemini 1.5 AI model. This means that the large language model can take photos, videos and audio, along with text, as inputs, process that information and generate responses. The company’s AI unit is now leveraging this capability to train robots to navigate their surroundings.