Google’s I/O 2024 event has been a hotbed of AI announcements, with the tech giant unveiling several significant updates to its AI offerings. One of the most exciting revelations is the introduction of multimodal capabilities to Gemini Nano, Google’s on-device large language model (LLM).
Currently, Gemini Nano, a lightweight and small LLM model designed for on-device AI tasks, is only capable of processing textual inputs. However, with the addition of multimodal capabilities, Gemini Nano will be able to input and process audio, images, and files, in addition to text.
This advancement will enable Gemini Nano to gather contextual information from various sources, including sounds, images, and spoken language, significantly enhancing its capabilities and usefulness. For instance, users will be able to ask Gemini Nano to extract information from YouTube videos or interpret diagrams and graphs, unlocking a whole new realm of possibilities.
Coming to Pixel later this year, we’ll be introducing our latest model, Gemini Nano with Multimodality.
This means your phone will not just be able to process text input but also understand more information in context like sights, sounds and spoken language. #GoogleIO pic.twitter.com/1yTujAl1W7
— Made by Google (@madebygoogle) May 14, 2024
While Google has announced that multimodal capabilities will be rolled out to Gemini Nano starting with Pixel phones later this year, the specifics of which Pixel models will receive the update and the exact timeline are yet to be revealed.
In addition to the multimodal capabilities, Google has also unveiled several other AI-powered features coming to Android devices. One notable addition is the enhanced homework support in Circle to Search, which will enable students to solve word problems in subjects like math and physics by simply highlighting the problem on the screen and receiving step-by-step instructions for solving it.
Circle to Search can now help with homework—directly from your Pixel phone or tablet.
When you circle the exact part of a prompt you're stuck on, you'll get step-by-step guidance to solve physics word problems without leaving your digital info sheet or syllabus. #GoogleIO pic.twitter.com/Fsmtcu7emn
— Made by Google (@madebygoogle) May 14, 2024
Moreover, Google is working on a more convenient Gemini overlay interface for Android, which will simplify tasks like dropping generated images into apps and enable users to ask Gemini to extract information from YouTube videos or answer questions based on the content of PDF files (for Gemini Advanced subscribers).
Other upcoming AI-powered features include real-time scam detection during phone calls and multimodal support in TalkBack, which will help better describe images to the visually impaired, with or without a network connection.
Gemini Nano’s multimodal capabilities are coming to TalkBack later this year.
People who experience blindness or low vision will get richer & clearer details of what’s happening in an image—whether it’s about a photo in a text or style of clothes when shopping online. #GoogleIO pic.twitter.com/JIs2DYhkg4
— Made by Google (@madebygoogle) May 14, 2024
As Google continues to push the boundaries of AI integration into its products and services, it’s clear that Pixel phones, and Android devices in general, are poised to become even smarter and more useful, thanks to the multimodal capabilities of Gemini Nano and other AI-driven advancements. Although, I personally think Google needs to take it easy with the “AI” talk. Android Authority’s Rita El Khoury highlighted that Google used the word Gemini 170 (including Gem/Gems) times during its keynote. Phew! So much so that the folks at Tech Crunch had to make things easier for everyone with a quick recap of the event:
In case you missed today's #GoogleIO keynote presentation, we summed it up for you pic.twitter.com/TdMDTSmc88
— TechCrunch (@TechCrunch) May 14, 2024