Google’s I/O 2024 event has been a hotbed of AI announcements, with the tech giant unveiling several significant updates to its AI offerings. One of the most exciting revelations is the introduction of multimodal capabilities to Gemini Nano, Google’s on-device large language model (LLM).

Currently, Gemini Nano, a lightweight and small LLM model designed for on-device AI tasks, is only capable of processing textual inputs. However, with the addition of multimodal capabilities, Gemini Nano will be able to input and process audio, images, and files, in addition to text.

This advancement will enable Gemini Nano to gather contextual information from various sources, including sounds, images, and spoken language, significantly enhancing its capabilities and usefulness. For instance, users will be able to ask Gemini Nano to extract information from YouTube videos or interpret diagrams and graphs, unlocking a whole new realm of possibilities.

While Google has announced that multimodal capabilities will be rolled out to Gemini Nano starting with Pixel phones later this year, the specifics of which Pixel models will receive the update and the exact timeline are yet to be revealed.

In addition to the multimodal capabilities, Google has also unveiled several other AI-powered features coming to Android devices. One notable addition is the enhanced homework support in Circle to Search, which will enable students to solve word problems in subjects like math and physics by simply highlighting the problem on the screen and receiving step-by-step instructions for solving it.

Moreover, Google is working on a more convenient Gemini overlay interface for Android, which will simplify tasks like dropping generated images into apps and enable users to ask Gemini to extract information from YouTube videos or answer questions based on the content of PDF files (for Gemini Advanced subscribers).

Other upcoming AI-powered features include real-time scam detection during phone calls and multimodal support in TalkBack, which will help better describe images to the visually impaired, with or without a network connection.

As Google continues to push the boundaries of AI integration into its products and services, it’s clear that Pixel phones, and Android devices in general, are poised to become even smarter and more useful, thanks to the multimodal capabilities of Gemini Nano and other AI-driven advancements. Although, I personally think Google needs to take it easy with the “AI” talk. Android Authority’s Rita El Khoury highlighted that Google used the word Gemini 170 (including Gem/Gems) times during its keynote. Phew! So much so that the folks at Tech Crunch had to make things easier for everyone with a quick recap of the event:

Dwayne Cubbins
1305 Posts

My fascination with Android phones began the moment I got my hands on one. Since then, I've been on a journey to decode the ever-evolving tech landscape, fueled by a passion for both the "how" and the "why." Since 2018, I've been crafting content that empowers users and demystifies the tech world. From in-depth how-to guides that unlock your phone's potential to breaking news based on original research, I strive to make tech accessible and engaging.

Next article View Article

Pixel 8a gets AI boost as wallpaper generator arrives in first update

Google's latest mid-ranger, the Pixel 8a, is getting a taste of flagship features with its first software update. This ~190MB over-the-air (OTA) update brings not only...
May 15, 2024 1 Min Read