We’re proud to highlight Google Research’s contributions to improving Clear Calling, the background noise reduction feature on Pixel, which can now handle full-band audio & is powered by an audio-to-audio ML model that was optimized to run at low latency on Google Tensor. pic.twitter.com/adCJzfQUrM
— Google AI (@GoogleAI) December 14, 2023
Clear Calling, one of the exclusive features for modern Pixel phones, just got better this month thanks to the collaboration of Google’s Research team, enabling full-band audio handling thanks to a new audio-to-audio ML model.
Google’s Research team new audio-to-audio ML model now powering Pixel devices ‘Clear Calling’
Clear Calling is a feature that is available for the Pixel 7 series onwards (including the Pixel Fold) and offers noise cancellation during calls in noisy environments to make voices much clearer.
From the beginning, Clear Calling has been powered by an ML (machine learning) model. However, this month, Google announced a big improvement from working with its Research team, integrating a new audio-to-audio ML model optimized for full-band audio processing. This means that noise cancellation should be even more effective during calls.
That said, the official post does not make it clear if the improved Clear Calling for Pixel phones has already being implemented, or if it is ready to roll out in an upcoming update. The latest December feature drop patch did not include any mention of improved Clear Calling in the changelog, so we may have to wait a bit to find out how it behaves on a day-to-day basis.
Google working on Pixel Aligned Language Models
In addition to the above, Google is working on Pixel Aligned Language Models that will help improve the image understanding capabilities of its AI-powered services, focusing on the detection of locations at pixel level to determine multiple possible outputs with respect to an object or region in the image.
Although it may sound relatively complex (and its operation surely is), the company published a simple example to better understand what they are looking for. In the short video below, they show a picture of a cat touching a TV screen with its paw:
Google announces Pixel Aligned Language Models
— AK (@_akhaliq) December 15, 2023
paper page: https://t.co/mA8pGX6Yp2
Large language models have achieved great success in recent years, so as their variants in vision. Existing vision-language models can describe images in natural languages, answer visual-related… pic.twitter.com/LUJpseQJuD
In this case, the model they are working on is capable of determining more than one situation on the same image, increasing its ‘understanding’ capabilities. For example, it detected that there is both a cat raising its paw to touch something, and that there is a dog on a TV screen.
There is still no specific information about the integration of this model into Gemini Pro or Gemini Ultra. However, these types of continuous developments demonstrate the company’s current focus on fully exploring all the possibilities that AI models offer to improve their products.