Advertisement

Google announced voice and video search capabilities for Google Lens in May of this year at I/O 2024. Now the company is implementing the features in Google Lens, where you can long-press and ask directly with your voice. It makes searching much more easy and continent.

The video search feature in Lens uses a custom Gemini model

Google is currently rolling out this feature in Search Labs on Android and iOS. However, as of now, the voice search feature in Lens is only available for English queries.

It surfaces an AI Overview and search results based on the video’s contents and your question. In the preview video released at the I/O event, Google showed someone curious about the fish he is seeing at an aquarium can hold up his phone to the exhibit, open the Google Lens app, and then long-press the shutter button.

When Lens starts recording, users can ask questions based on what they are seeing. In response to the question “Why are they swimming together?” Lens responded using Google Gemini.

[beehiiv_newsletter]

The ability to search with a video allows you to show your phone how objects are moving and ask questions about it, which makes Google Lens much more useful for certain scenarios. You can use the feature by joining the “AI Overviews and more” experiment in Search Labs.

For those curious about how the feature works, Rajan Patel, the vice president of engineering at Google, revealed that Google is capturing the video “as a series of image frames and then applying the same computer vision techniques” that Lens already uses. However, Google also revealed that the responses are coming from a custom Gemini model that the company has designed to understand multiple frames in sequence. After processing the frames, the model uses information from the web related to the topic to generate the response.

In summary, this is a good use of the already existing technologies that results in a valuable addition to Google Lens.

(Via)

Comments