Advertisement

Google has released a new AI model called Gemini 2.5 Computer Use. The model allows AI agents to interact with websites and user interfaces the way a human would. It is now available in public preview via the Gemini API on Google AI Studio and Vertex AI.

Gemini 2.5 Computer Use

The model builds on Gemini 2.5 Pro’s visual understanding and reasoning capabilities. It can perform a wide range of browser-based actions such as clicking, typing, scrolling, hovering, opening dropdowns, and navigating through URLs. Google says the model outperforms competing tools on several benchmarks, including Online-Mind2Web, WebVoyager, and AndroidWorld, while maintaining lower latency.

Gemini 2.5 Computer Use

Unlike traditional AI models that rely on APIs, Gemini 2.5 Computer Use processes screenshots of web interfaces and generates specific UI actions in response. The agent receives a task prompt, a screenshot of the digital environment, and a history of recent actions. It then analyzes the interface and returns a UI action, such as clicking a button or typing into a field. The action is executed on the client side, and a new screenshot is sent back to the model to continue the task in a loop.

Gemini 2.5 Computer Use

Google demonstrated the model’s performance with examples that show the agent sorting sticky notes on a digital whiteboard and transferring pet details from one website to a CRM system. The demo videos are accelerated to show the process in real time.

The model supports 13 actions currently and works best with web browsers. Google said it is not yet optimized for desktop OS-level tasks, though it has shown potential on mobile benchmarks.

Google has also implemented safety measures to prevent misuse. Each action proposed by the model is reviewed by a safety service before execution. Developers can restrict certain actions or require explicit user confirmation for high-risk tasks like financial transactions.

Several internal Google teams are already using the model in production. It supports UI testing and automation tasks across platforms such as Search and Firebase. External developers in the early access program have used the model to build workflow automation and assistant tools.

Developers can start using the model through Google AI Studio or Vertex AI. Google also provides a demo environment via Browserbase for testing and experimentation.

For more daily updates, please visit our News Section.

Stay ahead in tech! Join our Telegram community and sign up for our daily newsletter of top stories! 💡

(Source)

Comments