Google launches browser-navigating AI model amid agent race

TL;DR:

  • Google has released Gemini 2.5 Computer Use, an AI model that navigates web browsers through clicking, scrolling and typing
  • The model uses visual reasoning to analyse user requests and complete tasks like form submissions
  • Release follows OpenAI’s ChatGPT Agent announcement by one day, highlighting intensifying competition in AI agent capabilities

Google has unveiled Gemini 2.5 Computer Use, an AI model designed to navigate and interact with web browsers through visual understanding and reasoning capabilities. The model can analyse user requests and carry out tasks such as filling forms or navigating interfaces without requiring API access.

Context and Background

The model supports 13 actions including opening browsers, typing text, and drag-and-drop functionality. Developers can access Gemini 2.5 Computer Use through Google AI Studio and Vertex AI, with a demonstration environment available via Browserbase where users can observe the AI completing tasks like playing games or browsing forums.

Google reports the model “outperforms leading alternatives on multiple web and mobile benchmarks,” though the company acknowledges it currently focuses exclusively on browser-level control rather than full desktop environment access. Previous versions of this technology have powered features in Google’s AI Mode and Project Mariner research prototype.

Looking Forward

The announcement positions Google in direct competition with recent AI agent developments. OpenAI revealed new ChatGPT applications during its Dev Day just 24 hours earlier, whilst Anthropic launched its Claude AI model with similar “computer use” capabilities in October 2024.

The browser-focused approach represents a strategic choice, enabling AI to interact with interfaces designed for human use whilst avoiding the complexity and security considerations of full system-level control. This could prove particularly valuable for UI testing and accessing data unavailable through traditional APIs, though the technology’s current limitations suggest further development lies ahead.

Source Attribution:

Share this article