TL;DR: Google has released Gemini 3, achieving a breakthrough 1501 Elo score on LMArena and demonstrating PhD-level reasoning with 37.5% on Humanity’s Last Exam. The release includes Gemini 3 Deep Think mode for enhanced reasoning, new agentic development platform Google Antigravity, and immediate availability across Search, Gemini app, AI Studio, and Vertex AI. The model excels at multimodal understanding with 81% on MMMU-Pro and tops WebDev Arena with 1487 Elo.

Google has unveiled Gemini 3, marking what CEO Sundar Pichai describes as “a new era of intelligence” that combines advanced reasoning, multimodal understanding, and agentic capabilities. The model represents a significant leap from Gemini 2.5 Pro, which topped LMArena for over six months.

Gemini 3 Pro achieves state-of-the-art performance across major AI benchmarks, including 91.9% on GPQA Diamond for PhD-level reasoning and 23.4% on MathArena Apex for advanced mathematics. Beyond text, the model demonstrates exceptional multimodal reasoning with 81% on MMMU-Pro and 87.6% on Video-MMMU.

“It’s state-of-the-art in reasoning, built to grasp depth and nuance — whether it’s perceiving the subtle clues in a creative idea, or peeling apart the overlapping layers of a difficult problem,” writes Pichai. The model is designed to understand context and intent with less prompting, representing what Google characterizes as evolution “from simply reading text and images to reading the room.”

Deep Think Mode and Agentic Development

Gemini 3 Deep Think mode pushes reasoning capabilities further, achieving 41.0% on Humanity’s Last Exam (without tools) and an unprecedented 45.1% on ARC-AGI-2, demonstrating novel problem-solving abilities. The enhanced mode will initially be available to safety testers before rolling out to Google AI Ultra subscribers.

Google has introduced Antigravity, a new agentic development platform that transforms AI assistance from toolkit to active partner. Powered by Gemini 3’s advanced reasoning and tool use capabilities, Antigravity agents have direct access to editor, terminal, and browser—autonomously planning and executing complex end-to-end software tasks whilst validating their own code.

The platform integrates Gemini 2.5 Computer Use model for browser control and Nano Banana (Gemini 2.5 Image) for image editing, enabling agents to independently plan, code applications, and validate execution through browser-based computer use.

Superior Coding and Long-Horizon Planning

Gemini 3 demonstrates exceptional coding capabilities, topping WebDev Arena leaderboard with 1487 Elo and achieving 76.2% on SWE-bench Verified. The model scores 54.2% on Terminal-Bench 2.0, which tests tool use ability to operate computers via terminal.

For long-horizon planning, Gemini 3 Pro tops Vending-Bench 2, maintaining consistent tool usage and decision-making for a full simulated year of vending machine business operation. This improved planning capability enables more reliable multi-step workflows—from booking local services to organizing inboxes.

“This means Gemini 3 can better help you get things done in everyday life,” notes the announcement. Google AI Ultra subscribers can access these agentic capabilities through Gemini Agent in the Gemini app, with expansion to more Google products planned.

Immediate Availability and Safety Focus

Gemini 3 launches today across Google’s ecosystem, including the Gemini app, AI Mode in Search, AI Studio, Vertex AI, and Gemini CLI. Third-party platform integrations include Cursor, GitHub, JetBrains, Manus, and Replit.

The model has undergone Google’s most comprehensive safety evaluations to date, showing reduced sycophancy, increased resistance to prompt injections, and improved protection against cyberattacks. Google partnered with world-leading subject matter experts and provided early access to bodies like UK AISI for independent assessment.

With AI Overviews now reaching 2 billion users monthly and the Gemini app surpassing 650 million monthly users, the Gemini 3 release represents Google’s push toward AGI through what it describes as a “differentiated full stack approach to AI innovation.”


Source: Google Blog

Share this article