OpenAI introduces GDPval: new evaluation for real-world AI performance
OpenAI has launched GDPval, a comprehensive evaluation framework that measures AI model performance on economically valuable, real-world tasks across 44 occupations. The evaluation demonstrates that today’s frontier models are already approaching the quality of work produced by industry experts on professional tasks.
Context and Background
GDPval represents a significant departure from traditional AI benchmarks by focusing on actual work deliverables rather than synthetic academic tests. The evaluation encompasses 1,320 specialised tasks across nine industries that contribute most to U.S. GDP, including software development, legal work, nursing, and mechanical engineering.
Each task was crafted and vetted by experienced professionals with over 14 years of average experience in their fields. Unlike simple text prompts, GDPval tasks include reference files and context, with expected deliverables spanning documents, slides, diagrams, spreadsheets, and multimedia—mirroring real workplace scenarios.
The evaluation reveals striking performance improvements, with frontier models completing tasks roughly 100 times faster and cheaper than industry experts. Claude Opus 4.1 emerged as the strongest performer, excelling particularly in aesthetics such as document formatting and slide layout, whilst GPT-5 demonstrated superior accuracy in domain-specific knowledge tasks.
Looking Forward
The results suggest that AI models can already handle routine, well-specified professional tasks, potentially freeing human workers to focus on creative and judgement-heavy aspects of their roles. OpenAI plans to expand GDPval to include more occupations, industries, and interactive workflows that better reflect real-world complexity.
However, current limitations include one-shot evaluations that don’t capture iterative processes common in professional work, such as revising documents after client feedback or navigating ambiguous requirements before determining the appropriate solution approach.
Source Attribution:
- Source: OpenAI
- Original: https://openai.com/index/gdpval/
- Published: 29 September 2025