TL;DR
The UK AI Safety Institute will present ten research papers at NeurIPS 2025, covering topics from model safeguard robustness to emerging loss-of-control risks. The institute is also hosting a full-day workshop on AI evaluation practices and participating in an Agents Safety Panel.
Comprehensive AI Safety Research Agenda
The UK AI Safety Institute (AISI) is showcasing a substantial body of work at the 39th Annual Conference on Neural Information Processing Systems in San Diego. The research spans critical areas of AI security, grounded in rigorous empirical evidence.
Strengthening Model Safeguards
AISI’s Red Team has been stress-testing safeguards that prevent AI model misuse. Key findings include:
- Research demonstrating limitations in defending against malicious fine-tuning of language models through public APIs
- Collaborative work with Eleuther AI showing that filtering harmful data during training is ten times more effective at resisting adversarial fine-tuning than post-training defences
Improving AI Evaluation Standards
A comprehensive review of 445 language model benchmarks revealed that popular benchmarks often fail to reliably measure their intended phenomena. The paper contains eight key recommendations to address this problem. AISI also contributed to the Agentic Benchmark Checklist (ABC), establishing best practices for building rigorous agentic benchmarks.
Understanding Emerging Risks
The institute has developed RepliBench, a dedicated benchmark tracking capabilities required for AI model self-replication, including resource acquisition and model weight exfiltration. The benchmark contains 20 agent evaluations comprising 65 tasks exploring conditions under which replication-relevant behaviours might emerge.
Additionally, AISI partnered with Gray Swan AI to conduct the largest public red-teaming competition to date, where 2,000 participants elicited over 60,000 policy violations across more than 40 realistic scenarios.
Looking Forward
This research represents critical groundwork for ensuring AI systems remain secure and controllable as capabilities advance. For UK businesses exploring AI adoption, understanding these safety considerations will become increasingly important for responsible implementation.
Source: UK AI Safety Institute