Dewi S.W. Gould

I have recently joined the new Arcadia Impact Alignment Project (in collaboration with the UK AISI alignment team). My research is motivated by understanding and reducing risks from increasingly capable AI systems.

Previously I…

Was an Astra Fellow at Redwood Research, working on general purpose elicitation and no-CoT time horizons.
Was a post-doc at the Alan Turing Institute on Project Bluebird, working on synthetic scenario generation for autonomous air-traffic control. I gave a Pint of Science talk about it.
Did a PhD in mathematical physics at the University of Oxford. My thesis was on Generalized Symmetries in String Theory Realizations of Quantum Field Theories. I wrote a blog post about my research.

news

May 11, 2026	Started new role as a researcher on the Arcadia Impact Alignment Project in London, UK .
May 01, 2026	Our paper “A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior” was accepted to ICML 2026 .
Apr 27, 2026	Our work “A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior” was presented as a poster at the ICLR 2026 Trustworthy-AI Workshop .

recent research

A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior

H. Mayne, J. S. Kang, Dewi S. W. Gould, and 3 more authors

In Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026
SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges

Dewi S. W. Gould, B. Mlodozeniec, and S. F. Brown

arXiv preprint, 2025
PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning

O. Bajgar, Dewi S. W. Gould, J. Liu, and 3 more authors

In Reinforcement Learning Conference, 2025

Earlier version at NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty