About me
I co-lead the Superalignment Team at OpenAI, where I’ve been involved in the development of InstructGPT, ChatGPT, and the alignment of GPT-4. I developed OpenAI’s approach to alignment research and co-authored the Superalignment Team’s research roadmap. Prior to OpenAI, I was an alignment researcher at DeepMind where I prototyped reinforcement learning from human feedback. I hold a PhD in Reinforcement Learning theory from the Australian National University. In 2023 TIME magazine listed me as one of the 100 most influential people in AI.
My Research
My research aims to solve
the hard problem of alignment:
How can we train AI systems to follow human intent on tasks that are difficult for humans to evaluate directly?
My team at OpenAI is researching how to align an automated alignment researcher. We work on scalable oversight, easy-to-hard generalization, automated interpretability, model organisms, among other projects.
Read more:
Selected Publications
- Language models can explain neurons in language models
Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders. 2023.
- Self-critiquing models for assisting human evaluators
William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang, Jonathan Ward, Jan Leike. 2022.
- Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe. Neural Information Processing Systems, 2022.
- Scalable agent alignment via reward modeling: a research direction
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. 2018.
- Deep Reinforcement Learning from Human Preferences
Paul F Christiano, Jan Leike, Tom B Brown, Miljan Martic, Shane Legg, and Dario Amodei.
Neural Information Processing Systems, 2017.
- Nonparametric General Reinforcement Learning.
Jan Leike.
PhD Thesis, supervised by Marcus Hutter, 2016.
For more details, see my publication list and my Google Scholar.