My research aims at helping to make machine learning robust and beneficial; I work on safety and alignment of reinforcement learning agents. My current research can be motivated by the following question:

*How can we design competitive and scalable machine learning algorithms
that make sequential decisions in the absence of a reward function?*

My current research direction centers around Recursive Reward Modeling, a scalable technique for training RL agents from human feedback that involves breaking the evaluation of individual tasks down recursively until they can be solved directly with reward modeling.

My publication list on Google Scholar

Since joining DeepMind in 2016, I have been working on empirical research related to learning reward functions for deep reinforcement learning.

**Quantifying Differences in Reward Functions**

Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, and Jan Leike. International Conference on Learning Representations, 2021.**Spotlight.****Pitfalls of Learning a Reward Function Online**

Stuart Armstrong, Jan Leike, Laurent Orseau, and Shane Legg. International Joint Conference on Artificial Intelligence, 2021.**Learning Human Objectives by Evaluating Hypothetical Behavior**

Siddharth Reddy, Anca D Dragan, Sergey Levine, Shane Legg, and Jan Leike. International Conference on Machine Learning, 2020. Blog post.**Scalable agent alignment via reward modeling: a research direction**

Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. 2018. Blog post. Video.**Reward learning from human preferences and demonstrations in Atari**

Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, and Dario Amodei. Neural Information Processing Systems, 2018.**Learning to understand goal specifications by modelling reward**

Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Pushmeet Kohli, and Edward Grefenstette. International Conference on Learning Representations, 2019.**AI Safety Gridworlds**

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. 2017. Blog post. Video.**Deep Reinforcement Learning from Human Preferences**

Paul F Christiano, Jan Leike, Tom B Brown, Miljan Martic, Shane Legg, and Dario Amodei. Neural Information Processing Systems, 2017. Blog post. Video.

I wrote my thesis on general reinforcement learning, reinforcement learning in non-ergodic, partially observable environments. My most interesting results are that Bayesian reinforcement learning agents can misbehave drastically if given a bad prior, that Thompson sampling learns to act optimally in any environment, and a formal solution to an open problem in game theory. If this interests you, take a look at my short introduction to general reinforcement learning.

**Nonparametric General Reinforcement Learning**.

Jan Leike. PhD Thesis, 2016.**Exploration Potential**.

Jan Leike. European Workshop on Reinforcement Learning, 2016.**A Formal Solution to the Grain of Truth Problem**.

Jan Leike, Jessica Taylor, and Benya Fallenstein. Uncertainty in Artificial Intelligence, 2016.**Thompson Sampling is Asymptotically Optimal in General Environments**.

Jan Leike, Tor Lattimore, Laurent Orseau, and Marcus Hutter. Uncertainty in Artificial Intelligence, 2016.**Best student paper award**.**Loss Bounds and Time Complexity for Speed Priors**.

Daniel Filan, Jan Leike, and Marcus Hutter. AI & Statistics, 2016.**On the Computability of Solomonoff Induction and Knowledge-Seeking**.

Jan Leike and Marcus Hutter. Algorithmic Learning Theory, 2015.**Solomonoff Induction Violates Nicod’s Criterion**.

Jan Leike and Marcus Hutter. Algorithmic Learning Theory, 2015.**Sequential Extensions of Causal and Evidential Decision Theory**.

Tom Everitt, Jan Leike, and Marcus Hutter. Algorithmic Decision Theory, 2015. Source code to the examples.**On the Computability of AIXI**.

Jan Leike and Marcus Hutter. Uncertainty in Artificial Intelligence, 2015.**Bad Universal Priors and Notions of Optimality**.

Jan Leike and Marcus Hutter. Conference on Learning Theory, 2015.**A Definition of Happiness for Reinforcement Learning Agents**.

Mayank Daswani and Jan Leike. Artificial General Intelligence, 2015.**Indefinitely Oscillating Martingales**.

Jan Leike and Marcus Hutter. Algorithmic Learning Theory, 2014.

During my Master’s degree at the University of Freiburg I developed the termination analysis tool Ultimate LassoRanker together with Matthias Heizmann. This tool can automatically prove termination and nontermination properties of C programs. It won two second places and two first places in the termination category of the SV-COMP from 2015 to 2018. The following papers are mostly related to that work.

**Geometric Nontermination Arguments**.

Jan Leike and Matthias Heizmann. Tools and Algorithms for the Construction and Analysis of Systems, 2018.**Ranking Templates for Linear Loops**.

Jan Leike and Matthias Heizmann. Logical Methods in Computer Science, 2015.**Geometric Series as Nontermination Arguments for Linear Lasso Programs**.

Jan Leike and Matthias Heizmann. International Workshop on Termination, 2014.**Ranking Templates for Linear Loops**.

Jan Leike and Matthias Heizmann. Tools and Algorithms for the Construction and Analysis of Systems, 2014.**Synthesis for Polynomial Lasso Programs**.

Jan Leike and Ashish Tiwari. Verification, Model Checking, and Abstract Interpretation, 2014. Source code to the experiments.**Linear Ranking for Linear Lasso Programs**.

Matthias Heizmann, Jochen Hoenicke, Jan Leike, and Andreas Podelski. Automated Technology for Verification and Analysis, 2013.**Ranking Function Synthesis for Linear Lasso Programs**.

Jan Leike. Master’s Thesis. University of Freiburg, 2013.

Although I take great care when polishing a paper, sometimes technical errors remain. Please see my list of errata. If you find a mistake not listed there, please let me know!