Section outline
-
- Ian Goodfellow and Yoshua Bengio and Aaron Courville , Deep Learning, MIT Press, Free online version
- David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, Free online version
- CJCH Watkins, P Dayan, Q-learning, Machine Learning, 1992, PDF
- Mnih et al,Human-level control through deep reinforcement learning, Nature, 2015, PDF
- van Hasselt et al, Deep Reinforcement Learning with Double Q-learning, AAAI, 2015, PDF
- Wang et al, Dueling Network Architectures for Deep Reinforcement Learning, ICML, 2016, PDF
- Schaul et al, Prioritized Experience Replay, ICLR, 2016, PDF
- Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992, PDF
- Sutton et al, Policy gradient methods for reinforcement learning with function approximation, NIPS, 2000, PDF
- Peters & Schaal, Reinforcement learning of motor skills with policy gradients, Neural Networks, 2008, PDF
- Mnih et al, Asynchronous methods for deep reinforcement learning, ICLR, 2016, PDF
- Lillicrap et al., Continuous control with deep reinforcement learning, ICLR, 2016, PDF
- Gu et al. Q-Prop: sample-efficient policy gradient with an off-policy critic, ICLR, 2017, PDF
- Schulman et al, Trust Region Policy Optimization, ICML, 2015, PDF
- Duan et al, Benchmarking Deep Reinforcement Learning for Continuous Control, ICML, 2016, PDF
- Kocsis and Szepesvari, Bandit based Monte-Carlo planning, ECML, 2006, PDF
- Gelly and Silver, Combining Online and Offline Knowledge in UCT, ICML, 2017, PDF
- Silver et al, Mastering the game of Go with deep neural networks and tree search, Nature, 2016, Online
- Silver et al, Mastering the game of Go without human knowledge, Nature, 2017, Online
- Auer et al, Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002, PDF
- Dudik et al, Efficient Optimal Learning for Contextual Bandits, ICML, 2011, PDF
- Agarwal et al, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits, ICML, 2014, PDF
- Lihong Li, Generalized Thompson Sampling for Contextual Bandits, PDF
- Russo et al, A tutorial on Thompson Sampling, PDF
- Stadie et al, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2016, PDF
- Pomerleau, ALVINN: An autonomous Land vehicle in a neural Network”, NIPS 1989, PDF
- Bojarski et al., End to End Learning for Self-Driving Cars, 2016, PDF
- Ross et al, A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning, AISTATS 2011, PDF
- Rusu et al, Policy distillation, ICLR 2016, PDF
- Levine and Koltun, Guided policy search, ICML 2013, PDF
- Reddy et al, SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards, ICLR 2020, PDF
- Ho and Ermon, Generative Adversarial Imitation Learning, NIPS 2016, PDF
- Ng and Russell, Algorithms for Inverse Reinforcement Learning, ICML 2000, PDF
- Wulfmeier et al, Maximum Entropy Deep Inverse Reinforcement Learning, 2015, PDF
- Finn et al, Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, ICML 2016, PDF
- Hausman, Multi-Modal Imitation Learning from UnstructuredDemonstrations using Generative Adversarial Nets, NIPS 2017, PDF