Section: Bibliography | Reinforcement Learning (SSSA 2020) | INF - e-learning

Section outline

Ian Goodfellow and Yoshua Bengio and Aaron Courville , Deep Learning, MIT Press, Free online version
David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, Free online version
CJCH Watkins, P Dayan, Q-learning, Machine Learning, 1992, PDF
Mnih et al,Human-level control through deep reinforcement learning, Nature, 2015, PDF
van Hasselt et al, Deep Reinforcement Learning with Double Q-learning, AAAI, 2015, PDF
Wang et al, Dueling Network Architectures for Deep Reinforcement Learning, ICML, 2016, PDF
Schaul et al, Prioritized Experience Replay, ICLR, 2016, PDF
Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992, PDF
Sutton et al, Policy gradient methods for reinforcement learning with function approximation, NIPS, 2000, PDF
Peters & Schaal, Reinforcement learning of motor skills with policy gradients, Neural Networks, 2008, PDF
Mnih et al, Asynchronous methods for deep reinforcement learning, ICLR, 2016, PDF
Lillicrap et al., Continuous control with deep reinforcement learning, ICLR, 2016, PDF
Gu et al. Q-Prop: sample-efficient policy gradient with an off-policy critic, ICLR, 2017, PDF
Schulman et al, Trust Region Policy Optimization, ICML, 2015, PDF
Duan et al, Benchmarking Deep Reinforcement Learning for Continuous Control, ICML, 2016, PDF
Kocsis and Szepesvari, Bandit based Monte-Carlo planning, ECML, 2006, PDF
Gelly and Silver, Combining Online and Offline Knowledge in UCT, ICML, 2017, PDF
Silver et al, Mastering the game of Go with deep neural networks and tree search, Nature, 2016, Online
Silver et al, Mastering the game of Go without human knowledge, Nature, 2017, Online
Auer et al, Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002, PDF
Dudik et al, Efficient Optimal Learning for Contextual Bandits, ICML, 2011, PDF
Agarwal et al, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits, ICML, 2014, PDF
Lihong Li, Generalized Thompson Sampling for Contextual Bandits, PDF
Russo et al, A tutorial on Thompson Sampling, PDF
Stadie et al, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2016, PDF
Pomerleau, ALVINN: An autonomous Land vehicle in a neural Network”, NIPS 1989, PDF
Bojarski et al., End to End Learning for Self-Driving Cars, 2016, PDF
Ross et al, A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning, AISTATS 2011, PDF
Rusu et al, Policy distillation, ICLR 2016, PDF
Levine and Koltun, Guided policy search, ICML 2013, PDF
Reddy et al, SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards, ICLR 2020, PDF
Ho and Ermon, Generative Adversarial Imitation Learning, NIPS 2016, PDF
Ng and Russell, Algorithms for Inverse Reinforcement Learning, ICML 2000, PDF
Wulfmeier et al, Maximum Entropy Deep Inverse Reinforcement Learning, 2015, PDF
Finn et al, Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, ICML 2016, PDF
Hausman, Multi-Modal Imitation Learning from UnstructuredDemonstrations using Generative Adversarial Nets, NIPS 2017, PDF