Course: Reinforcement Learning 2021 | INF - e-learning

Reinforcement Learning A.A. 2020-2021

Collapse all Expand all

Credits (ECTS): 3, Semester: 2, Official Language: English

Instructor: Davide Bacciu - Special Guest Instructor: Maurizio Parton

Contact: email - phone 050 2212749

Office: Room 367, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa

Office Hours: Email to arrange meeting

Select activity Annunci

Annunci Forum

Course Information

The course is open to M.Sc. students of the AI Curriculum and Ph. D. students. Please check course prerequisites.

Schedule

The course is held on the second term in an online form. Lectures are delivered through Teams.

The course does not have a fixed schedule: register to Moodle notifications to receive information about the next lectures.

Objective

Course Prerequisites

Course prerequisites include knowledge of machine learning fundamentals (e.g. covered through ML course), knowledge of deep learning models and probabilistic learning (e.g. covered through the ISPR course). Knowledge of elements of probability and statistics, calculus and optimization algorithms are also expected. Previous programming experience with Python is expected for project assigments.

Course Overview

The course will introduce students to the fundamentals of reinforcement learning (RL). We will start by introducing RL problem formulation, its core challenges and a survey of consolidated approaches from literature, including dynamic programming, value-function learning and policy learning. We will then cover model-based RL and exploration strategies. Finally, the course will discuss more recent reinforcement learning models that combine RL with deep learning techniques. The course will leverage a combination of theoretical and applicative lectures.

A student successfully completing the course should be able to lay down the key aspects differentiating RL from other machine learning approaches. Given an application, the student should be able to determine (i) if it can be adequately formulated as a RL problem; (ii) be able to formalise it as such and (iii) identify a set of techniques best suited to solve the task, together with the software tools to implement the solution.

Textbook and Teaching Materials

The course textbook is the classical reference book for RL course, altough it might not covering all contents in the lectures.

Note that the book has also an electronic version which is freely available online.

[RL] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second Edition, MIT Press, 2018 (PDF)

Lectures & Calendar

The official language of the course is English: all materials, references and books are in English.

Lecture slides will be made available here, together with suggested readings.


N.	Date	Where	Topic	Reference	Additional Material
1	29/03/2021 (16.00-17.45)	ONLINE	Reinforcement learning fundamentals (slides)	[RL] Chapter 1
2	01/04/2021 (14.15-16.00)	ONLINE	Markov Decision Processes (slides)	[RL] Chapter 3
3	07/04/2021 (14.15-16.00)	ONLINE	Planning by Dynamic Programming (slides)	[RL] Chapter 4	Software: Dynamic programming demo on Gridworld in Javascript (with code)
4	27/04/2021 (14.15-16.00)	ONLINE	Model-Free Prediction (slides)	[RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1, 12.2
5	04/05/2021 (13.30-16.00)	ONLINE	Model-Free Control (slides)	[RL] Section 5.3, 5.4, 5.5, 6.4, 6.5, 6.6, 7.2, 12.7	Additional reading: [3] The original Q-learning paper Software: TD learning demo on Gridworld in Javascript (with code) A Javascript demo environment based on AIXI models
6	10/05/2021 (16.00-18.00)	ONLINE	Value-function Approximation (pdf)	[RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5	Additional from RL Section 11.6 - Bellmann error learnability Section 11.7 - Gradient-TD properties Additional Reading: [4] Original DQN paper [5] Double Q-learning [6] Dueling Network Architectures [7] Prioritized Replay Software: DQN original code DQN tutorial in Pytorch DQN tutorial with Tensorflow Agents KERAS-RL suite of deep RL, including DQN implementation
7	18/05/2021 (14.00-16.00)	ONLINE	Policy gradient methods - Part I Guest lecture by Maurizio Parton (slides)	[RL] Chapter 13	Additional Reading: [8] Original REINFORCE paper [9] Learning with the actor-critic architecture [10] Accessible reference to natural policy gradient [11] A3C paper [12] Deep Deterministic Policy Gradient [13] Off-policy policy gradient [14] A generalization of natural policy gradient [15] Benchmarking article for continous actions and learning to control
8	25/05/2021 (14.00-16.00)	ONLINE	Policy gradient methods - Part II Guest lecture by Maurizio Parton (notes1, notes2)
9	01/06/2021 (14.00-16.00)	ONLINE	TRPO and PPO papers Guest lecture by Maurizio Parton (slides)
10	09/06/2021 (16.00-18.00)	ONLINE	Integrating Learning and Planning (slides)	[RL] Chapter 8, Sect 16.6	Additional Reading: [16] UCT paper: the introduction of Monte-Carlo planning [17] MoGo: the grandfather of AlphaGo (RL using offline and online experience) [18] AlphaGo paper [19] AlphaGo without human bootstrap
11	16/06/2021 (14.00-16.00)	ONLINE	Bandits, Exploration and Exploitationù (slides)	[RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10	Additional Reading: [20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free) [21] Randomized UCB algorithm for contextual bandits [22] Efficient learning of contextual bandit with an oracle [23] A deep learning based approach to generate exploration bonuses via model-based
12	22/06/2021 (14.00-16.00)	ONLINE	Imitation Learning (slides) (wrap-up & project info)		Additional Reading: [24] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989) [25] NVIDIA Self-driving trick [26] DAgger paper [27] Using distillation in reinforcement learning [28] Imitation learning with importance sampling [29] Imitation learning with off-policy Q-learning [30] Generative Adversarial Imitation Learning [31] An early compendium of inverse RL [32] Deep inverse RL [33] Guided cost learning [34] Handling multimodality with GANs
13	22/07/2021 (14.00-17.00)	ONLINE	Final student seminars - PART I Alessandro Cudazzo - Deep Reinforcement learning at scale: DQN and beyond Edoardo Federici - Offline Q-Learning Pitfalls and How to Avoid Them Fabio Murgese - Monte-Carlo Tree Search in Autonomous Vehicles Luigi Quarantiello - Automated Curriculum Learning Lisa Lavorati - From Imitation Learning to Inverse Reinforcement Learning Mattia Sangermano - Multi-agent RL: agents modeling agents
14	16/09/2021	ONLINE	Final student seminars - PART II

Final Projects & Seminars

Successful course completion will be assessed by either a seminar or a coding project.

M.Sc. students need to prepare a seminar on a RL topic, or to develop a programming project involving RL, to be presented in front of the class on one of the two available dates (22/07/2021 or 16/09/2021). Delivery of exam material NEEDS to be performed through the Moodle assignments below (withing the given deadlines).

Ph.D. students can complete the course by means of several final assignment types, examples of which are listed on these slides. Delivery of exam materials for Ph.D. is by email and there is no fixed date (to be arranged by email with the instructor), as long as it is within year 2021.

Select activity M.Sc. Seminars - 22nd July 2021

M.Sc. Seminars - 22nd July 2021 Assignment
Select activity M.Sc. Seminars - 16th September 2021

M.Sc. Seminars - 16th September 2021 Assignment

Bibliography

Ian Goodfellow and Yoshua Bengio and Aaron Courville , Deep Learning, MIT Press, Free online version
David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, Free online version
CJCH Watkins, P Dayan, Q-learning, Machine Learning, 1992, PDF
Mnih et al,Human-level control through deep reinforcement learning, Nature, 2015, PDF
van Hasselt et al, Deep Reinforcement Learning with Double Q-learning, AAAI, 2015, PDF
Wang et al, Dueling Network Architectures for Deep Reinforcement Learning, ICML, 2016, PDF
Schaul et al, Prioritized Experience Replay, ICLR, 2016, PDF
Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992, PDF
Sutton et al, Policy gradient methods for reinforcement learning with function approximation, NIPS, 2000, PDF
Peters & Schaal, Reinforcement learning of motor skills with policy gradients, Neural Networks, 2008, PDF
Mnih et al, Asynchronous methods for deep reinforcement learning, ICLR, 2016, PDF
Lillicrap et al., Continuous control with deep reinforcement learning, ICLR, 2016, PDF
Gu et al. Q-Prop: sample-efficient policy gradient with an off-policy critic, ICLR, 2017, PDF
Schulman et al, Trust Region Policy Optimization, ICML, 2015, PDF
Duan et al, Benchmarking Deep Reinforcement Learning for Continuous Control, ICML, 2016, PDF
Kocsis and Szepesvari, Bandit based Monte-Carlo planning, ECML, 2006, PDF
Gelly and Silver, Combining Online and Offline Knowledge in UCT, ICML, 2017, PDF
Silver et al, Mastering the game of Go with deep neural networks and tree search, Nature, 2016, Online
Silver et al, Mastering the game of Go without human knowledge, Nature, 2017, Online
Auer et al, Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002, PDF
Dudik et al, Efficient Optimal Learning for Contextual Bandits, ICML, 2011, PDF
Agarwal et al, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits, ICML, 2014, PDF
Stadie et al, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2016, PDF
Pomerleau, ALVINN: An autonomous Land vehicle in a neural Network”, NIPS 1989, PDF
Bojarski et al., End to End Learning for Self-Driving Cars, 2016, PDF
Ross et al, A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning, AISTATS 2011, PDF
Rusu et al, Policy distillation, ICLR 2016, PDF
Levine and Koltun, Guided policy search, ICML 2013, PDF
Reddy et al, SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards, ICLR 2020, PDF
Ho and Ermon, Generative Adversarial Imitation Learning, NIPS 2016, PDF
Ng and Russell, Algorithms for Inverse Reinforcement Learning, ICML 2000, PDF
Wulfmeier et al, Maximum Entropy Deep Inverse Reinforcement Learning, 2015, PDF
Finn et al, Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, ICML 2016, PDF
Hausman, Multi-Modal Imitation Learning from UnstructuredDemonstrations using Generative Adversarial Nets, NIPS 2017, PDF

Section outline