Section: Lectures & Calendar | Reinforcement Learning 2021 | INF - e-learning

Section outline

The official language of the course is English: all materials, references and books are in English.

Lecture slides will be made available here, together with suggested readings.


N.	Date	Where	Topic	Reference	Additional Material
1	29/03/2021 (16.00-17.45)	ONLINE	Reinforcement learning fundamentals (slides)	[RL] Chapter 1
2	01/04/2021 (14.15-16.00)	ONLINE	Markov Decision Processes (slides)	[RL] Chapter 3
3	07/04/2021 (14.15-16.00)	ONLINE	Planning by Dynamic Programming (slides)	[RL] Chapter 4	Software: Dynamic programming demo on Gridworld in Javascript (with code)
4	27/04/2021 (14.15-16.00)	ONLINE	Model-Free Prediction (slides)	[RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1, 12.2
5	04/05/2021 (13.30-16.00)	ONLINE	Model-Free Control (slides)	[RL] Section 5.3, 5.4, 5.5, 6.4, 6.5, 6.6, 7.2, 12.7	Additional reading: [3] The original Q-learning paper Software: TD learning demo on Gridworld in Javascript (with code) A Javascript demo environment based on AIXI models
6	10/05/2021 (16.00-18.00)	ONLINE	Value-function Approximation (pdf)	[RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5	Additional from RL Section 11.6 - Bellmann error learnability Section 11.7 - Gradient-TD properties Additional Reading: [4] Original DQN paper [5] Double Q-learning [6] Dueling Network Architectures [7] Prioritized Replay Software: DQN original code DQN tutorial in Pytorch DQN tutorial with Tensorflow Agents KERAS-RL suite of deep RL, including DQN implementation
7	18/05/2021 (14.00-16.00)	ONLINE	Policy gradient methods - Part I Guest lecture by Maurizio Parton (slides)	[RL] Chapter 13	Additional Reading: [8] Original REINFORCE paper [9] Learning with the actor-critic architecture [10] Accessible reference to natural policy gradient [11] A3C paper [12] Deep Deterministic Policy Gradient [13] Off-policy policy gradient [14] A generalization of natural policy gradient [15] Benchmarking article for continous actions and learning to control
8	25/05/2021 (14.00-16.00)	ONLINE	Policy gradient methods - Part II Guest lecture by Maurizio Parton (notes1, notes2)
9	01/06/2021 (14.00-16.00)	ONLINE	TRPO and PPO papers Guest lecture by Maurizio Parton (slides)
10	09/06/2021 (16.00-18.00)	ONLINE	Integrating Learning and Planning (slides)	[RL] Chapter 8, Sect 16.6	Additional Reading: [16] UCT paper: the introduction of Monte-Carlo planning [17] MoGo: the grandfather of AlphaGo (RL using offline and online experience) [18] AlphaGo paper [19] AlphaGo without human bootstrap
11	16/06/2021 (14.00-16.00)	ONLINE	Bandits, Exploration and Exploitationù (slides)	[RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10	Additional Reading: [20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free) [21] Randomized UCB algorithm for contextual bandits [22] Efficient learning of contextual bandit with an oracle [23] A deep learning based approach to generate exploration bonuses via model-based
12	22/06/2021 (14.00-16.00)	ONLINE	Imitation Learning (slides) (wrap-up & project info)		Additional Reading: [24] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989) [25] NVIDIA Self-driving trick [26] DAgger paper [27] Using distillation in reinforcement learning [28] Imitation learning with importance sampling [29] Imitation learning with off-policy Q-learning [30] Generative Adversarial Imitation Learning [31] An early compendium of inverse RL [32] Deep inverse RL [33] Guided cost learning [34] Handling multimodality with GANs
13	22/07/2021 (14.00-17.00)	ONLINE	Final student seminars - PART I Alessandro Cudazzo - Deep Reinforcement learning at scale: DQN and beyond Edoardo Federici - Offline Q-Learning Pitfalls and How to Avoid Them Fabio Murgese - Monte-Carlo Tree Search in Autonomous Vehicles Luigi Quarantiello - Automated Curriculum Learning Lisa Lavorati - From Imitation Learning to Inverse Reinforcement Learning Mattia Sangermano - Multi-agent RL: agents modeling agents
14	16/09/2021	ONLINE	Final student seminars - PART II