Section: Lectures & Calendar | Reinforcement Learning (SSSA 2020) | INF - e-learning

Section outline

The official language of the course is English: all materials, references and books are in English.

Lecture slides will be made available here, together with suggested readings.


N.	Date	Where	Topic	Reference	Additional Material
1	22/04/2020 (16.00-18.00)	ONLINE	Course Introduction and Machine Learning Refresher (slides) [video]		[1] Reference book for neural networks and deep learning fundamentals [2] Reference book for Bayesian and probabilistic methods
2	24/04/2020 (15.00-17.00)	ONLINE	Reinforcement learning fundamentals (slides)[video]	[RL] Chapter 1
3	29/04/2020 (16.00-18.00)	ONLINE	Markov Decision Processes (slides)[video]	[RL] Chapter 3
4	06/05/2020 (16.00-18.00)	ONLINE	Planning by Dynamic Programming (slides)[video]	[RL] Chapter 4	Software: Dynamic programming demo on Gridworld in Javascript (with code)
5	08/05/2020 (16.00-18.00)	ONLINE	Model-Free Prediction (slides)[video]	[RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1
6	15/05/2020 (16.00-18.00)	ONLINE	Model-Free Control (slides) (coding addendum) [video]	[RL] Section 5.3, 5.4, 5.5, 6.4, 6.5, 6.6, 7.2, 12.7	Additional reading: [3] The original Q-learning paper Software: TD learning demo on Gridworld in Javascript (with code) A Javascript demo environment based on AIXI models
7	22/05/2020 (16.00-18.00)	ONLINE	Value-function Approximation (slide)[video]	[RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5	Additional from RL Section 11.6 - Bellmann error learnability Section 11.7 - Gradient-TD properties Additional Reading: [4] Original DQN paper [5] Double Q-learning [6] Dueling Network Architectures [7] Prioritized Replay Software: DQN original code DQN tutorial in Pytorch DQN tutorial with Tensorflow Agents KERAS-RL suite of deep RL, including DQN implementation
8	27/05/2020 (16.00-18.00)	ONLINE	Policy Gradient Methods (I) (slides) [video]	[RL] Chapter 13	Additional Reading: [8] Original REINFORCE paper [9] Learning with the actor-critic architecture [10] Accessible reference to natural policy gradient
9	29/05/2020 (16.00-18.00)	ONLINE	Policy Gradient Methods (II) (slides)[video]	[RL] Chapter 13, Sect. 16.5	Additional Reading: [11] A3C paper [12] Deep Deterministic Policy Gradient [13] Off-policy policy gradient [14] A generalization of natural policy gradient [15] Benchmarking article for continous actions and learning to control Software: Raylib - a framework for scalable RL with a tutorial-like implementation of A3C
10	05/06/2020 (16.00-18.00)	ONLINE	Integrating Learning and Planning (slides) [video]	[RL] Chapter 8, Sect 16.6	Additional Reading: [16] UCT paper: the introduction of Monte-Carlo planning [17] MoGo: the grandfather of AlphaGo (RL using offline and online experience) [18] AlphaGo paper [19] AlphaGo without human bootstrap
11	10/06/2020 (16.00-18.00)	ONLINE	Bandits, Exploration and Exploitation (slides) [video1,video2,video3]	[RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10	Additional Reading: [20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free) [21] Randomized UCB algorithm for contextual bandits [22] Efficient learning of contextual bandit with an oracle [23] (Generalized) Thompson sampling in contextual bandits [24] Tutorial on Thompson Sampling [25] A deep learning based approach to generate exploration bonuses via model-based
12	19/06/2020 (16.00-18.00)	ONLINE	Imitation Learning (slides) [video]		Additional Reading: [26] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989) [27] NVIDIA Self-driving trick [28] DAgger paper [29] Using distillation in reinforcement learning [30] Imitation learning with importance sampling [31] Imitation learning with off-policy Q-learning [32] Generative Adversarial Imitation Learning [33] An early compendium of inverse RL [34] Deep inverse RL [35] Guided cost learning [36] Handling multimodality with GANs
13	03/07/2020 (16.00-18.00	ONLINE	Concluding lecture and advanced RL topics (slides) [video]
14	21/10/2020 (14.30-18.30)	ONLINE	Student seminars		Federico Dragoni, Q-learning Lorenzo Ferrini, Proximal Policy Optimization Algorithms Mario Bonsembiante, Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Christian Esposito, Robust Reinforcement Learning via Adversarial training with Langevin Dynamics Marco Biasizzo, SQIL - Imitation Learning via Reinforcement Learning with Sparse Rewards Daniele Morra, Learning Body Shape Variation in Physics-based Characters Luca Girardi, Cooperative Multi-Agent Control Using Deep Reinforcement Learning Leonardo Lai, Control of a Quadrotor with Reinforcement Learning