Section outline
-
The official language of the course is English: all materials, references and books are in English.
Lecture slides will be made available here, together with suggested readings.
N. Date Where Topic Reference Additional Material 1 22/04/2020
(16.00-18.00)ONLINE Course Introduction and Machine Learning Refresher
(slides) [video]
[1] Reference book for neural networks and deep learning fundamentals
[2] Reference book for Bayesian and probabilistic methods2 24/04/2020
(15.00-17.00)ONLINE Reinforcement learning fundamentals
(slides)[video][RL] Chapter 1 3 29/04/2020
(16.00-18.00)ONLINE Markov Decision Processes
(slides)[video][RL] Chapter 3 4 06/05/2020
(16.00-18.00)ONLINE Planning by Dynamic Programming
(slides)[video][RL] Chapter 4 Software:
Dynamic programming demo on Gridworld in Javascript (with code)5 08/05/2020
(16.00-18.00)ONLINE Model-Free Prediction
(slides)[video][RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1 6 15/05/2020
(16.00-18.00)ONLINE Model-Free Control
(slides)
(coding addendum)
[video][RL] Section 5.3, 5.4, 5.5, 6.4, 6.5, 6.6, 7.2, 12.7 Additional reading:
[3] The original Q-learning paper
Software:- TD learning demo on Gridworld in Javascript (with code)
- A Javascript demo environment based on AIXI models
7 22/05/2020
(16.00-18.00)ONLINE Value-function Approximation
(slide)[video][RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5 Additional from RL - Section 11.6 - Bellmann error learnability
- Section 11.7 - Gradient-TD properties
[4] Original DQN paper
[5] Double Q-learning
[6] Dueling Network Architectures
[7] Prioritized Replay
Software:- DQN original code
- DQN tutorial in Pytorch
- DQN tutorial with Tensorflow Agents
- KERAS-RL suite of deep RL, including DQN implementation
8 27/05/2020
(16.00-18.00)ONLINE Policy Gradient Methods (I)
(slides)
[video][RL] Chapter 13 Additional Reading:
[8] Original REINFORCE paper
[9] Learning with the actor-critic architecture
[10] Accessible reference to natural policy gradient9 29/05/2020
(16.00-18.00)ONLINE Policy Gradient Methods (II)
(slides)[video][RL] Chapter 13, Sect. 16.5 Additional Reading:
[11] A3C paper
[12] Deep Deterministic Policy Gradient
[13] Off-policy policy gradient
[14] A generalization of natural policy gradient
[15] Benchmarking article for continous actions and learning to control
Software:
Raylib - a framework for scalable RL with a tutorial-like implementation of A3C10 05/06/2020
(16.00-18.00)ONLINE Integrating Learning and Planning
(slides)
[video][RL] Chapter 8, Sect 16.6 Additional Reading:
[16] UCT paper: the introduction of Monte-Carlo planning
[17] MoGo: the grandfather of AlphaGo (RL using offline and online experience)
[18] AlphaGo paper
[19] AlphaGo without human bootstrap11 10/06/2020
(16.00-18.00)ONLINE Bandits, Exploration and Exploitation
(slides)
[video1,video2,video3][RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10 Additional Reading:
[20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free)
[21] Randomized UCB algorithm for contextual bandits
[22] Efficient learning of contextual bandit with an oracle
[23] (Generalized) Thompson sampling in contextual bandits
[24] Tutorial on Thompson Sampling
[25] A deep learning based approach to generate exploration bonuses via model-based12 19/06/2020
(16.00-18.00)ONLINE Imitation Learning
(slides)
[video]Additional Reading:
[26] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989)
[27] NVIDIA Self-driving trick
[28] DAgger paper
[29] Using distillation in reinforcement learning
[30] Imitation learning with importance sampling
[31] Imitation learning with off-policy Q-learning
[32] Generative Adversarial Imitation Learning
[33] An early compendium of inverse RL
[34] Deep inverse RL
[35] Guided cost learning
[36] Handling multimodality with GANs13 03/07/2020
(16.00-18.00ONLINE Concluding lecture and advanced RL topics
(slides)
[video]14 21/10/2020
(14.30-18.30)ONLINE Student seminars Federico Dragoni, Q-learning
Lorenzo Ferrini, Proximal Policy Optimization Algorithms
Mario Bonsembiante, Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Christian Esposito, Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
Marco Biasizzo, SQIL - Imitation Learning via Reinforcement Learning with Sparse Rewards
Daniele Morra, Learning Body Shape Variation in Physics-based Characters
Luca Girardi, Cooperative Multi-Agent Control Using Deep Reinforcement Learning
Leonardo Lai, Control of a Quadrotor with Reinforcement Learning