Section outline

  • The official language of the course is English: all materials, references and books are in English.

    Lecture slides will be made available here, together with suggested readings.


    N. Date
    Where Topic Reference Additional Material
    1 22/04/2020
    (16.00-18.00)
    ONLINE
    Course Introduction and Machine Learning Refresher
    (slides) [video]

    [1] Reference book for neural networks and deep learning fundamentals
    [2] Reference book for Bayesian and probabilistic methods
    2 24/04/2020
    (15.00-17.00)
    ONLINE Reinforcement learning fundamentals
    (slides)[video]
    [RL] Chapter 1
    3 29/04/2020
    (16.00-18.00)
    ONLINE Markov Decision Processes
    (slides)[video]
    [RL] Chapter 3
    4 06/05/2020
    (16.00-18.00)
    ONLINE Planning by Dynamic Programming
    (slides)[video]
    [RL] Chapter 4
    Software:
    Dynamic programming demo on Gridworld in Javascript (with code)
    5 08/05/2020
    (16.00-18.00)
    ONLINE Model-Free Prediction
    (slides)[video]
    [RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1

    6 15/05/2020
    (16.00-18.00)
    ONLINE Model-Free Control
    (slides)
    (
    coding addendum)
    [video]
    [RL] Section 5.3, 5.4, 5.5,  6.4, 6.5, 6.6, 7.2, 12.7
    Additional reading:
    [3] The original Q-learning paper

    Software:

    7 22/05/2020
    (16.00-18.00)
    ONLINE Value-function Approximation
    (slide)[video]
    [RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5
    Additional from RL
    • Section 11.6 - Bellmann error learnability
    • Section 11.7 - Gradient-TD properties
    Additional Reading:
    [4] Original DQN paper
    [5] Double Q-learning
    [6] Dueling Network Architectures
    [7] Prioritized Replay

    Software:
    8 27/05/2020
    (16.00-18.00)
    ONLINE Policy Gradient Methods (I)
    (slides)
    [video]
    [RL] Chapter 13
    Additional Reading:
    [8] Original REINFORCE paper
    [9] Learning with the actor-critic architecture
    [10] Accessible reference to natural policy gradient

    9 29/05/2020
    (16.00-18.00)
    ONLINE Policy Gradient Methods (II)
    (slides)[video]
    [RL] Chapter 13, Sect. 16.5
    Additional Reading:
    [11] A3C paper
    [12] Deep Deterministic Policy Gradient
    [13] Off-policy policy gradient
    [14] A generalization of natural policy gradient
    [15] Benchmarking article for continous actions and learning to control

    Software:
    Raylib - a framework for scalable RL with a tutorial-like implementation of A3C
    10 05/06/2020
    (16.00-18.00)
    ONLINE Integrating Learning and Planning
    (slides)
    [video]
    [RL] Chapter 8, Sect 16.6
    Additional Reading:
    [16] UCT paper: the introduction of Monte-Carlo planning
    [17] MoGo: the grandfather of AlphaGo (RL using offline and online experience)
    [18] AlphaGo paper
    [19] AlphaGo without human bootstrap
    11 10/06/2020
    (16.00-18.00)
    ONLINE Bandits, Exploration and Exploitation
    (slides)
    [video1,video2,video3]
    [RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10
    Additional Reading:
    [20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free)
    [21] Randomized UCB algorithm for contextual bandits
    [22] Efficient learning of contextual bandit with an oracle
    [23] (Generalized) Thompson sampling in contextual bandits
    [24] Tutorial on Thompson Sampling
    [25] A deep learning based approach to generate exploration bonuses via model-based
    12 19/06/2020
    (16.00-18.00)
    ONLINE Imitation Learning
    (slides)
    [video]
    Additional Reading:
    [26] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989)
    [27] NVIDIA Self-driving trick
    [28] DAgger paper
    [29] Using distillation in reinforcement learning
    [30] Imitation learning with importance sampling
    [31] Imitation learning with off-policy Q-learning
    [32] Generative Adversarial Imitation Learning
    [33] An early compendium of inverse RL
    [34] Deep inverse RL
    [35] Guided cost learning
    [36] Handling multimodality with GANs
    13 03/07/2020
    (16.00-18.00
    ONLINE Concluding lecture and advanced RL topics
    (slides)
    [video]
    14 21/10/2020
    (14.30-18.30)
    ONLINE Student seminars
    Federico Dragoni, Q-learning
    Lorenzo Ferrini, Proximal Policy Optimization Algorithms
    Mario Bonsembiante, Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
    Christian Esposito, Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
    Marco Biasizzo, SQIL - Imitation Learning via Reinforcement Learning with Sparse Rewards
    Daniele Morra, Learning Body Shape Variation in Physics-based Characters
    Luca Girardi, Cooperative Multi-Agent Control Using Deep Reinforcement Learning
    Leonardo Lai, Control of a Quadrotor with Reinforcement Learning