Section outline

  • The official language of the course is English: all materials, references and books are in English.

    Lecture slides will be made available here, together with suggested readings.

    N. Date
    Where Topic Reference Additional Material
    1 29/03/2021
    (16.00-17.45)
    ONLINE
    Reinforcement learning fundamentals
    (slides)
    [RL] Chapter 1
    2 01/04/2021
    (14.15-16.00)
    ONLINE Markov Decision Processes
    (slides)
    [RL] Chapter 3
    3
    07/04/2021
    (14.15-16.00)
    ONLINE
    Planning by Dynamic Programming
    (slides)
    [RL] Chapter 4
    Software:
    Dynamic programming demo on Gridworld in Javascript (with code)
    4
    27/04/2021
    (14.15-16.00)
    ONLINE
    Model-Free Prediction
    (slides)
    [RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1, 12.2

    5
    04/05/2021
    (13.30-16.00)
    ONLINE
    Model-Free Control
    (slides)
    [RL] Section 5.3, 5.4, 5.5,  6.4, 6.5, 6.6, 7.2, 12.7
    Additional reading:
    [3] The original Q-learning paper

    Software:

    6
    10/05/2021
    (16.00-18.00)
    ONLINE
    Value-function Approximation
    (pdf)
    [RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5
    Additional from RL
    • Section 11.6 - Bellmann error learnability
    • Section 11.7 - Gradient-TD properties
    Additional Reading:
    [4] Original DQN paper
    [5] Double Q-learning
    [6] Dueling Network Architectures
    [7] Prioritized Replay

    Software:
    7
    18/05/2021
    (14.00-16.00)
    ONLINE
    Policy gradient methods - Part I
    Guest lecture by Maurizio Parton
    (slides)
    [RL] Chapter 13
    Additional Reading:
    [8] Original REINFORCE paper
    [9] Learning with the actor-critic architecture
    [10] Accessible reference to natural policy gradient
    [11] A3C paper
    [12] Deep Deterministic Policy Gradient
    [13] Off-policy policy gradient
    [14] A generalization of natural policy gradient
    [15] Benchmarking article for continous actions and learning to control
    8
    25/05/2021
    (14.00-16.00)
    ONLINE
    Policy gradient methods - Part II
    Guest lecture by Maurizio Parton
    (notes1, notes2)


    9
    01/06/2021
    (14.00-16.00)
    ONLINE
    TRPO and PPO papers
    Guest lecture by Maurizio Parton
    (slides)


    10
    09/06/2021
    (16.00-18.00)
    ONLINE
    Integrating Learning and Planning
    (slides)
    [RL] Chapter 8, Sect 16.6
    Additional Reading:
    [16] UCT paper: the introduction of Monte-Carlo planning
    [17] MoGo: the grandfather of AlphaGo (RL using offline and online experience)
    [18] AlphaGo paper
    [19] AlphaGo without human bootstrap
    11
    16/06/2021
    (14.00-16.00)
    ONLINE
    Bandits, Exploration and Exploitationù
    (slides)
    [RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10
    Additional Reading:
    [20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free)
    [21] Randomized UCB algorithm for contextual bandits
    [22] Efficient learning of contextual bandit with an oracle
    [23] A deep learning based approach to generate exploration bonuses via model-based
    12
    22/06/2021
    (14.00-16.00)
    ONLINE
    Imitation Learning
    (slides)
    (wrap-up & project info)

    Additional Reading:
    [24] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989)
    [25] NVIDIA Self-driving trick
    [26] DAgger paper
    [27] Using distillation in reinforcement learning
    [28] Imitation learning with importance sampling
    [29] Imitation learning with off-policy Q-learning
    [30] Generative Adversarial Imitation Learning
    [31] An early compendium of inverse RL
    [32] Deep inverse RL
    [33] Guided cost learning
    [34] Handling multimodality with GANs
    13
    22/07/2021
    (14.00-17.00)
    ONLINE
    Final student seminars - PART I
    1. Alessandro Cudazzo - Deep Reinforcement learning at scale: DQN and beyond
    2. Edoardo Federici - Offline Q-Learning Pitfalls and How to Avoid Them
    3. Fabio Murgese - Monte-Carlo Tree Search in Autonomous Vehicles
    4. Luigi Quarantiello - Automated Curriculum Learning
    5. Lisa Lavorati - From Imitation Learning to Inverse Reinforcement Learning
    6. Mattia Sangermano - Multi-agent RL: agents modeling agents


    14
    16/09/2021
    ONLINE
    Final student seminars - PART II