Topic outline

  • Reinforcement Learning (SSSA 2020)

    Credits (ECTS): 3, Semester: 2, Official Language: English

    Instructor: Davide Bacciu

    Contact: email - phone 050 2212749

    Office: Room 367, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa

    Office Hours: Email to arrange meeting


  • Course Information

    The course is open to Allievi of the Scuola Superiore Sant'Anna and Ph. D. students. Please check course prerequisites.

    Schedule

    The course is held on the second term in an online form. Lectures are delivered through Teams, accessible trough this link.

    The course does not have a fixed schedule: register to Moodle notifications to receive information about the next lectures.

    Objective

    Course Prerequisites

    Course prerequisites include knowledge of machine learning fundamentals (e.g. covered through ML course), knowledge of deep learning models and probabilistic learning (e.g. covered through the ISPR course). Knowledge of elements of probability and statistics, calculus and optimization algorithms are also expected. Previous programming experience with Python is expected for project assigments.

    Course Overview

    The course will introduce students to the fundamentals of reinforcement learning (RL). The course will start by recalling the machine learning and statistics fundamentals needed to fully understand the specifics of RL. Then, it will introduce RL problem formulation, its core challenges and a survey of consolidated approaches from literature. Finally, the course will cover more recent reinforcement learning models that combine RL with deep learning techniques.

    Space will be devoted to present RL applications in areas that are relevant for students of industrial and information engineering, such as robotics, pattern recognition, life sciences, material sciences, signal processing, computer vision and natural language processing.  The course will leverage a combination of theoretical and applicative lectures.

    A student successfully completing the course should be able to lay down the key aspects differentiating RL from other machine learning approaches. Given an application, the student should be able to determine (i) if it can be adequately formulated as a RL problem;  (ii) be able to formalize it as such and (iii) identify a set of techniques best suited to solve the task, together with the software tools to implement the solution.

    Textbook and Teaching Materials

    The course textbook is the classical reference book for RL course, altough it might not covering all contents in the lectures.

    Note that the book has also an electronic version which is freely available online.

    [RL] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second Edition, MIT Press, 2018 (PDF)

  • Lectures & Calendar

    The official language of the course is English: all materials, references and books are in English.

    Lecture slides will be made available here, together with suggested readings.


    N. Date
    Where Topic Reference Additional Material
    1 22/04/2020
    (16.00-18.00)
    ONLINE
    Course Introduction and Machine Learning Refresher
    (slides) [video]

    [1] Reference book for neural networks and deep learning fundamentals
    [2] Reference book for Bayesian and probabilistic methods
    2 24/04/2020
    (15.00-17.00)
    ONLINE Reinforcement learning fundamentals
    (slides)[video]
    [RL] Chapter 1
    3 29/04/2020
    (16.00-18.00)
    ONLINE Markov Decision Processes
    (slides)[video]
    [RL] Chapter 3
    4 06/05/2020
    (16.00-18.00)
    ONLINE Planning by Dynamic Programming
    (slides)[video]
    [RL] Chapter 4
    Software:
    Dynamic programming demo on Gridworld in Javascript (with code)
    5 08/05/2020
    (16.00-18.00)
    ONLINE Model-Free Prediction
    (slides)[video]
    [RL] Section 5.1, 5.6, 6.1-6.3, 7.1, 12.1

    6 15/05/2020
    (16.00-18.00)
    ONLINE Model-Free Control
    (slides)
    (
    coding addendum)
    [video]
    [RL] Section 5.3, 5.4, 5.5,  6.4, 6.5, 6.6, 7.2, 12.7
    Additional reading:
    [3] The original Q-learning paper

    Software:

    7 22/05/2020
    (16.00-18.00)
    ONLINE Value-function Approximation
    (slide)[video]
    [RL] Section 9.1-9.5, 9.8, 10.1, 11.1-11.5
    Additional from RL
    • Section 11.6 - Bellmann error learnability
    • Section 11.7 - Gradient-TD properties
    Additional Reading:
    [4] Original DQN paper
    [5] Double Q-learning
    [6] Dueling Network Architectures
    [7] Prioritized Replay

    Software:
    8 27/05/2020
    (16.00-18.00)
    ONLINE Policy Gradient Methods (I)
    (slides)
    [video]
    [RL] Chapter 13
    Additional Reading:
    [8] Original REINFORCE paper
    [9] Learning with the actor-critic architecture
    [10] Accessible reference to natural policy gradient

    9 29/05/2020
    (16.00-18.00)
    ONLINE Policy Gradient Methods (II)
    (slides)[video]
    [RL] Chapter 13, Sect. 16.5
    Additional Reading:
    [11] A3C paper
    [12] Deep Deterministic Policy Gradient
    [13] Off-policy policy gradient
    [14] A generalization of natural policy gradient
    [15] Benchmarking article for continous actions and learning to control

    Software:
    Raylib - a framework for scalable RL with a tutorial-like implementation of A3C
    10 05/06/2020
    (16.00-18.00)
    ONLINE Integrating Learning and Planning
    (slides)
    [video]
    [RL] Chapter 8, Sect 16.6
    Additional Reading:
    [16] UCT paper: the introduction of Monte-Carlo planning
    [17] MoGo: the grandfather of AlphaGo (RL using offline and online experience)
    [18] AlphaGo paper
    [19] AlphaGo without human bootstrap
    11 10/06/2020
    (16.00-18.00)
    ONLINE Bandits, Exploration and Exploitation
    (slides)
    [video1,video2,video3]
    [RL] Sect. 2.1-2.4, 2.6, 2.7, 2.9, 2.10
    Additional Reading:
    [20] Seminal UCB1 and UCB2 paper (upper confidence bounds algorithm for context-free)
    [21] Randomized UCB algorithm for contextual bandits
    [22] Efficient learning of contextual bandit with an oracle
    [23] (Generalized) Thompson sampling in contextual bandits
    [24] Tutorial on Thompson Sampling
    [25] A deep learning based approach to generate exploration bonuses via model-based
    12 19/06/2020
    (16.00-18.00)
    ONLINE Imitation Learning
    (slides)
    [video]
    Additional Reading:
    [26] Seminal paper on data augmentation for handling distribution shift (aka self-driving in 1989)
    [27] NVIDIA Self-driving trick
    [28] DAgger paper
    [29] Using distillation in reinforcement learning
    [30] Imitation learning with importance sampling
    [31] Imitation learning with off-policy Q-learning
    [32] Generative Adversarial Imitation Learning
    [33] An early compendium of inverse RL
    [34] Deep inverse RL
    [35] Guided cost learning
    [36] Handling multimodality with GANs
    13 03/07/2020
    (16.00-18.00
    ONLINE Concluding lecture and advanced RL topics
    (slides)
    [video]
    14 21/10/2020
    (14.30-18.30)
    ONLINE Student seminars
    Federico Dragoni, Q-learning
    Lorenzo Ferrini, Proximal Policy Optimization Algorithms
    Mario Bonsembiante, Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
    Christian Esposito, Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
    Marco Biasizzo, SQIL - Imitation Learning via Reinforcement Learning with Sparse Rewards
    Daniele Morra, Learning Body Shape Variation in Physics-based Characters
    Luca Girardi, Cooperative Multi-Agent Control Using Deep Reinforcement Learning
    Leonardo Lai, Control of a Quadrotor with Reinforcement Learning





  • Final Projects & Seminars

    Successful course completion will be assessed by either a seminar or a coding project.

    Final course seminars will be discussed by SSSA students on Wednsday the 21st of October 2020, h. 14.30-18.30

    Discussions will be on aired on Ms Teams.

    The program for the discussion day is the following (subject to last minute change in presentation ordering).

    14.30 -    Federico Dragoni, Q-learning
    14.55 -    Lorenzo Ferrini, Proximal Policy Optimization Algorithms
    15.20 -    Mario Bonsembiante, Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
    15.45 -    Christian Esposito, Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
    16.10 -    Marco Biasizzo, SQIL - Imitation Learning via Reinforcement Learning with Sparse Rewards
    16.35 -    Daniele Morra, Learning Body Shape Variation in Physics-based Characters
    17.00 -    Luca Girardi, Cooperative Multi-Agent Control Using Deep Reinforcement Learning
    17.25 -    Leonardo Lai, Control of a Quadrotor with Reinforcement Learning



  • Bibliography

    1. Ian Goodfellow and Yoshua Bengio and Aaron Courville , Deep Learning, MIT Press, Free online version
    2. David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, Free online version
    3. CJCH Watkins, P Dayan, Q-learning, Machine Learning, 1992, PDF
    4. Mnih et al,Human-level control through deep reinforcement learning, Nature, 2015, PDF
    5. van Hasselt et al, Deep Reinforcement Learning with Double Q-learning, AAAI, 2015, PDF
    6. Wang et al, Dueling Network Architectures for Deep Reinforcement Learning, ICML, 2016, PDF
    7. Schaul et al, Prioritized Experience Replay, ICLR, 2016, PDF
    8. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 1992, PDF
    9. Sutton et al, Policy gradient methods for reinforcement learning with function approximation, NIPS, 2000, PDF
    10. Peters & Schaal, Reinforcement learning of motor skills with policy gradients, Neural Networks, 2008, PDF
    11. Mnih et al, Asynchronous methods for deep reinforcement learning, ICLR, 2016, PDF
    12. Lillicrap et al., Continuous control with deep reinforcement learning, ICLR, 2016, PDF
    13. Gu et al. Q-Prop: sample-efficient policy gradient with an off-policy critic, ICLR, 2017, PDF
    14. Schulman et al, Trust Region Policy Optimization, ICML, 2015, PDF
    15. Duan et al, Benchmarking Deep Reinforcement Learning for Continuous Control, ICML, 2016, PDF
    16. Kocsis and Szepesvari, Bandit based Monte-Carlo planning, ECML, 2006, PDF
    17. Gelly and Silver, Combining Online and Offline Knowledge in UCT, ICML, 2017, PDF
    18. Silver et al, Mastering the game of Go with deep neural networks and tree search, Nature, 2016, Online
    19. Silver et al, Mastering the game of Go without human knowledge, Nature, 2017, Online
    20. Auer et al, Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002, PDF
    21. Dudik et al, Efficient Optimal Learning for Contextual Bandits, ICML, 2011, PDF
    22. Agarwal et al, Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits, ICML, 2014, PDF
    23. Lihong Li, Generalized Thompson Sampling for Contextual Bandits, PDF
    24. Russo et al, A tutorial on Thompson Sampling, PDF
    25. Stadie et al, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, 2016, PDF
    26. Pomerleau, ALVINN: An autonomous Land vehicle in a neural Network”, NIPS 1989, PDF
    27. Bojarski et al., End to End Learning for Self-Driving Cars, 2016, PDF
    28. Ross et al, A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning, AISTATS 2011, PDF
    29. Rusu et al, Policy distillation, ICLR 2016, PDF
    30. Levine and Koltun, Guided policy search, ICML 2013, PDF
    31. Reddy et al, SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards, ICLR 2020, PDF
    32. Ho and Ermon, Generative Adversarial Imitation Learning, NIPS 2016, PDF
    33. Ng and Russell, Algorithms for Inverse Reinforcement Learning, ICML 2000, PDF
    34. Wulfmeier et al, Maximum Entropy Deep Inverse Reinforcement Learning, 2015, PDF
    35. Finn et al, Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, ICML 2016, PDF
    36. Hausman, Multi-Modal Imitation Learning from UnstructuredDemonstrations using Generative Adversarial Nets, NIPS 2017, PDF