Section outline

  • Code: 0075A, Credits (ECTS): 9, Semester: 2, Official Language: English

    Instructor: Davide Bacciu  - Co-Instructor: Riccardo Massidda

    Contact: Instructor's email at UNIPI

    Office: Room 331O, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa

    Office Hours: (email to arrange meeting)

  • Weekly Schedule

    The course is held on the second term. The schedule for A.A. 2025/26 is provided in table below.

    The first lecture of the course will be ON FEBRUARY 19th 2026 h. 11.00. The course will be in person, with lecture videos being recorded and made available a-posteriori to course students (with no guarantee of quality nor completeness).

    Day Time
    Tuesday 11.15-13.00 (Room C1 - Polo Fibonacci)
    Wednesday 16.15-18.00 (Room C1 - Polo Fibonacci)
    Thursday 11.15-13.00 (Room C1 - Polo Fibonacci)

     

    Course Prerequisites

    Course prerequisites include knowledge of machine learning fundamentals ("Machine Learning" course); knowledge of elements of probability and statistics, calculus and optimization algorithms ("Computational mathematics for learning and data analysis" course).  Previous programming experience with Python is a plus for the practical lectures.

    Course Overview

    The course introduces students to the analysis and design of deep and generative learning models and discusses how to realize advanced applications exploiting advanced machine learning techniques. Particular focus will be given to methodological aspects and foundational knowledge of modern neural networks and machine learning. The course is targeted at students who are pursuing specializations in Artificial Intelligence and Machine Learning, but it is of interest for mathematicians, physicists, data scientist and information retrieval specialists, roboticists and those with a bioinformatics curricula.

    The course is articulated in five parts. The first part introduces  basic concepts and foundations on probabilistic models and causality, followed by a module dealing with formalization of learning in the probalistic paradigm, models and methods leveraging approximated inference in learning.  The third and fourth parts will delve into deep learning and deep generative learning models, respectively.   The final part of the course will present selected recent works, advanced models and applications in modern machine learning.

    Presentation of the theoretical models and associated algorithms will be complemented by introductory classes on the most popular software libraries used to implement them.

    The official language of the course is English: all materials, references and books are in English. Lecture slides will be made available here, together with suggested readings.

    Topics covered - graphical models (Bayesian networks, Markov Random Fields), causality, Expectation Maximization, approximated inference and learning (variational, sampling), Bayesian models, fundamentals of deep learning (CNN, gated recurrent networks, attention, transformers), propagation issues in neural networks, generative deep learning (autoencoders, VAE, GANs, diffusion models, normalizing flows, score-based models), deep graph networks, principles of reinforcement learning and deep reinforcement learning, ML and deep learning libraries.

    Textbooks and Teaching Materials

    Much of the course content will be available through lecture slides and associated bibliographic references. Slides will be integrated by course notes.

    We will use two main textbooks, one covering foundational knowledge on probabilistic models and the other more oriented towards deep learning models.

    Note that all books have an electronic version freely available online.

    [BRML] David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press (PDF)

    [SD] Simon J.D. Prince, Understanding Deep Learning, MIT Press (2023) (book online and additional materials)

  • Introduction to the course philosophy, its learning goals and expected outcomes. We will discuss prospectively the overall structure of the course and the interelations between its parts. Exam modalities and schedule are also discussed.

      Date Topic References
    1 19/02/2026
    (11-13)

    Introduction to the course

    Motivations and aim; course housekeeping (exams, timetable, materials); introduction to generative and deep learning.
     

     

  • The module introduces probabilistic and causal models.  We will refresh useful knowledge from probability and statistics, and introduce fundamental concepts for working with probabilistic models, including conditional independence, d-separation, and causality. We will discuss the graphical models formalism to represent probabilistic relationships in directed/undirected models. 

      Date Topic References
    Additional Material
    2

    20/02/2026
    (16-18)

    ROOM E

    Introduction to probabilistic learning & models

    module overview; basic concepts of probability and statistics; random variables and probability distributions; Bayes rule, marginalization, family of distributions and their properties; inference in probabilistic models. 

    RECOVERY LECTURE FOR 18/02/2026

     [BRML] Ch. 1 

     [DL] Appendix C1.1-C3.3

     
    3 24/02/2026
    (11-13)

    Graphical models: representation
    Bayesian networks; representing joint distributions; conditional independence;

    Lecture by Riccardo Massidda

    [BRML] Sect. 3.1, 3.2 and 3.3.1 (conditional independence)  
    4 25/02/2026
    (16-18)

    Graphical models: Markov properties
    d-separation; Markov properties; faithfulness; Markov models

    Lecture by Riccardo Massidda

    [BRML] Sect. 3.3 (conditional independence, d-separation)

    [BRML] Sect. 4.1, 4.2.0-4.2.2 (Undirected Models and Markov Properties) 
    [BRML] Sect. 4.5 (Expressiveness)

     
    5 26/02/2026
    (11-13)

    Graphical Causal Models 

    causation and correlation; causal Bayesian networks; structural causal models; causal Inference

    Lecture by Riccardo Massidda

       
    6 03/03/2026
    (11-13)

    Structure Learning and Causal Discovery 

    constraint-based methods; score-based methods; parametric assumptions 

    Lecture by Riccardo Massidda

       

     

  • The module introduces learning in probabilistic models.  We will discuss fundamental algoritms and concepts, including Expectation-Maximization, sampling and variational approximations, and we will study relevant models from the three fundamental paradigms of probabilistic learning, namely Bayesian networks, Markov networks and dynamic models.  Models covered include: Hidden Markov Models, Markov Random Fields, Boltzmann Machines,  Latent topic models.

      Date Topic References
    Additional Material
    7

    04/03/2026
    (16-18)

    Learning with fully observed variables

    learning as inference;  flavors of probabilistic learning; Maximum Likelihood learning with fully observed variables; Naïve Bayes

    [BRML] Sect. 9.1.1-9.1.1.3, 9.3, 10.1, 10.2 A dedicated chapter to deepen knowledge on fitting distributions by ML or MAP.
    8

    05/03/2026
    (11-13)

    Learning with hidden variables

    Latent/hidden variable models, maximum likelihood learning with latent variables; 
    Expectation-Maximization algorithm; exact learning in mixture models

    [BRML] Sect. 11.1 (learning with latent variables)

    [BRML] 20.1, 20.2.1, 20.3 (mixture models)

     
    9

    10/03/2026
    (11-13)

    Hidden Markov Models  - Part I
    lgenerative models for sequential data;  inference problems on sequential data; forward-backward algorithm;  

    [BRML] Sect. 23.1.0 (Markov Models) 

    [BRML] Sect. 23.2.0-23.2.4 (HMM and forward backward) 

    Additional Readings
    [1]  A classical tutorial introduction to HMMs
    10

    11/03/2026
    (16-18)

    Hidden Markov Models  - Part II

    EM learning in HMMs; Viterbi algorithm; advanced models

    [BRML] Sect. 23.3.1-23.3.4 (EM and learning)

    [BRML] Sect. 23.2.6 (Viterbi)

    Software

    HMMLearn - Scikit-like library for HMMs

    HMMS - discrete and continuous time HMMs

    11

    12/03/2026
    (11-13)

    Variational Inference

    learning and inference in intractable latent variable models; expectation lower-bound; generalized expectation maximization

    BRML] Sect. 11.2.1 (Variational EM)

     
    12

    17/03/2026
    (11-13)

    Latent Dirichlet Allocation (LDA)

    latent topic models; probabilities as random variables; Dirichlet distribution; LDA learning by variational inference; LDA applications

    [BRML] Sect. 20.4-20.6.1  (LDA)

    Additional Readings
    [2] LDA foundation paper 
    [3] A gentle introduction to latent topic models

    Sofware
    13

    18/03/2026
    (16-18)

    Sampling methods

    sampling fundamentals; ancestral sampling; Gibbs Sampling; approximated LDA parameter learning via sampling 

    [BRML] Sect. 27.1 (sampling), Sect. 27.2 (ancestral sampling), Sect. 27.3 (Gibbs sampling)

    Additional Readings
    [4] A step-by-step derivation of collapsed Gibbs sampling for LDA
    14

    19/03/2026
    (11-13)

    Markov Random Fields 

    learning in undirected graphical models; conditional random fields restricted Boltzmann machine; contrastive divergence and Gibbs sampling in use

    [BRML] Sect. 4.2.2, 4.2.5 (MRF)

    [BRML] Sect. 4.4 (Factor Graphs)

    [BRML] Sect. 5.1.1 (Variable Elimination and Inference on Chain) 

    [BRML] Sect. 9.6.0, 9.6.1, 9.6.4, 9.6.5 (Learning in MRF/CRF)

    Additional Readings
    [5] A clean and clear introduction to RBM from its author

  • The module presents the fundamental concepts, challenges, architectures and methodologies of deep learning. We introduce the learning of neural representations from data of heterogenous nature (vectorial, image, sequential), discussing inductive biases for each data type along with relevant foundational architectures. We will discuss key concepts to understand and address issues in deep neural architectures, with focus on information propagation and stability of learning processes. Models covered in this module include:  convolutional neural networks, long-short term memory, gated recurrent units, randomized networks, sequence-to-sequence, neural attention, transformers, neural ODEs. Methodological lectures will be complemented by introductory seminars to Keras-TF and Pytorch.

      Date Topic References
    Additional Material
    15

    23/03/2026
    (11-13)

    Convolutional Neural Networks I
    Introduction to the deep learning module; i
    ntroduction to CNN; basic CNN elements

    [SD] Chapter 10

    Additional Readings

    [6-10] Original papers for LeNet, AlexNet, VGGNet, GoogLeNet and ResNet.

    16

    24/03/2026
    (16-18)

    Convolutional Neural Networks II
    CNN training, notable CNN architectures; advanced topics (deconvolution, causal convolutions, dilated convolutions); vision tasks with CNNs

    [SD] Chapter 10 (CNNs)

    SD] Chapter 11 (residual nets)

    Additional Readings
    [11] Complete summary of convolution arithmetics

    [12] Seminal paper on batch normalization

    [13] Seminal paper on dilated convolutions

    [14] Object detection by Faster RCNN

    16b

    25/03/2026
    (11-13)

    Lecture time reduced by assessment student survey.

    Remaining time dedicated to conclude CNN lecture.

     

     

    17

    31/03/2026
    (11-13)

    Information Propagation in Deep Networks

    sequential data processing; RNNs refresher; exploding and vanishing gradient problem; information propagation beyond the gradient

      Additional Readings
    [15] Paper describing gradient vanish/explosion
    18

    01/04/2026
    (16-18)

    Advanced Recurrent Models

    gating neurons; LSTM, GRU; randomized RNNs; autoregressive modeling with RNNs

    Coverage of Prince book on this lecture is inadequate (for reasons oblivious to me). You can use the course slides for this topic, and if you like you can integrate those with chapter 10 from the Deep Learning Book.

    Additional Readings
    [16] Original LSTM paper

    [17] An historical view on gated RNN

    [18] Gated recurrent units paper

    [19] Seminal paper on dropout regularization 

    Software 

       

    02/04/2026-07/04/2026 - EASTER BREAK

    Lectures resume on: 08/04/2026

       
    19

    08/04/2026
    (16-18)

    Attention-based architectures I: recurrent encoder-decoder

    sequence-to-sequence task; encoder-decoder architectures; cross-attention mechanism; general view on neural attention

    Coverage of Prince book on this lecture is inadequate. You can use the course slides.

    Additional Readings

    [20,21] Seminal papers on encoder-decoder architectures with cross-attention

     20 09/04/2026
    (11-13)

    Attention-based architectures II: Transformers

    self-attention; transformer models; inductive bias; gradient propagation; self-supervised training

    [SD] Chapter 12

    Additional Readings

    [22] Seminal paper on Transformers 
    [23] Transformers in vision

    21 14/04/2026
    (11-13)

    Coding practice I - Lecture by Riccardo Massidda

     

     

    22 15/04/2026
    (16-18)

    Coding practice II - Lecture by Riccardo Massidda

     

     

  • We close the gap between neural networks and probabilistic learning by discussing generative deep learning models. We introduce a taxonomy of the existing generative deep learning approaches and study in-depth relevant families of models for each element of the taxonomy, including: autoregressive generation, variational autoencoders, generative adversarial networks, diffusion models, flow-based methods and score matching-

      Date Topic References
    Additional Material
    23

    16/04/2026
    (11-13)

    Neural Autoencoders
    Introduction to the generative deep learning module; generative models taxonomy; undercomplete neural autoencoders; deep autoencoders.

    [SD] Coverage of the Prince book on this lecture is inadequate but you can use the lecture slides and complement with the additional material if necessary. (e.g. chapter 14 of the deep learning book).

    Additional Readings

    [24] DBN: the paper that started deep learning
    [25] Deep Boltzmann machines paper
    [26] Review paper on deep generative models
    [27] Long review paper on autoencoders from the perspective of representation learning

    24

    21/04/2026
    (11-13)

    Variational Autoencoders

    explicit distribution models; score learning in DAE; neural ELBO; variational approximation; reparameterization trick; latent space properties

    [SD] Chapter 14 

    [SD] Chapter 17

     

    25

    22/04/2026
    (16-18)

    Generative Adversarial Networks

    learning a sampling process; adversarial learning principles; wasserstein GANs; conditional generation; notable GANs; adversarial autoencoders

    [SD] Chapter 15

     

    26

    23/04/2026
    (11-13)

    Coding practice III - Lecture by Riccardo Massidda

     

     

    27

    28/04/2026
    (11-13)

    Normalizing flow models I

    tractable explicit likelihood; autoregressive generative learning; probabilistic change of variable, forward/normalization pass; from 1D to multidimensional flows.

    [SD] Chapter 16

     

     

    29/04/2026
    (16-18)

    LECTURE CANCELLED DUE TO STUDENTS' ASSEMBLY

     

     

    28

    30/04/2026
    (11-13)

    Normalizing flow models II

    coupling flows; masking & squeezing; invertible convolutions; autoregressive flows; residual & continous normalizing flows

    [SD] Chapter 16

     

     

    05/05/2026
    (11-13)

    LECTURE CANCELLED DUE TO LECTURER UNAVAILABILITY

     

     

    29

    06/05/2026
    (16-18)

    Diffusion models

    noising-denoising processes; kernelized diffusion; latent space diffusion; 

     

     

  • Course grading will follow preferentially a modality comprising in-itinere assignments and a final oral exam. In-itinere assignments waive the final project.

    Midterms are only available to students regularly following the course: mechanism to control attendance will be in place. Students who don't follow regularly the course can use the traditional exam modality.

    Midterm Assignments

    Midterms consist of interim coding assignments involving a quick and dirty (but working) implementation of models (e.g. colab notebook) introduced during the lectures (with and without the use of supporting deep learning libraries).

    There will be 3 interim midterms, which will have to be developed individually, roughly aligned with the conclusion of the major modules of the course (expect midterms to be scheduled roughly every 4 weeks). 

    There will also be a final assigment (midterm n. 4) which will consist in a presentation of a recent research paper on topics/models related to the course content. This final assignment will be executed in groups.

    Coding midterms will be automatically tested for correctness but not scored. During the final assignment the instructors will ask questions about the paper to determine knowledge of the paper: again no score provided, only pass/fail.

    Oral Exam

    The oral examination will test knowledge of the course contents (models, algorithms and applications).

    Exam Grading (with Midterms)

    The final exam vote is given by the oral grade. The midterms only wave the final project but do not contribute to the grade. In other words you can only fail or pass a midterm. You need to pass all midterms in order to succesfully wave the final project.

    Traditional Exam Modality (No Midterms / Non attending students)

    Working students, those not attending lectures, those who have failed midterms or simply do not wish to do them, can complete the course by delivering a final project and an oral exam.  Final project topics will be released in the final weeks of the course: contact the instructor by mail to arrange choice of the topics once these are published.

    The final project concerns the realization of a software implementing and validating a non-trivial learning model and/or an AI-based application relevant for the course. The content of the final project will be discussed in front of the instructor and anybody interested during the oral examination. Students are expected to prepare slides for a 15 minutes presentation which should summarize the ideas, models and results in the report. The exposition should demonstrate a solid understanding of the main ideas in the report.

    Grade for this exam modality is determined as

     \( G = 0.5 \cdot (G_P + G_O) \)

    where \( G_P \in [1,30] \) is the project grade and \( G_O \in [1,32] \) is the oral grade

    1. Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, pages 257-286, Online Version
    2. D. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003
    3. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012, Free Online Version
    4. W. M. Darling, A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling, Lecture notes
    5. Geoffrey Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Technical Report 2010-003, University of Toronto, 2010
    6. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel. Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems, NIPS, 1989
    7. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, NIPS, 2012
    8. S. Simonyan and A. Zisserman.  Very deep convolutional networks for large-scale image recognition, ICLR 2015, Free Online Version
    9. C. Szegedy et al,  Going Deeper with Convolutions, CVPR 2015, Free Online Version
    10. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. CVPR 2016, Free Online Version
    11. V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, Arxiv
    12. S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML 2013,  Arxiv
    13. F. Yu et al, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016, Arxiv
    14. S. Ren et al, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NeurIPS 2015
    15. Y. Bengio, P. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult. TNN, 1994, Free Online Version
    16. S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 1997, Free Online Version
    17. K. Greff et al, LSTM: A Search Space Odyssey, TNNLS 2016, Arxiv
    18. C. Kyunghyun et al, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP 2014, Arxiv
    19. N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
    20. Bahdanau et al, Neural machine translation by jointly learning to align and translate, ICLR 2015, Arxiv
    21. Xu et al, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML 2015, Arxiv
    22. A. Vaswan et al, Attention Is All You Need, NIPS 2017, Arxiv
    23. A. Dosovitskiy et al,  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021
    24. G.E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science 313.5786 (2006): 504-507, Free Online Version
    25. G.E. Hinton, R. R. Salakhutdinov. Deep Boltzmann Machines. AISTATS 2009, Free online version.
    26. R. R. Salakhutdinov. Learning Deep Generative Models, Annual Review of Statistics and Its Application, 2015, Free Online Version
    27. Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 35(8) (2013): 1798-1828, Arxiv.