Section outline
-
Code: 0075A, Credits (ECTS): 9, Semester: 2, Official Language: English
Instructor: Davide Bacciu - Co-Instructor: Riccardo Massidda
Contact: Instructor's email at UNIPI
Office: Room 331O, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa
Office Hours: (email to arrange meeting)
-
Weekly Schedule
The course is held on the second term. The schedule for A.A. 2025/26 is provided in table below.
The first lecture of the course will be ON FEBRUARY 19th 2026 h. 11.00. The course will be in person, with lecture videos being recorded and made available a-posteriori to course students (with no guarantee of quality nor completeness).
Day Time Tuesday 11.15-13.00 (Room C1 - Polo Fibonacci) Wednesday 16.15-18.00 (Room C1 - Polo Fibonacci) Thursday 11.15-13.00 (Room C1 - Polo Fibonacci) Course Prerequisites
Course prerequisites include knowledge of machine learning fundamentals ("Machine Learning" course); knowledge of elements of probability and statistics, calculus and optimization algorithms ("Computational mathematics for learning and data analysis" course). Previous programming experience with Python is a plus for the practical lectures.
Course Overview
The course introduces students to the analysis and design of deep and generative learning models and discusses how to realize advanced applications exploiting advanced machine learning techniques. Particular focus will be given to methodological aspects and foundational knowledge of modern neural networks and machine learning. The course is targeted at students who are pursuing specializations in Artificial Intelligence and Machine Learning, but it is of interest for mathematicians, physicists, data scientist and information retrieval specialists, roboticists and those with a bioinformatics curricula.
The course is articulated in five parts. The first part introduces basic concepts and foundations on probabilistic models and causality, followed by a module dealing with formalization of learning in the probalistic paradigm, models and methods leveraging approximated inference in learning. The third and fourth parts will delve into deep learning and deep generative learning models, respectively. The final part of the course will present selected recent works, advanced models and applications in modern machine learning.
Presentation of the theoretical models and associated algorithms will be complemented by introductory classes on the most popular software libraries used to implement them.
The official language of the course is English: all materials, references and books are in English. Lecture slides will be made available here, together with suggested readings.
Topics covered - graphical models (Bayesian networks, Markov Random Fields), causality, Expectation Maximization, approximated inference and learning (variational, sampling), Bayesian models, fundamentals of deep learning (CNN, gated recurrent networks, attention, transformers), propagation issues in neural networks, generative deep learning (autoencoders, VAE, GANs, diffusion models, normalizing flows, score-based models), deep graph networks, principles of reinforcement learning and deep reinforcement learning, ML and deep learning libraries.
Textbooks and Teaching Materials
Much of the course content will be available through lecture slides and associated bibliographic references. Slides will be integrated by course notes.
We will use two main textbooks, one covering foundational knowledge on probabilistic models and the other more oriented towards deep learning models.
Note that all books have an electronic version freely available online.
[BRML] David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press (PDF)
[SD] Simon J.D. Prince, Understanding Deep Learning, MIT Press (2023) (book online and additional materials)
-
Introduction to the course philosophy, its learning goals and expected outcomes. We will discuss prospectively the overall structure of the course and the interelations between its parts. Exam modalities and schedule are also discussed.
Date Topic References 1 19/02/2026
(11-13)Introduction to the course
Motivations and aim; course housekeeping (exams, timetable, materials); introduction to generative and deep learning. -
The module introduces probabilistic and causal models. We will refresh useful knowledge from probability and statistics, and introduce fundamental concepts for working with probabilistic models, including conditional independence, d-separation, and causality. We will discuss the graphical models formalism to represent probabilistic relationships in directed/undirected models.
Date Topic References Additional Material 2 20/02/2026
(16-18)ROOM E
Introduction to probabilistic learning & models
module overview; basic concepts of probability and statistics; random variables and probability distributions; Bayes rule, marginalization, family of distributions and their properties; inference in probabilistic models.
RECOVERY LECTURE FOR 18/02/2026
[BRML] Ch. 1
[DL] Appendix C1.1-C3.3
3 24/02/2026
(11-13)Graphical models: representation
Bayesian networks; representing joint distributions; conditional independence;Lecture by Riccardo Massidda
[BRML] Sect. 3.1, 3.2 and 3.3.1 (conditional independence) 4 25/02/2026
(16-18)Graphical models: Markov properties
d-separation; Markov properties; faithfulness; Markov modelsLecture by Riccardo Massidda
[BRML] Sect. 3.3 (conditional independence, d-separation)
[BRML] Sect. 4.1, 4.2.0-4.2.2 (Undirected Models and Markov Properties)
[BRML] Sect. 4.5 (Expressiveness)5 26/02/2026
(11-13)Graphical Causal Models
causation and correlation; causal Bayesian networks; structural causal models; causal Inference
Lecture by Riccardo Massidda
6 03/03/2026
(11-13)Structure Learning and Causal Discovery
constraint-based methods; score-based methods; parametric assumptions
Lecture by Riccardo Massidda
-
The module introduces learning in probabilistic models. We will discuss fundamental algoritms and concepts, including Expectation-Maximization, sampling and variational approximations, and we will study relevant models from the three fundamental paradigms of probabilistic learning, namely Bayesian networks, Markov networks and dynamic models. Models covered include: Hidden Markov Models, Markov Random Fields, Boltzmann Machines, Latent topic models.
Date Topic References Additional Material 7 04/03/2026
(16-18)Learning with fully observed variables
learning as inference; flavors of probabilistic learning; Maximum Likelihood learning with fully observed variables; Naïve Bayes
[BRML] Sect. 9.1.1-9.1.1.3, 9.3, 10.1, 10.2 A dedicated chapter to deepen knowledge on fitting distributions by ML or MAP. 8 05/03/2026
(11-13)Learning with hidden variables
Latent/hidden variable models, maximum likelihood learning with latent variables;
Expectation-Maximization algorithm; exact learning in mixture models[BRML] Sect. 11.1 (learning with latent variables)
[BRML] 20.1, 20.2.1, 20.3 (mixture models)
9 10/03/2026
(11-13)Hidden Markov Models - Part I
lgenerative models for sequential data; inference problems on sequential data; forward-backward algorithm;[BRML] Sect. 23.1.0 (Markov Models)
[BRML] Sect. 23.2.0-23.2.4 (HMM and forward backward)
Additional Readings
[1] A classical tutorial introduction to HMMs10 11/03/2026
(16-18)Hidden Markov Models - Part II
EM learning in HMMs; Viterbi algorithm; advanced models
[BRML] Sect. 23.3.1-23.3.4 (EM and learning)
[BRML] Sect. 23.2.6 (Viterbi)
Software
HMMLearn - Scikit-like library for HMMs
HMMS - discrete and continuous time HMMs
11 12/03/2026
(11-13)Variational Inference
learning and inference in intractable latent variable models; expectation lower-bound; generalized expectation maximization
BRML] Sect. 11.2.1 (Variational EM)
12 17/03/2026
(11-13)Latent Dirichlet Allocation (LDA)
latent topic models; probabilities as random variables; Dirichlet distribution; LDA learning by variational inference; LDA applications
[BRML] Sect. 20.4-20.6.1 (LDA)
Additional Readings
[2] LDA foundation paper
[3] A gentle introduction to latent topic models
Sofware
- A tutorial implementation of LDA in R that does not use any high-level API
- A step-by-step tutorial on using LDA for text topic modelling in Gensim
- A chatty demo on using LDA for image understanding
13 18/03/2026
(16-18)Sampling methods
sampling fundamentals; ancestral sampling; Gibbs Sampling; approximated LDA parameter learning via sampling
[BRML] Sect. 27.1 (sampling), Sect. 27.2 (ancestral sampling), Sect. 27.3 (Gibbs sampling)
Additional Readings
[4] A step-by-step derivation of collapsed Gibbs sampling for LDA14 19/03/2026
(11-13)Markov Random Fields
learning in undirected graphical models; conditional random fields; restricted Boltzmann machine; contrastive divergence and Gibbs sampling in use
[BRML] Sect. 4.2.2, 4.2.5 (MRF)
[BRML] Sect. 4.4 (Factor Graphs)
[BRML] Sect. 5.1.1 (Variable Elimination and Inference on Chain)
[BRML] Sect. 9.6.0, 9.6.1, 9.6.4, 9.6.5 (Learning in MRF/CRF)
Additional Readings
[5] A clean and clear introduction to RBM from its author -
The module presents the fundamental concepts, challenges, architectures and methodologies of deep learning. We introduce the learning of neural representations from data of heterogenous nature (vectorial, image, sequential), discussing inductive biases for each data type along with relevant foundational architectures. We will discuss key concepts to understand and address issues in deep neural architectures, with focus on information propagation and stability of learning processes. Models covered in this module include: convolutional neural networks, long-short term memory, gated recurrent units, randomized networks, sequence-to-sequence, neural attention, transformers, neural ODEs. Methodological lectures will be complemented by introductory seminars to Keras-TF and Pytorch.
Date Topic References Additional Material 15 23/03/2026
(11-13)Convolutional Neural Networks I
Introduction to the deep learning module; introduction to CNN; basic CNN elements[SD] Chapter 10 Additional Readings
[6-10] Original papers for LeNet, AlexNet, VGGNet, GoogLeNet and ResNet.
16 24/03/2026
(16-18)Convolutional Neural Networks II
CNN training, notable CNN architectures; advanced topics (deconvolution, causal convolutions, dilated convolutions); vision tasks with CNNs[SD] Chapter 10 (CNNs)
SD] Chapter 11 (residual nets)
Additional Readings
[11] Complete summary of convolution arithmetics[12] Seminal paper on batch normalization
[13] Seminal paper on dilated convolutions
[14] Object detection by Faster RCNN
16b 25/03/2026
(11-13)Lecture time reduced by assessment student survey.
Remaining time dedicated to conclude CNN lecture.
17 31/03/2026
(11-13)Information Propagation in Deep Networks
sequential data processing; RNNs refresher; exploding and vanishing gradient problem; information propagation beyond the gradient
Additional Readings
[15] Paper describing gradient vanish/explosion18 01/04/2026
(16-18)Advanced Recurrent Models
gating neurons; LSTM, GRU; randomized RNNs; autoregressive modeling with RNNs
Coverage of Prince book on this lecture is inadequate (for reasons oblivious to me). You can use the course slides for this topic, and if you like you can integrate those with chapter 10 from the Deep Learning Book. Additional Readings
[16] Original LSTM paper[17] An historical view on gated RNN
[18] Gated recurrent units paper
[19] Seminal paper on dropout regularization
Software
- A simple introduction to generative use of LSTM
02/04/2026-07/04/2026 - EASTER BREAK
Lectures resume on: 08/04/2026
19 08/04/2026
(16-18)Attention-based architectures I: recurrent encoder-decoder
sequence-to-sequence task; encoder-decoder architectures; cross-attention mechanism; general view on neural attention
Coverage of Prince book on this lecture is inadequate. You can use the course slides. Additional Readings
[20,21] Seminal papers on encoder-decoder architectures with cross-attention
20 09/04/2026
(11-13)Attention-based architectures II: Transformers
self-attention; transformer models; inductive bias; gradient propagation; self-supervised training
[SD] Chapter 12 Additional Readings
[22] Seminal paper on Transformers
[23] Transformers in vision21 14/04/2026
(11-13)Coding practice I - Lecture by Riccardo Massidda
22 15/04/2026
(16-18)Coding practice II - Lecture by Riccardo Massidda
-
We close the gap between neural networks and probabilistic learning by discussing generative deep learning models. We introduce a taxonomy of the existing generative deep learning approaches and study in-depth relevant families of models for each element of the taxonomy, including: autoregressive generation, variational autoencoders, generative adversarial networks, diffusion models, flow-based methods and score matching-
Date Topic References Additional Material 23 16/04/2026
(11-13)Neural Autoencoders
Introduction to the generative deep learning module; generative models taxonomy; undercomplete neural autoencoders; deep autoencoders.[SD] Coverage of the Prince book on this lecture is inadequate but you can use the lecture slides and complement with the additional material if necessary. (e.g. chapter 14 of the deep learning book). Additional Readings
[24] DBN: the paper that started deep learning
[25] Deep Boltzmann machines paper
[26] Review paper on deep generative models
[27] Long review paper on autoencoders from the perspective of representation learning24 21/04/2026
(11-13)Variational Autoencoders
explicit distribution models; score learning in DAE; neural ELBO; variational approximation; reparameterization trick; latent space properties
[SD] Chapter 14
[SD] Chapter 17
25 22/04/2026
(16-18)Generative Adversarial Networks
learning a sampling process; adversarial learning principles; wasserstein GANs; conditional generation; notable GANs; adversarial autoencoders
[SD] Chapter 15 26 23/04/2026
(11-13)Coding practice III - Lecture by Riccardo Massidda
27 28/04/2026
(11-13)Normalizing flow models I
tractable explicit likelihood; autoregressive generative learning; probabilistic change of variable, forward/normalization pass; from 1D to multidimensional flows.
[SD] Chapter 16 29/04/2026
(16-18)LECTURE CANCELLED DUE TO STUDENTS' ASSEMBLY
28 30/04/2026
(11-13)Normalizing flow models II
coupling flows; masking & squeezing; invertible convolutions; autoregressive flows; residual & continous normalizing flows
[SD] Chapter 16 05/05/2026
(11-13)LECTURE CANCELLED DUE TO LECTURER UNAVAILABILITY
29 06/05/2026
(16-18)Diffusion models
noising-denoising processes; kernelized diffusion; latent space diffusion;
-
Course grading will follow preferentially a modality comprising in-itinere assignments and a final oral exam. In-itinere assignments waive the final project.
Midterms are only available to students regularly following the course: mechanism to control attendance will be in place. Students who don't follow regularly the course can use the traditional exam modality.
Midterm Assignments
Midterms consist of interim coding assignments involving a quick and dirty (but working) implementation of models (e.g. colab notebook) introduced during the lectures (with and without the use of supporting deep learning libraries).
There will be 3 interim midterms, which will have to be developed individually, roughly aligned with the conclusion of the major modules of the course (expect midterms to be scheduled roughly every 4 weeks).
There will also be a final assigment (midterm n. 4) which will consist in a presentation of a recent research paper on topics/models related to the course content. This final assignment will be executed in groups.
Coding midterms will be automatically tested for correctness but not scored. During the final assignment the instructors will ask questions about the paper to determine knowledge of the paper: again no score provided, only pass/fail.
Oral Exam
The oral examination will test knowledge of the course contents (models, algorithms and applications).
Exam Grading (with Midterms)
The final exam vote is given by the oral grade. The midterms only wave the final project but do not contribute to the grade. In other words you can only fail or pass a midterm. You need to pass all midterms in order to succesfully wave the final project.
Traditional Exam Modality (No Midterms / Non attending students)
Working students, those not attending lectures, those who have failed midterms or simply do not wish to do them, can complete the course by delivering a final project and an oral exam. Final project topics will be released in the final weeks of the course: contact the instructor by mail to arrange choice of the topics once these are published.
The final project concerns the realization of a software implementing and validating a non-trivial learning model and/or an AI-based application relevant for the course. The content of the final project will be discussed in front of the instructor and anybody interested during the oral examination. Students are expected to prepare slides for a 15 minutes presentation which should summarize the ideas, models and results in the report. The exposition should demonstrate a solid understanding of the main ideas in the report.
Grade for this exam modality is determined as
\( G = 0.5 \cdot (G_P + G_O) \)
where \( G_P \in [1,30] \) is the project grade and \( G_O \in [1,32] \) is the oral grade
-
-
Opened: Monday, 9 March 2026, 6:00 PMDue: Monday, 23 March 2026, 6:00 PM
-
Opened: Thursday, 2 April 2026, 9:00 AMDue: Saturday, 18 April 2026, 11:00 PM
-
-
- Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, pages 257-286, Online Version
- D. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003
- D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012, Free Online Version
- W. M. Darling, A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling, Lecture notes
- Geoffrey Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Technical Report 2010-003, University of Toronto, 2010
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel. Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems,
NIPS, 1989 - A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, NIPS, 2012
- S. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, ICLR 2015, Free Online Version
- C. Szegedy et al, Going Deeper with Convolutions, CVPR 2015, Free Online Version
- K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. CVPR 2016, Free Online Version
- V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, Arxiv
- S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML 2013, Arxiv
- F. Yu et al, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016, Arxiv
- S. Ren et al, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NeurIPS 2015
- Y. Bengio, P. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult. TNN, 1994, Free Online Version
- S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 1997, Free Online Version
- K. Greff et al, LSTM: A Search Space Odyssey, TNNLS 2016, Arxiv
- C. Kyunghyun et al, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP 2014, Arxiv
- N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
- Bahdanau et al, Neural machine translation by jointly learning to align and translate, ICLR 2015, Arxiv
- Xu et al, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML 2015, Arxiv
- A. Vaswan et al, Attention Is All You Need, NIPS 2017, Arxiv
- A. Dosovitskiy et al, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021
- G.E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science 313.5786 (2006): 504-507, Free Online Version
- G.E. Hinton, R. R. Salakhutdinov. Deep Boltzmann Machines. AISTATS 2009, Free online version.
-
R. R. Salakhutdinov. Learning Deep Generative Models, Annual Review of Statistics and Its Application, 2015, Free Online Version
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 35(8) (2013): 1798-1828, Arxiv.