Section outline

  • Code: 760AA, Credits (ECTS): 9, Semester: 2, Official Language: English

    Instructor: Davide Bacciu 

    Contact: email - phone 050 2212749

    Office: Room 331, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa

    Office Hours: (email to arrange meeting)

  • Weekly Schedule

    The course is held on the second term. The schedule for A.A. 2023/24 is provided in table below.

    The first lecture of the course will be ON FEBRUARY 20th 2024 h. 11.00. The course will be hybrid, both in person and online on the dedicated MS Team.

    Recordings of the lectures will be made available to the students following the course.

    Day Time
    Tuesday 11.00-12.45 (Room E, online)
    Wednesday 16.15-18.00 (Room C1, online)
    Thursday 14.15-16.00 (Room C1, online)


    Objectives

    Course Prerequisites

    Course prerequisites include knowledge of machine learning fundamentals (e.g. covered through ML course). Knowledge of elements of probability and statistics, calculus and optimization algorithms are also expected. Previous programming experience with Python is a plus for the practical lectures.

    Course Overview

    The course introduces students to the analysis and design of advanced machine learning and deep learning models for modern pattern recognition problems and discusses how to realize advanced applications exploiting computational intelligence techniques.

    The course is articulated in five parts. The first part introduces basic concepts and algorithms concerning traditional pattern recognition, in particular as pertains sequence and image analysis. The next two parts introduce advanced models from two major learning paradigms, that are deep neural networks and probabilistic models and their use in pattern recognition applications. The fourth part will cover generative deep learning and the intersection between probabilistic and neural models. The final part of the course will present selected recent works, advanced models and applications of learning models.

    Presentation of the theoretical models and associated algorithms will be complemented by introductory classes on the most popular software libraries used to implement them.

    The course hosts guest seminars by national and international researchers working on the field as well as by companies that are engaged in the development of advanced applications using machine learning models.

    The official language of the course is English: all materials, references and books are in English. Lecture slides will be made available here, together with suggested readings.

    Topics covered -Bayesian learning, graphical models, learning with sampling and variational approximations, fundamentals of deep learning (CNNs, AE, DBN, GRNs), deep learning for machine vision and signal processing, advanced deep learning models (transformers, foundational models, NTMs), generative deep learning (VAE, GANs, diffusion models, score-based models) deep graph networks, reinforcement learning and deep reinforcement learning, signal processing and time-series analysis, image processing, filters and visual feature detectors, pattern recognition applications (machine vision, bio-informatics, robotics, medical imaging, etc), introduction to programming libraries and frameworks.

    Textbooks and Teaching Materials

    The course textbooks are being changed this year. For the sake of continuity of the course in the lectures I will provide reference to both the old sets of books and the new sets of books (whenever double reference is possible). Feel free to use the set of books which you find yourself most confortable with, although I warmly invite to prioritize new and most up to date books.

    Note that all books have an electronic version freely available online.

    NEW BOOKS

    [CHB] Chris Bishop, Hugh Bishop, Deep Learning Foundations and Concepts , Springer (2024) (PDF)

    [SD] Simon J.D. Prince, Understanding Deep Learning, MIT Press (2023) (PDF)

    OLD BOOKS

    [BRML] David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press (PDF)

    [DL] Ian Goodfellow and Yoshua Bengio and Aaron Courville , Deep Learning, MIT Press (ONLINE)


  • Introduction to the course philosophy, its learning goals and expected outcomes. We will discuss prospectively the overall structure of the course and the interelations between its parts. Exam modalities and schedule are also discussed (for both M.Sc. and Ph.D. students).
    Date Topic  References  
     Additional Material 
    1 20/02/2024
    (11-13)
    Introduction to the course
    Motivations and aim; course housekeeping (exams, timetable, materials); introduction to modern pattern recognition applications


  • The module will provide a brief introduction to classical pattern recognition for signal/timeseries and for images. We will cover approaches working on the spatial (temporal) and frequency (spectral) domain, presenting methods to represent temporal and visual information in static descriptors, as well as approaches to identify relevant patterns in the data (feature descriptors). Methodologies covered include correlation analysis, Fourier analysis, wavelets, intensity gradient-based descriptors and detectors, normalized cut segmentation.

    Date Topic  References  
     Additional Material 
    2
    21/02/2024
    (16-18)
    Signal processing
    Timeseries; time domain analysis (statistics, correlation); spectral analysis; fourier analysis.


    3 22/02/2024
    (14-16)
     Image Processing I
     Spatial feature descriptors (color histograms, SIFT); spectral analysis.
      Additional readings
     [1] Survey on visual descriptors

    Software:
    • A tweakable and fast implementation of SIFT in C (on top of OpenCV)
    27/02/2024
    28/02/2024
    29/02/2024

    LECTURES CANCELLED (WILL BE RECOVERED)
     4 01/03/2024
    (14-16) Room L1
    (Recovery Lecture)
     Image Processing II
     Feature detectors (edge, blobs); image segmentation; wavelet decompositions
      Additional readings 
    [2] Survey on visual feature detectors

    A reference book for the pattern recognition part is " S. THEODORIDIS, K. KOUTROUMBAS, Pattern Recognition, 4th edition". It is not needed for the sake of the course, but it is a reference book if you are interested on the topic. It is not available online for free (legally; what you do with Google is none of my business).

    You can find the original NCUT paper freely available from authors here.

  • The module introduces probabilistic learning, causal models, generative modelling and Bayesian learning. We will discuss fundamental algoritms and concepts, including Expectation-Maximization, sampling and variational approximations, and we will study relevant models from the three fundamental paradigms of probabilistic learning, namely Bayesian networks, Markov networks and dynamic models.  Models covered include: Bayesian Networks, Hidden Markov Models, Markov Random Fields, Boltzmann Machines,  Latent topic models.


    Date Topic  References   (OLD)
     References (NEW)    Additional Material 
      5 
    05/03/2024
    (11-13)
    Introduction to Generative Graphical Models.
    P
    robability refresher; graphical model representation; directed and undirected models
    [BRML] Ch. 1 and 2 (Refresher)
    [BRML] Sect. 3.1, 3.2 and 3.3.1
    (conditional independence)
     [CHB] Sect. 2.1-2.4 (refresher)
     [CHB] Sect. 2.5 (ML probabilities)
     [CHB] Sect. 11.1 and Sect. 11.2.1 (graphical models + conditional independence)
    Software
    • Pyro - Python library based on PyTorch
    • PyMC3 - Python library based on Theano
    • Edward - Python library based on TensorFlow
    • TensorFlow Probability - Probabilistic models and deep learning in Tensorflow

    06/03/2024
    (16-18)
    LECTURE CANCELLED DUE TO STUDENT ASSEMBLY

     
    6
    07/06/2024
    (14-16)
    Conditional independence and causality - Part I
    Bayesian networks; Markov networks; conditional independence;

    [BRML] Sect. 3.3 (Directed Models)
    [BRML] Sect. 4.1, 4.2.0-4.2.2 (Undirected Models)
    [BRML] Sect. 4.5 (Expressiveness)
     [CHB] 11.1-11.3, 11.6 Graphical Models
     [CHB] 11.2 Conditional Independence

    Disclaimer: Coverage of the Bishop book on this lecture is partial. I suggest to use Barber's Book.

    7
    08/03/2023
    (14-16)
    AULA L1
    RECOVERY LECTURE

    Conditional independence and causality - Part II
    d-separation; structure learning in Bayesian Networks
    [BRML] Sect. 9.5.1 (PC algorithm)
    [BRML] Sect. 9.5.2 (Independence testing)
    [BRML] Sect. 9.5.3 (Structure scoring)
     Disclaimer: Coverage of the Bishop book on this lecture is inadequate. I suggest to use Barber's Book.  Additional readings
    [3] A short review of BN structure learning
    [4] PC algorithm with consistent ordering for large scale data
    [5] MMHC - Hybrid structure learning algorithm

    If you are interested in deepening of your knowledge on causality this is an excellent book (also freely available online): Jonas Peters, Dominik Janzing, Bernhard Schölkopf, Elements of causal inference : foundations and learning algorithms, MIT Press.

    Software
    - A selection of BN structure learning libraries in Python: pgmpy, bnlearn, pomegranate.
    - bnlearn: the most consolidated and efficient library for BN structure learning (in R)
    - Causal learner: a mixed R-Matlab package integrating over 26 BN structure learning algorithms.
    8
    12/03/2024
    (11-13)
    Hidden Markov Models  - Part I
    learning in directed graphical models; forward-backward algorithm;  generative models for sequential data
     [BRML] Sect. 23.1.0 (Markov Models)
    [BRML] Sect. 23.2.0-23.2.4 (HMM and forward backward) 
    [CHB] 11.3 Sequence models

    Coverage of the Bishop book on this lecture is inadequate. I suggest to use Barber's Book.


    13/03/2024
    (16-18)
    LECTURE CANCELLED (RECOVERY LECTURE ON FRIDAY)

     
    9
    14/03/2023
    (14-16)
    Hidden Markov Models - Part II
    EM algorithm, learning as inference, Viterbi Algorithm
    [BRML] Sect. 23.2.6 (Viterbi)
    [BRML] Sect. 23.3.1-23.3.4 (EM and learning)
     Coverage of the Bishop book on this lecture is inadequate. I suggest to use Barber's Book. Additional Readings
    [6]  A classical tutorial introduction to HMMs
    10
    15/03/2023
    (14-16)
    AULA L1
    RECOVERY LECTURE

    Markov Random Fields I
    learning in undirected graphical  models;
    [BRML] Sect. 4.2.2, 4.2.5 (MRF)
    [BRML] Sect. 4.4 (Factor Graphs)

     Coverage of the Bishop book on this lecture is inadequate. I suggest to use Barber's Book.
    11
    19/03/2024
    (11-13)
    Markov Random Fields II
    conditional random fields; pattern recognition applications

    [BRML] Sect. 5.1.1 (Variable Elimination and Inference on Chain) 
    [BRML] Sect. 9.6.0, 9.6.1, 9.6.4, 9.6.5 (Learning in MRF/CRF)
    Coverage of the Bishop book on this lecture is inadequate. I suggest to use Barber's Book.
    Additional Readings
    [7,8] Two comprehensive tutorials on CRF ([7] more introductory and [8] more focused on vision)
    [9] A nice application of CRF to image segmentation

    Sofware
    12
    20/03/2024
    (16-18)
    Bayesian Learning I
    Principles of Bayesian learning; EM algorithm objective; principles of variational approximation; latent topic models; Latent Dirichlet Allocation (LDA).
    BRML] Sect. 11.2.1 (Variational EM)
    [CHB] 15.4 Evidence Lower Bound and the generalized EM
    13 21/03/2024
    (14-16)
    Bayesian Learning II
    LDA learning; machine vision application of latent topic models;

    Bayesian Learning III
    sampling methods; ancestral sampling;

    [BRML] Sect. 20.4-20.6.1  (LDA)

    [BRML] Sect. 27.1 (sampling), Sect. 27.2 (ancestral sampling)

    Bishop's book does not cover LDA: I suggest to use Barber's Book for this.

    [CHB] 14.1.1-2 (Sampling) 14.2.5 (ancestral)

      Additional Readings
    [10] LDA foundation paper
    [11] A gentle introduction to latent topic models
    [12] Foundations of bag of words image representation

    Sofware
    14
    26/04/2024
    (11-13)
    Bayesian Learning III
    Gibbs sampling

    Boltzmann Machines
    bridging neural networks and generative models; stochastic neuron; restricted Boltzmann machine; contrastive divergence and Gibbs sampling in use
    [BRML] Sect. 27.3 (Gibbs sampling)

    [DL] Sections 20.1 and 20.2 (RBM)
     [CHB] 14.2.4 (Gibbs)

    Bishop's book does not cover RBMs: the slides (possibly integrated by reference [14]) are enough for this part.
     Additional Readings
    [13] A step-by-step derivation of collapsed Gibbs sampling for LDA
    [14] A clean and clear introduction to RBM from its author

    Sofware
    Matlab code for Deep Belief Networks (i.e. stacked RBM) and Deep Boltzmann Machines.

  • The module presents the fundamental concepts, challenges, architectures and methodologies of deep learning. We introduce the learning of neural representations from vectorial, sequential and image data, covering both supervised and unsupervised learning, and hinting at various forms of weak supervision.  Models covered include: deep autoencoders, convolutional neural networks, long-short term memory, gated recurrent units, advanced recurrent architectures, sequence-to-sequence, neural attention, Transformers, neural Turing machines. Methodological lectures will be complemented by introductory seminars to Keras-TF and Pytorch.

    Date Topic   References  (OLD)
    References   (NEW) Additional Material 
    15
    27/03/2024
    (16-18)
    Convolutional Neural Networks I
    Introduction to the deep learning module; i
    ntroduction to CNN; basic CNN elements
    [DL] Chapter 9
    [CHB] Chapter 10
    [SD] Chapter 10
    Additional Readings
    [15-19] Original papers for LeNet, AlexNet, VGGNet, GoogLeNet and ResNet.
    16
    28/04/2024
    (14-16)
     Convolutional Neural Networks II
    CNN architectures for image recognition; convolution visualization; advanced topics (deconvolution, dense nets); applications and code
     [DL] Chapter 9 [CHB] Chapter 10
    [SD] Chapter 10
    Additional Readings
    [20] Complete summary of convolution arithmetics
    [21] Seminal paper on batch normalization
    [22] CNN interpretation using deconvolutions
    [23] CNN interpretation with GradCAM


     EASTER BREAK



    17
    03/04/2024
    (16-18)
    Deep Autoencoders
    S
    parse, denoising and contractive AE; deep RBM
    [DL] Chapter 14, Sect 20.3, 20.4.0 (from 20.4.1 onwards not needed)
    [CHB] Section 19.1

    [SD] Coverage of the Prince book on this lecture is inadequate.
    Additional Readings
    [24] DBN: the paper that started deep learning
    [25] Deep Boltzmann machines paper
    [26] Review paper on deep generative models
    [27] Long review paper on autoencoders from the perspective of representation learning
    [28] Paper discussing regularized autoencoder as approximations of likelihood gradient
    18 04/04/2024
    (14-16)
    Gated Recurrent Networks I
    Deep learning for sequence processing; gradient issues;
    [DL] Sections 10.1-10.3, 10.5-10.7, 10.10, 10.11
    Coverage of the Bishop and Prince books on this lecture is inadequate (for reasons I do not understand). Please use the DL book or slides integrated by the Additional Readings.
    Additional Readings
    [29] Paper describing gradient vanish/explosion
    [30] Original LSTM paper
    [31] An historical view on gated RNN
    19
    05/04/2024
    (16-18)
    ROOM E
    Gated Recurrent Networks II
    long-short term memory; gated recurrent units; generative use of RNN
    RECOVERY LECTURE
    [DL] Sections 10.12, 12.4.5
    Coverage of the Bishop and Prince books on this lecture is inadequate (for reasons I do not understand). Please use the DL book or slides integrated by the Additional Readings.
    Additional Readings
    [32] Gated recurren units paper
    [33] Seminal paper on dropout regularization

     Software
    20
    09/04/2024
    (11-13)
    Coding practice I
    Pytorch and principles of autograd

    Guest lecture by Valerio De Caro




    21
    10/04/2024
    (16-18)
    Coding practice II
    Keras/TF and programming exercises

    Guest lecture by Valerio De Caro

    Github with the notebooks for the lecture: https://github.com/vdecaro/intro-tf-keras/



    22
    11/04/2024
    (14-16)
    Attention-based architectures
    sequence-to-sequence;  attention modules; transformers
     [DL] Sections 10.12, 12.4.5
    [CHB] Chapter 12
    [SD] Chapter 12
    Additional Readings
    [34,35] Models of sequence-to-sequence and image-to-sequence transduction with attention
    [36] Seminal paper on Transformers
    [37] Transformers in vision
    23
    12/04/2024
    (16-18)
    AULA E
    RECOVERY LECTURE

    Memory-based models
    multiscale network; hierarchical models; memory networks; neural Turing machines

































  • We formalise the reinforcement learning problem by rooting it into Markov decision processes and we provide an overview of the main approaches to design reinforcement learning agents, including model-based, model-free, value and policy learning. We link classical approaches with modern deep learning based approximators (deep reinforcement learning). Methodologies covered include: dynamic programming, MC learning, TD learning, SARSA, Q-learning, deep Q-learning, policy gradient and deep policy gradient.


    Date Topic  References  (OLD)
     References (NEW) 
     Additional Material 
    24
    16/04/2024
    (11-13)
    Explicit Density Learning
    explicit distribution models; neural ELBO; variational autoencoders
     [DL] Sections 20.9, 20.10.1-20.10.3
     
    [CHB] Section 19.2
    [SD] Chapter 14 (generative learning),        Chapter 17 (VAE)
    Additional Readings
    [38] PixelCNN - Explict likelihood model
    [39] Tutorial on VAE

    Sofware
    25
    17/04/2024
    (16-18)
    Implicit models - Adversarial Learning
    generative adversarial networks; wasserstein GANs; conditional generation; notable GANs; adversarial autoencoders
     [DL] Section 20.10.4
     
    [CHB] Chapter 17
    [SD] Chapter 15
    Additional Readings
    [40] Tutorial on GAN (here another online resource with GAN tips)
    [40] Wasserstein GAN
    [42] Tutorial on sampling neural networks
    [43] Progressive GAN
    [44] Cycle Gan
    [45] Seminal paper on Adversarial AEs

    Sofware
    26
    18/04/2024
    (14-16)
    Diffusion models
    noising-denoising processes; kernelized diffusion; latent space diffusion; conditional diffusion models

    Not covered
    [CHB] Chapter 20
    [SD] Chapter 18
    Additional Readings
    [46] Introductory and survey paper on diffusion models
    [47] Seminal paper introducing diffusion models
    [48] An intepretation of diffusion models as score matching
    [49] Paper introducing the diffusion model reparameterization
    [50] Diffusion beats GAN paper

    23-25/04/2024
    NO LECTURE DURING THIS WEEK

     
    27
    30/04/2024
    (11-13)
    Normalizing flow models
    probabilistic change of variable; forward/normalization pass; from 1D to multidimensional flows; survey of notable flow models; wrap-up of deep generative learning
    Not covered
    [CHB] Chapter 18
    [SD] Chapter 16
    Additional Readings
    [51] Survey paper on normalizing flows
    [52] RealNVP paper
    [53] GLOW paper
    [54] MADE autoregressive flow
    Sofware


  • The module covers some recent and interesting development and research topics in the field of machine learning. Topics choice is likely to vary at each edition. Example topics include: deep learning for graphs, continual learning, distributed learning, learning-reasoning integration, edgeAI, lerning beyond backpropagation, neural networks inspired by dynamical systems, ... The module concludes with a final lecture which discusses the course content retrospectively and details the exam modalities, topics and deadlines.

       Date  Topic  References (OLD) 
    References (NEW)
      Additional Material 
     28 02/05/2024
    (14-16)
    Fundamentals of deep learning for graphs I
    learning with structured data, learning tasks on graphs, message-passing architectures, survey of foundational models for graphs

       [CHB] Chapter 13
     [SP] Chapter 13
     Software
    - PyDGN: our in-house DLG library
    - PyTorch geometric
    - Deep graph library

    Additional readings
    [55-56] Seminal works on neural networks for graphs
    [57] Recent tutorial paper
    29
    07/05/2024
    (11-13)
    Reservoir Computing
    Guest lecture by Andrea Ceni

    The content of this lecture is not part of the exam topics

      
    30
    08/05/2024
    (16-18)
    Alternatives to backpropagation training of (deep) neural models
    Guest lecture by Andrea Cossu

    The content of this lecture is not part of the exam topics
        
    31 14/05/2024
    (11-13)
    Fundamentals of deep learning for graphs II
    graph convolutional networks, graph pooling, generative learning on graphs, probabilistic graph models, non-dissipative graph message passing, neural algorithmic reasoning
      [CHB] Chapter 13
    [SP] Chapter 13
    Additional readings 
    [58] A work on generalizing pooling to graphs
    [59] Probabilistic learning on graphs
    [60] Non-dissipative message passing via neural graph ODEs
    [61] Survey on deep learning for dynamic graphs
    [62] Neural algorithmic reasoning following duality structure in optimization problems
    32 15/05/2024
    (16-18)
    Beyond accuracy: auditing LLMs based on exams designed for humans
    Guest lecture by Wagner Meira Jr

    The content of this lecture is not part of the exam topics
        
    33
    16/05/2024
    (14-16)
     (Deep) Reinforcement Learning fundamentals
      [SP] Sections 19.1-19.3.1, 19.4, 19.5 (no derivation of policy gradient)
     Additional readings 
    [63] Original Q-Learning algorithm
    [64] Original DQN paper
    [65] Learning with the actor-critic architecture
    [66] A masterpiece paper deriving trust-region policy optimization (technical by worth the read)
     34  21/05/2024
    (11-13)
    RECOVERY LECTURE - ROOM C
    An introduction to causality and causal learning
    Guest lecture by Riccardo Massidda
        
    35 22/05/2024
    (16-18)
    RECOVERY LECTURE - ROOM C1

     Final lecture
        

  • Typical course examination (for students attending the lectures) is performed in 2 stages: midterm assignments and an oral exam. Midterms waive the final project.

    Midterm Assignment

    Midterms consist in short assignments involving with one of the following tasks:

    • A quick and dirty (but working) implementation of a simple pattern recognition algorithm
    • A report concerning the experience of installing and running a demo application realized using available deep learning and machine learning libraries
    • A summary of a recent research paper on topics/models related to the course content.

    The midterms can consist in either the delivery of code (e.g. colab notebook) or a short slide deck (no more than 10 slides) presenting the key/most-interesting aspects of the assignment. 

    Students might be given some amount of freedom in the choice of assignments, pending a reasonable choice of the topic. The assignments will roughly be scheduled every 3/4 weeks.

    Oral Exam

    The oral examination will test knowledge of the course contents (models, algorithms and applications).

    Exam Grading (with Midterms)

    The final exam vote is given by the oral grade. The midterms only wave the final project but do not contribute to the grade. In other words you can only fail or pass a midterm. You need to pass all midterms in order to succesfully wave the final project.

    Alternative Exam Modality (No Midterms / Non attending students)

    Working students, those not attending lectures, those who have failed midterms or simply do not wish to do them, can complete the course by delivering a final project and an oral exam.  Final project topics will be released in the final weeks of the course: contact the instructor by mail to arrange choice of the topics once these are published.

    The final project concerns preparing a report on a topic relevant to the course content or the realization of a software implementing a non-trivial learning model and/or a PR application relevant for the course. The content of the final project will be discussed in front of the instructor and anybody interested during the oral examination. Students are expected to prepare slides for a 15 minutes presentation which should summarize the ideas, models and results in the report. The exposition should demonstrate a solid understanding of the main ideas in the report.

    Grade for this exam modality is determined as

     \( G = 0.5 \cdot (G_P + G_O) \)

    where \( G_P \in [1,32] \) is the project grade and \( G_O \in [1,30] \) is the oral grade


    1. Scott Krigg, Interest Point Detector and Feature Descriptor Survey, Computer Vision Metrics, pp 217-282, Open Access Chapter
    2. Tinne Tuytelaars and Krystian Mikolajczyk, Local Invariant Feature Detectors: A Survey, Foundations and Trends in Computer Graphics and Vision, Vol. 3, No. 3 (2007) 177–2, Online Version
    3. C. Glymour, Kun Zhang and P. Spirtes, Review of Causal Discovery Methods Based on Graphical Models Front. Genet. 2019, Online version
    4. Bacciu, D., Etchells, T. A., Lisboa, P. J., & Whittaker, J. (2013). Efficient identification of independence networks using mutual information. Computational Statistics, 28(2), 621-646, Online version
    5. Tsamardinos, I., Brown, L.E. & Aliferis, C.F. The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65, 31–78 (2006), Online version
    6. Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, pages 257-286, Online Version
    7. Charles Sutton and Andrew McCallum,  An Introduction to Conditional Random Fields, Arxiv
    8. Sebastian Nowozin and Christoph H. Lampert, Structured Learning and Prediction, Foundations and Trends in Computer Graphics and Vision, Online Version
    9. Philipp Krahenbuhl, Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Proc.of NIPS 2011, Arxiv
    10. D. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003
    11. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012, Free Online Version
    12. G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual Categorization with Bags of Keypoints. Workshop on Statistical Learning in Computer Vision. ECCV 2004, Free Online Version
    13. W. M. Darling, A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling, Lecture notes
    14. Geoffrey Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Technical Report 2010-003, University of Toronto, 2010
    15. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel. Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems, NIPS, 1989
    16. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, NIPS, 2012
    17. S. Simonyan and A. Zisserman.  Very deep convolutional networks for large-scale image recognition, ICLR 2015, Free Online Version
    18. C. Szegedy et al,  Going Deeper with Convolutions, CVPR 2015, Free Online Version
    19. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. CVPR 2016, Free Online Version
    20. V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, Arxiv
    21. S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML 2013,  Arxiv
    22. M.D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013, Arxiv
    23. J. Adebayo et al, Sanity Checks for Saliency Maps, NeurIPS, 2018
    24. G.E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks.Science 313.5786 (2006): 504-507, Free Online Version
    25. G.E. Hinton, R. R. Salakhutdinov. Deep Boltzmann Machines. AISTATS 2009, Free online version.
    26. R. R. Salakhutdinov. Learning Deep Generative Models, Annual Review of Statistics and Its Application, 2015, Free Online Version
    27. Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 35(8) (2013): 1798-1828, Arxiv.
    28. G. Alain, Y. Bengio. What Regularized Auto-Encoders Learn from the Data-Generating Distribution, JMLR, 2014.
    29. Y. Bengio, P. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult. TNN, 1994, Free Online Version
    30. S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 1997, Free Online Version
    31. K. Greff et al, LSTM: A Search Space Odyssey, TNNLS 2016, Arxiv
    32. C. Kyunghyun et al, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP 2014, Arxiv
    33. N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
    34. Bahdanau et al, Neural machine translation by jointly learning to align and translate, ICLR 2015, Arxiv
    35. Xu et al, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML 2015, Arxiv
    36. A. Vaswan et al, Attention Is All You Need, NIPS 2017, Arxiv
    37. A. Dosovitskiy et al,  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021
    38. A. van der Oord et al., Pixel Recurrent Neural Networks, 2016, Arxiv
    39. C. Doersch, A Tutorial on Variational Autoencoders, 2016, Arxiv
    40. Ian Goodfellow, NIPS 2016 Tutorial: Generative Adversarial Networks, 2016, Arxiv
    41. Arjovsky et al, Wasserstein GAN, 2017, Arxiv
    42. T. White, Sampling Generative Network, NIPS 2016, Arxiv
    43. T. Karras et al, Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR 2018, Arxiv
    44. Jun-Yan Zhu et al, Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017 Arxiv
    45. Alireza Makhzani et al, Adversarial Autoencoders, NIPS 2016, Arxiv
    46. Ling Yang et al, Diffusion Models: A Comprehensive Survey of Methods and Applications, 2023, Arxiv
    47. Jascha Sohl-Dickstein et al, Deep Unsupervised Learning using Nonequilibrium Thermodynamics, ICML 2015, PDF
    48. Y. Song & S. Ermon, Generative Modeling by Estimating Gradients of the Data Distribution, NeurIPS 2019, PDF
    49. Jonathan Ho et al, Denoising Diffusion Probabilistic Models, NeurIPS 2020, Arxiv
    50. P. Dhariwal & A. Nichol, Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021, PDF 
    51. I. Kobyzev et al Normalizing Flows: An Introduction and Review of Current Methods, Arxiv
    52. L Dinh et al, Density Estimation using real NVP, ICLR 2017, PDF
    53. D. Kingma & P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, NeurIPS 2018, PDF
    54. G. Papamakarios et al, Masked Autoregressive Flow for Density Estimation, NeurIPS 2017, PDF
    55. A. Micheli, Neural Network for Graphs: A Contextual Constructive Approach. IEEE TNN, 2009, Online
    56. Scarselli et al, The graph neural network model, IEEE TNN, 2009, Online
    57. Bacciu et al, A Gentle Introduction to Deep Learning for Graphs, Neural Networks, 2020, Arxiv
    58. Bacciu et al, Generalizing downsampling from regular data to graphs, AAAI, 2023, PDF
    59. Bacciu et al,  Probabilistic Learning on Graphs via Contextual Architectures, 2020, JMLR
    60. Gravina et al, ANTI-SYMMETRIC DGN: A STABLE ARCHITECTURE FOR DEEP GRAPH NETWORKS, ICLR, 2023, Arxiv
    61. A. Gravina and D. Bacciu, Deep learning for dynamic graphs: models and benchmarks, 2024, TNNLS
    62. Numeroso et al, Dual Algorithmic Reasoning, ICRL, 2023, Arxiv
    63. CJCH Watkins, P Dayan, Q-learning, Machine Learning, 1992, PDF
    64. Mnih et al,Human-level control through deep reinforcement learning, Nature, 2015, PDF
    65. Sutton et al, Policy gradient methods for reinforcement learning with function approximation, NIPS, 2000, PDF
    66. Schulman et al, Trust Region Policy Optimization, ICML, 2015, PDF