Section outline

    1. Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, pages 257-286, Online Version
    2. D. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003
    3. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012, Free Online Version
    4. W. M. Darling, A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling, Lecture notes
    5. Geoffrey Hinton, A Practical Guide to Training Restricted Boltzmann Machines, Technical Report 2010-003, University of Toronto, 2010
    6. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel. Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems, NIPS, 1989
    7. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, NIPS, 2012
    8. S. Simonyan and A. Zisserman.  Very deep convolutional networks for large-scale image recognition, ICLR 2015, Free Online Version
    9. C. Szegedy et al,  Going Deeper with Convolutions, CVPR 2015, Free Online Version
    10. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. CVPR 2016, Free Online Version
    11. V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, Arxiv
    12. S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML 2013,  Arxiv
    13. F. Yu et al, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016, Arxiv
    14. S. Ren et al, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NeurIPS 2015
    15. Y. Bengio, P. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult. TNN, 1994, Free Online Version
    16. S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 1997, Free Online Version
    17. K. Greff et al, LSTM: A Search Space Odyssey, TNNLS 2016, Arxiv
    18. C. Kyunghyun et al, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP 2014, Arxiv
    19. N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
    20. Bahdanau et al, Neural machine translation by jointly learning to align and translate, ICLR 2015, Arxiv
    21. Xu et al, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML 2015, Arxiv
    22. A. Vaswan et al, Attention Is All You Need, NIPS 2017, Arxiv
    23. A. Dosovitskiy et al,  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021
    24. G.E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science 313.5786 (2006): 504-507, Free Online Version
    25. G.E. Hinton, R. R. Salakhutdinov. Deep Boltzmann Machines. AISTATS 2009, Free online version.
    26. R. R. Salakhutdinov. Learning Deep Generative Models, Annual Review of Statistics and Its Application, 2015, Free Online Version
    27. Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 35(8) (2013): 1798-1828, Arxiv.