Topic outline

  • General

  • Objectives

    The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

    1. Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
    2. Mathematical background: Probability, Statistics and Algebra
    3. Linguistic essentials: words, lemmas, morphology, PoS, syntax
    4. Basic text processing: regular expression, tokenisation
    5. Data gathering: twitter API, scraping
    6. Basic modelling: collocations, language models
    7. Introduction to Machine Learning: theory and practical tips
    8. Libraries and tools: NLTK, Tensorflow/Keras, Pytorch
    9. Applications:
      • Classification/Clustering
      • Sentiment Analysis/Opinion Mining
      • Information Extraction/Relation Extraction
      • Entity Linking
      • Spam Detection: mail spam & phishing, blog spam, review spam

    • Schedule

      Day Hour Room
      Tuesday 9-11 X1, Polo Fibonacci 
      Thursday 14-16 N1, Polo Fibonacci 
      • Jupyter Notebooks

        A server is available for running Jupyter Notebooks. You can log into the server using your University credentials.

        • Lectures

          16/09/2019 Introduction  
          17/09/2019 Introduction to Probability  slides
          24/09/2019 Language Modeling  slides
          26/09/2019 Language Modeling  
          30/09/2019 Python Tutorial See notebooks: Introduction to Python and Introduction to Python 2.
          01/10/2019 Python Tutorial and Examples  
          03/10/2019 Building Language Models
          See notebook: Language Models
          08/10/2019 Introduction to NLTK See notebook: Introduction to NLTK
          10/10/2019 Representation of Words  slides
          15/10/2019 Word Embeddings: Word2Vec, GloVE, FastText, ELMo  See notebook: Language Models
          17/10/2019 Text Classification  slides
          22/10/2019 Classifiers  slides
          24/10/2019 Sequence Labeling  slides
          29/10/2019 Named Entity Recognition  slides
          31/10/2019 Deep Learning  slides
          Notebook: Introduction to eras
          05/11/2019 Neural Language Models slides 
          Notebook: DocumentEmbeddings.ipynb
          07/11/2019 Transformer models
          12/11/2019 Parsing  slides
          14/11/2019 Deep Learning for NLP  
          19/11/2019 Deep Learning for NLP  
          21/11/2019 Introduction to Sentiment Analysis
          26/11/2019 Lexical Resources
          Notebook: pmi-lex-IMDB.ipynb
          28/11/2019Sentiment Classification  slides
          Classification sklearn.ipynb
          03/12/2019 Sentiment Classification  Notebook:
          Classification sklearn-feats.ipynb
          Classification - lstmNet.ipynb
          Classification - cnnNet.ipynb
          05/12/2019 Transfer Learning/Opinion Extraction  slides
          10/12/2019 Quantification  slides
          12/12/2019 Spam Detection: mail spam & phishing, web spam, review spam slides
          • Textbooks

            1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
            2. S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.

            Further Readings

            1. J. Eisenstein. Introduction to NLP. MIT Press, 2019.
            2. I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press, 2016.
            3. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.

            • Earlier Editions