Section outline

  • The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

    1. Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
    2. Mathematical background: Probability, Statistics and Algebra
    3. Linguistic essentials: words, lemmas, morphology, PoS, syntax
    4. Basic text processing: regular expression, tokenisation
    5. Data gathering: twitter API, scraping
    6. Basic modelling: collocations, language models
    7. Introduction to Machine Learning: theory and practical tips
    8. Libraries and tools: NLTK, Tensorflow/Keras, Pytorch
    9. Applications:
      • Classification/Clustering
      • Sentiment Analysis/Opinion Mining
      • Information Extraction/Relation Extraction
      • Entity Linking
      • Spam Detection: mail spam & phishing, blog spam, review spam

  • Schedule
    Day Hour Room
    Tuesday 9-11 X1, Polo Fibonacci 
    Thursday 14-16 N1, Polo Fibonacci 
  • A server is available for running Jupyter Notebooks. You can log into the server using your University credentials.

  • Date 
    Topic
    Material
    16/09/2019 Introduction  
    17/09/2019 Introduction to Probability  slides
    24/09/2019 Language Modeling  slides
    26/09/2019 Language Modeling  
    30/09/2019 Python Tutorial See notebooks: Introduction to Python and Introduction to Python 2.
    01/10/2019 Python Tutorial and Examples  
    03/10/2019 Building Language Models
    See notebook: Language Models
    08/10/2019 Introduction to NLTK See notebook: Introduction to NLTK
    10/10/2019 Representation of Words  slides
    15/10/2019 Word Embeddings: Word2Vec, GloVE, FastText, ELMo  See notebook: Language Models
    17/10/2019 Text Classification  slides
    22/10/2019 Classifiers  slides
    24/10/2019 Sequence Labeling  slides
    29/10/2019 Named Entity Recognition  slides
    31/10/2019 Deep Learning  slides
    Notebook: Introduction to eras
    05/11/2019 Neural Language Models slides 
    Notebook: DocumentEmbeddings.ipynb
    07/11/2019 Transformer models
     slides
    12/11/2019 Parsing  slides
    14/11/2019 Deep Learning for NLP  
    19/11/2019 Deep Learning for NLP  
    21/11/2019 Introduction to Sentiment Analysis
     slides
    26/11/2019 Lexical Resources
     slides
    Notebook: pmi-lex-IMDB.ipynb
    28/11/2019Sentiment Classification  slides
    Notebook:
    VADER.ipynb
    Classification sklearn.ipynb
    03/12/2019 Sentiment Classification  Notebook:
    Classification sklearn-feats.ipynb
    Classification - lstmNet.ipynb
    Classification - cnnNet.ipynb
    05/12/2019 Transfer Learning/Opinion Extraction  slides
    10/12/2019 Quantification  slides
    12/12/2019 Spam Detection: mail spam & phishing, web spam, review spam slides
    1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
    2. S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.

    Further Readings

    1. J. Eisenstein. Introduction to NLP. MIT Press, 2019.
    2. I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press, 2016.
    3. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.