Section outline
-
-
Experiment with the Zipf's law and with Naive Bayes classifier.
-
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.
- Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
- Mathematical background: Probability, Statistics and Algebra
- Linguistic essentials: words, lemmas, morphology, PoS, syntax
- Basic text processing: regular expression, tokenisation
- Data gathering: twitter API, scraping
- Basic modelling: collocations, language models
- Introduction to Machine Learning: theory and practical tips
- Libraries and tools: NLTK, Tensorflow/Keras, Pytorch
- Applications:
- Classification/Clustering
- Sentiment Analysis/Opinion Mining
- Information Extraction/Relation Extraction
- Entity Linking
- Spam Detection: mail spam & phishing, blog spam, review spam
-
Schedule Day Hour Room Tuesday 9-11 X1, Polo Fibonacci Thursday 14-16 N1, Polo Fibonacci -
A server is available for running Jupyter Notebooks. You can log into the server using your University credentials.
-
Date
Topic
Material
16/09/2019 Introduction 17/09/2019 Introduction to Probability slides 24/09/2019 Language Modeling slides 26/09/2019 Language Modeling 30/09/2019 Python Tutorial See notebooks: Introduction to Python and Introduction to Python 2. 01/10/2019 Python Tutorial and Examples 03/10/2019 Building Language Models See notebook: Language Models 08/10/2019 Introduction to NLTK See notebook: Introduction to NLTK 10/10/2019 Representation of Words slides 15/10/2019 Word Embeddings: Word2Vec, GloVE, FastText, ELMo See notebook: Language Models 17/10/2019 Text Classification slides 22/10/2019 Classifiers slides 24/10/2019 Sequence Labeling slides 29/10/2019 Named Entity Recognition slides 31/10/2019 Deep Learning slides
Notebook: Introduction to eras05/11/2019 Neural Language Models slides
Notebook: DocumentEmbeddings.ipynb07/11/2019 Transformer models slides 12/11/2019 Parsing slides 14/11/2019 Deep Learning for NLP 19/11/2019 Deep Learning for NLP 21/11/2019 Introduction to Sentiment Analysis slides 26/11/2019 Lexical Resources slides
Notebook: pmi-lex-IMDB.ipynb28/11/2019 Sentiment Classification slides
Notebook:
VADER.ipynb
Classification sklearn.ipynb03/12/2019 Sentiment Classification Notebook:
Classification sklearn-feats.ipynb
Classification - lstmNet.ipynb
Classification - cnnNet.ipynb05/12/2019 Transfer Learning/Opinion Extraction slides 10/12/2019 Quantification slides 12/12/2019 Spam Detection: mail spam & phishing, web spam, review spam slides -
- D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
- S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.
Further Readings
- J. Eisenstein. Introduction to NLP. MIT Press, 2019.
- I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press, 2016.
- B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.