Midterm 3 (2025)
Assignment Rules and Execution
The third midterm covers the deep learning techniques up to Lecture 27. To pass the midterm you should
- perform one (only one) of the assignments described in the following;
- report your results as a Colab notebook or a 10 slide presentation (both versions are equally fine) and upload it here by the (strict) deadline;
In case you are delivering a Colab notebook, please upload a txt file with the link to your Colab notebook (already ran).
You can use library functions to perform the analysis unless explicitly indicated. You can use whatever programming language you like, but I strongly suggest to use either Python or Matlab for which you have coding examples. Python will further allow you to deliver your midterm as a Colab notebook, but this is not a requirement (you can deliver a presentation instead).
Your report (irrespectively of whether it is a notebook or a presentation) needs to cover at least the following aspects (different assignments might have different additional requirements):
- A title with the assignment number and your name
- The full code to run your analysis (for Colabs) or a few slides (for presentations) with code snippets highlighting the key aspects of your code
- A section reporting results of the analysis and your brief comments on it
- A final section with your personal considerations (fun things, weak aspects, possible ways to enhance the analysis, etc.).
Do not waste time and space to describe the dataset or the assignment you are solving as we will all be already informed on it.
DATASET (CIFAR10): https://www.cs.toronto.edu/~kriz/cifar.html
Train a denoising autoencoder on the CIFAR10 dataset. It is up to you to decide how many layers and neurons in each layer and how many layers you want in the deep autoencoder. Train two versions of the autoencoder, one using dense layers and une using convolutional layers. Show an accuracy comparison between the different autoencoders.
Assignment 2
DATASET (HORSES): www.kaggle.com/datasets/ztaihong/weizmann-horse-database/data
Implement your own convolutional network to solve problem of horse semantic segmentation from the background in the Horses Dataset. Choose your favourite convolutional-type architecture to solve the problem and motivate your choice. The design decisions about how many layers, the type of layers and how they are interleaved, the type of pooling, the use of residual connections, etc. are also on you. Train and validate the model on the data as appropriate in Machine Learning an provide a measure of segmentation accuracy.
Assignment 3
DATASET (AIR QUALITY): https://archive.ics.uci.edu/dataset/360/air+quality
Train a neural network for sequences of your choice (LSTM, GRU, Convolutional, Clockwork RNN, ...) to predict the Benzene (C6H6 column) based on the sensor measurements timeseries (PT08.* columns) being fed in input to the recurrent model. Evaluate the predictive accuracy of the network on the task (using appropriately training/validation splits). Confront the perfomance of this model, with another recurrent neural network trained to predict benzene one-step-ahead, i.e. given the current benzene measuement, predict its next value. Show and compare performance of both settings.
Assignment 4
DATASET (Airline reviews): https://www.kaggle.com/datasets/khushipitroda/airline-reviews
The dataset contains text of online travel reviews (in Column Review) with an associated Rating (column Overall_Rating). The objective is to train a classifier to predict the rating from the Review text. You are free to choose the model's architecture, but you should describe and justify your design choices. Train the model and assess it as appropriate in machine learning. You are allowed to preprocess the data however you want (e.g. using pretrained embeddings, dropping some features, just a bag-of-words), but the predictive model must be trained by yourself from scratch (no pretrained predictor).
Assignment 5
DATASET (PROTEIN): https://www.kaggle.com/code/danofer/deep-protein-sequence-family-classification
The dataset contains proteins represented as sequences of aminoacids. To reduce the complexity of the problem select only protein sequences that have length smaller than 200. Train a sequence to sequence architecture to learn the "dumb" task of reconstruncting as an output sequence the reversed version of the input sequence. Details of the sequence to sequence architecture can be freely choosen by you, including which backbone neural network to use, if and how to use attention, etc. Assess the reconstruction error of your model and comment the results.