Vai al contenuto principale

Statistical machine learning


Statistical machine learning


Academic year 2022/2023

Course ID
Silvia Montagna
2nd year
Teaching period
First semester
D.M. 270 TAF C - Related or integrative
Course disciplinary sector (SSD)
SECS-S/01 - statisticsm
Type of examination
Students must have training in linear algebra, calculus and multivariate calculus, basic probability, and statistics at mathematics BSc level or above. Students must also master at least one programming language (e.g., knowledge of data structures, control flow, basic functions for data manipulation, visualisation tools, reading/writing files). R is highly recommended.

Sommario del corso


Course objectives

Machine learning studies methods that can automatically extract patterns and trends from big amount of data, and then use these patterns to predict future data or other outcomes of interest.

This course is intended to introduce master students to fundamental concepts and techniques in machine learning from a statistical perspective, with a focus on supervised learning. Topics include regression, classification, model selection and regularisation, and tree-based methods, among others. The course emphasises the selection of appropriate methods and justification of choice, the use of programming for the implementation of the method, and the evaluation and effective communication of results in data analysis reports.



Results of learning outcomes

On completion of the course, the student is expected to:

  • Describe and apply a range of machine learning techniques to perform prediction/classification
  • Convert various applied problems and data into statistical models, based on an appreciation of their relative suitability to different tasks
  • Implement a range of techniques using statistical software
  • Communicate findings using appropriate statistical language in written reports

The acquired skills will give students the opportunity to improve and deepen their knowledge of statistical modelling.


Course delivery

The course is delivered through a combination of lectures and tutorials for a total of 48 hours, comprising of two 2-hours classes per week.

Lectures (approx. 30 hours) are devoted to the methodological/theoretical aspects of statistical machine learning, with reproducible examples given to support the understanding of methods.

Tutorials are devoted to the detailed analysis of case studies using current, relevant data, including the practical implementation of the analysis. Tutorials will involve conducting meaningful exploratory data analysis for the case study at hand, formulating and fitting a statistical model, assessing model adequacy, proposing graphical displays to illustrate the results of the analysis and “tell the story”, and thinking about potential threats to the validity of the analysis. Tutorials will be done in R, and students will be able to use RMarkdown for creating HTML and pdf documents. For the exam (see below), students are permitted to use any programming language and any programming environment/OS; however, R is the officially supported language for this course.

A module of the course, included in the overall courseload, will be taught by Visiting Professor Vinayak Rao (Purdue University, USA) on Causal Inference in Machine Learning.

All classes are foreseen to be taught in presence. All course material will be posted on Moodle.



Learning assessment methods

The final examination consists in a written, closed-books test on theory, and a data analysis. Specifically:

1) Winter (January/February) exam session: your final grade will be based on a weighted average of a data analysis project (60%) and a closed-books written exam on theory (40%). For the data analysis, students can work individually or in teams (max 3 people) on a project of their choosing. Each student/team will have to submit a project report due on Friday, December 16. Students will then be asked (on an individual, not group, basis) to provide feedback and comments on one case study presented by another student/group. The review process will be "double blind", meaning that both the project report and review report will be anonymous. A small part of the case study project grade will be based on completion of the assigned review

2) Summer and Fall exam sessions: the final exam consists of a long written test (4 hours) comprising both theory (50%, closed-books) and analysis of one (assigned) case study (50%). 

Note: for students who choose option (1) but fail to submit their case study on time, case (2) above applies.



Introduction to Statistical Learning:

  • Context and motivations
  • Trade-off between goodness-of-fit and model complexity (i.e., variance and bias)
  • Training and test set


  • Exploratory data analysis
  • Simple & multiple linear regression
  • Residual analysis & model checking

Classification: Logistic regression, KNN  

Resampling methods

Shrinkage methods and regularisation

Tree-based methods:

  • Regression & classification trees
  • Bagging, boosting, random forests

Introduction to neural networks & deep learning: 

  • (Single- and) multi-hidden layers back-propagation networks
  • Issues in training neural networks
  • Convolutional neural networks
  • Autoencoders

Fundamentals of causal inference in machine learning

Suggested readings and bibliography

An Introduction to Statistical Learning with Applications in R (2nd Edition)
Year of publication:  
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani

Course literature

The main books used during the course are:

Slides for the course will be provided. If you see any typos in my notes (no matter how small), please tell me about them! Doing so will not only benefit you, but also myself, your classmates and any future students of this course.

Supplementary reading

There are many books written on machine learning, and new books keep appearing all the time. These books can approach the field from different perspectives (e.g., statistics, computer science, probability). Here are links to a few additional resources:


Class schedule



Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the SDS directory), so please check that account regularly.

Last update: 17/10/2022 12:23
Non cliccare qui!