Vai al contenuto principale
Oggetto:
Oggetto:

Statistical machine learning

Oggetto:

Statistical machine learning

Oggetto:

Academic year 2022/2023

Course ID
MAT0043
Teacher
Silvia Montagna
Year
2nd year
Teaching period
First semester
Type
D.M. 270 TAF C - Related or integrative
Credits/Recognition
6
Course disciplinary sector (SSD)
SECS-S/01 - statisticsm
Delivery
Blended
Language
English
Attendance
Optional
Type of examination
Written
Prerequisites
Students must have training in linear algebra, calculus and multivariate calculus, basic probability, and statistics at mathematics BSc level or above. Students must also master at least one programming language (e.g., knowledge of data structures, control flow, basic functions for data manipulation, visualisation tools, reading/writing files). R is highly recommended.
Oggetto:

Sommario del corso

Oggetto:

Course objectives

Machine learning studies methods that can automatically extract patterns and trends from big amount of data, and then use these patterns to predict future data or other outcomes of interest.

This course is intended to introduce master students to fundamental concepts and techniques in machine learning from a statistical perspective, with a focus on supervised learning. Topics include regression, classification, model selection and regularisation, and tree-based methods, among others. The course emphasises the selection of appropriate methods and justification of choice, the use of programming for the implementation of the method, and the evaluation and effective communication of results in data analysis reports.

 

Oggetto:

Results of learning outcomes

On completion of the course, the student is expected to:

  • Describe and apply a range of machine learning techniques to perform prediction/classification
  • Convert various applied problems and data into statistical models, based on an appreciation of their relative suitability to different tasks
  • Implement a range of techniques using statistical software
  • Communicate findings using appropriate statistical language in written reports

The acquired skills will give students the opportunity to improve and deepen their knowledge of statistical modelling.

Oggetto:

Course delivery

The course is delivered through a combination of lectures and tutorials for a total of 48 hours, comprising of two 2-hours classes per week.

Lectures (approx. 30 hours) are devoted to the methodological/theoretical aspects of statistical machine learning, with reproducible examples given to support the understanding of methods.

Tutorials are devoted to the detailed analysis of case studies using current, relevant data, including the practical implementation of the analysis. Tutorials will involve conducting meaningful exploratory data analysis for the case study at hand, formulating and fitting a statistical model, assessing model adequacy, proposing graphical displays to illustrate the results of the analysis and “tell the story”, and thinking about potential threats to the validity of the analysis. Tutorials will be done in R, and students will be able to use RMarkdown for creating HTML and pdf documents. For the exam (see below), students are permitted to use any programming language and any programming environment/OS; however, R is the officially supported language for this course.

A module of the course, included in the overall courseload, will be taught by Visiting Professor Vinayak Rao (Purdue University, USA) on Causal Inference in Machine Learning.

All classes are foreseen to be taught in presence. All course material will be posted on Moodle.

 

Oggetto:

Learning assessment methods

The final examination consists in a written, closed-books test on theory, and a data analysis. Specifically:

1) Winter (January/February) exam session: your final grade will be based on a weighted average of a data analysis project (60%) and a closed-books written exam on theory (40%). For the data analysis, students can work individually or in teams (max 3 people) on a project of their choosing. Each student/team will have to submit a project report due on Friday, December 16. Students will then be asked (on an individual, not group, basis) to provide feedback and comments on one case study presented by another student/group. The review process will be "double blind", meaning that both the project report and review report will be anonymous. A small part of the case study project grade will be based on completion of the assigned review

2) Summer and Fall exam sessions: the final exam consists of a long written test (4 hours) comprising both theory (50%, closed-books) and analysis of one (assigned) case study (50%). 

Note: for students who choose option (1) but fail to submit their case study on time, case (2) above applies.

Oggetto:

Program

Introduction to Statistical Learning:

  • Context and motivations
  • Trade-off between goodness-of-fit and model complexity (i.e., variance and bias)
  • Training and test set

Regression:

  • Exploratory data analysis
  • Simple & multiple linear regression
  • Residual analysis & model checking

Classification: Logistic regression, KNN  

Resampling methods

Shrinkage methods and regularisation

Tree-based methods:

  • Regression & classification trees
  • Bagging, boosting, random forests

Introduction to neural networks & deep learning: 

  • (Single- and) multi-hidden layers back-propagation networks
  • Issues in training neural networks
  • Convolutional neural networks
  • Autoencoders

Fundamentals of causal inference in machine learning

Suggested readings and bibliography

Title:  
An Introduction to Statistical Learning with Applications in R (2nd Edition)
Year of publication:  
2021
Publisher:  
Springer
Author:  
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
Required:  
No
Oggetto:

Course literature

The main books used during the course are:

Slides for the course will be provided. If you see any typos in my notes (no matter how small), please tell me about them! Doing so will not only benefit you, but also myself, your classmates and any future students of this course.

Supplementary reading

There are many books written on machine learning, and new books keep appearing all the time. These books can approach the field from different perspectives (e.g., statistics, computer science, probability). Here are links to a few additional resources:



Oggetto:

Class schedule

Oggetto:

Note

Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the SDS directory), so please check that account regularly.

Oggetto:
Last update: 17/10/2022 12:23
Location: https://www.master-sds.unito.it/robots.html
Non cliccare qui!