Statistical machine learning

Oggetto:

Statistical machine learning

Oggetto:

Statistical machine learning

Oggetto:

Academic year 2021/2022

Course ID

MAT0043

Teacher

Silvia Montagna

Year

2nd year

Teaching period

First semester

Type

D.M. 270 TAF C - Related or integrative

Credits/Recognition

Course disciplinary sector (SSD)

SECS-S/01 - statistica

Delivery

Blended

Language

English

Attendance

Optional

Type of examination

Written

Prerequisites

MAT0035 Statistical Inference
MAT0041 Multivariate Statistical Analysis
Good knowledge of R is required

Oggetto:

Course objectives

The course introduces methods and models to extract important patterns and trends from big amount of data, and presents basic concepts of machine learning and data mining from a statistical perspective. Topics covered include modern regression, classification, cross validation, model selection and regularisation, and tree-based methods, among others. The course emphasizes selection of appropriate methods and justification of choice, use of programming for implementation of the method, and evaluation and effective communication of results in data analysis reports.

Oggetto:

Results of learning outcomes

Knowledge and understanding

Advance knowledge of parametric and nonparametric models for prediction and classification

Applying knowledge and understanding

Ability to convert various problems and data into statistical models to perform prediction/classification

Making judgements

Students will be able to discern the different aspects of statistical learning in modern settings

Communication skills

Students will properly use statistical language to comunicate the results of their findings

Learning skills

The acquired skills will give students the opportunity to improve and deepen their knowledge of statistical modeling

Oggetto:

Course delivery

The course is composed of 48 hours of lectures. Two thirds of the lectures are devoted to the methodological/theoretical aspects of statistical machine learning, with reproducible examples given to support the understanding of methods. The remaining lectures are devoted to their practical implementation in R. Students are free to use any programming language; however, R is the officially supported language for this course. Students will be able to use RMarkdown for creating HTML and pdf documents.

Please check this page for the teaching modalities foreseen for the a.y. 2021/22.

All course material will be posted on Moodle.

Oggetto:

Learning assessment methods

Until the end of the Covid-19 emergency (including Sept. 2021), the written exam will be held remotely via Webex with video surveillance. More specific instructions will be given to students registered to the exam via their institutional email addresses.

1) Winter (January/February) exam session: your final grade will be based on a weighted average of a data analysis project (60%) and a closed-books written exam on theory (40%). For the data analysis, students can work individually or in teams (max 3 people) on a project of their choosing. Each student/team will have to submit a project report due on Friday, December 18. A team's report will be reviewed by another group. The review process will be "double blind", meaning that both the project report and review report will be anonymous.

For students who have failed to submit their data analysis (in total or in part), case (2) below applies.

2) Summer and Fall exam sessions: the final exam consists of a long written test (4 hours) on theory (50%) and data analysis in R (50%).

Oggetto:

Program

Introduction to Statistical Learning:

Context and motivations
Trade-off between goodness-of-fit and model complexity (i.e., variance and bias)
Training and test set

Regression:

Exploratory data analysis
Simple & multiple linear regression
Residual analysis & model checking

Classification: Logistic regression; Multinomial logit/probit regression

Resampling methods: Cross-validation, bootstrap

Model selection:

Shrinkage methods
Dimension reduction methods

Beyond linearity:

Polynomial regression
Step functions
Splines, smoothing splines, thin-plate splines
Generalised additive models
Kernels

Tree-based methods:

Regression & classification trees
Bagging, boosting, random forests

Support vector machines

Introduction to neural networks & deep learning:

(Single- and) multi-hidden layers back-propagation networks
Issues in training neural networks
Convolutional neural networks
Autoencoders

Gaussian processes

Descrizione

Class schedule

Oggetto:

Note

Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the SDS directory), so please check that account regularly.

Oggetto:

Laurea Magistrale (M.Sc.) in Stochastics and Data Science

Statistical machine learning

Statistical machine learning

Academic year 2021/2022

Course objectives

Results of learning outcomes

Course delivery

Learning assessment methods

Program

Suggested readings and bibliography

Class schedule

Note

Statistical machine learning

Statistical machine learning

Academic year 2021/2022

Sommario del corso

Course objectives

Results of learning outcomes

Course delivery

Learning assessment methods

Program

Suggested readings and bibliography

Class schedule

Note