Vai al contenuto principale
Oggetto:
Oggetto:

Statistical machine learning

Oggetto:

Statistical machine learning

Oggetto:

Academic year 2019/2020

Course ID
MAT0043
Teacher
Silvia Montagna
Year
2nd year
Teaching period
First semester
Type
D.M. 270 TAF C - Related or integrative
Credits/Recognition
6
Course disciplinary sector (SSD)
SECS-S/01 - statistica
Delivery
Formal authority
Language
English
Attendance
Optional
Type of examination
Mixed
Prerequisites
MAT0035 Statistical Inference
MAT0041 Multivariate Statistical Analysis
Good knowledge of R is required
Oggetto:

Sommario del corso

Oggetto:

Course objectives

The course introduces methods and models to extract important patterns and trends from big amount of data, and presents basic concepts of machine learning and data mining from a statistical perspective. Topics covered include modern regression, classification, cross validation, model selection and regularisation, and tree-based methods, among others. The course emphasizes selection of appropriate methods and justification of choice, use of programming for implementation of the method, and evaluation and effective communication of results in data analysis reports. 

Oggetto:

Results of learning outcomes

Knowledge and understanding 

  • Advances knowledge of parametric and nonparametric models for prediction and classification

Applying knowledge and understanding

  • Ability to convert various problems and data into statistical models to perform prediction/classification

Making judgements

  • Students will be able to discern the different aspects of statistical learning in modern settings

Communication skills

  • Students will properly use statistical language to comunicate the results of their findings

Learning skills

  • The skills acquired will give students the opportunity to improve and deepen their knowledge of statistical modeling
Oggetto:

Course delivery

The course is composed of 48 hours of class lectures. Half of the lectures are devoted to the theorerical aspects of statistical machine learning, and the remaining half to their practical implemetation. We will use R as a programming language for data analysis and use existing packages written in R to support the course. Students will use RMarkdown for creating HTML and pdf documents. 

Oggetto:

Learning assessment methods

1) Winter (January/February) exam session: your final grade will be based on a weighted average of a data analysis project (60%) and a closed-books written exam on theory (40%). For the data analysis, students can work individually or in teams (max 3 people) on a project of their choosing. Each student/team is required to prepare a short presentation summarising the project to the class. Oral project presentations will be held during the last week of classes and will be evaluated, counting towards the final grade for the data analysis part. Further, each student will have to submit an individual project report due on Friday, December 20. 

For students who have failed to submit their data analysis (in total or in part), case (2) below applies.

2) Summer and Fall exam sessions: the final exam consists of a long written test (4 hours) on theory (50%) and data analysis in R (50%). 

Until the end of the Covid-19 emergency, the written exam will be held remotely via Webex with video surveillance. More specific instructions will be given to students registered to the exam via their institutional email addresses.

Oggetto:

Program

Introduction to Statistical Learning

  • Context and motivations
  • Trade-off between goodness-of-fit and model complexity (i.e. variance and bias)
  • Training and test set

Regression

  • Exploratory data analysis
  • Simple & multiple linear regression
  • Residual analysis & model checking

Classification: Logistic regression; Multinomial logit/probit regression

Resampling methods: Cross-validation, bootstrap

Model selection

  • Subset selection 
  • Shrinkage methods
  • Dimension reduction methods

Beyond linearity:

  • Polynomial regression
  • Step functions
  • Splines & smoothing splines
  • Generalised additive models
  • Kernels

Tree-based methods

  • Regression & classification trees
  • Bagging, boosting, random forests

Support vector machines

Introduction to neural networks

 

 

 

 

 

Suggested readings and bibliography

Oggetto:

This is an applied course, which will be based on:

  • JAMES, WITTEN, HASTIE, TIBSHIRANI. An introduction to statistical learning with applications in R. Springer. 

This is available freely at www-bcf.usc.edu/~gareth/ISL. You are welcome to download it and print it out.

Another useful resource (also available freely online) is:

  • HASTIE, TIBSHIRANI AND FRIEDMAN. The elements of statistical learning: data mining, inference and prediction. Springer-Verlag.

Slides for the course will be provided. If you see any typos in my notes (no matter how small), please tell me about them! Doing so will not only benefit you, but also myself, your classmates and any future students of this course.

 



Oggetto:

Class schedule

Oggetto:

Note

This course will be delivered at the ESOMAS Department.

Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the SDS directory), so please check that account regularly.

Oggetto:
Last update: 28/04/2020 12:53
Non cliccare qui!