- Oggetto:
- Oggetto:
Statistical machine learning
- Oggetto:
Statistical machine learning
- Oggetto:
Academic year 2022/2023
- Course ID
- MAT0043
- Teacher
- Silvia Montagna
- Year
- 2nd year
- Teaching period
- First semester
- Type
- D.M. 270 TAF C - Related or integrative
- Credits/Recognition
- 6
- Course disciplinary sector (SSD)
- SECS-S/01 - statisticsm
- Delivery
- Blended
- Language
- English
- Attendance
- Optional
- Type of examination
- Written
- Prerequisites
- Students must have training in linear algebra, calculus and multivariate calculus, basic probability, and statistics at mathematics BSc level or above. Students must also master at least one programming language (e.g., knowledge of data structures, control flow, basic functions for data manipulation, visualisation tools, reading/writing files). R is highly recommended.
- Oggetto:
Sommario del corso
- Oggetto:
Course objectives
Machine learning studies methods that can automatically extract patterns and trends from big amount of data, and then use these patterns to predict future data or other outcomes of interest.
This course is intended to introduce master students to fundamental concepts and techniques in machine learning from a statistical perspective, with a focus on supervised learning. Topics include regression, classification, model selection and regularisation, and tree-based methods, among others. The course emphasises the selection of appropriate methods and justification of choice, the use of programming for the implementation of the method, and the evaluation and effective communication of results in data analysis reports.
- Oggetto:
Results of learning outcomes
On completion of the course, the student is expected to:
- Describe and apply a range of machine learning techniques to perform prediction/classification
- Convert various applied problems and data into statistical models, based on an appreciation of their relative suitability to different tasks
- Implement a range of techniques using statistical software
- Communicate findings using appropriate statistical language in written reports
The acquired skills will give students the opportunity to improve and deepen their knowledge of statistical modelling.
- Oggetto:
Course delivery
The course is delivered through a combination of lectures and tutorials for a total of 48 hours, comprising of two 2-hours classes per week.
Lectures (approx. 30 hours) are devoted to the methodological/theoretical aspects of statistical machine learning, with reproducible examples given to support the understanding of methods.
Tutorials are devoted to the detailed analysis of case studies using current, relevant data, including the practical implementation of the analysis. Tutorials will involve conducting meaningful exploratory data analysis for the case study at hand, formulating and fitting a statistical model, assessing model adequacy, proposing graphical displays to illustrate the results of the analysis and “tell the story”, and thinking about potential threats to the validity of the analysis. Tutorials will be done in R, and students will be able to use RMarkdown for creating HTML and pdf documents. For the exam (see below), students are permitted to use any programming language and any programming environment/OS; however, R is the officially supported language for this course.
A module of the course, included in the overall courseload, will be taught by Visiting Professor Vinayak Rao (Purdue University, USA) on Causal Inference in Machine Learning.
All classes are foreseen to be taught in presence. All course material will be posted on Moodle.
- Oggetto:
Learning assessment methods
The final examination consists in a written, closed-books test on theory, and a data analysis. Specifically:
1) Winter (January/February) exam session: your final grade will be based on a weighted average of a data analysis project (60%) and a closed-books written exam on theory (40%). For the data analysis, students can work individually or in teams (max 3 people) on a project of their choosing. Each student/team will have to submit a project report due on Friday, December 16. Students will then be asked (on an individual, not group, basis) to provide feedback and comments on one case study presented by another student/group. The review process will be "double blind", meaning that both the project report and review report will be anonymous. A small part of the case study project grade will be based on completion of the assigned review
2) Summer and Fall exam sessions: the final exam consists of a long written test (4 hours) comprising both theory (50%, closed-books) and analysis of one (assigned) case study (50%).
Note: for students who choose option (1) but fail to submit their case study on time, case (2) above applies.
- Oggetto:
Program
Introduction to Statistical Learning:
- Context and motivations
- Trade-off between goodness-of-fit and model complexity (i.e., variance and bias)
- Training and test set
Regression:
- Exploratory data analysis
- Simple & multiple linear regression
- Residual analysis & model checking
Classification: Logistic regression, KNN
Resampling methods
Shrinkage methods and regularisation
Tree-based methods:
- Regression & classification trees
- Bagging, boosting, random forests
Introduction to neural networks & deep learning:
- (Single- and) multi-hidden layers back-propagation networks
- Issues in training neural networks
- Convolutional neural networks
- Autoencoders
Fundamentals of causal inference in machine learning
Suggested readings and bibliography
- Title:
- An Introduction to Statistical Learning with Applications in R (2nd Edition)
- Year of publication:
- 2021
- Publisher:
- Springer
- Author:
- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
- Required:
- No
- Oggetto:
Course literature
The main books used during the course are:
- HASTIE, TIBSHIRANI AND FRIEDMAN. The elements of statistical learning: data mining, inference and prediction. Springer-Verlag
- JAMES, WITTEN, HASTIE, TIBSHIRANI. An introduction to statistical learning with applications in R. Springer. Provides a nice introduction to the field of statistical machine learning for non-mathematical sciences
Slides for the course will be provided. If you see any typos in my notes (no matter how small), please tell me about them! Doing so will not only benefit you, but also myself, your classmates and any future students of this course.
Supplementary reading
There are many books written on machine learning, and new books keep appearing all the time. These books can approach the field from different perspectives (e.g., statistics, computer science, probability). Here are links to a few additional resources:
- Bishop. Pattern Recognition and Machine Learning Springer
- Murphy. Machine learning - a probabilistic perspective, MIT Press
- Barber. Bayesian Reasoning and Machine Learning, Cambridge University Press
- Oggetto:
Class schedule
- Oggetto:
Note
Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the SDS directory), so please check that account regularly.
- Oggetto: