- Oggetto:
- Oggetto:
Statistical machine learning
- Oggetto:
Statistical machine learning
- Oggetto:
Academic year 2018/2019
- Course ID
- MAT0043
- Teacher
- Silvia Montagna
- Year
- 2nd year
- Teaching period
- First semester
- Type
- D.M. 270 TAF C - Related or integrative
- Credits/Recognition
- 6
- Course disciplinary sector (SSD)
- SECS-S/01 - statistica
- Delivery
- Formal authority
- Language
- English
- Attendance
- Optional
- Type of examination
- Mixed
- Prerequisites
- Familiarity with linear & logistic regression; Good knowledge of R is required
- Oggetto:
Sommario del corso
- Oggetto:
Course objectives
The course introduces methods and models to extract important patterns and trends from big amount of data, and presents basic concepts of machine learning and data mining from a statistical perspective. Topics covered include modern regression, classification, cross validation, model selection and regularisation, and tree-based methods, among others. The course emphasizes selection of appropriate methods and justification of choice, use of programming for implementation of the method, and evaluation and effective communication of results in data analysis reports.
- Oggetto:
Results of learning outcomes
Knowledge and understanding
- Advances knowledge of parametric and nonparametric models for prediction and classification
Applying knowledge and understanding
- Ability to convert various problems and data into statistical models to perform prediction/classification
Making judgements
- Students will be able to discern the different aspects of statistical learning in modern settings
Communication skills
- Students will properly use statistical language to comunicate the results of their findings
Learning skills
- The skills acquired will give students the opportunity to improve and deepen their knowledge of statistical modeling
- Oggetto:
Course delivery
The course is composed of 48 hours of class lectures. Half of the lectures are devoted to the theorerical aspects of statistical machine learning, and the remaining half to their practical implemetation. We will use R as a programming language for data analysis and use existing packages written in R to support the course. Students will use RMarkdown for creating HTML and pdf documents.
- Oggetto:
Learning assessment methods
1) Winter (January/February) exam session: your grade will be based on a weighted average of homework (30%), data analysis in R (40%), and a closed-book final exam on theory (30%). There will be three problems sets assigned throughout the course. You may work with other students on the homework assignments, but all students must write and turn in their own solutions. Assignments may be submitted either electronically or as a physical copy, and must be turned in as a .pdf file (i.e., no Word or hand-written documents). Assignments may also involve writing code to analyze data. Late submissions of homework assignments will not be accepted. For students who have failed to submit their homework solutions, case (2) below applies.
2) Summer and Fall exam sessions: the final exam consists of a long written test (4h) on theory (50%) and data analysis in R (50%).
- Oggetto:
Program
Introduction to Statistical Learning
- Context and motivations
- Trade-off between goodness-of-fit and model complexity (i.e. variance and bias)
- Training and test set
Regression
- Exploratory data analysis
- Simple & multiple linear regression
- Residual analysis & model checking
Classification: Logistic regression; Multinomial logit/probit regression
Resampling methods: Cross-validation, bootstrap
Model selection
- Subset selection
- Shrinkage methods
- Dimension reduction methods
Beyond linearity:
- Polynomial regression
- Step functions
- Splines & smoothing splines
- Generalised additive models
- Kernels
- Local regression
Tree-based methods
- Regression & classification trees
- Bagging, boosting, random forests, overview of Bayesian additive regression trees
Support vector machines
Introduction to neural networks
Scrivi testo qui...
Write text here...
Scrivi testo qui...
Write text here...Suggested readings and bibliography
- Oggetto:
This is an applied course, which will be based on:
- JAMES, WITTEN, HASTIE, TIBSHIRANI. An introduction to statistical learning with applications in R. Springer.
This is available freely at www-bcf.usc.edu/~gareth/ISL. You are welcome to download it and print it out.
Another useful resource (also available freely online) is:
- HASTIE, TIBSHIRANI AND FRIEDMAN. The elements of statistical learning: data mining, inference and prediction. Springer-Verlag.
Slides for the course will be provided. If you see any typos in my notes (no matter how small), please tell me about them! Doing so will not only benefit you, but also myself, your classmates and any future students of this course.
- Oggetto:
Class schedule
- Oggetto:
Note
This course will be delivered at the ESOMAS Department.
Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the SDS directory), so please check that account regularly.
- Oggetto: