Vai al contenuto principale

Department of Mathematics "Giuseppe Peano"

# Laurea Magistrale (M.Sc.) in Stochastics and Data Science

Oggetto:
Oggetto:

Oggetto:

## Multivariate statistical analysis

Oggetto:

Course ID
MAT0041
Teacher
Pierpaolo De Blasi
Year
1st year
Teaching period
Second semester
Type
D.M. 270 TAF C - Related or integrative
Credits/Recognition
6
Course disciplinary sector (SSD)
SECS-S/01 - statistica
Delivery
Formal authority
Language
English
Attendance
Mandatory
Type of examination
Written
Prerequisites
Undergraduate level courses in Probability and in Statistics. Some basic knowledge of linear algebra and analysis is also required.
Propedeutic for
Statistical Machine Learning
Oggetto:

Oggetto:

## Course objectives

The course aims at introducing the students to multivariate analysis in statistical modeling. Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. Multivariate analysis includes methods both for describing and exploring such data and for making formal inferences about them. The computations involved in applying most multivariate techniques are considerable, and their routine use requires a suitable software package. In addition, most analyses of multivariate data should involve the construction of appropriate graphs and diagrams and this will also need to be carried out by the same package. In this course illustrations are provided on real datasets with the help of the R statistical software.

Oggetto:

## Results of learning outcomes

- Knowledge and understanding

The student will learn the basic techniques for analyzing multi-dimensional data, including visualization, study multivariate distributions and their properties

- Applying knowledge and understanding

Ability to discuss various methods for dimension reduction.

- Making judgements

The student will be able to select the appropriate multivariate method to analyze large datasets by the help of statistical softwares in both supervised and unsupervised learning.

- Communication skills

Students will properly use statistical language to comunicate the results of their findings

Oggetto:

## Program

- Introduction
- summary statistics for multivariate data
- multivariate data visualization
- multivariate Gaussian distributions
- Principal Component Analysis (PCA):
- geometric and algebraic basics of PCA
- calculation and choice of components
- plotting PCs, interpretation
- Factor Analysis (FA):
- model definition and assumptions
- choice of the number of factors
- factor rotation
- Canonical Correlation Analysis:
- computation and interpretation
- relationship with multiple regression
- Discriminant Analysis and Classification:
- classification rules
- error rates
- Cluster Analysis:
- measure of similarity
- hierarchical clustering
- K-means clustering
- model based clustering

Oggetto:

## Course delivery

The course is composed of 48 hours of class lectures. Examples and exercises will be dealt with at class through the R language.

If the COVID 19 health alert status persists in spring 2021, lectures will be held remotely, either as live streaming or pre-recorded, and made available online, together with slides and other course material, through the Moodle platform at URL:

https://math.i-learn.unito.it/course/view.php?id=1406

Some additional activities meant to favour direct interactions between professors and students may be organised as meetings in presence, under appropriate conditions of social distancing and in compliance with future existing regulations. Students not able to participate will have the chance to follow such activities through the online course material.

Details will be provided via announcement through Moodle to the registered students.

Oggetto:

## Learning assessment methods

Problem Sets:
There will be 2 problem sets assigned throughout the course.  They will be posted in due time on
together with an indication of the deadline.
Problem sets must be submitted and there are no late submissions. They are an essential part of the course, providing students with a guide on how well they are grasping the material on a "real time" basis. They request the solution of exercises, solution which might require the use of a statistical software. Students are encouraged to work in groups on the problem sets. However, students should understand the material on their own, and hand in their own problem sets.

Exam:
There will be a final exam, check out for dates on
http://www.master-sds.unito.it

The final examination consists in a written test, either a short or a long test according to the completion of both problem sets. Specifically,

(1) First 2 exam dates: the course grade is determined by the problem sets and the final exam. The final exam consists in a short written test (1h) on the part of the program not covered by problem sets followed by an oral examination. The final grade will be a combination of the problem sets grades (70%) and the final exam grade (30%).  For students who have failed to submit the solutions of the problem sets, case (2) below applies.

(2) From the 3rd exam date on: the final exam consists in a long written test (3h) on the whole program and the final grade will be determined solely by it.

During the Covid-19 emergency the final exam, both long and short versions, will consist in a written test with video surveillance on Webex. This applies to September 2020 and Summer 2021 sessions.

Oggetto:

The bibliography, to be confirmed at the beginning of the course, is:

- R.A. Johnson and D.W. Wichern (2007). Applied Multivariate Statistical Analysis. Prentice-Hall, 6th Ed.

- Hastie, Tibshirani, Friedman (2009). The Elements of Statistical Learning, 2nd ed., Springer
- Rencher A. C., Christensen W. F. (2012). Methods of multivariate analysis, 3rd ed., Wiley
- Rencher A.C. (1992). Interpretation of canonical discriminant functions, canonical variates and principal components. The American Statistician 46, 217-225.

Oggetto:

Oggetto:

## Class schedule

Enroll
• Open
Enrollment opening date
01/09/2020 at 00:00
Enrollment closing date
30/06/2021 at 00:00
Oggetto:
Last update: 30/04/2021 14:29