Vai al contenuto principale

Department of Mathematics "Giuseppe Peano"

# Laurea Magistrale (M.Sc.) in Stochastics and Data Science

Oggetto:
Oggetto:

Oggetto:

## Multivariate statistical analysis

Oggetto:

### Academic year 2023/2024

Course ID
MAT0041
Teacher
Pierpaolo De Blasi
Year
1st year
Teaching period
Second semester
Type
D.M. 270 TAF C - Related or integrative
Credits/Recognition
6
Course disciplinary sector (SSD)
SECS-S/01 - statistics
Delivery
Class Lectures
Language
English
Attendance
Optional
Type of examination
Written
Prerequisites
Undergraduate level courses in Probability and in Statistics. Some basic knowledge of linear algebra and analysis is also required.
Oggetto:

Oggetto:

## Course objectives

The course aims at introducing the students to multivariate analysis in statistical modeling. Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. Multivariate analysis includes methods both for describing and exploring such data and for making formal inferences about them. The computations involved in applying most multivariate techniques are considerable, and their routine use requires a suitable software package. In addition, most analyses of multivariate data should involve the construction of appropriate graphs and diagrams and this will also need to be carried out by the same package. In this course illustrations are provided on real datasets with the help of the R statistical software.

Oggetto:

## Results of learning outcomes

- Knowledge and understanding

The student will learn the basic techniques for analyzing multi-dimensional data, including visualization, study multivariate distributions and their properties

- Applying knowledge and understanding

Ability to discuss various methods for dimension reduction.

- Making judgements

The student will be able to select the appropriate multivariate method to analyze large datasets by the help of statistical softwares in both supervised and unsupervised learning.

- Communication skills

Students will properly use statistical language to comunicate the results of their findings

Oggetto:

## Program

- Introduction
- summary statistics for multivariate data
- multivariate data visualization
- multivariate Gaussian distributions
- Principal Component Analysis (PCA):
- geometric and algebraic basics of PCA
- calculation and choice of components
- plotting PCs, interpretation
- Factor Analysis (FA):
- model definition and assumptions
- choice of the number of factors
- factor rotation
- blind source separation and independent component analysis
- Discriminant Analysis and Classification:
- classification rules
- linear and quadratic discrimination
- error rates
- Cluster Analysis:
- measure of similarity
- hierarchical clustering
- K-means clustering
- model based clustering

Additional topics (if time permits)

- Canonical Correlation Analysis:
- computation and interpretation
- relationship with multiple regression

Oggetto:

## Course delivery

The course is composed of 48 hours of class lectures. Examples and exercises will be dealt with the R language.

The lectures will be in presence with exceptions in accordance with university regulations.

Teaching materials and updates will be delivered via Moodle (see button below).

Oggetto:

## Learning assessment methods

Problem Sets:
There will be 2 problem sets assigned throughout the course.  They will be posted in due time on the course's Moodle page (see button below) together with an indication of the deadline.

Problem sets must be submitted and there are no late submissions. They are an essential part of the course, providing students with a guide on how well they are grasping the material on a "real time" basis. They request the solution of exercises, solution which might require the use of a statistical software. Students are encouraged to work in groups on the problem sets. However, students should understand the material on their own, and hand in their own problem sets.

Exam:
There will be a final exam, check out for dates on
http://www.master-sds.unito.it

The final examination consists in a written test with open-ended questions, either a short or a long test according to the completion of both problem sets. Specifically,

(1) First 2 exam dates: the course grade is determined by the problem sets and the final exam. The final exam consists in a short written test (1h) on the part of the program not covered by problem sets. The final grade will be a combination of the problem sets grades (70%) and the final exam grade (30%).  For students who have failed to submit the solutions of the problem sets, case (2) below applies.

(2) From the 3rd exam date on: the final exam consists in a long written test (3h) on the whole program and the final grade will be determined solely by it.

The final assessment takes place in presence.

## Suggested readings and bibliography

Oggetto:

The bibliography, to be confirmed at the beginning of the course, is:

- Lecture notes

- R.A. Johnson and D.W. Wichern (2008). Applied Multivariate Statistical Analysis 6th ed. Pearson.

- Hastie, Tibshirani, Friedman (2009). The Elements of Statistical Learning, 2nd ed., Springer

E-copies can be found through Bibioteca di Economia e Management at

https://www.bem.unito.it/it/che-cosa-cerchi/testi-desame-e-altri-materiali-didattici

Oggetto:

Oggetto:

## Class schedule

Enroll
• Closed
Enrollment opening date
01/09/2022 at 00:00
Enrollment closing date
30/06/2023 at 00:00
Oggetto:
Last update: 09/05/2023 08:51
Location: https://www.master-sds.unito.it/robots.html