Multivariate statistical analysis
Multivariate statistical analysis
Academic year 2020/2021
- Course ID
- Pierpaolo De Blasi
- 1st year
- Teaching period
- Second semester
- D.M. 270 TAF C - Related or integrative
- Course disciplinary sector (SSD)
- SECS-S/01 - statistica
- Formal authority
- Type of examination
- Undergraduate level courses in Probability and in Statistics. Some basic knowledge of linear algebra and analysis is also required.
- Propedeutic for
- Statistical Machine Learning
Sommario del corso
The course aims at introducing the students to multivariate analysis in statistical modeling. Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. Multivariate analysis includes methods both for describing and exploring such data and for making formal inferences about them. The computations involved in applying most multivariate techniques are considerable, and their routine use requires a suitable software package. In addition, most analyses of multivariate data should involve the construction of appropriate graphs and diagrams and this will also need to be carried out by the same package. In this course illustrations are provided on real datasets with the help of the R statistical software.
Results of learning outcomes
- Knowledge and understanding
The student will learn the basic techniques for analyzing multi-dimensional data, including visualization, study multivariate distributions and their properties
- Applying knowledge and understanding
Ability to discuss various methods for dimension reduction.
- Making judgements
The student will be able to select the appropriate multivariate method to analyze large datasets by the help of statistical softwares in both supervised and unsupervised learning.
- Communication skills
Students will properly use statistical language to comunicate the results of their findings
- summary statistics for multivariate data
- multivariate data visualization
- multivariate Gaussian distributions
- Principal Component Analysis (PCA):
- geometric and algebraic basics of PCA
- calculation and choice of components
- plotting PCs, interpretation
- Factor Analysis (FA):
- model definition and assumptions
- estimation of loadings and communalities
- choice of the number of factors
- factor rotation
- Canonical Correlation Analysis:
- computation and interpretation
- relationship with multiple regression
- Discriminant Analysis and Classification:
- classification rules
- linear and quadratic discrimination
- error rates
- Cluster Analysis:
- measure of similarity
- hierarchical clustering
- K-means clustering
- model based clustering
The course is composed of 48 hours of class lectures. Examples and exercises will be dealt with at class through the R language.
If the COVID 19 health alert status persists in spring 2021, lectures will be held remotely, either as live streaming or pre-recorded, and made available online, together with slides and other course material, through the Moodle platform at URL:
Some additional activities meant to favour direct interactions between professors and students may be organised as meetings in presence, under appropriate conditions of social distancing and in compliance with future existing regulations. Students not able to participate will have the chance to follow such activities through the online course material.
Details will be provided via announcement through Moodle to the registered students.
Learning assessment methods
There will be 2 problem sets assigned throughout the course. They will be posted in due time on
together with an indication of the deadline.
Problem sets must be submitted and there are no late submissions. They are an essential part of the course, providing students with a guide on how well they are grasping the material on a "real time" basis. They request the solution of exercises, solution which might require the use of a statistical software. Students are encouraged to work in groups on the problem sets. However, students should understand the material on their own, and hand in their own problem sets.
There will be a final exam, check out for dates on
The final examination consists in a written test, either a short or a long test according to the completion of both problem sets. Specifically,
(1) First 2 exam dates: the course grade is determined by the problem sets and the final exam. The final exam consists in a short written test (1h) on the part of the program not covered by problem sets followed by an oral examination. The final grade will be a combination of the problem sets grades (70%) and the final exam grade (30%). For students who have failed to submit the solutions of the problem sets, case (2) below applies.
(2) From the 3rd exam date on: the final exam consists in a long written test (3h) on the whole program and the final grade will be determined solely by it.
During the Covid-19 emergency the final exam, both long and short versions, will consist in a written test with video surveillance on Webex. This applies to September 2020 and Summer 2021 sessions.
Suggested readings and bibliography
The bibliography, to be confirmed at the beginning of the course, is:
- R.A. Johnson and D.W. Wichern (2007). Applied Multivariate Statistical Analysis. Prentice-Hall, 6th Ed.
- Hastie, Tibshirani, Friedman (2009). The Elements of Statistical Learning, 2nd ed., Springer
- Rencher A. C., Christensen W. F. (2012). Methods of multivariate analysis, 3rd ed., Wiley
- Rencher A.C. (1992). Interpretation of canonical discriminant functions, canonical variates and principal components. The American Statistician 46, 217-225.
Courses that borrow this teaching
- Enrollment opening date
- 01/09/2020 at 00:00
- Enrollment closing date
- 30/06/2021 at 00:00