Multivariate statistical analysis
Multivariate statistical analysis
Academic year 2023/2024
- Course ID
- Pierpaolo De Blasi
- 1st year
- Teaching period
- Second semester
- D.M. 270 TAF C - Related or integrative
- Course disciplinary sector (SSD)
- SECS-S/01 - statistics
- Class Lectures
- Type of examination
- Undergraduate level courses in Probability and in Statistics. Some basic knowledge of linear algebra and analysis is also required.
Sommario del corso
The course aims at introducing the students to multivariate analysis in statistical modeling. Multivariate data arise when researchers measure several variables on each unit in their sample. The majority of data sets collected by researchers in all disciplines are multivariate. Multivariate analysis includes methods both for describing and exploring such data and for making formal inferences about them. The computations involved in applying most multivariate techniques are considerable, and their routine use requires a suitable software package. In addition, most analyses of multivariate data should involve the construction of appropriate graphs and diagrams and this will also need to be carried out by the same package. In this course illustrations are provided on real datasets with the help of the R statistical software.
Results of learning outcomes
- Knowledge and understanding
The student will learn the basic techniques for analyzing multi-dimensional data, including visualization, study multivariate distributions and their properties
- Applying knowledge and understanding
Ability to discuss various methods for dimension reduction.
- Making judgements
The student will be able to select the appropriate multivariate method to analyze large datasets by the help of statistical softwares in both supervised and unsupervised learning.
- Communication skills
Students will properly use statistical language to comunicate the results of their findings
- summary statistics for multivariate data
- multivariate data visualization
- multivariate Gaussian distributions
- Principal Component Analysis (PCA):
- geometric and algebraic basics of PCA
- calculation and choice of components
- plotting PCs, interpretation
- Factor Analysis (FA):
- model definition and assumptions
- estimation of loadings and communalities
- choice of the number of factors
- factor rotation
- blind source separation and independent component analysis
- Discriminant Analysis and Classification:
- classification rules
- linear and quadratic discrimination
- error rates
- Cluster Analysis:
- measure of similarity
- hierarchical clustering
- K-means clustering
- model based clustering
Additional topics (if time permits)
- Canonical Correlation Analysis:
- computation and interpretation
- relationship with multiple regression
The course is composed of 48 hours of class lectures. Examples and exercises will be dealt with the R language.
The lectures will be in presence with exceptions in accordance with university regulations.
Teaching materials and updates will be delivered via Moodle (see button below).
Learning assessment methods
There will be 2 problem sets assigned throughout the course. They will be posted in due time on the course's Moodle page (see button below) together with an indication of the deadline.
Problem sets must be submitted and there are no late submissions. They are an essential part of the course, providing students with a guide on how well they are grasping the material on a "real time" basis. They request the solution of exercises, solution which might require the use of a statistical software. Students are encouraged to work in groups on the problem sets. However, students should understand the material on their own, and hand in their own problem sets.
There will be a final exam, check out for dates on
The final examination consists in a written test with open-ended questions, either a short or a long test according to the completion of both problem sets. Specifically,
(1) First 2 exam dates: the course grade is determined by the problem sets and the final exam. The final exam consists in a short written test (1h) on the part of the program not covered by problem sets. The final grade will be a combination of the problem sets grades (70%) and the final exam grade (30%). For students who have failed to submit the solutions of the problem sets, case (2) below applies.
(2) From the 3rd exam date on: the final exam consists in a long written test (3h) on the whole program and the final grade will be determined solely by it.
The final assessment takes place in presence.
Suggested readings and bibliography
The bibliography, to be confirmed at the beginning of the course, is:
- Lecture notes
- R.A. Johnson and D.W. Wichern (2008). Applied Multivariate Statistical Analysis 6th ed. Pearson.
- Hastie, Tibshirani, Friedman (2009). The Elements of Statistical Learning, 2nd ed., Springer
E-copies can be found through Bibioteca di Economia e Management at
Courses that borrow this teaching
- Enrollment opening date
- 01/09/2022 at 00:00
- Enrollment closing date
- 30/06/2023 at 00:00