MULTIVARIATE STATISTICAL ANALYSIS
MULTIVARIATE STATISTICAL ANALYSIS
Academic year 2015/2016
- Course ID
- Pierpaolo De Blasi
- 1st year
- Teaching period
- Second semester
- D.M. 270 TAF C - Related or integrative
- Course disciplinary sector (SSD)
- SECS-S/01 - statistica
- Class Lecture
- Type of examination
- Probability Theory
- Propedeutic for
- Statistical Machine Learning
Sommario del corso
The course aims at introducing multivariate analysis in statistical modeling. All the methods will be implemented on real dataset in the R language.
Results of learning outcomes
The student will learn the basic techniques for analyzing multi-dimensional data (including visualization), study multivariate distributions and their properties, discuss various methods for dimension reduction.
The course is composed of 48 hours of class lectures. Examples and exercises will be dealt with at class through the R language.
Learning assessment methods
There will be 2/3 problem sets assigned throughout the course. They will be posted in due time on
together with an indication of the deadline.
Problem sets must be submitted and there are no late submissions. They are an essential part of the course, providing students with a guide on how well they are grasping the material on a “real time” basis. They request the solution of two or more exercises, solution which might require the use of a statistical software. Students are encouraged to work in groups on the problem sets. However, students should understand the material on their own, and hand in their own problem sets.
There will be a final exam, check out for dates on
The final examination includes a written or an oral test according to the problem sets.
Specifically, in the first 2 exam dates, the grade will be determined either by
(1.1) problem session (50%)
(1.2) final exam via oral test (50%)
(2) final exam via written test (100%)
for students who have failed to submit the problem sets. From 3rd exam date on, only case (2) above applies.
- summary statistics for multivariate data
- multivariate data visualization
- multivariate Normal distributions
- Principal Component Analysis (PCA):
- geometric and algebraic basics of PCA
- calculation and choice of components
- plotting PCs, interpretation
- Factor Analysis (FA):
- model definition and assumptions
- estimation of loadings and communalities
- choice of the number of factors
- factor rotation
- Canonical Correlation Analysis:
- computation and interpretation
- relationship with multiple regression
- Discriminant Analysis and Classification:
- classification rules
- linear and quadratic discrimination
- error rates
- Cluster Analysis:
- measure of similarity
- hierarchical clustering
- K-means clustering
- model based clustering
Suggested readings and bibliography
The bibliography, to be confirmed at the beginning of the course, is:
- R.A. Johnson and D.W. Wichern (2007). Applied Multivariate Statistical Analysis. Prentice-Hall, 6th Ed.
- Afifi A., May S., Clark V.A. (2012). Practical Multivariate Analysis, 5th ed., Chapman & Hall/CRC
- Everitt B. (2005). An R and S-PLUS Companion to Multivariate Analysis. Springer
- Rencher A. C., Christensen W. F. (2012). Methods of multivariate analysis, 3rd ed., Wiley
- Rencher A.C. (1992). Interpretation of canonical discriminant functions, canonical variates and principal components. The American Statistician 46, 217-225.
Days Time Classroom