Vai al contenuto principale

Introduction to data mining


Introduction to data mining


Academic year 2023/2024

Course ID
Mirko Polato (Lecturer)
Robert Birke (Lecturer)
2nd year
Teaching period
First semester
D.M. 270 TAF C - Related or integrative
Course disciplinary sector (SSD)
INF/01 - informatics
Class Lectures
Type of examination
Written and oral
- Acquaintance with the basic concepts of linear algebra, calculus, probability and statistics.
- A basic knowledge of the Python programming language.

Sommario del corso


Course objectives

The course is positioned in the context of the Master's Degree in Stochastic and Data Science. It concurs with the objectives of the degree by providing theoretical and practical knowledge to perform real data science tasks on different types of data and to reason about the properties of Data Mining (and Machine Learning) models and algorithms used to solve specific mining/learning tasks.

The course's first focus is data, which is essential to any mining/learning task. The course introduces the main techniques to handle data, perform data cleaning and pre-processing, and assess data quality.

Starting from data, the course teaches the differences between tasks and models and introduces the students to popular Data Mining models.

Specifically, the course provides an overview of the main supervised and unsupervised learning tasks, ranging from classification to clustering algorithms, discussed theoretically and practically.

Particular attention will be given to the practical aspects by introducing the students to some of the most popular Python libraries for data science (e.g., numpy, pandas, scikit-learn).

Finally, the course will introduce the students to the main concepts of privacy-preserving and federated learning.

Results of learning outcomes

Through the course, students will acquire the knowledge and skills to perform real data science tasks on different types of data and the ability to reason about the properties of the models and algorithms used to solve specific mining/learning tasks.

Knowledge and understanding
Students will be mastering some of the main concepts in Data Mining and Machine Learning.

Applying knowledge and understanding
The students will be able to use the learned knowledge in the context of a modern programming language and libraries to solve some real data science tasks (such as classifying examples into a set of given classes, clustering data into meaningful groups, etc.).

Making judgments
Students will learn to judge the suitability of a given model or the properties of a learning algorithm for a given data set.
These abilities will be refined practicing with different Machine Learning and Data Mining methods applied to real-case problems.

Learning skills
Students will be given the opportunity to test and self-assess their own knowledge and skills via quizzes and polls.


- Introduction to the course
- Data
- types of data
- data quality
- data pre-processing
- similarity and dissimilarity
- avoiding false discoveries
- Classification
- basic concepts
- decision trees
- model evaluation
- Advanced Classification
- rule-based classifiers
- nearest neighbors classifier
- support-vector machines
- neural networks
- Clustering
- overview
- k-means
- hierarchical clustering
- db-scan
- cluster evaluation
- graph-based clustering
- Anomaly detection
- clustering-based approach
- one-class classification
- Privacy and Federated Learning

Course delivery

The course will be held in presence.

The course is mainly lecture-based, with some laboratory sessions.

Lectures will cover the theoretical aspects (17 lectures, 34 hours), while the laboratory will focus on the practical grounds (7 laboratory sessions, 14 hours).

During the lectures, students are encouraged to participate in live polls, Q&A, and quizzes (through tools like Mentimeter, Kahoot, Slido, etc.) to test their understanding and to check the presence of any misconceptions and/or biases.

Laboratory sessions will leverage the Python programming language and make use of Python notebooks (e.g., Google Colaboratory). Small assignments will be given to the students to further develop their understanding of the presented concepts and algorithms.

Learning assessment methods

All exams will be held in presence.

The final exam is divided into two parts: (i) a written test and (ii) an oral examination:

i) The written test will be composed of open questions, each covering a different course topic. The written test will assess the understanding of the presented concepts and the student's ability to solve small practical problems. The written test will be graded up to 32 points.

ii) The oral examination will test the student's ability to understand the results of a data analysis session and to reason about the properties of the models and algorithms that are used to solve specific learning/mining tasks. The oral examination also allows the student to elaborate on the answers given in the written test. The oral examination will be graded up to 32 points.

To pass the exam students need to obtain at least 18 points in both the written and oral tests.
The final grade will take into account the results of both.

Suggested readings and bibliography

Introduction to Data Mining
Year of publication:  
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar


Class scheduleV

  • Closed
    Enrollment opening date
    01/09/2022 at 00:00
    Enrollment closing date
    30/06/2023 at 00:00
    Last update: 13/09/2023 10:25
    Non cliccare qui!