- Oggetto:
- Oggetto:
Introduction to data mining
- Oggetto:
Introduction to data mining
- Oggetto:
Academic year 2024/2025
- Course ID
- MAT0051
- Teachers
- Mirko Polato (Lecturer)
Robert Birke (Lecturer) - Year
- 2nd year
- Teaching period
- First semester
- Type
- D.M. 270 TAF C - Related or integrative
- Credits/Recognition
- 6
- Course disciplinary sector (SSD)
- INF/01 - informatics
- Delivery
- Class Lectures
- Language
- English
- Attendance
- Optional
- Type of examination
- Written and oral
- Prerequisites
- - Acquaintance with the basic concepts of linear algebra, calculus, probability and statistics.
- A basic knowledge of the Python programming language. - Oggetto:
Sommario del corso
- Oggetto:
Course objectives
The course is positioned in the context of the Master's Degree in Stochastic and Data Science. It concurs with the objectives of the degree by providing theoretical and practical knowledge to perform real data science tasks on different types of data and to reason about the properties of Data Mining (and Machine Learning) models and algorithms used to solve specific mining/learning tasks.
The course's first focus is data, which is essential to any mining/learning task. The course introduces the main techniques to handle data, perform data cleaning and pre-processing, and assess data quality.
Starting from data, the course teaches the differences between tasks and models and introduces the students to popular Data Mining models.
Specifically, the course provides an overview of the main supervised and unsupervised learning tasks, ranging from classification to clustering algorithms, discussed theoretically and practically.
Particular attention will be given to the practical aspects by introducing the students to some of the most popular Python libraries for data science (e.g., numpy, pandas, scikit-learn).
Finally, the course will introduce the students to the main concepts of privacy-preserving and federated learning.- Oggetto:
Results of learning outcomes
Through the course, students will acquire the knowledge and skills to perform real data science tasks on different types of data and the ability to reason about the properties of the models and algorithms used to solve specific mining/learning tasks.
Knowledge and understandingStudents will be mastering some of the main concepts in Data Mining and Machine Learning.
Applying knowledge and understandingThe students will be able to use the learned knowledge in the context of a modern programming language and libraries to solve some real data science tasks (such as classifying examples into a set of given classes, clustering data into meaningful groups, etc.).
Making judgmentsStudents will learn to judge the suitability of a given model or the properties of a learning algorithm for a given data set.These abilities will be refined practicing with different Machine Learning and Data Mining methods applied to real-case problems.
Learning skillsStudents will be given the opportunity to test and self-assess their own knowledge and skills via quizzes and polls.- Oggetto:
Program
- Introduction to the course- Data- types of data- data quality- data pre-processing- similarity and dissimilarity- avoiding false discoveries- Classification- basic concepts- decision trees- model evaluation and selection- Advanced Classification- rule-based classifiers- nearest neighbours classifier- support-vector machines- neural networks- Clustering- overview- partitional clustering- hierarchical clustering- density-based clustering- graph-based clustering- cluster evaluation- Anomaly detection- clustering-based approach- one-class classification- Privacy and Federated Learning- Oggetto:
Course delivery
The course will be held in presence.
The course is mainly lecture-based, with some laboratory sessions.
Lectures will cover the theoretical aspects (17 lectures, 34 hours), while the laboratory will focus on the practical grounds (7 laboratory sessions, 14 hours).
During the lectures, students are encouraged to participate in live polls, Q&A, and quizzes to test their understanding and to check the presence of any misconceptions and/or biases.
Laboratory sessions aim to develop further the student's understanding of the presented concepts and algorithms. Examples will be based on notebooks written in Python (e.g., Google Colaboratory).- Oggetto:
Learning assessment methods
All exams will be held in presence.Students will be assessed on their knowledge and understanding of the presented concepts and their ability to apply them to solve small practical problems, choosing the right tool and methodology for a given learning/mining task.
The final exam consists of two parts: (i) a written test and (ii) an optional oral examination.The written test will consist of open questions, each covering a different course topic. The optional oral examination will allow the student to elaborate on the answers given in the written test and deepen the assessment. The written test will be graded up to 27 points out of 30. The optional oral examination will provide extra grade points in the range -27, +5.To request the optional oral exam students need to obtain at least 17 points in the written test. Students need to get at least 18 as final mark to pass the exam.Suggested readings and bibliography
- Oggetto:
- Book
- Title:
- Introduction to Data Mining
- Year of publication:
- 2020
- Publisher:
- Pearson
- Author:
- Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar
- ISBN
- Required:
- No
- Oggetto:
Class schedule
- Oggetto: