Vai al contenuto principale
Oggetto:
Oggetto:

Introduction to data mining

Oggetto:

Introduction to data mining

Oggetto:

Academic year 2024/2025

Course ID
MAT0051
Teachers
Mirko Polato (Lecturer)
Robert Birke (Lecturer)
Year
2nd year
Teaching period
First semester
Type
D.M. 270 TAF C - Related or integrative
Credits/Recognition
6
Course disciplinary sector (SSD)
INF/01 - informatics
Delivery
Class Lectures
Language
English
Attendance
Optional
Type of examination
Written and oral
Prerequisites
- Acquaintance with the basic concepts of linear algebra, calculus, probability and statistics.
- A basic knowledge of the Python programming language.
Oggetto:

Sommario del corso

Oggetto:

Course objectives

The course is positioned in the context of the Master's Degree in Stochastic and Data Science. It concurs with the objectives of the degree by providing theoretical and practical knowledge to perform real data science tasks on different types of data and to reason about the properties of Data Mining (and Machine Learning) models and algorithms used to solve specific mining/learning tasks.

The course's first focus is data, which is essential to any mining/learning task. The course introduces the main techniques to handle data, perform data cleaning and pre-processing, and assess data quality.

Starting from data, the course teaches the differences between tasks and models and introduces the students to popular Data Mining models.

Specifically, the course provides an overview of the main supervised and unsupervised learning tasks, ranging from classification to clustering algorithms, discussed theoretically and practically.

Particular attention will be given to the practical aspects by introducing the students to some of the most popular Python libraries for data science (e.g., numpy, pandas, scikit-learn).

Finally, the course will introduce the students to the main concepts of privacy-preserving and federated learning.
Oggetto:

Results of learning outcomes

Through the course, students will acquire the knowledge and skills to perform real data science tasks on different types of data and the ability to reason about the properties of the models and algorithms used to solve specific mining/learning tasks.

Knowledge and understanding
Students will be mastering some of the main concepts in Data Mining and Machine Learning.

Applying knowledge and understanding
The students will be able to use the learned knowledge in the context of a modern programming language and libraries to solve some real data science tasks (such as classifying examples into a set of given classes, clustering data into meaningful groups, etc.).

Making judgments
Students will learn to judge the suitability of a given model or the properties of a learning algorithm for a given data set.
These abilities will be refined practicing with different Machine Learning and Data Mining methods applied to real-case problems.

Learning skills
Students will be given the opportunity to test and self-assess their own knowledge and skills via quizzes and polls.
Oggetto:

Program

- Introduction to the course
- Data
- types of data
- data quality
- data pre-processing
- similarity and dissimilarity
- avoiding false discoveries
- Classification
- basic concepts
- decision trees
- model evaluation and selection
- Advanced Classification
- rule-based classifiers
- nearest neighbours classifier
- support-vector machines
- neural networks
- Clustering
- overview
- partitional clustering
- hierarchical clustering
- density-based clustering
- graph-based clustering
- cluster evaluation
- Anomaly detection
- clustering-based approach
- one-class classification
- Privacy and Federated Learning
Oggetto:

Course delivery

The course will be held in presence.

The course is mainly lecture-based, with some laboratory sessions.

Lectures will cover the theoretical aspects (17 lectures, 34 hours), while the laboratory will focus on the practical grounds (7 laboratory sessions, 14 hours).

During the lectures, students are encouraged to participate in live polls, Q&A, and quizzes to test their understanding and to check the presence of any misconceptions and/or biases.

Laboratory sessions aim to develop further the student's understanding of the presented concepts and algorithms. Examples will be based on notebooks written in Python (e.g., Google Colaboratory).
Oggetto:

Learning assessment methods

All exams will be held in presence.
 
Students will be assessed on their knowledge and understanding of the presented concepts and their ability to apply them to solve small practical problems, choosing the right tool and methodology for a given learning/mining task.
 

The final exam consists of two parts: (i) a written test and (ii) an optional oral examination.
 
 
The written test will consist of open questions, each covering a different course topic. The optional oral examination will allow the student to elaborate on the answers given in the written test and deepen the assessment. The written test will be graded up to 27 points out of 30. The optional oral examination will provide extra grade points in the range -27, +5.
 
To request the optional oral exam students need to obtain at least 17 points in the written  test. Students need to get at least 18 as final mark to pass the exam.

Suggested readings and bibliography



Oggetto:
Book
Title:  
Introduction to Data Mining
Year of publication:  
2020
Publisher:  
Pearson
Author:  
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar
ISBN  
Required:  
No


Oggetto:

Class scheduleV

Oggetto:
Last update: 20/09/2024 19:03
Location: https://www.master-sds.unito.it/robots.html
Non cliccare qui!