Vai al contenuto principale
Coronavirus: aggiornamenti per la comunità universitaria / Coronavirus: updates for UniTo Community
Oggetto:
Oggetto:

Introduction to data mining

Oggetto:

Introduction to data mining

Oggetto:

Academic year 2022/2023

Course ID
MAT0051
Teachers
Prof. Roberto Esposito
Prof. Ruggero Gaetano Pensa
Year
2nd year
Teaching period
First semester
Type
D.M. 270 TAF C - Related or integrative
Credits/Recognition
6
Course disciplinary sector (SSD)
INF/01 - informatics
Delivery
Class Lectures
Language
English
Attendance
Optional
Type of examination
Written and oral
Prerequisites
Databases and Algorithms, Programming
Oggetto:

Sommario del corso

Oggetto:

Course objectives

The course introduces students to the fields of Data Mining and Machine Learning, that merge together competencies of statistics and computer science.

The course teaches the differences between tasks and models and introduces the students to some of the popular models in Data Mining such as:

  • binary classification and related tasks, 
  • tree models, rule models and subgroup discovery
  • linear models (least squares, regression),
  • Support Vector Machines and Kernel methods.
  • Clustering
  • Anomaly detection
  • Fairness and privacy

The laboratory part of the course introduces the students to python libraries and proposes programming excercises to further develop the understanding of the presented concepts and algorithms.

Oggetto:

Results of learning outcomes

As regards “Knowledge and understanding”, the results of the learning outcomes will be mastering some the main concepts in Data Mining and Machine Learning.

As regards “Applying knowledge and understanding”, the students will use the learnt knowledge in the context of modern languages and libraries with the goal to solve some real data science tasks (such as classifying examples into a set of given classes, clustering data into meaningful groups, etc.). 

As regards the “Making judgements” goal, students will learn to judge the suitability of a given model or the properties of a learning algorithm on as given set of data.
These abilities will be refined by doing experience and practice of Machine Learning and Data Mining methods applied to real-case problems that will be assigned them by the teachers during the laboratory sessions.

As regards the “Learning skills” students will acquire skills of reasoning about models, about their properties. Students will develop the skills of programming software tools for machine learning.

As regards the “Communication skills”, students will learn to communicate the results of a session of data analsysis, by producing and interpreting the resulting diagrams, pictures and info-graphics that constitute the results of the application of a model to a set of data.

 

Oggetto:

Program

  1. Introduction
  2. Data
    1. types of data
    2. data quality
    3. data pre-processing
    4. similarity and dissimilarity
    5. avoiding false discoveries
  3. Classification
    1. basic concepts
    2. decision trees
    3. model evaluation
  4. Clustering
    1. overview
    2. k-means
    3. hierarchical clustering
    4. db-scan
    5. cluster evaluation
  5. Advanced Classification
    1. rule-based classifiers
    2. nearest neighbors classifier
    3. support-vector machines
  6. Graph-based Clustering
  7. Anomaly detection
    1. clustering based approach
    2. one-class classification
  8. Fairness and Privacy
Oggetto:

Course delivery

The course lessons will be both theoretical and practical (laboratory sessions)

These classes are divided into 18 theoretical classes and 6 laboratory sessions.

 

Oggetto:

Learning assessment methods

The final exam will be written with an optional oral examination. The written test will assess the understanding of the presented concepts as well as the student ability to solve small practical problems. The written test will be graded up to 27 points out of 30. The optional oral examination will provide extra grade points in the range -27, +5.

Suggested readings and bibliography



Oggetto:
Libro
Titolo:  
Introduction to Data Mining
Anno pubblicazione:  
2020
Editore:  
Pearson
Autore:  
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar
ISBN  
Permalink:  
Obbligatorio:  
Si
Oggetto:

 



Oggetto:

Class scheduleV

Enroll
  • Closed
    Enrollment opening date
    01/09/2022 at 00:00
    Enrollment closing date
    30/06/2023 at 00:00
    Oggetto:
    Last update: 19/07/2022 14:35
    Non cliccare qui!