- Oggetto:
- Oggetto:
Introduction to data mining
- Oggetto:
Introduction to data mining
- Oggetto:
Academic year 2022/2023
- Course ID
- MAT0051
- Teaching staff
- Prof. Roberto Esposito
Prof. Ruggero Gaetano Pensa - Year
- 2nd year
- Teaching period
- First semester
- Type
- D.M. 270 TAF C - Related or integrative
- Credits/Recognition
- 6
- Course disciplinary sector (SSD)
- INF/01 - informatics
- Delivery
- Class Lectures
- Language
- English
- Attendance
- Optional
- Type of examination
- Written and oral
- Oggetto:
Sommario del corso
- Oggetto:
Course objectives
The course introduces students to the fields of Data Mining and Machine Learning, that merge together competencies of statistics and computer science.
The course teaches the differences between tasks and models and introduces the students to some of the popular models in Data Mining such as:
- binary classification and related tasks,
- tree models and rule models
- association analysis
- Support Vector Machines and Kernel methods.
- Clustering
- Anomaly detection
- Fairness and privacy
The laboratory part of the course introduces the students to python libraries and proposes programming excercises to further develop the understanding of the presented concepts and algorithms.
- Oggetto:
Results of learning outcomes
Knowledge and understanding: students will be mastering some the main concepts in Data Mining and Machine Learning.
Applying knowledge and understanding: the students will be able to use the learnt knowledge in the context of modern languages and libraries with the goal to solve some real data science tasks (such as classifying examples into a set of given classes, clustering data into meaningful groups, etc.).
Making judgements: students will learn to judge the suitability of a given model or the properties of a learning algorithm on as given set of data.
These abilities will be refined by doing experience and practice of Machine Learning and Data Mining methods applied to real-case problems that will be assigned them by the teachers during the laboratory sessions.Learning skills: students will acquire the skill of reasoning about models and about their properties. Students will develop the skill of programming software tools for machine learning.
Communication skills: students will learn to communicate the results of a session of data analsysis, by producing and interpreting the resulting diagrams, pictures and info-graphics that constitute the results of the application of a model to a set of data.
- Oggetto:
Course delivery
The course will be held in presence.
Lectures will cover theoretical (18 lectures, 36 hours) and practical grounds (6 laboratory sessions, 12 hours).
Laboratory sessions will make use of Jupyter and leverage the Python language. A basic knowlege of python working is assumed.
- Oggetto:
Learning assessment methods
The final exam will be written with an optional oral examination.
All exams will be held in presence.
The written test will assess the understanding of the presented concepts as well as the student ability to solve small practical problems. The written test will be graded up to 27 points out of 30. The optional oral examination will provide extra grade points in the range -27, +5.
- Oggetto:
Program
- Introduction
- Data
- types of data
- data quality
- data pre-processing
- similarity and dissimilarity
- avoiding false discoveries
- Classification
- basic concepts
- decision trees
- model evaluation
- Association Analysis
- frequent itemsets generation
- rule generation
- compact representation of frequent itemsets
- Advanced Classification
- rule-based classifiers
- nearest neighbors classifier
- support-vector machines
- Clustering
- overview
- k-means
- hierarchical clustering
- db-scan
- cluster evaluation
- Graph-based Clustering
- Anomaly detection
- clustering based approach
- one-class classification
- Fairness and Privacy
Suggested readings and bibliography
- Title:
- Introduction to Data Mining
- Year of publication:
- 2020
- Publisher:
- Pearson
- Author:
- Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar
- ISBN
- Permalink:
- Oggetto:
- Documentation of Scikit Learn software suite:
https://scikit-learn.org/stable/documentation.html
- Documentation of Scikit Learn software suite:
- Oggetto:
Class schedule
- Oggetto: