- Oggetto:
- Oggetto:
Introduction to data mining
- Oggetto:
Introduction to data mining
- Oggetto:
Academic year 2021/2022
- Course ID
- MAT0051
- Teaching staff
- Prof. Roberto Esposito
Prof.ssa Rosa Meo - Year
- 2nd year
- Teaching period
- First semester
- Type
- D.M. 270 TAF C - Related or integrative
- Credits/Recognition
- 6
- Course disciplinary sector (SSD)
- INF/01 - informatica
- Delivery
- Blended
- Language
- English
- Attendance
- Optional
- Type of examination
- Oral
- Prerequisites
- Databases and Algorithms, Programming
- Oggetto:
Sommario del corso
- Oggetto:
Course objectives
The objectives of the course will be introduce students to the field of Data Mining and Machine Learning, that merge together competencies of statistics and computer science.
The course will teach the differences between tasks and models and will introduce the students to some of the popular models in Machine Learning such as:
- binary classification and related tasks,
- transformation of a binary classification model into a multiple class model,
- concept learning by means of logical formulas,
- tree models, rule models and subgroup discovery
- linear models (least squares, regression),
- Support Vector Machines and Kernel methods.
The course will introduce the algorithms for the training of the models.
The laboratory part of the course will introduce the students to a practical open software suite that includes the learning algorithms of the models seen during the course (and much more).
- Oggetto:
Results of learning outcomes
As regards “Knowledge and understanding”, the results of the learning outcomes will be mastering some the main concepts in Data Mining and Machine Learning.
As regards “Applying knowledge and understanding”, the students will use the learnt knowledge in the context of a practical open software suite for data analysis and machine learning with the goal to solve some real data science tasks (such as classifying examples into a set of given classes, infer a logical formula for an unknown concept represented by means of examples in a training set, etc).
As regards the “Making judgements” goal, students will learn to judge the suitability of a given model or the properties of a learning algorithm on as given set of data.
These abilities will be refined by doing experience and practice of Machine Learning and Data Mining methods applied to real-case problems that will be assigned them by the teachers during the laboratory sessions.As regards the “Learning skills” students will acquire skills of reasoning about models, about their properties. Students will get the skills of using and programming a software platform for machine learning.
As regards the “Communication skills”, students will learn to communicate the results of a session of data analsysis, by producing and interpreting the resulting diagrams, pictures and info-graphics that constitute the results of the application of a model to a set of data.
- Oggetto:
Course delivery
The course lessons will be both theoretical and practical (laboratory sessions)
The classes in this course (6 CFU) are borrowed from the classes offered in the first part of another, larger course (9 CFU), named Apprendimento Automatico (Machine Learning) at the Computer Science Department.
The classes of this course will be all the first 24 classes offered in the initial part of the larger course.These classes are divided into 18 theoretical classes and 6 laboratory sessions.
- Oggetto:
Learning assessment methods
The final exam will be oral in which the students will be asked to show that they master the theorical lessons (knowledge of the models and of their purposes) and use of the practical software suite (sci-kit learn) for data analysis in some use cases.
Due to the COVID-19 emergency, the exam will be on-line, by the students' computers connecting to the Moodle site of the course, using the built-in camera of their computers and sharing their screen.
- Oggetto:
Support activities
Machine learning experiments in Laboratory with a software suite for Data Mining.
The laboratory will be a practical support to the learning of the theorical lessons by means of practical data analysis assignments on public data-sets (UCI KDD Archive and Kaggle, a platform for Data Science challenges).
- Oggetto:
Program
- Tasks and models;
- Binary classification and related tasks;
- Beyond binary classification (transformation of a binary classification model into a multiple class model;
- Concept learning by means of logical formulas; Version Space;
- Tree models (decision trees, regression trees, features trees, ranking trees);
- rule models (list of rules and sets of rules);
- subgroup discovery;
- linear models (least squares, regression);
- Models based on distances and neighbours: k-nearest neighbours, K-means, DBSCAN, hierarchical clustering
- Support Vector Machines and Kernel methods.
Suggested readings and bibliography
- Oggetto:
- The Art and Science of Algorithms that Make Sense of Data
Author: Peter FlachEdition: First Edition
Publisher: Cambridge University Press
ISBN: 9780511973000
Url: https://www.cambridge.org/core/books/machine-learning/621D3E616DF879E494B094CC93ED36A4 - Documentation of Scikit Learn software suite:
https://scikit-learn.org/stable/documentation.html
- The Art and Science of Algorithms that Make Sense of Data
- Oggetto:
Class schedule
- Oggetto:
Note
This course is borrowed from Machine Learning and will be delivered at Apprendimento Automatico / Introduction to Data Mining, a.a. 18/19
This course (6 CFU) includes the classes offered in the first part of a larger course (of 9 CFU), named Apprendimento Automatico (Machine Learning) held at the Computer Science Department.
The classes in this course will include all the classes in the first part of the larger course (they are the first 24 classes of the complete course). The classes are divided in 18 theoretical classes and 6 laboratory sessions.- Oggetto: