Programming for data science
Programming for data science
Academic year 2021/2022
- Course ID
- Prof. Marco Beccuti
- 1st year
- Teaching period
- First semester
- D.M. 270 TAF F - Other activities
- Course disciplinary sector (SSD)
- INF/01 - informatica
- Type of examination
- Basic knowledge in Calculus as provided by the first year Mathematics course.
No specific computer science knowledge is required.
Sommario del corso
Aim of the course is to introduce methods, techniques and related computer science instruments for the analysis of experimental data.
It provides the basic knowledge to use programming languages for statistical computing and graphics (e.g. R programming language)
Results of learning outcomes
KNOWLEDGE AND UNDERSTANDING – Completing the course students will be able to:
1) use suitable descriptive and inferential statistics techniques to describe and understand the phenomena being studied;
2) manage suitable computer science instruments such as worksheet or dedicated software programs for statistical data analysis.
APPLYING KNOWLEDGE AND UNDERSTANDING – Students will perform the statistical analyses required by the problem under study by selecting the most computationally and graphically suitable computer science support.
MAKING JUDGEMENTS – Students will decide which statistical techniques to use according to the available data sets to describe and understand the phenomena under consideration.
COMMUNICATION – The student will be able to justify the choices for the analysis to be performed and to give a synthetic description of the techniques employed and of the results obtained.
The course consists of 10 hours of lectures, and 14 hours of laboratories . Laboratories include exclusively practical activities.
The slides presented during lectures are available to students as online materials.
Attendance to lessons is not mandatory, but highly recommended due to the necessity of learning and employing specific computer science instruments.
Please check this page for the teaching modalities foreseen for the a.y. 2021/22.
Learning assessment methods
During the Covid-19 emergency the learning assessment method will consist in a written exam with video surveillance on Webex.
The exam consists of a written test and requires a practice exercise on R programming languages
- ten multiple choice questions on course topics (4 options, with the possibility of 0-4 correct options);
- a practice exercise on R programming languages
The maximum possible score is 30 cum laude, where the maximum score for the first part is 3 points, while the maximum score for the second part is 28 points.
Introduction to Data science;
Visualization using ggplot2;
Basic R functionalities:
Data structures: vector, matrix, list and data frame, tibble;
Package and library.
Tidy data in R
Programming with R:
Flow control: if,for, while, break ... statements;
Debugging in R.
Creation of package in R
Suggested readings and bibliography
- Garrett Grolemund and Hadley Wickham, R for Data Science, O'Reilly Media, Inc, USA, 2017.
- P. Dalgaard, Introductory Statistics with R, Springer 2008
- The R Manuals: An Introduction to R (http://cran.r-project.org/doc/manuals/r-releas /Rintro.pdf)
The teaching material used for lessons and a series of practical exercises are available on the web site of the course.