Data Analysis (06_XDADD)
- Coefficient : 3
- Hourly Volume: 80.0h (including 45.0h supervised)
- CTD : 36h supervised
- Labo : 9h supervised
- Out-of-schedule personal work : 35h
AATs Lists
Description
This course offers a progressive and applied approach to the main methods of statistical analysis and data modeling. Through projects, case studies, and targeted theoretical input (just-in-time teaching), students will learn to: Model data using appropriate distributions, estimate their parameters, and interpret confidence intervals (AAV1). Construct and interpret hypothesis tests (p-value, ANOVA, MANOVA, etc.) and explain their scope and limitations (AAV2). Perform multiple regressions and master the underlying methodological foundations (least squares, normal equations, etc.) (AAV3). Reduce the size of data using appropriate methods (PCA, etc.) and draw relevant interpretations from them (AAV4). Implement unsupervised partitioning methods (K-means, etc.) and use clusters for predictive purposes (AAV5). Evaluate the quality of models (goodness-of-fit, interpretation of coefficients, confusion matrices, precision/recall scores, etc.) in order to validate their relevance (AAV6). Emphasis is placed on the link between statistical rigor and critical interpretation, in order to develop skills that can be transferred to a variety of data analysis contexts.
Learning Outcomes AAv (AAv)
AAV1 [heures: 0, B1,B4] : At the end of the data analysis course, students will be able to choose, and justify their choice, a parametric distribution to model a given dataset and estimate its parameter(s) by constructing confidence intervals, while being able to explain the meaning and scope of these estimates.
AAV2 [heures: 0, B3,B4] : At the end of the data analysis course, students will be able to construct hypothesis tests (in the contexts of p-value, ANOVA, MANOVA, etc.) and calculate the associated statistics to draw conclusions about the data sets provided. In addition, students will be able to explain the scope and validity of the conclusions.
AAV3 [heures: 0, B3,B2] : At the end of the data analysis course, students will be able to perform regressions (mainly multiple linear regressions) on provided data sets, mastering the underlying methods (least squares, normal equations, etc.).
AAV4 [heures: 0, B2,B3] : At the end of the data analysis course, students will be able to choose and implement an appropriate dimension reduction method (such as PCA) on the data sets provided, and interpret the results.
AAV5 [heures: 0, B2,B3] : At the end of the data analysis course, students will be able to select and implement unsupervised partitioning methods (such as K-means) on provided data sets. In addition, students will be able to use the clusters constructed for predictive purposes.
AAV6 [heures: 0, B3,B4] : At the end of the data analysis course, students will be able to analyze the quality of a model built using the methods taught in the course.
Assessment methods
Each AAV will be assessed on an ongoing basis during the completion of projects and case studies.
In addition, written submissions will be used for critical validation tests to assess AAVs: students will receive a version of a report (usually their own) that has been altered (certain conceptual, methodological, or consistency errors will have been introduced).
Within a limited time frame, and possibly orally, and without documents, they will have to:
- Identify the errors;
- Explain why they are errors (justification);
- Propose appropriate corrections.
Key Words
- Statistical analysis
- Probabilistic modeling
- Estimation and confidence intervals
- Hypothesis testing (p-value, ANOVA, MANOVA)
- Multiple linear regressions
- Dimension reduction (PCA)
- Partitioning and clustering (K-means, unsupervised methods)
- Model validation (goodness-of-fit, precision, recall, confusion matrices)
- Critical thinking and interpretation of results
Prerequisites
Mathematics: basic concepts in probability, linear algebra, matrix calculus.
Statistics: mean, variance, correlation, common distributions.
Computer science: data manipulation with Python, basic programming concepts.
Methodology: ability to read and interpret numerical data, aptitude for critical analysis.