https://studiegids.vu.nl/en/courses/2024-2025/X_405078In this course, students will get acquainted with the most common experimental designs and regression models, nonparametric tests and bootstrap methods will also be discussed. Upon completion of this course, among other things, students will be able to:To get acquainted with some of the most commonly used statistical models (knowledge and understanding)To understand the basic theory behind these models. (knowledge and understanding)To learn how to apply these models in valid settings (applying knowledge and understanding)The course EDDA introduces several widely-used types of models and techniques: paired and independent t-tests, non-parametric rank based tests, bootstrap and permutation techniques, analysis of variance (ANOVA), ANCOVA, lasso, linear regression models, generalized linear and nonlinear models, survival analysis. We discuss how to decide which models are appropriate for different settings, how to estimate model parameters based on available data, and how to validate the fit of a proposed model. In the case of regression models it is necessary that the study is well designed in order to draw sound conclusions from an experiment. In this course a few well known designs (completely randomized, randomized block, etc.) and the associated analyses of variance are discussed. A part of the course is dedicated to non-parametric testing methods and bootstrap methods: Wilcoxon test for one and two samples, Kolmogorov-Smirnov test , rank correlation tests, permutation and bootstrap tests. All methods will be illustrated with practical examples. Many real data examples are presented. To analyze these data, students will use the statistical package R and the modeling techniques discussed.Lectures, practical sessions.Two (or three) regular assignments carried out in teamwork during the course and a digital (individual) test at the end. The final grade is a weighted average of the assignment grade (tentatively: 60% or more) and the test grade (tentatively 40% or less) which must be above a lower bound (to be announced at the beginning of the course). Mode of re-examination: if insufficient, only the test is retaken.The lecture slides contain all essential theory for this course. A short list of introductory (online) books on statistics and probability containing the prerequisite knowledge for this course:Probability and Statistics for Computer Scientists, by Michael Baron, CRC Press, 2nd edition.https://www.probabilitycourse.com. (Introduction to Probability, Statistics, and Random Processes).https://probstatsdata.com/. (Probability, Statistics, and Data: A Fresh Approach Using R)These references contain some parts of the material for this course also, reference 3 also contains R-related material. For more theory and background on some topics in this course (not all), the following books may be useful:Linear models with R and Extending the linear model with R, by J.J. Faraway, Chapman&Hall/CRC.R by Example, by Jim Albert and Maria Rizzo, Springer.MSc Computer ScienceAssignments are to be made using the statistical package R (http://www.r-project.org).A basic course in statistics of the same level as Statistical Methods (X_401020) or Statistical Methods for AI (XB_0080) is absolutely mandatory for following this course. All the material and its prerequisites (basic mathematical and programming skills) for these courses will be assumed known in the lectures/assignments/exam of the course EDDA.A good background in probability of the same level at least as in one of the above mentioned courses.Basic knowledge of the statistical software R and its application to data analysis.All the information about the course will be available on Canvas.