https://studiegids.vu.nl/en/courses/2024-2025/XM_0078The amount of omics data being gathered, analysed, integrated and interpreted increases every year. The number of algorithms, packages and webservices also increases very fast. With the high number of omics features e.g. genes, proteins or metabolites measured, the chances of drawing incorrect conclusions are rather high. In this course we will discuss properties of the measured data, assumptions of the data analysis methods being used, data preprocessing and pretreatment and validation of the conclusions and interpretations. Upon completion of this course, students will have reached the following goals 1. Students are able to use R for multivariate analysis of omics data 2. Students understand linear algebra concepts that are used in data analysis methods 3. Students are able to apply data preprocessing methods (variance stabilization, scaling, transformation), explorative data analysis methods (PCA, Clustering) and supervised data analysis (PCDA, ASCA, Random Forest). 4. Students are able to evaluate which preprocessing methods, and which data analysis method to use for a given problem. 5. Students are able to analyse omics data and interpret the results. 6. Students are able to evaluate the quality of the data analysis models using cross validation and permutation methods 7. Students understand how the data analysis methods work theoretically. 8. Students are able to critically review data analysis applications in which the above mentioned methods have been used.In the analysis of biochemical systems, the tendency is to measure more and more of just a few samples, leading to complex multivariate data sets. Multivariate data analysis methods are often used to explore such sets. This course covers a broad range of multivariate data analysis methods, for data preprocessing (e.g. variance stabilization), for exploration analysis (PCA, Clustering, nonlinear mapping), supervised analysis (ASCA, Classification, Random Forest). Furthermore, validation techniques such as cross validation and permutation testing are introduced and the interpretation of selected features in terms of function and networks is discussed. The course starts with an introduction on the properties of the different types of functional genomics data. The main goal of this course is to teach students how to interpret the results of the multivariate methods and how this relates to the biological problem that is studied.o 10 lectures (2h each) o 10 laptop seminars (4h each) o Self-study for each of the 10 topicsThere is a data analysis project in which students apply several methods to a dataset. There is a theoretical test in which open questions are asked on the data analysis methods, their properties and especially on how to interpret their results. For the final grade, the weights are 1/3 and 2/3 for the project and the theoretical test, respectively. Further assessment and grading details will be posted on Canvas (resits and compensation rules).:o Will be provided through Canvas.Software:o R.Admission to the course will depend on capacity, total number of applications, date of registration and background of the individual student. If the number of applications exceeds the capacity of the course, students may have to be selected and priority will be given in the following order: o First-year students of the master Bioinformatics and Systems Biology (JD UvA and VU). o Second-year students of the master Bioinformatics and Systems Biology (JD UvA and VU). o Students of the master Computational Science (JD UvA and VU). o Students of other master programmes.Bachelor in any science discipline, with an interest in biological data analysis. Basic level of linear algebra required. When in doubt please contact the coordinator.Linear Algebra, introduction-level Statistics.