Learning Objectives

This document describes the learning objectives for BIO144: Data Analysis in Biology.

The learning objective are what you must know (i.e., could appear in the exam). As well the “must know” being described in the learning objectives, the content of each chapter of the course book is “must know”.

Course-level learning objectives

After successfully completing BIO144, students will be able to:

Formulate biological questions as statistical models, identifying appropriate response and explanatory variables.
Use R and RStudio to import, explore, visualise, and analyse biological data reproducibly.
Fit, interpret, and compare statistical models commonly used in biology, including linear models and generalised linear models.
Evaluate model assumptions, diagnose violations, and understand the consequences for inference.
Interpret model output quantitatively and biologically, rather than mechanically reporting p-values.
Communicate statistical results clearly, using figures, tables, and written explanations appropriate for biological audiences.

Unit-level learning objectives

Introduction

After this unit, students will be able to:

Explain the goals, structure, and expectations of the BIO144 course.
Know the prior knowledge and skills required for success in the course.
Understand appropriate use of AI tools in data analysis work and during the course.
Describe a typical data analysis workflow, from developing question to communicating the answer.

R and RStudio

After this unit, students will be able to:

Navigate the RStudio interface and explain the purpose of scripts, the console, and the environment.
Run R code and understand basic R syntax.
Import data into R and inspect its structure.
Know the many ways one can get help with R and RStudio.
Understand what are and how to work with add-on packages in R.
Perform basic data manipulation operations, including importing, viewing, and various manipulations.
Produce simple exploratory data visualisations.
Use scripts to support reproducible data analysis workflows.

Simple linear regression – Part 1

After this unit, students will be able to:

Explain the purpose of simple linear regression in biological data analysis.
Understand the mathematical form of a simple linear regression model.
Describe how the slope and intercept are calculated.
Describe how the error / variation is modelled.
State the assumptions underlying simple linear regression.
Assess if the assumptions are reasonably met using diagnostic plots.
Recognise and remedy common problems encountered when fitting linear regression models.
Use R to perform these tasks.

Simple linear regression – Part 2

After this unit, students will be able to:

Understand how to measure how good is the regression (correlation and R-squared).
Test and explain if the parameter estimates are compatible with some specific value (t-test).
Understand how to find the range of parameters values are compatible with the data (confidence intervals).
Understand what are the regression lines compatible with the data (confidence band), and how to construct these in R.
Understand what are the plausible values of newly collected data (prediction band) and how to calculate these in R.
Interpret the biological meaning of regression coefficients.
Know how to communicate the results of a linear regression analysis effectively.
Critically assess the strength and limitations of linear regression models.

One-way ANOVA

After this unit, students will be able to:

Explain how ANOVA fits within the linear model framework.
Fit a one-way ANOVA model using lm().
Interpret group means and differences between groups.
Explain the concept of variance partitioning.
Understand the connection between ANOVA and regression with categorical predictors.
Decide when ANOVA is an appropriate modelling approach for biological data.
Perform ANOVA in R, assess model assumptions, and interpret results in a biological context.
Communicate ANOVA results effectively using text, tables, and visualisations.

Multiple regression

After this unit, students will be able to:

Understand what is a question that multiple regression can help answer.
Know the mathematical form of multiple regression models.
Fit linear models with multiple explanatory variables.
Assess if model assumptions are reasonably met using diagnostic plots.
Assess if the group of explanatory variables significantly explain variation in the response.
Describe which variables are associated with the response, and the direction of these associations.
Measure the amount of variation explained by the model (R-squared, adjusted R-squared).
Assess the relative importance of different explanatory variables.
Make conditional predictions from multiple regression models.
Understand what is collinearity, and how it affects multiple regression.
Use R to fit, diagnose, and interpret multiple regression models.
Communicate multiple regression results effectively using text, tables, and visualisations.

Interactions

After this unit, students will be able to:

Explain what an interaction between explanatory variables means biologically and mathematically.
Fit models that include interaction terms.
Interpret interaction coefficients correctly.
Visualise and explain how effects depend on the values of other variables.
Distinguish additive from non-additive biological effects.
Judge when interactions are biologically meaningful and justified.
Understand interactions in the context of both categorical and continuous explanatory variables, ANCOVA, two-way ANOVA, and multiple regression.
Communicate results involving interactions clearly and accurately.
Use R to fit, diagnose, interpret, and communicate models with interactions.

Generalised linear models for count data

After this unit, students will be able to:

Describe the kinds of biological questions and processes that involve count data.
Recognise when, why, and how count data violate the assumptions of linear models.
Explain why normal error assumptions are inappropriate for count data.
Describe the key components of a generalised linear model (GLM): the linear predictor, the distribution family, and the link function.
Fit Poisson regression models using glm().
Interpret model coefficients on the appropriate scale.
Assess whether a Poisson model provides an adequate description of the data.
Use R to fit, diagnose, and interpret GLMs for count data.
Communicate results from count data analyses effectively.

Generalised linear models for binary data

After this unit, students will be able to:

Identify binary response variables in biological datasets, and be familiar with common examples (e.g., presence–absence, success–failure) and the types of questions they can arise from.
Explain why linear models are unsuitable for binary outcomes.
Fit logistic regression models using glm().
Interpret model coefficients in terms of probabilities and odds.
Visualise fitted relationships for binary data.
Understand how logistic regression supports biological inference about presence–absence or success–failure outcomes.
Use R to fit, diagnose, and interpret GLMs for binary data.
Communicate results from binary data analyses effectively.

Ordination and multivariate data

After this unit, students will be able to:

Recognise when biological questions involve many response variables simultaneously.
Explain the goals of ordination methods.
Interpret ordination plots in terms of similarities and differences among observations.
Understand ordination as a tool for dimensionality reduction.
Critically assess what information ordination methods do and do not provide.
Use R to perform basic ordination analyses (i.e., PCA, NMDS).
Communicate ordination results effectively using visualisations and text.

Mixed models and what next

Part 1: Mixed models

After completing this part, students will be able to:

Recognise biological situations in which data are grouped, nested, or hierarchical.
Explain why standard linear models may be inappropriate for grouped data.
Describe the conceptual difference between fixed effects and random effects.
Understand random effects as a way of modelling structured sources of variation.
Interpret mixed models as extensions of linear models, rather than as fundamentally different tools.
Identify common biological examples where mixed models are appropriate (e.g. repeated measures, individuals within populations, sites within regions).
Interpret mixed model output at a conceptual level, focusing on biological meaning rather than technical detail.
Appreciate the role of partial pooling in balancing information across groups.
Recognise the limitations and assumptions of mixed models without needing to master implementation details.
Use R to fit simple linear mixed models using the lme4 package.
Communicate results from mixed model analyses effectively, focusing on biological interpretation.

Part 2: What next?

After completing this part, students will be able to:

Place the statistical models learned in BIO144 within a broader landscape of data analysis methods.
Recognise when more advanced models may be required to address biological questions.
Identify directions for further learning in statistics and data analysis.
Understand that statistical modelling is an iterative and evolving process, not a fixed set of rules.
Reflect critically on the limits of the models used in the course.
Appreciate the importance of biological reasoning alongside statistical tools.
Approach future quantitative methods courses and analyses with confidence and curiosity.

Review (L12)

No additional learning objectives.