What next (L11-2)

During this course, you have learned a variety of data analysis techniques using R, including data manipulation, visualization, and statistical modeling. You have gained a solid foundation in using R for data analysis, you can analyse data that meets the assumptions of linear models, and some types of data that do not (e.g., count data and binary data using GLMs).

Of course, there is much more to learn! There are many opportunities for you to further develop your skills in R and data analysis. What is the next step after this course? What are the options to further improve your skills in data analysis in R? What other types of analyses could you learn about, and when might you need them?

Here are a list of other types of problem / question that we might have, and types of analysis that could be relevant. The list is by no means exhaustive, but it should give you some ideas of what to explore next, and what to explore when you encounter specific types of data or research questions.

Time series analysis: If your data are collected over time (e.g., daily, monthly, yearly), you might need to learn about time series analysis techniques such as ARIMA models, seasonal decomposition, and forecasting methods. A key feature of time series data is that observations are not independent, which violates assumptions of many standard statistical methods. This needs to be carefully handled in the analysis.
Spatial analysis: If your data have a spatial component (e.g., locations, regions), you might need to learn about spatial statistics, geostatistics, and spatial modeling techniques. This could include methods such as kriging, spatial autocorrelation analysis, and spatial regression models. Again, spatial data often violate independence assumptions, requiring specialized methods.
Non-linear regression: If the relationship between your explanatory variables and response variable is not linear, you might need to learn about non-linear regression techniques. These can estimate the parameters of specific non-linear functions, and to assess the goodness of fit.
Breakpoint analysis: If you suspect that there are changes in the relationship between variables at certain points (e.g., before and after an intervention), you might need to learn about breakpoint analysis techniques, such as piecewise regression or change point detection methods.
Generalized Additive Models (GAMs): If you want to model complex, non-linear relationships between explanatory variables and response variables while maintaining some interpretability, you might need to learn about GAMs. These models use smooth functions to capture non-linear effects. They are rather elegant!
Structural Equation Modeling (SEM): If you want to analyze complex relationships among multiple variables, including latent variables, you might need to learn about SEM techniques. SEM allows for the modeling of direct and indirect effects, as well as measurement error. Effectively, we can build and test complex causal models. Variables can be both explanatory variables and responses at the same time.
Machine Learning: If you want to make predictions or classify data based on patterns, you might need to learn about machine learning techniques such as decision trees, random forests, support vector machines, and neural networks. These methods can handle large datasets and complex relationships but may sacrifice some interpretability.
Meta-analysis: If you want to synthesize results from multiple studies to draw broader conclusions, you might need to learn about meta-analysis techniques. This involves combining effect sizes from different studies and assessing heterogeneity among them.
Survival analysis: If your data involve time-to-event response variables (e.g., time until failure, time until death), you might need to learn about survival analysis techniques such as Kaplan-Meier estimation, Cox proportional hazards models, and parametric survival models.
Non-parametric methods: If your data do not meet the assumptions of parametric tests (e.g., normality, homoscedasticity), and you really can’t figure out how to make a parametric model (e.g., LM or GLM) you might need to learn about non-parametric methods such as rank-based tests, bootstrapping, and permutation tests.
Power analysis and sample size estimation: If you want to design studies with adequate statistical power, you might need to learn about power analysis techniques. This involves calculating the required sample size based on effect sizes, significance levels, and desired power.
Bayesian statistics: If you want to incorporate prior knowledge and uncertainty into your analyses, you might need to learn about Bayesian statistical methods. This involves using Bayes’ theorem to update prior beliefs based on observed data.

These are just a few examples of the many types of analyses that you might encounter in your data analysis journey. The choice of which techniques to learn next will depend on your specific research questions, data characteristics, and goals. Ideally you will plan your analyses when you design your study, so that you can collect the right type of data to answer your questions. When you don’t or when something changes, you can then explore and discussion with experts which techniques are most appropriate.

A final word of advice… try to not be driven by techniques. Instead, be driven by your research questions. After all, we are not doing data analysis for its own sake, but to answer questions about the world around us. Let your questions guide your learning journey!