This is a reproduction of the anaylses presented in the paper Chaos in a long-term experiment with a plankton community, by Elisa Benincà and others (the paper on the Nature website). Details of the methods are in the Supplement to the Nature paper.
This reproduction was made as part of the Reproducible Research in Ecology, Evolution, Behaviour, and Environmental Studies (RREEBES) Course, lead by Owen Petchey at the University of Zurich. More information about the course here on github.
The code and data for the reproduction are here on github.
The known issues / problems with this reproduction are with the prefix “Beninca” here. Please add and or solve issues there.
The data are available as an Excel file supplement to an Ecology Letters publication. The Excel file contains several datasheets. Two are particularly important, as they are the source of the raw data (one contains original species abundances, the one with the nutrient concentrations). Another datasheet in the ELE supplement contains transformed variables. We also got some data direct from Steve Ellner, see below for details.
In the code below, the data and any other files are read from github, which means there must be a connection to github.
All required libraries:
rm(list=ls())
library(tidyr)
library(dplyr)
library(lubridate)
library(stringr)
library(ggplot2)
library(RCurl)
library(pracma)
library(oce)
library(tseriesChaos)
library(reshape2)
library(mgcv)
library(repmis)
## Warning: package 'repmis' was built under R version 3.1.3
spp.abund <- read.csv(text=getURL("https://raw.githubusercontent.com/opetchey/RREEBES/master/Beninca_etal_2008_Nature/data/species_abundances_original.csv"), skip=7, header=T)
spp.abund <- select(spp.abund, -X, -X.1)
spp.abund <- spp.abund[-804:-920,]
str(spp.abund)
## 'data.frame': 803 obs. of 12 variables:
## $ Date : Factor w/ 803 levels "","01/02/91",..: 306 440 465 498 520 601 628 673 699 778 ...
## $ Day.number : int 1 6 7 8 9 12 13 15 16 19 ...
## $ Cyclopoids : num 0 0 0.0353 0 0.0353 ...
## $ Calanoid.copepods : num 1.04 2.03 1.72 2.41 1.71 ...
## $ Rotifers : num 7.7 10.19 8.08 6.06 5.94 ...
## $ Protozoa : Factor w/ 330 levels "","0","0,000001",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Nanophytoplankton : num 0.106 0.212 0.212 0.212 0.212 ...
## $ Picophytoplankton : num 1 2 1.52 1.52 1.98 ...
## $ Filamentous.diatoms: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Ostracods : num 0 0 0 0.0187 0 ...
## $ Harpacticoids : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Bacteria : num 2.15 1.97 1.79 1.61 1.43 ...
The Protozoa variable contains some numbers with comman as the decimal separator. This creates a question about what dataset was used for the original analyses, as it could not have been this one.
spp.abund$Protozoa <- as.numeric(str_replace(spp.abund$Protozoa, ",", "."))
Format the dates as dates
spp.abund$Date <- dmy(spp.abund$Date)
Ooops… R assumes the experiment was done in the 21st century. Shouldn’t matter too much.
Check dates match the Day.number (should give true):
sum(spp.abund$Day.number == 1+as.numeric((spp.abund$Date - spp.abund$Date[1]) / 24 / 60 / 60)) == length(spp.abund$Date)
## [1] TRUE
Check for duplicate dates:
spp.abund$Date[duplicated(spp.abund$Date)]
## [1] "2096-10-28 UTC"
which(duplicated(spp.abund$Date))
## [1] 702
Original dataset contains a duplicated date: 28/10/1996 (row 709 and 710 in excel sheet). Lets change the date in row 709 to 26/10/1996, which will put it half way between the two surrounding dates:
which(spp.abund$Date==ymd("2096-10-28 UTC"))
## [1] 701 702
spp.abund$Date[701] <- ymd("2096-10-26 UTC")
Check dates match the Day.number (should give true):
sum(spp.abund$Day.number == 1+as.numeric((spp.abund$Date - spp.abund$Date[1]) / 24 / 60 / 60)) == length(spp.abund$Date)
## [1] FALSE
Fix the Day.number problem:
spp.abund$Day.number <- 1+as.numeric((spp.abund$Date - spp.abund$Date[1]) / 24 / 60 / 60)
Data is in wide format, so change it to long:
spp.abund <- gather(spp.abund, "variable", "value", 3:12)
str(spp.abund)
## 'data.frame': 8030 obs. of 4 variables:
## $ Date : POSIXct, format: "2090-07-12" "2090-07-17" ...
## $ Day.number: num 1 6 7 8 9 12 13 15 16 19 ...
## $ variable : Factor w/ 10 levels "Cyclopoids","Calanoid.copepods",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ value : num 0 0 0.0353 0 0.0353 ...
Bring in the nutrient data:
nuts <- read.csv(text=getURL("https://raw.githubusercontent.com/opetchey/RREEBES/master/Beninca_etal_2008_Nature/data/nutrients_original.csv"), skip=7, header=T)
#nuts <- read.csv("~/Dropbox (Dept of Geography)/RREEBES/Beninca_etal_2008_Nature/data/nutrients_original.csv", skip=7, header=T)
nuts <- select(nuts, -X, -X.1)
nuts <- nuts[-349:-8163,]
nuts$Date <- dmy(nuts$Date)
nuts <- select(nuts, -NO2, -NO3, -NH4)
#nuts$Date[duplicated(nuts$Date)]
#which(duplicated(nuts$Date))
nuts <- gather(nuts, "variable", "value", 3:4)
str(nuts)
## 'data.frame': 696 obs. of 4 variables:
## $ Date : POSIXct, format: "2090-09-18" "2090-09-24" ...
## $ Day.number: int 69 75 82 90 96 103 110 117 124 131 ...
## $ variable : Factor w/ 2 levels "Total.dissolved.inorganic.nitrogen",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ value : num 28.32 20.84 11.15 15.5 5.92 ...
Now put the two datasets together
all.data <- rbind(spp.abund, nuts)
Now select only the date range used in the Nature paper. From the supplment The analysis in Benincà et al. (Nature 2008) covered all data from 16/06/1991 until 20/10/1997. (Remembering dates in the R dataframes are 2090s.)
all.data <- filter(all.data, Date>=dmy("15/06/2091") & Date<=dmy("21/10/2097"))
#all.data[all.data$Date==dmy("16/06/2091"),]
#all.data[all.data$Date==dmy("20/10/2097"),]
(No attempt to reproduce Figure 1a, as its a food web diagram.)
First quick go:
ggplot(all.data, aes(x=Day.number, y=value)) + geom_line() +
facet_wrap(~variable, scales="free_y")
Now we add a column that gives the variable types, same as in figure 1b through 1g. First make a lookup table giving species type:
tt <- data.frame(variable=unique(all.data$variable),
Type=c("Cyclopoids", "Herbivore", "Herbivore", "Herbivore",
"Phytoplankton", "Phytoplankton", "Phytoplankton",
"Detritivore", "Detritivore", "Bacteria", "Nutrient", "Nutrient"))
#tt
And add the Type variable to the new dataset:
all.data <- merge(all.data, tt)
First lets set the colours as in the original:
species.colour.mapping <- c("Cyclopoids"="pink",
"Calanoid.copepods"="red",
"Rotifers"="blue",
"Protozoa"="green",
"Nanophytoplankton"="red",
"Picophytoplankton"="black",
"Filamentous.diatoms"="green",
"Ostracods"="lightblue",
"Harpacticoids"="purple",
"Bacteria"="black",
"Total.dissolved.inorganic.nitrogen"="red",
"Soluble.reactive.phosphorus"="black")
Next change the order of the levels in the Type variable, so plots appear in the same order as in the original figure:
all.data$Type <- factor(all.data$Type, levels=c("Cyclopoids", "Herbivore", "Phytoplankton", "Nutrient",
"Bacteria", "Detritivore"))
The graph with abundances fourth root transformed, as this is the transformation used in the ms.
g1 <- qplot(as.numeric(Day.number), value^0.25, col=variable, data=all.data) +
facet_wrap(~Type, ncol=2, scales="free_y") +
geom_point() + geom_line() +
scale_colour_manual(values = species.colour.mapping)
g1