This report is the reproduction of the analyses shown in Ward, E.J., Holmes, E.E., Thorson, J.T. & Collen, B. (2014) Complexity is costly: A meta-analysis of parametric and non-parametric methods for short-term population forecasting. Oikos, 123, 652–661.
Eric Ward kindly provided a subset of the raw data, processed data as well as the R scripts used for model fitting; they can be found in the respective folders. His original repository can be found here. Because the data originates from a collaborative project, not all data is freely available yet; however, the data provided by Eric allows to reproduce the analyses on the fish time-series shown in the paper.
##
## Attaching package: 'lubridate'
##
## The following object is masked from 'package:plyr':
##
## here
##
## Loading required package: bitops
## Loading required package: nlme
## This is mgcv 1.8-6. For overview type 'help("mgcv-package")'.
## Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-2)
## [vignette("np_faq",package="np") provides answers to frequently asked questions]
## locfit 1.5-9.1 2013-03-22
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: timeDate
## This is forecast 6.1
##
##
## Attaching package: 'forecast'
##
## The following object is masked from 'package:nlme':
##
## getResponse
##
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
Read time series data provided from repository:
ts <- read.csv(text=getURL("https://raw.githubusercontent.com/opetchey/RREEBES/WARD_etal_2014_Oikos/WARD_etal_2014_Oikos/processed%20data/masterDat%20052015.csv"), header=T, stringsAsFactors=F)
metainfo <- read.csv(text=getURL("https://raw.githubusercontent.com/opetchey/RREEBES/WARD_etal_2014_Oikos/WARD_etal_2014_Oikos/processed%20data/Data%20and%20metadata%20052015.csv"), header=T, stringsAsFactors=F)
#look at data structure
str(ts)
## 'data.frame': 53130 obs. of 8 variables:
## $ X : int 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 ...
## $ ID : int 62 62 62 62 62 62 62 62 62 62 ...
## $ Database: chr "salmon" "salmon" "salmon" "salmon" ...
## $ Spcode : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 ...
## $ Species : chr "Chinook" "Chinook" "Chinook" "Chinook" ...
## $ Class : chr "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
## $ Value : num 10 10.1 10.2 10.3 9.1 ...
head(ts)
## X ID Database Spcode Year Species Class Value
## 1 2380 62 salmon 1 1951 Chinook Actinopterygii 9.998798
## 2 2381 62 salmon 1 1952 Chinook Actinopterygii 10.126631
## 3 2382 62 salmon 1 1953 Chinook Actinopterygii 10.239960
## 4 2383 62 salmon 1 1954 Chinook Actinopterygii 10.275051
## 5 2384 62 salmon 1 1955 Chinook Actinopterygii 9.104980
## 6 2385 62 salmon 1 1956 Chinook Actinopterygii 8.294050
# number of fish ts
length(unique(ts$ID))
## [1] 1266
ts$ID_old <- as.factor(ts$ID)
ts$ID_new <- as.numeric(ts$ID_old)
ts$ID <- ts$ID_new
Explore some time series visually to get an idea of the variability. Figure does not appear in the paper. Time series were centered to easily plot them simultaneousl [e.g. Year-Mean(Year) and Value - Mean(Value)].
set.seed(12345678)
ggplot(data=subset(ts, ID %in% sample(ts$ID,size=15)), aes(x=Year-mean(Year), y=Value-mean(Value))) + geom_line() + facet_wrap(ID~Species,ncol=5,nrow=3, scales="free")