The original paper by Hekstra and Leibler concerns whether population dynamics have a common statistical structure, in the same way that the frequency distribution of earthquake strengths has a characteristic strucutre. Important for this is understanding the role of historical contingency, and this is addressed in the paper also.
The data analysed comes from closed ecosystems containing Escherichia coli, Chlamydomonas reinhardtii, and Tetrahymena thermophila. Abundances were estimated via selective plane microscopy, allowing accurate and non-invasive estimates for months with high temporal resolution. The closed ecosystems were 3ml in volume held in fluorimetric cuvettes. Organisms in the ecosystes have been observed to persist for over 1000 days.
Through time, the ecosystems developed spatial heterogeneity and phenotypic changes, such as large Tetrahymena (that may have been able to consume algae) and filamentous E. coli and colonies that may have been resistant to consumption by Tetrahymena. Replicates differed from each other in the development of this complexity.
There were over 50 replicates distributed across temporal blocks. Observations of density were made over ~100 days, every day to every eight weeks.
This R code for this reproduction appears on the RREEBES repository.
The authors will provide the data to Owen, but have not given permission to make it public. Owen will put it on his group server, access to which is controlled by University of Zurich administrative systems (not by Owen directly).
In order to minimise the chance of us leaking the data, please follow these rules: - Do not make a copy of the data. Absolutely. Do. Not. - From within R, read the data from the group server. This will require you to be connected to it. If you’re off site, you will need to VPN in. - Do not perform any save() or write() operations in R. (We may amend this if analyses are intensive and we need to save intermediate steps. However, these should be stored on the server, and not on local machines.)
This is obviously a bit of hassle compared to if we could make the data public, but we cannot, so must adhere to this practice.
These rules / guidelines are provided and discussed in the wiki.
Basis Curated_workspace_20091115.mat in Matlab/Work on “SecondBackup” (LeiblerLab).
Organization for variables: G_S: Group of ecosystems (A-F) and Species (Algae, Bacteria, Ciliates) The first experiment consisted of what is called group F. The second experiment of groups A-E, with A measured twice per week, B once per week, etc.
Within each table, the entries are organized as follows
| —- system 1 —- | | —- system 2 —- | | —- system 3 —- | … time CPF NF time CPF NF time CPF NF … … … … … … … … … … … With each row a measurement day; CPF = counts per frame; NF: number of frames. Density = CPF * calibration factor. Calibration factors convert counts per frame to cells per mL (not necessary for most analysis): coeff_A = 10^3.92; coeff_B = 10^5.15; coeff_C = 10^4.36;
Total counts: CPF * NF, used for estimation of measurement error (see below).(Note to self: For F_B, I removed the columns tracking alga-related fluorescence “leakage” > into the red channel to conform to the format of the other data sets.
Ecosystems excluded because of excessive leakage (>0.1 mg/day) [Bootstrap_sigma_20090714_2.m]: Group A: 7, 9 Group B: 2, 8, 9
Group C: 2, 6 Group D: 10 Group E: 5, 10 Group F: none.For some analysis, replicate 4 of set F was excluded (this ecosystem was used for a number of other experiments after day 70).
Grouping of ecosystems with approximately the same (sub) measurement schedule. Group A, rows (measurement days) 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 21, 23, 25, 26 Group B rows 2-15. Group F: rows: 4, 11, 18, 25, 31, 37, 42, 46, 50, 54, 58, 59, 59, 60. (time point 59 was duplicated for this analysis)
Calculated Escape Rates Motivation: the counts of individuals are not entirely independent between time points as cells often spend a few seconds within the observation volume. A correction is > thus necessary with respect to Poisson counting statistics.
Probability that object (identified as A, B or C, respectively) is not observed in the next frame (at 1 Hz): p_e_A = 0.237; p_e_B = 0.3; p_e_C = 0.9;
Correlation times are defined as 1/p_e. The number of independent counts goes as N*p_e. The error model is described in my thesis as well ( https://dspace.rockefeller.edu/bitstream/10209/413/1/DoekeHekstraFinalThesis.pdf ).
Other • Figure 4: to determine correspondence of eigenvectors between time points, I worked backwards in time, using the dot products of (normalized) eigenvectors as a measure of their similarity. • Treatment of zero counts: as 0.5 counts over the total number of frames.
Read the data and tidy (here and in many other places in this document code is hidden, but can be viewed on the repository mentioned above.)
Show the population dynamics for each species (different rows) in each block (capital letters / column) and replicate (colours).
Figure 2E is the mean and spread of the population dynamics, with a couple of individual replicates shown. There is something a bit wrong with the chosen replicates, in that ther are sometimes multiple measures of density on the same day. Also, there are quantitative difference between the published figure and the reproduced one. The dynamics exhibit a common trend across replicates, but with considerable fluctuations around this trend.
The aim here is to characterise across replicate variability in (species log transformed) densities. I.e., is it the case, at a particular point in time, that replicates with high Tetrahymena abundance also have high abundance of other the other two species, or high of one and low of the other. Do get this, we calculate covariance among species abundances across replicates. This hould be done at “weekly time points”, and separately for experiment 1 and 2 (The first experiment consisted of what is called group F. The second experiment of groups A-E, with A measured twice per week, B once per week, etc.).
An example of an eigen analysis of one time slice, across replicates:
## Importance of components:
## Comp.1 Comp.2 Comp.3
## Standard deviation 1.1588898 1.0797786 0.7007516
## Proportion of Variance 0.4476752 0.3886406 0.1636842
## Cumulative Proportion 0.4476752 0.8363158 1.0000000
##
## Loadings:
## Comp.1 Comp.2 Comp.3
## Algae 0.539 0.623 -0.567
## Bacteria 0.345 -0.777 -0.526
## Ciliate 0.769 0.633
##
## Comp.1 Comp.2 Comp.3
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.333 0.333 0.333
## Cumulative Var 0.333 0.667 1.000
At about half way through the experiment (time bin = 10), the first PC axis is positively correlated with all species abundances. This is the dominant axes of variation among replicates. No we do find these axes of variation and how they change through time…
Reproduce original article figure 4a:
Qualitative patterns reproduced… though quite a bit messier than in the original report.
Reproduce original article figure 4b:
At least the qualitative patterns are reproduced quite nicely.
The remainder of the analysis in the paper are not yet reproduced:
Analysis of growth rate fluctuations
Analysis of Hurst exponent
Taylor law analysis
Effect of difference in gas seal quality among replicates
There is a qualitative match between the reproduced results and those in the original article. Hence, the reproduction supports the conclusion that “despite wide variations of the population dynamics in individual systems around their mean and an increase of vari- ance with time, the variations of the three species were corre- lated. Well-defined ecomodes that describe these correlations emerged and stabilized after an initial period of about 3 weeks. The existence of these ecomodes reflects the fact that fluctua- tions of the three species’ densities around the replicate-average dynamics are coupled through ecological interactions that are common to all replicates.”
Could think about what the analyses above assume about linearity of interactions / relationship among species. A non-linear approach (e.g., multispecies GAM) might give interesting insight.