Marine phytoplankton are photosynthetic microorganisms that account for up to half of global net primary production [1]. As such, the population dynamics of these organisms are crucial to understanding the global carbon cycle [2, 3]. One key aspect of phytoplankton populations is the growth rate, typically defined as the rate of increase in population biomass over time per unit of existing biomass. Direct in-situ measurement of this quantity cannot be obtained from abundance or carbon biomass alone, which are a composite of cell growth, cell mortality, and other biological and physical processes [4]. Several different methodologies have been employed to estimate in-situ phytoplankton growth rates; however, previous estimates relied on analytically challenging and low-throughput methods such as the radiometric turnover times of14C labeled chlorophyll [5] and 32P labeled ATP [6], cell cycle analysis [7], and the dilution method [8]. While taxon-specific growth rates might be estimated with these methods, they often suffer from large uncertainties caused by coarse sample time resolution or experimental artifacts (collectively known as “bottle effects”; e.g., [9]). The emergence of continuous flow cytometry in ocean surveys [1012] provides high resolution, taxon-specific measurements of the abundance and size of individual phytoplankton cells and offers a high-throughput in-situ alternative. In principle, measurements of cell abundance across different sizes over time provide a means to derive rates of carbon fixation and cell division [4], motivating the use of size-based mechanistic modeling frameworks to isolate these biological rates.

We focus on a class of mechanistic models known as stage-structured matrix population models (MPMs), which can be used to estimate demographic rates from measurements of abundance across life-cycle stages [13], often defined by the age or size of individuals. For example, tree species produce seeds once they have reached a particular size [14] and fish species maximize reproduction at a critical age [15]. These models assume that individuals in a population can be classified into m discrete stages that define their response to the environment modeled as a discrete-time process. MPMs assume that the state of the population at time t + 1 can be written in terms of the state of the population at time t and a set of transition rates [16]:
where Bt(θ) is a projection matrix that defines the possibly time-dependent population dynamics, θ is a parameter vector, and nt is a vector representing the number of individuals in each stage at time t, which defines the state of the population. The vector θ includes biological parameters to model population dynamics and is the target of parameter estimation [

In recent years, size-structured MPMs have gained popularity for estimating division rates of phytoplankton populations by mechanistically describing changes in microbial cell size distributions over the day-night cycle [1824]. For instance, MPMs have been employed to estimate daily division rates of the picocyanobacterium Synechococcus and picoeukaryotic phytoplankton based on a 13-year hourly time series from a coastal location in the Atlantic Ocean using a submersible flow cytometer [19, 23, 24]. In the North Pacific Subtropical Gyre, similar MPMs were used to estimate daily and hourly division rates of another picocyanobacterium, Prochlorococcus, based on continuous flow cytometry measurements taken over two research cruises [21]. In these studies, cell size measurements provided by high-frequency flow cytometry were used to define the life-cycle stages of the population. These models assumed that changes in the cell size distribution over the day-night cycle are driven by two biological processes: 1) carbon fixation via photosynthesis and 2) cell division; other processes such as respiration and exudation, which lead to cell shrinkage, are omitted. In previous investigations, model results were validated against estimates from dilution experiments [19] or DNA-based cell cycle analysis [21]. The focus of these models was to estimate division rates rather than carbon fluxes; as a consequence, these MPMs [18, 19, 21, 24] can lead to transition matrices with biologically implausible estimates. Furthermore, uncertainty quantification for model parameters typically involved refitting methods.

Here, we extend existing size-structured MPMs to model additional biological processes describing population dynamics over the day-night cycle and to improve parameter interpretability and model performance. Model estimates are computed using a Bayesian implementation in the probabilistic programming language Stan [25], through which we provide statistically rigorous parameter uncertainty intervals while allowing for the incorporation of prior scientific knowledge about model parameters. This approach enabled an evaluation of the sensitivity of posterior distributions to sampling size, sampling frequency, and initial conditions. In the following, we develop five MPMs that differ in their complexity and flexibility in parameterizing three transition terms: cell division, carbon fixation, and carbon loss (Fig 1), which describe the dynamics of the picocyanobacterium Prochlorococcus, Earth’s most abundant phytoplankton [26]. Four additional models (S1 Table) with intermediate levels of complexity, are examined in the Supporting Information (S1 and S2 Figs).


Fig 1. MPM size classes and transitions.

Schematic of the MPM’s cell size classes and its three class transitions: carbon fixation, division, and carbon loss. The boundaries of the m cell size classes (vi for i = 1, 2, …, m + 1) are logarithmically spaced, so that cells can transition to a size class that is exactly half their original size when they divide. For this purpose, the integer j is selected so that for i > j; cells in the first j size classes cannot divide.

We evaluated the performance of our models using laboratory experiment time-series measurements of a highly synchronized population of a high-light adapted strain of Prochlorococcus [27] collected during the exponential phase of batch growth over two simulated day-night cycles (Fig 2). This dataset contains cell size distributions derived from flow cytometry (Fig 2A and 2B), cell abundance and light measurements (Fig 2C), and measurements of carbon fixation (Fig 2D) at two-hour intervals. Division rates are derived from changes in cell abundances, while carbon loss is estimated from other measurements (see Experimental data below). We fit our models to the size distribution data (Fig 2A and 2B) and evaluated the ability of each model to reproduce the observed parameters at daily and hourly time scales. All models used a logarithmically-spaced discrete cell size distribution, permitting cells to divide into two daughter cells that are half their size (Fig 1). While our simplest model has no size-dependence for carbon fixation and lacks a carbon loss term, the more complex models include size-dependence for all three transitions, explained below. Finally, we converted model parameters to estimates of biological rates such as carbon fixation and carbon loss, allowing for more direct interpretation of estimated parameter values.


Fig 2. Laboratory Prochlorococcus time series measurements.

(A) Heatmap of the number of cells and (B) relative cell abundances in each size class measured every two hours over a 48-hour period. (C) Cell abundance and photosynthetically active radiation (PAR). (D) Hourly carbon fixation, carbon loss, and division rates. Error bars indicate one standard deviation based on two technical and two biological replicates.



Past work has assumed that changes in cell size result from two processes: carbon fixation and cell division [1824]. We built upon these studies by developing models that include an additional process: cell shrinkage through exudation and respiration. Another assumption of past models is that division is a monotonically increasing function of size, i.e. larger cells are more likely to divide than smaller cells. This implies that the highest rates of cell division should coincide with the highest proportion of large cells in the size distribution. However, the peak of cell size in Synechococcus and Prochlorococcus occurs during daylight while the peak of division usually occurs at night [28]. In the Prochlorococcus culture dataset used in our work, hourly cell division lagged 4–8 hours behind the peak of cell size (Fig 3A). In fact, hourly division rates showed little correlation with mean cell size (Fig 3B). Note that these trends hold not only for the mean cell size, but also with respect to larger cells, e.g. for the 70th, 80th, 90th, and 95th percentiles. When comparing the size distribution at 15 hours (peak in cell division) and at 35 hours (almost no division) after the start of the experiment, we see that many more large cells are present at hour 35, but the division rate is much higher at hour 15 (Fig 3C). However, we observed a strong correlation (r = 0.84) between hourly division rate and mean cell size with a 6-hour lag (Fig 3D). These results were also consistent for the 70th, 80th, 90th, and 95th percentiles, suggesting that cell division is dependent on cell size as well as additional processes. Cell division in photosynthetic organisms is tightly regulated by light, although the onset of the cell cycle in Prochlorococcus does not seem to be strictly light dependent [29]. We therefore tested two different parameterizations for estimating cell division. In the first, cell division is constrained to be an increasing monotonic function of cell size, but constant over time, as in previous studies. In the second, cell division still increases monotonically with cell size but is allowed to vary over time. We also considered size dependence in carbon fixation through power-law relationships supported by experimental evidence [30]. Finally, we implemented a “free” parameterization in which carbon fixation and carbon loss rates are estimated separately for each size class, in order to provide enough flexibility for the model to capture biological processes that are not explicitly accounted for in our models.


Fig 3. Hourly division rates vs. cell size.

(A) Phytoplankton size distribution overlaid with hourly division rates (red curve; error bars indicate one standard deviation based on two technical and two biological replicates). Division rate and size distribution at t = 15 (blue box) and t = 35 (gold box). (B) Hourly division rates vs. mean cell size. (C) Cell size distribution at time t = 15 (blue) and t = 35 (gold). (D) hourly division rate at time t vs. mean cell size at time t − 6.

We distilled our assumptions into a set of five models of differing parameterizations (Table 1). Each model is identified by a subscript consisting of three letters corresponding to the parameterizations of carbon fixation, division, and carbon loss, respectively. The first letter in each model name corresponds to the carbon fixation parameterization. The letter b in carbon fixation indicates a basic parameterization in which carbon fixation is assumed to be constant as a function of size. The letter p indicates a power-law relationship with respect to size and f represents a free parameterization where each size class may have its own rate of carbon fixation. With respect to division, represented by the second letter of the model name, the letter m indicates a monotone increasing division rate as a function of size with no time-dependence, while t indicates a parameterization that also includes time-dependence in division. The third letter, indicating the carbon loss parameterization, can be b (basic) or f (free parameterization) as in carbon fixation, or x for a model with no carbon loss. As an example, we refer to our simplest model as mbmx, denoting that it has basic carbon fixation without size-dependence, division rates that monotonically increase with cell size, and no carbon loss term. Examples of the functional forms used for the model parameters can be found in the Materials and methods section.

Models contain more parameters down the rows of Table 1. Thus, model mbmx is the simplest model and most closely represents previous MPMs applied to microbial communities, while model mftf is the most complex with respect to the number of parameters. We fit mbmb and mftf on model-generated data to verify that our models were able to recover the values of the biological rate parameters (S3 Text).

We fit these five models to a dataset gathered in a laboratory experiment. Rates of division, carbon fixation, and carbon loss were estimated on both daily and hourly timescales. In the following section, we examine daily rate estimates, which have been the primary target of inference in past work. Then, we further assess the model rate estimates at an hourly timescale to inspect the behavior of our models within diel cycles. Furthermore, we explore the relationship between cell size and division, carbon fixation, and carbon loss. Finally, we examine the relationships between the estimated parameter values and perform observation sensitivity experiments.

Estimation of daily rates

We first assessed our models’ ability to recreate the observed Prochlorococcus cell size distribution. Then, we examined whether an improved fit to the size distribution data resulted in improved model performance by comparing model estimates of daily average carbon fixation, carbon loss, and division rates to independent measurements from laboratory data. Finally, we investigated model estimated photosynthetic parameters.

As expected, the MSE of the estimated cell size distribution decreased as the number of model parameters increased (Fig 4A). Critically, however, this improved fit did not correlate with better daily rate estimates. One of the most important parameters estimated by the models is the daily rate of cell division, see Eq (25). The observed daily division rate in the population was 0.63 ± 0.01 d-1 (1 standard deviation interval). However, the simplest model mbmx overestimated this rate by nearly a factor of two (Fig 4B; 1.06 ± 0.05 d-1). This may stem from the fact that this model did not include carbon loss; thus, it attributed any reduction in cell size to cell division. Model mbmb, which adds respiratory/exudative carbon loss, was able to accurately estimate the daily division rate (0.63 ± 0.02 d−1), while all other models produced less accurate estimates, despite lower MSE of the estimated cell size distribution.


Fig 4. Model estimated daily rate parameters.

(A) Mean squared error (MSE) of estimated proportions to the observed particle size distribution (PSD). (B) Estimated daily division rates. (C) Estimated daily carbon fixation. (D) Estimated daily carbon loss. (E) Estimated photosynthetic saturation parameter. (F) Estimated maximum photosynthetic rate. (B-F) Green vertical lines indicate ground truth calculated from data. Green shaded areas indicate uncertainty surrounding ground truth measurements. Model estimates shown as posterior distributions.

Model mbmb also performed well in estimating daily rates of carbon fixation and loss (Fig 4C and 4D). Again, the models with the best fit to the size distribution (mfmf, mftf) exhibited lower accuracy in their estimates of these rates. Interestingly, the addition of size-dependent carbon fixation (mpmb) resulted in underestimation of daily carbon fixation (75.57 ± 1.00 fg C cell−1 d−1) and cell division (0.33 ± 0.02 d−1) but improved estimates of daily carbon loss. The further addition of size-dependence in carbon loss (mfmf) led to overestimates of daily carbon loss and even lower division rate estimates, indicating that this model attributes too much of the observed decreases in cell size to carbon loss rather than cell division. Model mftf, with added time-dependent division, underestimated daily rates of carbon fixation, division, and carbon loss.

Finally, we examined the photosynthetic saturation parameter Ek and the maximum light-saturated photosynthetic rate Pmax, two components of the mechanics of carbon fixation (see Carbon fixation section). Model mbmx shows the worst performance for these parameters, but mbmb also greatly overestimates both quantities despite accurate estimation of daily carbon fixation, highlighting potentially weak identifiability—i.e. similar daily carbon fixation rates can be obtained by different means, as carbon fixation decreases with higher values of Ek but increases with higher values of Pmax. These are examined further via simulation studies in the Supporting Information (S3 Text). Interestingly, mpmb had much more accurate estimates of the photosynthetic parameters, despite lower accuracy in overall daily carbon fixation. Size-dependent carbon loss (mfmf) and time-dependent division (mftf) resulted in poorer estimates of the photosynthetic parameters relative to mpmb.

Overall, the simplest model mbmx showed the poorest performance in estimation for nearly every category, highlighting the importance of accounting for carbon loss in our models. There is no model that performed best with respect to all the daily rate estimates we included in our tests; mbmb created the best division and carbon fixation estimates, while mpmb provided the best performance for Ek, Pmax, and daily carbon loss.

Estimation of hourly rates

In addition to the analysis of daily rate parameters, we examined the models’ abilities to recreate population dynamics at hourly resolution (Fig 5) to determine whether discrepancies between model estimates and observations occur at a particular time of the diel cycle and to help us identify the relevant biological processes at play. While some of our models were able to estimate the daily rates of cell division, carbon fixation, and carbon loss accurately, the hourly patterns were more difficult to replicate (Fig 5A–5C). As expected by the relationship between cell size and hourly division rates (Fig 3), models that assume that cell division is only size-dependent (mbmx, mbmb, mpmb, mfmf) estimated the timing of cell division to be 4 to 8 hours too early (Fig 5A). On the other hand, model mftf, with both time-dependent division and size-dependent carbon fixation, was able to more accurately estimate the timing of cell division. However, this model underestimated division at dusk, thus leading to the inaccurate daily rates as discussed above. All models were able to capture the timing of carbon fixation, which is tied to the amount of incident light (Fig 5B). Yet, most models tended to underestimate the amount of fixed carbon, with mbmb coming closest to capturing the dynamics observed in the data. Surprisingly, the timing of carbon loss computed from the data (Fig 5C) closely matched that of carbon fixation. Our models tended to underestimate carbon loss during daytime peaks and overestimate it at night.


Fig 5. Model estimated hourly rate parameters.

(A) Observed (black) and estimated (colored bands) hourly division rates. (B) Observed (black) and estimated (colored bands) hourly carbon fixation. (C) Observed (black) and estimated (colored bands) hourly carbon loss. (A-C) Black points indicate ground truth calculated from data. (D) Estimated cell division fraction as a function of cell size. (E) Estimated light-saturated cell growth (carbon fixation) fraction as a function of cell size. (F) Estimated cell shrinkage (carbon loss) fraction as a function of cell size. (A-F) Colored bands indicate model estimates. Shading indicates the first to third quartiles of the posterior distributions. (D-F) Fractions correspond to MPM transitions over a 20-minute time period.

To further explore the estimated dynamics of division, carbon fixation, and carbon loss, we investigated the estimated proportions of cells undergoing each of these transitions as a function of cell size (Fig 5D–5F). The estimated shape of the size-division relationship tended to follow a sigmoidal pattern for all models: the fraction of dividing cells increases sharply above a critical size, which varied from 60 to 110 fg C depending on the model (Fig 5D). We note that the model that best estimated the daily division rate (mbmb) expected cell division to occur mostly in the largest size classes (> 110 fg C), which resulted in accurate amplitudes of hourly cell division rates, albeit at a 6-hour phase shift. In general, models that underestimated division (mfmf, mftf) estimated smaller proportions of dividing cells within the larger size classes. However, mbmx, which generally estimates a comparable or lower division fraction than mbmb at a given size, overestimates cell division. Because mbmx contains no carbon loss, it estimates more large cells to be present in the distribution, hence increasing the division rate relative to mbmb even if the division fraction is lower.

Meanwhile, model estimates of the size-dependence of carbon fixation generally estimated high values for the peak maximum growth fraction (Fig 5E). Models that assumed constant maximum growth (mbmx, mbmb) estimated this fraction to be near one. Interestingly, models with a free parameterization of size-dependent carbon fixation (mfmf, mftf) generally estimated larger cells to have a lower maximum growth fraction, as in the power-law formulation (mpmb). The estimated fractions of cell shrinkage tended to be significantly lower than the fractions of maximum growth, ranging from negligible to about one-fifth of the peak maximum growth fraction (Fig 5E and 5F). In the two models with size-dependent carbon loss rates (mfmf, mftf), the estimated fraction of cell shrinkage generally increased with cell size. However, both models estimated a sharp drop near the same critical sizes at which the division fraction sharply rose, suggesting that the models assign the decreases in cell size to cell division rather than carbon loss for larger but not smaller cells. These results suggest a trade-off of daily and hourly rate estimates between our models: models that produced some of the most accurate daily estimates of cell division, carbon fixation, and carbon loss showed a systematic offset in timing of cell division, while the models which accurately captured the timing often performed less well in estimating the daily average rate.

Posterior parameter distributions

As the cell size distribution is used for model fitting, a model may be able to accurately capture the net effect of the parameters despite failing to accurately capture the value of each parameter individually, highlighting potential weak parameter identifiability. We therefore examined the bivariate joint posterior distributions of estimated rates of daily cell division, carbon fixation, and carbon loss (which are composites of many model parameters) as well as photosynthetic parameters to better understand the mechanics of the MPMs and the interdependencies of their parameters. We focused on two models: mbmb, which had the best overall performance on daily rates of cell division, carbon fixation, and carbon loss but failed to predict the timing of cell division, and mftf, which was best able to predict the timing of cell division but failed to provide accurate daily rates (Fig 6). A strong correlation between daily carbon fixation and carbon loss was observed in the posterior distributions of both models (r = 0.61 and 0.81 for mbmb and mftf, respectively; Fig 6J and 6K), which was expected since the carbon fixed by photosynthesis fuels respiration and exudation. However, the relationship between carbon fixation and cell division differed between the two models (Fig 6F and 6O). Carbon fixation and cell division were positively correlated (r = 0.66) in mbmb, which makes intuitive sense since the faster the cells grow, the faster they divide (Fig 6F), while a negative correlation (r = -0.47) was observed in mftf (Fig 6O). This negative relationship likely stems from the fact that daily division rate and carbon loss in mftf were strongly negatively correlated (r = -0.89, Fig 6L), while this relationship was much weaker in mbmb (r = -0.16, Fig 6I). As carbon fixation and carbon loss are tightly correlated, carbon loss may mediate the observed negative relationship between carbon fixation and daily division in mftf, making it more difficult for this model to disentangle these two processes than in mbmb.

The shape of the posterior distribution highlights the strong relationship between Pmax and Ek (Fig 6A and 6T); increases in Pmax and reduction of Ek both increase carbon fixation in different ways (see Eq (7)), which would explain why mbmb could accurately estimate daily carbon fixation albeit with inaccurate estimates of photosynthetic parameters. The strong dependence structure between parameters shows that it is important to consider the joint distributions of the parameters and not solely focus on the marginal posterior distribution for each parameter. It also demonstrates that the size-distribution data itself cannot uniquely constrain all parameters, emphasizing the importance of prior knowledge and the prior distribution for limiting the parameter distributions.

Observation sensitivity experiments

In order to quantify the impact of changes in the size distribution data on model parameter estimates, we performed two sets of experiments. In the first, we used a sliding window approach to assess the effect of changing the start time of the 48-hour time series on model output. In the second, we studied the robustness of the models to changes in the sampling resolution of observations.

In the sliding window experiment, we extended the normalized size distribution time series by appending the data to itself, thereby creating a four-day dataset. Then, we estimated parameters and initial conditions within a 48-hour window that was moved forward in time in four-hour increments. Details about the setup of the sliding window experiments and their results can be found in the Supporting Information (S1 Text). With the exception of mbmx, all models exhibited a high degree of stability in their estimates for each window, indicating that the starting time of the model fitting procedure had a very limited effect on the models’ parameter estimates. Some deviations were noticeable, however, when the window start time was near the peak of the cell size distribution, at which the difference between observations and model estimates is most pronounced. For mbmx, estimates showed a high degree of variability among windows, suggesting that the results of this model may not be as stable or reliable as the others.

In the second set of experiments, we performed holdout validation experiments, in which time points of the size distribution data were withheld from the training data used for model fitting. This holdout data was then used as a testing set, and we computed the error for both datasets in order to examine our models’ out-of-sample performance and the stability of the parameter estimation relative to the full dataset. We conducted three experiments, sequentially removing an increasing amount of equally spaced data, roughly mimicking lab experiments in which measurements were collected at lower resolution. This procedure ensured that the daily cycle was sampled well, and both days are represented equally in the training data. More details of this analysis can be found in the Supporting Information (S2 Text). We found that parameter estimates and the observed cell size distribution remained stable when up to half of the data was removed from training, but out-of-sample performance deteriorated and parameter estimates differed significantly from those computed from the full data when two-thirds of the data was removed. This result suggests that our model could be applied to time series data collected at 4 hour intervals and still provide accurate estimated daily rates of cell division, carbon fixation, and carbon loss.


In this work, we extended size-structured MPMs to recover additional key biological processes that dictate phytoplankton cell growth, shrinkage, and division. Our investigation focused on a laboratory culture of the picocyanobacterium Prochlorococcus, whose dynamics over the diel cycle have been extensively studied [27]. We developed five models that differed in their parameterizations of changes in cell size. In addition to a size-dependent relationship for cell growth and time-dependence in cell division, we considered respiratory and exudative carbon loss in our models, which had previously been omitted in similar studies [1824]. We implemented our models within a Bayesian framework, which permitted us to incorporate prior information into the analysis to regularize parameter inference and avoid biologically implausible parameter values.

The selection of priors is a requirement for the Bayesian inference procedure. When scientific knowledge is available to determine plausible parameter values, this can be formally incorporated into the inference and resulting estimates; otherwise, uninformative priors [31] can be used, which include broad uniform distributions or the so-called Jeffreys prior [32]. However, constraining complex models with uninformative priors may lead to poor identifiability and numerical instability. In this case, it may be useful to conduct additional studies to learn about plausible parameter ranges so that information can be brought into model fitting or use an approach known as Empirical Bayes, which aims to construct a prior distribution that is consistent with the data before formally fitting the model [33].

Herein, we showed that size-structured MPMs can be used to estimate not only rates of cell division but also carbon fluxes, offering the potential to connect microbial growth processes to the carbon cycle. The addition of carbon loss, which allows cells to shrink in size through a process other than cell division, improved the accuracy of model estimates and the fit to the size distribution data, with mbmb successfully recovering the measured daily rates of cell division, carbon fixation, and carbon loss (Fig 4B–4D). More complex models, such as those with size-dependent carbon fixation and time-dependent cell division, provided better fits to the cell size distribution and photosynthetic parameter estimates but showed worse model performance in recovering the observed daily rate parameter values. This result indicates that model fit to the observed cell size distribution cannot be used alone as a proxy for overall model performance.

As expected from the lack of correlation between cell size and hourly division rate in the laboratory data (Fig 3), most of our models consistently predicted the peak of cell division about 4–8 hours earlier than observed in the data (Fig 5 and S1 Fig). This offset stemmed from the assumption that cell division (i.e. the separation of a single cell into two daughter cells) occurs instantaneously once the cells reach a certain size. While this assumption may be reasonable on daily time scales, it becomes problematic at hourly resolution; cell division is a complex process involving many components, each highly regulated to ensure that the separation of the cell into two daughter cells occurs only once DNA synthesis is completed, which takes between 4 and 6 hours depending on the strain and culture conditions [27, 29]. Here, the peak of DNA synthesis coinciding with the peak of cell size [27] suggests that cell size dictates the onset of DNA replication rather than the final separation of the cell into two daughter cells. Due to their greater flexibility, models with time-dependent division and size-dependent carbon fixation successfully captured the timing of cell division but failed to obtain accurate rate estimates. Interestingly, models with a free parameterization of size-dependent carbon fixation (mfmf and mftf) estimated less carbon fixation and more carbon loss in the large size classes which contains a large fraction of dividing cells (Fig 5E and 5F). This result suggests that dividing and non-dividing Prochlorococcus cells may have a different carbon metabolism, as observed in other organisms [34]. Ultimately, the choice of model will depend on the goal of the particular application. Our simpler models offered greater interpretability and accuracy of daily rate parameters, while more complex models were able to recover the timing of cell division at the cost of additional computation time and the requirement of stronger prior information.

Finally, we consider further potential future directions for this work. One of the interesting results in this study is the offset in the predicted and observed timing of division for the models with the most accurate daily division rate estimates. While the addition of time- and size-dependencies for cell division, carbon fixation, and loss allowed our more complex models to capture the timing of cell division, their estimates of the magnitude of division and other rate parameters suffered. As stated above, we hypothesize that carbon metabolism differs between dividing and non-dividing cells, yet our current modeling framework requires extension of the stage structure to encapsulate this information in order to test such a hypothesis. A hybrid age- and size-structured MPM may therefore be better suited to assess the importance of including cell division duration to more accurately simulate the timing of Prochlorococcus division, though this would expand the state-space of our models and require additional computation.

The flexibility of our modeling framework provides a valuable basis for examining and evaluating MPM results in the face of more complex datasets, which could further improve our understanding of the dynamics of marine microorganisms and their contributions to the carbon cycle. An exciting future extension of this work is application to in-situ Prochlorococcus and Synechococcus datasets obtained from shipboard flow cytometers [35]. Here, we tested our models on a highly synchronized population of Prochlorococcus grown under laboratory conditions, but we expect natural populations to be less synchronized and exhibit noisier dynamics over the diel cycle. Additional processes not accounted for in this study, such as grazing and viral lysis, which could potentially affect phytoplankton size distributions, will need to be tested. The application of our models to field data will be addressed in future work.

Materials and methods

Experimental data

A dataset of laboratory experiment time-series measurements of a high-light adapted strain of Prochlorococcus [27] collected during the exponential phase of batch growth over two simulated day-night cycles (Fig 2) was used to estimate biological parameters. We used changes in cell abundance over time to calculate division rates, since cell mortality is assumed to be negligible in exponentially growing cultures. A suite of measurements, which include cell size distributions and rates of carbon fixation, were collected at 2 hour intervals for a period of 50 hours to capture two complete diel cycles. Cell size distributions were inferred from flow-cytometry based forward-angle light scatter measurements (FALS). FALS signals normalized by calibration beads were converted to a proxy of mass using the relationships M = FALS1/1.74 [36] and then converted to carbon quotas, assuming an average carbon quota of 53 fg C cell−1 [27]. 14C-Photosynthetron experiments were conducted in duplicate at each time point to derive carbon fixation rates, maximum photosynthesis rates, and the photosynthetic saturation parameter. Short (1 hour) incubation times were used to approximate gross carbon fixation rates. Using the 2-hourly cell abundance measurements (at), average cell size measurements (st) and approximate carbon fixation rates (ft), we then estimated carbon loss rates (lt) every 2 hours, using
where dt is the two hour time step between measurements. Measurements of photosynthetically active radiation (PAR) were collected every 2 hours. Note that of these measurements, only the cell size distribution and PAR data were used in model fitting. Estimates of division, carbon fixation, carbon loss, and photosynthetic parameters were used only to provide ground truth values and are not used in the model fitting procedure.

Microbial matrix population models

The aim of the MPM applied to microbial populations is to mechanistically describe the evolution of the population over a day/night cycle. Typically, the target of inference is the daily division rate, which cannot be measured directly from changes in cell abundance measured in the field due to cell mortality caused by grazing and viral lysis as well as physical processes that can add or remove cells from the sampled population. Thus, in order to estimate this quantity, we infer it via observed changes in the relative abundance distribution over time. Past work has accomplished this by focusing on modeling two cellular processes: cell division and carbon fixation; in this work, we additionally consider carbon loss. We tested five MPMs involving these processes that varied in their complexity. All inference was carried out using the Bayesian modeling software Stan, see the Implementation section below.


Our models aim to quantify the rates of three key biological processes: cell division, carbon fixation, and carbon loss. These rates are deterministic functions of the parameter vector θ, which describes the dynamics of these processes, while the concentration parameter σ allows for overdispersion in the data. We can divide the parameter vector θ into four components θ = (θδ, θγ, θρ, ω0). The first three components correspond to each of the three cellular process we aim to model: cell division, carbon fixation, and carbon loss, respectively. The fourth defines the statistical mean of the cell size distribution at t = 0. The mean of the cell size distribution at each time point is a deterministic function of the model parameters θ; we consider the data to be stochastically distributed around this mean; see Observation model for details. We use Stan’s default prior for the initial condition simplex ω0 ∈ Δm−1. We describe the parameterizations of the remaining three components in the following; see also Fig 7.

Fig 7. Functional forms of model rate parameter dependencies on size, light, and time (compare Table 1).

(A) Examples of a monotonic relationship between size and division rate, and (B) a time-dependent relationship between the time of day and division rate. (C) The shape the light-dependent carbon fixation rate and examples of (D) power-law size-dependence of carbon fixations for 5 values of the parameter βγ; the “basic” parameterization is identical to βγ = 0. (E) The “basic” size-dependence of carbon loss. Vertical lines in panels with size-dependence denote the center of each size class. Examples of “free” carbon fixation or carbon loss parameterizations are not shown.

Cell division.

The cell division proportions δi(t) are parameterized as
where δmax ∈ [0, 24dt−1] is the maximum division quotient, q(t) is a function that induces time-dependence in division, and δincr ∈ Δmj is a simplex that defines the relative increase in the division quotient for each size class. Recall that the division parameterization of each model is indicated by the second letter in its subscript (
Table 1). For models with time-invariant division (mm), q(t) = 1. The parameter δmax is normalized by dt in units of days to better facilitate comparisons among models that vary in their values of dt; hence, (dt/24)δmax ∈ [0, 1]. The parameter δincr allows us to constrain cell division to be monotone without imposing a specific functional form of the relationship between cell size and cell division. For models with time-dependent division (mt), q(t) is estimated using a periodic cubic spline with 6 knots and associated control points . Thus, we have

Projection matrix.

The projection matrices are the core of a MPM: they project the population state forward in time. The population state at a given time point can be thought of as the statistical mean estimate of the Prochlorococcus cell size distribution. In this application, the projection matrices are deterministic functions of the parameters θ; the dependence on light E and the time-dependent division parameterization (for mftf) add time-dependence. The projection matrices define the dynamics of the microbial population through the three cellular processes described above: cell division, carbon fixation, and carbon loss. We assume that any individual cell can only undergo one of these three processes in each dt time step (it may also remain in the same size class). Thus, for each k, we first construct a set of matrices , where rk ≔ (tk+1tk)/dt is the number of dt time steps between time tk and time tk+1. Once these matrices are defined, we have for each k:
Each matrix projects the population from time to time .

Let δi(t) ∈ [0, 1] denote the proportion of cells in size class i that divide in one dt time step at time t, ρi ∈ [0, 1] the proportion of cells in size class i that shrink one size class in one dt time step given that they do not divide, and γi(t) ∈ [0, 1] the proportion of cells in size class i that grow one size class in one dt time step at time t given that they neither divide nor shrink. Then, recalling that j denotes the index of the smallest size class which can undergo division, the entries of each matrix are defined as follows:
where again . Here, only cell growth and stasis involve the PAR measurements E. The coefficient 2 in
Eq (14) reflects the fact that when a cell divides, it creates two daughter cells. This is the reason the normalization step (18) is needed to maintain the sum-to-one constraint and also the reason (26), which omits the normalization, is able to estimate the rate of cell division.

MPMs for microbial populations make projections differently from the formulation in (1). The counts are normalized at each time step so that we model the mean relative abundance:
Note that ωk(θ, E) is a deterministic function of the model parameters θ and the interpolated PAR E. Thus, we do not use the counts to estimate division rate directly, allowing for valid estimates even when mortality and physical movement of cells occur, so long as these processes do not affect the relative size distribution. We estimate the posterior distributions of the model parameters from their prior distributions and the likelihood of the data given the parameters (see Observation model section).

Model output.

The primary goal of inference is the daily division rate μ, defined as the exponential growth constant:
Recall that T = tK−1 is the time of the last observation in hours; thus, T/24 is the length of the time series in days. Because populations in their natural environments undergo cell loss due to cell mortality (due to grazing and viral lysis) and physical processes that can add or remove cells, a normalization step
(18) was applied to estimate division rate based on relative cell abundance, as in past applications [18, 19, 21]. By removing the normalization step, we estimate the relative increase in cell number caused by cell division. Given parameter estimates , projection matrix estimates , and initial state estimate , we obtain the following estimator of the daily division rate:


Parameter inference was carried out in the software package Stan [25]. This software performs Bayesian inference, where the target is the posterior distribution of the parameters, which reflects the probable values of these parameters given the model, our prior beliefs, and the data [38]. In order to generate samples from the posterior distribution, Stan implements a variant of the Hamiltonian Monte Carlo (HMC) algorithm [39, 40] which has been shown to have superior speed and performance for fitting complex, high-dimensional population dynamics models relative to other Markov Chain Monte Carlo (MCMC) methods for sampling from the posterior [41]. In particular, we use Stan’s implementation of the No-U-Turn Sampler (NUTS) [42] to avoid manual selection of application-specific tuning parameters. We initially tried using Stan’s implementation of variational inference, which, while faster, creates approximate results and provided less stable estimates than HMC using NUTS in our experiments. Thus, in all of the experiments presented here, we used HMC, which generated stable results in this study and generally provides asymptotic consistency [40]. The implementation of HMC in Stan uses automatic differentiation to provide the gradients needed to integrate Hamiltonian dynamics. The reader is directed to [43] for additional details on HMC in Stan.

Six HMC chains were run for 2000 MCMC iterations for each model. In accordance with the Stan default settings, the first 1000 samples of each of these chains were discarded as a warm-up period for the sampler to reach its stationary distribution. The convergence diagnostic [44] was monitored for all model fits to ensure ; otherwise, the sampling procedure was considered divergent. A comparison of the prior distributions of mbmb and mftf with their corresponding posteriors can be found in the Supporting Information (S3 Fig).

In order to benchmark our results, we used Stan’s optimization to compute the maximum likelihood estimator (MLE) for mbmb and mftf. However, the results were unstable and sensitive to initialization. To investigate the sensitivity of our inference to the prior distributions, we implemented mbmb and mftf with flat priors, so that the posterior distribution is proportional to the likelihood. For mbmb, this gave virtually identical results. For mftf, the model failed to converge, indicating that stronger prior information is necessary to remove potential identifiability issues introduced by the additional parameters for size- and time-dependent processes.

All code used to process the data, fit models, and produce visualizations is publicly available on GitHub [45].

Prior distributions

The prior distributions are shown in Table 2. Priors were defined for the model parameters θ and σ, and not on rates of division, carbon fixation, and loss directly. Maximum cell division, carbon fixation and loss along with photosynthetic parameter values were chosen within biologically feasible ranges using information derived from literature [27, 46], otherwise the Stan default priors were used, corresponding to uniform priors [25].

Supporting information


  1. 1.

    Johnson ZI, Zinser ER, Coe A, McNulty NP, Woodward EM, Chisholm SW. Niche Partitioning Among Prochlorococcus Ecotypes Along Ocean-Scale Environmental Gradients. Science. 2006;311(5768):1737–40. pmid:16556835
  2. 2.

    Longhurst A, Sathyendranat S, Platt T, Caverhill C. An estimate of global primary production in the ocean from satellite radiometer data. Journal of Plankton Research. 1995;17:1245–1271.
  3. 3.

    Harding L. Long-term trends in the distribution of phytoplankton in Chesapeake Bay: roles of light, nutrients and streaniflow. Marine Ecology Progress Series. 1994;104:267–291.
  4. 4.

    Laws EA. Evaluation of in-situ phytoplankton growth rates: A synthesis of data from varied approaches. Annual Review of Marine Science. 2013;5:247–268. pmid:22809184
  5. 5.

    Goericke R, Welschmeyer NA. The chlorophyll-labeling method: Measuring specific rates of chlorophyll a synthesis in cultures and in the open ocean. Limnology and Oceanography. 1993;38(1):80–95.
  6. 6.

    Bossard P, Karl DM. The direct measurement of ATP and adenine nucleotide pool turnover in microorganisms: A new method for environmental assessment of metabolism, energy flux and phosphorus dynamics. Journal of Plankton Research. 1986;8(1):1–13.
  7. 7.

    Steward GF, Azam F. Bromodeoxyuridine as an alternative to 3H-thymidine for measuring bacterial productivity in aquatic samples. Aquatic Microbial Ecology. 1999;19(1):57–66.
  8. 8.

    Liu HB, Campbell L, Landry MR. Growth and mortality rates of Prochlorococcus and Synechococcus measured with a selective inhibitor technique. Marine Ecology Progress Series. 1995;116(1-3):277–288.
  9. 9.

    Ross ON, Geider RJ, Berdalet E, Artigas ML, Piera J. Modelling the effect of vertical mixing on bottle incubations for determining in situ phytoplankton dynamics. I. Growth rates. Marine Ecology Progress Series. 2011;435:13–31.
  10. 10.

    Dubelaar GBJ, Gerritzen PL, Beeker AER, Jonker RR, Tangen K. Design and first results of CytoBuoy: A wireless flow cytometer for in situ analysis of marine and fresh waters. Cytometry. 1999;37(4):247–254. pmid:10547609
  11. 11.

    Olson RJ, Shalapyonok A, Sosik HM. An automated submersible flow cytometer for analyzing pico- and nanophytoplankton: FlowCytobot. Deep Sea Research Part I: Oceanographic Research Papers. 2003;50(2):301–315.
  12. 12.

    Swalwell JE, Ribalet F, Armbrust EV. SeaFlow: A novel underway flow-cytometer for continuous observations of phytoplankton in the ocean. Limnology and Oceanography: Methods. 2011;9(10):466–477.
  13. 13.

    Caswell H. Matrix Population Models: Construction, Analysis, and Interpretation. Oxford, United Kingdom: Oxford University Press; 2006.
  14. 14.

    Lytle D, Merritt D. Hydrologic regimes and riparian forests: A structured population model for cottonwood. Ecology. 2004;85:2493–2503.
  15. 15.

    Forbes LS, Peterman RM. Simple size structured model of recruitment and harvest in Pacific Salmon (Oncorhynchus spp.). Canadian Journal of Fisheries and Aquatic Sciences. 1994;51:603–616.
  16. 16.

    Keyfitz N, Caswell H. Applied Mathematical Demography. 233 Spring Street, New York, NY 10013, USA: Springer Science+Business Media, Inc.; 2005.
  17. 17.

    Mcarthur L, Boland J, Tiver F. Parameter estimation for stage-structured projection models using real data. Modelling and Simulation Society of Australia and New Zealand; 2003.
  18. 18.

    Sosik HM, Olson RJ, Neubert MG, Shalapyonok A, Solow AR. Growth rates of coastal phytoplankton from time-series measurements with a submersible flow cytometer. Limnology and Oceanography. 2003;48(5):1756–1765.
  19. 19.

    Hunter-Cevera KR, Neubert MG, Solow AR, Olson RJ, Shalapyonok A, Sosik HM. Diel size distributions reveal seasonal growth dynamics of a coastal phytoplankter. PNAS. 2014;111(27):9852–9857. pmid:24958866
  20. 20.

    Dugenne M, Thyssen M, Nerini D, Mante C, Poggiale JC, Garcia N, et al. Consequence of a sudden wind event on the dynamics of a coastal phytoplankton community: an insight into specific population growth rates using a single cell high frequency approach. Frontiers in Microbiology. 2014;5:485. pmid:25309523
  21. 21.

    Ribalet F, Swalwell J, Clayton S, Jiménez V, Sudek S, Lin Y, et al. Light-driven synchrony of Prochlorococcus growth and mortality in the subtropical Pacific gyre. PNAS. 2015;112(26):8008–8012. pmid:26080407
  22. 22.

    Hynes AM, Blythe BJ, Binder BJ. An individual-based model for the analysis of Prochlorococcus diel cycle behavior. Ecological Modelling. 2015;301:1–15.
  23. 23.

    Hunter-Cevera KR, Neubert MG, Olson RJ, Solow AR, Shalapyonok A, Sosik HM. Physiological and ecological drivers of early spring blooms of a coastal phytoplankter. Science. 2016;354(6310):326–329. pmid:27846565
  24. 24.

    Fowler BL, Neubert MG, Hunter-Cevera KR, Olson RJ, Shalapyonok A, Solow AR, et al. Dynamics and functional diversity of the smallest phytoplankton on the Northeast US Shelf. Proceedings of the National Academy of Sciences. 2020;117(22):12215–12221. pmid:32414929
  25. 25.
    Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, 2.19.1; 2020.
  26. 26.

    Partensky F, Garczareck L. Prochlorococcus: Advantages and Limits of Minimalism. Annual Review of Marine Science. 2010;2:305–331. pmid:21141667
  27. 27.

    Zinser ER, Lindell D, Johnson ZI, Futschik ME, Steglich C, Coleman ML, et al. Choreography of the transcriptome, photophysiology, and cell cycle of a minimal photoautotroph, Prochlorococcus. PLOS ONE. 2009;4(4):e5135. pmid:19352512
  28. 28.

    Binder BJ, DuRand MD. Diel cycles in surface waters of the equatorial Pacific. Deep Sea Research Part II: Topical Studies in Oceanography. 2002;49(13-14):2601–2617.
  29. 29.

    Jacquet S, Partensky F, Marie D, Casotti R, Vaulot D. Cell cycle regulation by light in prochlorococcus strains. Applied and Environmental Microbiology. 2001;67(2):782–790. pmid:11157244
  30. 30.

    Casey JR, Bjorkman KM, Ferron S, Karl DM. Size-dependence of metabolism within marine picoplankton populations. Limnology and Oceanography. 2019;64:1819–1827.
  31. 31.

    Kass RE, Wasserman L. The selection of prior distributions by formal rules. Journal of the American statistical Association. 1996;91(435):1343–1370.
  32. 32.

    Jeffreys H. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London Series A Mathematical and Physical Sciences. 1946;186(1007):453–461. pmid:20998741
  33. 33.

    Casella G. An Introduction to Empirical Bayes Data Analysis. The American Statistician. 1985;39(2):83–87.
  34. 34.

    Björklund M. Cell size homeostasis: Metabolic control of growth and cell division. Biochimica et Biophysica Acta—Molecular Cell Research. 2019;1866(3):409–417. pmid:30315834
  35. 35.

    Ribalet F, Berthiaume C, Hynes A, Swalwell J, Carlson M, Clayton S, et al. SeaFlow data v1, high-resolution abundance, size and biomass of small phytoplankton in the North Pacific. Scientific Data. 2019;6(1):277. pmid:31757971
  36. 36.

    Burbage CD, Binder BJ. Relationship between cell cycle and light-limited growth rate in oceanic Prochlorococcus (MIT9312) and Synechococcus (WH8103) (cyanobacteria). Journal of Phycology. 2007;43(2):266–274.
  37. 37.

    Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. Chapman and Hall/CRC; 1995.
  38. 38.

    van de Schoot R, Depaoli S, King R, Kramer B, Märtens K, Tadesse MG, et al. Bayesian statistics and modelling. Nature Reviews Methods Primers. 2021;1(1).
  39. 39.

    Neal RM. MCMC using Hamiltonian dynamics. In: Handbook of Markov Chain Monte Carlo. vol. 2. Chapman & Hall/CRC Press; 2011. p. 113–160.
  40. 40.
    Betancourt M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:170102434. 2017;.
  41. 41.

    Monnahan CC, Thorson JT, Branch TA. Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods in Ecology and Evolution. 2017;8(3):339–348.
  42. 42.

    Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. The Journal of Machine Learning Research. 2014;15(1):1593–1623.
  43. 43.

    Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, et al. Stan: A probabilistic programming language. Journal of Statistical Software. 2017;76(1).
  44. 44.

    Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7(4):457–472.
  45. 45.

    Mattern JP, Glauninger K, Britten GL, Casey JR, Hyun S, Wu Z, et al. Bayesian matrix population model; 2021. GitHub repository with data, material and results for “A Bayesian approach to modeling phytoplankton population dynamics from size distribution time series”.
  46. 46.

    Casey JR, Mardinoglu A, Nielsen J, Karl DM. Adaptive Evolution of Phosphorus Metabolism in Prochlorococcus. mSystems. 2016;1(6):1–15. pmid:27868089

Source link

Please disable your adblocker or whitelist this site!