9  Sprout data modeling

First, we’ll load some libraries and our data.

Code
library(lme4)
library(modelsummary)
library(marginaleffects)
library(ggdist)
library(ggridges)
library(emmeans)
library(multcomp)
library(multcompView)
library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)
library(patchwork)
library(DHARMa)
library(glmmTMB)
library(ggeffects)
library(DHARMa)
library(stringr)

load("wrangled_sprouts.rda")
source("./scripts/rnd_color_brewer.r")

9.1 Objectives

  1. Summarize differences in species response
  2. For each species, is there a difference in height increment between treatments?
  3. For each species/treatment combination, is there a difference in height increment between the first and second periods?
  4. Is there a treatment difference in redwood or tanoak sprout heights at year 10, if so, what is the magnitude of the difference?

9.2 Modeling

We have nested data where we have multiple observations within trees (over time), trees within plots and plots within sites. Any of these levels of nesting might result in correlations between observations, which violates the assumption of independence for OLS models, but can be accounted for using multi-level models.

I may want to include a categorical variable (spp) as a random slope (on the LHS of the random parts) to allow random effects to be estimated for each species separately. Michael Clark explores this scenario, with some simpler data.

My approach will be to start with a simple model and build towards more complex models.

9.2.1 Height increment

By starting with height increment modeling, I will be able to answer question 1 in the objectives. This conclusion may be helpful for informing total height modeling. I am curious whether height in the first 10 years increases linearly with age. Is there a growth slow down in year 5-10 compared to years 1-5?

We saw in Figure 8.6 that increment does slow down somewhat in the second measurement period (year 5-10), at least for redwoods, but we may be justified in overlooking this. Lets look at individual tree’s height trajectories.

Code
dht <- lengthen_data(sprouts, "ht")

dht |>
  ggplot(aes(year, ht, color = spp, group = tree)) +
  geom_line(alpha = 0.6) +
  facet_wrap(~ treat * site) +
  scale_color_brewer(palette = "Set2") +
  theme(
    legend.position = "bottom"
  )
Figure 9.1: Data plot showing trajectory of individual tree heights across the three measurements. Height increase appears fairly linear with year.

For the most part, it looks like height increases linearly with year. Lets test this with a model.

  1. ht_inc ~ year

    Ignoring everything else, there is a 0.08 m / yr slow down in the second period.

  2. ht_inc ~ year + spp + year:spp

    That slow down has more to do with redwood than tanoak.

  3. ht_inc ~ year + spp + treat + year:spp + year:treat + spp:treat + year:spp:treat

    There are overall treatment effect on growth which varies by species (redwood is more affected) but there are no more significant interactions with year. Our data does not detect significant differences in slow down by treatment or species. j

  4. ht_inc ~ year + spp + treat

    This includes only additive effects and the effect of year is the same as model 1.

  5. ht_inc ~ year + spp + treat + spp:treat

    Again, the species by treatment interactions seem to be significant.

  6. ht_inc ~ year + spp + treat + spp:treat + year:spp

    The species year interaction remains important.

  7. ht_inc ~ year + spp + treat + spp:treat + year:spp + year:treat

    Again, the treatment by year interactions do not seem important/estimable.

  8. ht_inc ~ treat + year + spp + treat:spp + spp:year + (1 | plot) + (1 | tree)

    Now I introduce random intercepts for plot and tree. These lead to much smaller AIC’s.

  9. ht_inc ~ treat + year + spp + (1 | plot) + (1 | tree) + treat:spp + year:spp + treat:year

    But treatment by year is still (mostly) not significant.

  10. ht_inc ~ treat + year + spp + (1 | tree) + (0 + spp | plot) + treat:spp + year:spp

    Adding a random slope for species by plot results in another big jump in AIC, but also reduces confidence in the treatment*species interaction. It would seem that the potential differences in species by treatment growth rate are obscured by redwoods greater variability across plots.

  11. ht_inc ~ treat + year + spp + (1 | tree) + (0 + spp | plot) + year:spp

    Removing the treat * species interaction results in a solid AIC boost.

  12. ht_inc ~ treat + year * spp + (1 | tree) + (0 + spp | plot)

    Switching to glmmTMB and adding a dispersion model that includes a three-way interaction reduces AIC, and results in more well-behaved residuals.

The coefficients are different for the glmmTMB model because it includes a model for dispersion. Here the dispersion model estimates are ommitted.

Model 12 has the lowest AIC and the residuals are relatively well behaved.

(1) (2)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 0.505*** 0.505***
(0.034) (0.034)
treatLD -0.101* -0.101*
(0.046) (0.046)
treatHA -0.183*** -0.183***
(0.048) (0.048)
treatHD -0.196*** -0.196***
(0.051) (0.051)
year10 -0.041*** -0.041***
(0.007) (0.007)
sppSESE 0.418*** 0.418***
(0.044) (0.044)
year10 x sppSESE -0.091*** -0.091***
(0.018) (0.018)
SD (Intercept tree) 0.115 0.115
SD (sppLIDE plot) 0.059 0.059
SD (sppSESE plot) 0.162 0.162
Cor (sppLIDE~sppSESE plot) 0.152 0.152
SD (Intercept site) 0.000
Num.Obs. 1384 1384
AIC -656.5 -654.5
BIC -515.3 -508.0
RMSE 0.15 0.15
minc1 minc2 minc3 minc4 minc5 minc6 minc7 minc8 minc9 minc10 minc11 minc12
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 0.585*** 0.387*** 0.499*** 0.577*** 0.516*** 0.496*** 0.506*** 0.496*** 0.506*** 0.494*** 0.504*** 0.505***
(0.012) (0.014) (0.025) (0.016) (0.019) (0.020) (0.022) (0.049) (0.049) (0.038) (0.037) (0.034)
year10 -0.082*** -0.042* -0.048 -0.082*** -0.082*** -0.042* -0.061* -0.042*** -0.061*** -0.042*** -0.042*** -0.041***
(0.018) (0.020) (0.035) (0.013) (0.013) (0.018) (0.028) (0.011) (0.017) (0.011) (0.011) (0.007)
sppSESE 0.412*** 0.544*** 0.366*** 0.489*** 0.530*** 0.530*** 0.526*** 0.525*** 0.528*** 0.414*** 0.418***
(0.020) (0.035) (0.013) (0.025) (0.028) (0.028) (0.031) (0.031) (0.086) (0.045) (0.044)
year10 x sppSESE -0.083** -0.109* -0.083** -0.083** -0.083*** -0.083*** -0.083*** -0.083*** -0.091***
(0.029) (0.050) (0.026) (0.026) (0.015) (0.015) (0.015) (0.015) (0.018)
treatLD -0.091* -0.132*** -0.086*** -0.086*** -0.086** -0.090 -0.091 -0.092+ -0.098+ -0.101*
(0.037) (0.019) (0.026) (0.026) (0.032) (0.070) (0.071) (0.054) (0.052) (0.046)
treatHA -0.171*** -0.248*** -0.169*** -0.169*** -0.193*** -0.172* -0.195** -0.167** -0.181*** -0.183***
(0.035) (0.018) (0.025) (0.025) (0.030) (0.069) (0.070) (0.053) (0.052) (0.048)
treatHD -0.182*** -0.290*** -0.176*** -0.176*** -0.189*** -0.175* -0.189** -0.174** -0.194*** -0.196***
(0.035) (0.018) (0.025) (0.025) (0.030) (0.069) (0.070) (0.053) (0.052) (0.051)
year10 x treatLD 0.010 0.001
(0.052) (0.037)
year10 x treatHA 0.004 0.048
(0.049) (0.035)
year10 x treatHD 0.012 0.026
(0.049) (0.035)
sppSESE x treatLD -0.082 -0.092* -0.092* -0.092*
(0.052) (0.037) (0.037) (0.037)
sppSESE x treatHA -0.206*** -0.161*** -0.161*** -0.161***
(0.050) (0.035) (0.035) (0.035)
sppSESE x treatHD -0.250*** -0.236*** -0.236*** -0.236***
(0.050) (0.036) (0.035) (0.035)
year10 x sppSESE x treatLD -0.019
(0.074)
year10 x sppSESE x treatHA 0.090
(0.070)
year10 x sppSESE x treatHD 0.029
(0.071)
treatLD x sppSESE -0.080+ -0.080+ -0.070
(0.045) (0.045) (0.122)
treatHA x sppSESE -0.154*** -0.154*** -0.158
(0.043) (0.043) (0.121)
treatHD x sppSESE -0.232*** -0.232*** -0.230+
(0.043) (0.043) (0.121)
treatLD x year10 0.001
(0.022)
treatHA x year10 0.048*
(0.021)
treatHD x year10 0.026
(0.021)
SD (Intercept tree) 0.176 0.176 0.162 0.162 0.115
SD (Intercept plot) 0.088 0.088
SD (sppLIDE plot) 0.064 0.063 0.059
SD (sppSESE plot) 0.167 0.173 0.162
Cor (sppLIDE~sppSESE plot) 0.283 0.259 0.152
SD (Observations) 0.143 0.143 0.143 0.143
Num.Obs. 1384 1384 1384 1384 1384 1384 1384 1384 1384 1384 1384 1384
AIC 827.1 283.0 -30.5 12.4 -28.9 -37.5 -33.9 -438.9 -439.6 -488.6 -490.4 -656.5
BIC 842.8 309.2 58.5 49.0 23.5 20.1 39.4 -370.9 -355.9 -410.1 -427.6 -515.3
RMSE 0.33 0.27 0.24 0.24 0.24 0.24 0.24 0.11 0.11 0.11 0.11 0.15
Code
simulateResiduals(minc[[12]]) |> plot()

9.2.2 Height increment results

The final selected model, based on AIC and examination of residuals had the form:

ht_inc ~ treat + year * spp + (1 | tree) + (0 + spp | plot)

The model selected based on AIC lacks a treatment x species interaction, suggesting that there is not evidence that treatments affected species differentially. It also lacks a treatment x year interaction. This means that there was not enough evidence to support that treatment was related to changes in growth rate.

The presence of treatment in the model (0.001 ≤ p < 0.03) suggests that the levels of treatment were associated with different growth rates across species and years. And the species x year interaction (p < 0.001) suggests changes in growth rates are different for redwood and tanoak.

Averaging over growth periods, treatment specific height increments for redwood ranged from 0.66 to 0.86 m yr-1, and for tanoak, from 0.29 to 0.49, with the slowest growth in the HD treatment and the fastest in the GS treatment. height increment was greater in the GS treatment than the HA and HD treatments by about 0.19 m yr-1 (p < 0.001). The LD treatment was intermediate, but not statistically distinguishable from the other treatments (0.13 < p < 0.28).

Code
emm_treat <- emmeans(minc_sel, "treat", by = "spp")
plot(emm_treat, comparisons = TRUE)
Figure 9.2: Estimated marginal means for the effect of harvest treatment on redwood and tanoak sprout height increment, averaged over two growth periods, ten years after harvest. Purple bars represent confidence intervals and statistical significance (α = 0.05) is indicated by non-overlaping red arrows.

Redwood growth slowed from 0.80 to 0.67 m yr-1 in the second period and tanoak slowed from 0.39 to 0.34 m yr-1.

Redwood grew faster than tanoak, but slowed down more relative to it in the second period. Height increment for redwood was 0.42 m yr-1 greater in the first period and 0.33 m yr-1 greater in the second period than tanoak height increment.

Code
emmeans(minc_sel, c("year"), by = "spp") |> plot(comparisons = TRUE)
Figure 9.3: Estimated marginal means for the effect of growth period on redwood and tanoak sprout height increment, averaged over four harvest treatments, ten years after harvest. Purple bars represent confidence intervals and statistical significance (α = 0.05) is indicated by non-overlaping red arrows.
Code
#|   Growth period 1 is year 1-5, growth period 2 is year 5-10. Growth
#|   significantly decreased in the second period for both species.

minc11con |> knitr::kable(digits = 3)
contrast spp estimate SE df lower.CL upper.CL t.ratio p.value
year10 - year5 LIDE -0.041 0.007 1357 -0.056 -0.026 -5.476 0
year10 - year5 SESE -0.132 0.017 1357 -0.165 -0.099 -7.848 0
Code
minc11cld |>
  ggplot(aes(year, emmean, fill = spp, color = spp)) +
  geom_pointrange(
    aes(ymin = lower.CL, ymax = upper.CL),
    position = position_nudge(x = -0.15),
    size = 0.9,
    linewidth = 1
  ) +
  geom_dots(data = dinc, aes(year, ht_inc), alpha = 0.4) +
  facet_grid(~spp, switch = "x") +
  theme(
    panel.spacing = unit(0, "lines"),
    strip.background = element_blank(),
    strip.text = element_blank(),
  ) +
  geom_text(
    aes(y = upper.CL, label = .group),
    position = position_nudge(x = -0.15, y = 0.1)
  ) +
  scale_color_brewer(palette = "Set2", aesthetics = c("color", "fill")) +
  labs(
    x = "Growth period",
    y = expression(Height ~ increment ~ (m %.% year^-1)),
    color = "Species",
    fill = "Species"
  ) +
  scale_x_discrete(labels = c("1", "2"))
Figure 9.4: There was a significant difference in height increment between period 1 and 2 for both species.

9.2.3 Heights - year 10 only

The simplest model that I’m willing to look at includes a treatment/species interaction, and no random effects. It will serve as a baseline for comparison.

# I'm calling the height data "d"
dht10 <- sprouts |> lengthen_data("ht") |> filter(year == 10)

mht101 <- lm(ht ~ treat * spp, data = dht10)
mht102 <- lmer(ht ~ treat * spp + (1 | plot) + (1 | site), data = dht10)
boundary (singular) fit: see help('isSingular')
mht103 <- lmer(ht ~ treat * spp + (1 | plot), data = dht10)
mht104 <- lmer(ht ~ treat * spp + (0 + spp | plot), data = dht10)
mht105 <- glmmTMB(
  ht ~ treat * spp + (0 + spp | plot),
  family = Gamma(link = "log"),
  data = dht10
)
mht106 <- glmmTMB(
  ht ~ treat * spp + (0 + spp | plot),
  family = Gamma(link = "log"),
  data = dht10,
  dispformula = ~spp
)
mht107 <- glmmTMB(
  ht ~ treat + spp + (0 + spp | plot),
  family = Gamma(link = "log"),
  data = dht10,
  dispformula = ~spp
)
mht108 <- glmmTMB(
  ht ~ treat + spp + (0 + spp | plot),
  family = Gamma(link = "log"),
  data = dht10,
  dispformula = ~ spp + treat
)
mht109 <- glmmTMB(
  ht ~ treat + spp + (0 + spp | plot),
  family = gaussian(link = "log"),
  data = dht10,
  dispformula = ~ spp * treat
)

mht1010 <- glmmTMB(
  ht ~ treat + spp + (0 + spp | plot) + (1 | site),
  family = gaussian(link = "log"),
  data = dht10,
  dispformula = ~ spp * treat
)

mht10_list <- lst(
  mht101,
  mht102,
  mht103,
  mht104,
  mht105,
  mht106,
  mht107,
  mht108,
  mht109
)

# nolint start
# # Here I'm testing if adding a random effect for site improves the model. It
# # doesn't
# modelsummary(
#   list(mht109, mht1010),
#   stars = TRUE,
#   output = "markdown",
#   shape = term:component + statistic ~ model,
#   coef_omit = ".*dispersion$",
#   coef_rename = rearrange_coef_names,
#   metrics = c("AIC", "BIC", "RMSE"),
#   estimator = "ML",
#   gof_omit = "F|Log.Lik"
# )
# nolint end

modelsummary(
  mht10_list,
  stars = TRUE,
  output = "markdown",
  shape = term:component + statistic ~ model,
  coef_omit = ".*dispersion$",
  coef_rename = rearrange_coef_names,
  metrics = c("AIC", "BIC", "RMSE"),
  estimator = "ML",
  gof_omit = "F|Log.Lik"
)
mht101 mht102 mht103 mht104 mht105 mht106 mht107 mht108 mht109
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 5.235*** 5.246*** 5.246*** 5.221*** 1.634*** 1.633*** 1.636*** 1.634*** 1.634***
(0.228) (0.576) (0.576) (0.424) (0.085) (0.086) (0.080) (0.081) (0.079)
treatLD -1.214*** -1.290 -1.290 -1.306* -0.299* -0.302* -0.297** -0.298** -0.287**
(0.337) (0.820) (0.820) (0.607) (0.121) (0.122) (0.111) (0.112) (0.110)
treatHA -2.055*** -2.086* -2.086* -2.044*** -0.476*** -0.475*** -0.457*** -0.453*** -0.463***
(0.320) (0.813) (0.813) (0.599) (0.120) (0.121) (0.110) (0.114) (0.111)
treatHD -2.087*** -2.088* -2.088* -2.076*** -0.492*** -0.492*** -0.525*** -0.526*** -0.523***
(0.322) (0.814) (0.814) (0.600) (0.120) (0.121) (0.111) (0.115) (0.111)
sppSESE 5.308*** 5.239*** 5.239*** 5.259*** 0.710*** 0.711*** 0.705*** 0.706*** 0.707***
(0.324) (0.296) (0.296) (0.867) (0.103) (0.103) (0.054) (0.055) (0.054)
treatLD x sppSESE -1.253** -1.102* -1.102* -1.016 0.012 0.016
(0.483) (0.443) (0.443) (1.234) (0.148) (0.148)
treatHA x sppSESE -1.589*** -1.494*** -1.494*** -1.527 0.054 0.053
(0.459) (0.419) (0.419) (1.226) (0.146) (0.146)
treatHD x sppSESE -2.564*** -2.508*** -2.508*** -2.479* -0.096 -0.096
(0.462) (0.422) (0.422) (1.227) (0.146) (0.146)
SD (Intercept plot) 1.073 1.073
SD (sppLIDE plot) 0.753 0.153 0.159 0.160 0.168 0.160
SD (sppSESE plot) 1.870 0.193 0.188 0.192 0.195 0.198
Cor (sppLIDE~sppSESE plot) 0.486 0.490 0.484 0.448 0.444 0.450
SD (Intercept site) 0.000
SD (Observations) 2.000 2.000 1.879
Num.Obs. 692 692 692 692 692 692 692 692 692
AIC 3059.2 2978.5 2976.5 2923.0 2825.0 2809.6 2804.7 2753.6 2711.0
BIC 3100.1 3028.4 3021.9 2977.4 2879.5 2868.6 2850.1 2812.6 2783.6
RMSE 2.18 1.97 1.97 1.84 1.85 1.85 1.85 1.85 1.84

We can see the residuals are not perfect, but nor are they very offensive.

Code
plot(simulateResiduals(mht109))

9.2.4 Here I make my selection

Code
ht10_sel <- mht109

9.2.5 Year 10 only Results

9.2.5.1 Random effects

The best model according to AIC includes a random intercept for plot and a random slope for spp. Because spp is a factor, this effectively estimates a species specific intercept, but also takes into account the covariance of the species plot effect.

9.2.5.2 Fixed effects

The best model included speceis and treatment, but no interactions. This suggests that treatments do not affect species differentially.

Because the best model did not contain a species x treatment interaction, comparisons between treatments is the same for both species. The GS treatment resulted in greater heights in year 10 than the other treatments (0.001 < p < 0.04). Predicted mean height for redwood ranged from 10.29 m in the GS treatment to 6.16 m in the HD treatment. For tanoak, predicted mean height ranged from 5.12 in the GS treatment to 3.04 in the HD treatment. Predicted mean heights followed the pattern GS > LD > HA > HD.

Code
dht10 |>
  ggplot(aes(treat, ht, color = spp, fill = spp)) +
  facet_grid(~spp, switch = "x") +
  theme(
    panel.spacing = unit(0, "lines"),
    strip.background = element_blank(),
    strip.text = element_blank(),
  ) +
  geom_dots() +
  scale_color_brewer(palette = "Set2", aesthetics = c("color", "fill")) +
  geom_pointrange(
    data = emmcomp,
    aes(y = the.emmean, ymin = lower.CL, ymax = upper.CL),
    color = "gray60",
    position = position_nudge(x = -0.07),
    size = 0.7,
    linewidth = 2.3,
    show.legend = FALSE
  ) +
  geom_segment(
    data = emmcomp,
    aes(y = the.emmean, xend = treat, yend = rcmpl),
    color = "blue4",
    lineend = "butt",
    linejoin = "bevel",
    arrow = arrow(length = grid::unit(2.5, "mm")),
    position = position_nudge(x = -0.07),
    linewidth = 1
  ) +
  geom_segment(
    data = emmcomp,
    aes(y = the.emmean, xend = treat, yend = lcmpl),
    color = "blue4",
    lineend = "butt",
    linejoin = "bevel",
    arrow = arrow(length = grid::unit(2.5, "mm")),
    position = position_nudge(x = -0.07),
    linewidth = 1
  )
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Figure 9.5: Predicted mean height and 95% confidence intervals (gray bars) for redwood and tanoak stump sprouts 10 years after harvest using four different harvest treatments. Non-overlaping blue arrows indicate statistically significant differences between treatments within a species.

9.2.6 Heights - multiple years

By including year as a numeric variable, we are assuming a liner relationship between year and height. This term (and it’s interactions) can than be interpreted as a modeled growth increment, and although we only have observations at years 1, 5, and 10, other years predictions will be linearly interpolated.

The following models were fit, and are summarized in Table 9.1.

  1. OLS linear model including year in a 3-way interaction.
  2. Eliminate the 3-way interaction (spp:treat:year), keeping others. This results in a significantly lower AIC.
  3. First random-effects model including interactions and universal random slopes for site and plot, and time-series.
  4. Allow the random effects to (co-) vary by species.
  5. Same as 5, but attempt to remove the 3-way interaction again.
  6. Same as 6, but attempt to simplify the random effects by forcing them to be estimated independently, correlation between species is not estimated.
  7. Re-introduce the 3-way interaction again.
  8. Omit the random effect for site. This is supported by the interpretation of
  9. Like 9, but try simplified (independent) random effects.

One problem with this is that I’m comparing AIC for different random effects using ML, when I should be using REML. This may be a problem throughout my model selection.

dht <- lengthen_data(sprouts, "ht")

mht2 <- lm(ht ~ treat * spp * year, data = dht)

mht3 <- lm(
  ht ~ treat + spp + year + treat:year + spp:year + treat:spp,
  data = dht
)

mht4 <- lmer(
  ht ~ treat * spp * year + (1 | site) + (1 | plot) + (1 | tree),
  data = dht
)
boundary (singular) fit: see help('isSingular')
mht5 <- lmer(
  ht ~
    treat * spp * year + (0 + spp | site) + (0 + spp | plot) + (0 + spp | tree),
  data = dht
)
boundary (singular) fit: see help('isSingular')
mht6 <- lmer(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      year:spp +
      treat:year +
      (0 + spp | site) +
      (0 + spp | plot) +
      (0 + spp | tree),
  data = dht
)
boundary (singular) fit: see help('isSingular')
mht7 <- lmer(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      year:spp +
      treat:year +
      (1 | site:spp) +
      (1 | plot:spp) +
      (1 | tree:spp),
  data = dht
)

mht8 <- lmer(
  ht ~ treat * spp * year + (1 | site:spp) + (1 | plot:spp) + (1 | tree:spp),
  data = dht
)

mht9 <- lmer(
  ht ~ treat * spp * year + (0 + spp | plot) + (0 + spp | tree),
  data = dht
)
boundary (singular) fit: see help('isSingular')
mht10 <- lmer(
  ht ~ treat * spp * year + (1 | plot:spp) + (1 | tree:spp),
  data = dht
)

# I tried fitting year as a factor, but then could not fit a model that included
# the 3-way interaction and random slopes for species. The model without the
# 3-way (and spp random slopes) was close, but about 12 AIC larger.

modelsummary(
  list(
    "Model 2" = mht2,
    "Model 3" = mht3,
    "Model 4" = mht4,
    "Model 5" = mht5,
    "Model 6" = mht6,
    "Model 7" = mht7,
    "Model 8" = mht8,
    "Model 9" = mht9,
    "Model 10" = mht10
  ),
  stars = TRUE,
  output = "markdown",
  metrics = c("AIC", "BIC", "RMSE"),
  estimator = "ML",
  gof_omit = "F|Log.Lik"
)
Table 9.1: Summary table for models 2-10, described above. Standard error in parantheses.
Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 0.552*** 0.232+ 0.559 0.546* 0.226 0.226 0.546 0.546* 0.546
(0.156) (0.136) (0.350) (0.261) (0.255) (0.425) (0.428) (0.260) (0.428)
treatLD -0.369 -0.120 -0.421 -0.441 -0.191 -0.193 -0.442 -0.441 -0.442
(0.230) (0.190) (0.499) (0.364) (0.361) (0.593) (0.599) (0.371) (0.609)
treatHA -0.372+ 0.035 -0.390 -0.367 0.040 0.041 -0.366 -0.367 -0.366
(0.219) (0.180) (0.495) (0.361) (0.358) (0.589) (0.595) (0.368) (0.605)
treatHD -0.348 0.258 -0.350 -0.345 0.262 0.262 -0.345 -0.345 -0.345
(0.220) (0.181) (0.495) (0.361) (0.358) (0.589) (0.596) (0.368) (0.605)
sppSESE 0.572** 1.215*** 0.527* 0.535 1.178* 1.182* 0.539 0.535 0.539
(0.221) (0.158) (0.205) (0.473) (0.474) (0.596) (0.606) (0.474) (0.606)
year 0.471*** 0.531*** 0.471*** 0.471*** 0.531*** 0.531*** 0.471*** 0.471*** 0.471***
(0.024) (0.019) (0.017) (0.017) (0.013) (0.013) (0.017) (0.017) (0.017)
treatLD × sppSESE -0.311 -0.807*** -0.224 -0.181 -0.677 -0.669 -0.173 -0.175 -0.173
(0.330) (0.188) (0.307) (0.533) (0.500) (0.828) (0.849) (0.679) (0.862)
treatHA × sppSESE -0.100 -0.925*** -0.039 -0.057 -0.882+ -0.885 -0.060 -0.055 -0.059
(0.314) (0.179) (0.291) (0.524) (0.493) (0.824) (0.843) (0.672) (0.857)
treatHD × sppSESE -0.245 -1.492*** -0.204 -0.186 -1.432** -1.434+ -0.187 -0.186 -0.187
(0.316) (0.180) (0.293) (0.525) (0.494) (0.825) (0.844) (0.673) (0.857)
treatLD × year -0.085* -0.132*** -0.085*** -0.085*** -0.132*** -0.132*** -0.085*** -0.085*** -0.085***
(0.036) (0.026) (0.025) (0.025) (0.018) (0.018) (0.025) (0.025) (0.025)
treatHA × year -0.169*** -0.245*** -0.169*** -0.169*** -0.245*** -0.245*** -0.169*** -0.169*** -0.169***
(0.034) (0.024) (0.023) (0.023) (0.017) (0.017) (0.023) (0.023) (0.023)
treatHD × year -0.175*** -0.288*** -0.175*** -0.175*** -0.288*** -0.288*** -0.175*** -0.175*** -0.175***
(0.034) (0.024) (0.023) (0.023) (0.017) (0.017) (0.023) (0.023) (0.023)
sppSESE × year 0.481*** 0.360*** 0.481*** 0.481*** 0.360*** 0.360*** 0.481*** 0.481*** 0.481***
(0.034) (0.018) (0.024) (0.024) (0.012) (0.012) (0.024) (0.024) (0.024)
treatLD × sppSESE × year -0.093+ -0.093** -0.093** -0.093** -0.093** -0.093**
(0.051) (0.035) (0.035) (0.035) (0.035) (0.035)
treatHA × sppSESE × year -0.155** -0.155*** -0.155*** -0.155*** -0.155*** -0.155***
(0.048) (0.033) (0.033) (0.033) (0.033) (0.033)
treatHD × sppSESE × year -0.234*** -0.234*** -0.234*** -0.234*** -0.234*** -0.234***
(0.049) (0.034) (0.034) (0.034) (0.034) (0.034)
SD (Observations) 1.016 1.016 1.031 1.034 1.016 1.016 1.016
SD (Intercept treespp) 0.844 0.852 0.852
SD (Intercept plotspp) 0.795 0.795 0.809
SD (Intercept site) 0.000
SD (sppLIDE site) 0.103 0.000
SD (sppSESE site) 0.488 0.629
Cor (sppLIDE~sppSESE site) -1.000
SD (Intercept sitespp) 0.153 0.153
SD (Intercept tree) 0.925
SD (sppLIDE tree) 0.004 0.000 0.000
SD (sppSESE tree) 1.227 1.223 1.227
Cor (sppLIDE~sppSESE tree) -0.103 0.450 0.284
SD (Intercept plot) 0.638
SD (sppLIDE plot) 0.463 0.474 0.474
SD (sppSESE plot) 0.914 0.919 1.037
Cor (sppLIDE~sppSESE plot) 0.783 0.782 0.570
Num.Obs. 2076 2076 2076 2076 2076 2076 2076 2076 2076
AIC 7518.3 7536.9 6890.7 6631.3 6677.1 6896.9 6851.9 6629.7 6849.9
BIC 7614.1 7615.9 7003.5 6777.9 6806.8 6992.7 6964.7 6759.4 6957.1
RMSE 1.47 1.48 0.88 0.94 0.96 0.91 0.89 0.94 0.89

9.2.7 Multi-year heights V2

The problem with all of the above models is (a) a normal distribution is not a good fit for the data because it results in negative height predictions for small trees (year 1) and (b) variance is not constant. Finally, we are fitting year as a linear effect because fitting it as a factor with 3 levels led to model convergence problems, but ideally, we would like to allow different relationships between height and year, because, as we saw in the previous section, height growth is not constant in the first 10 years.

For these reasons it makes sense to try another modeling framework, particularly, a gamma distributed model, with a modeled dispersion parameter. It allows only positive predictions, can right skewed distributions, and the dispersion can be modeled as a function of covariates.

mht11 <- glmmTMB(
  ht ~ treat * spp * year + (1 | plot) + (1 | tree),
  data = dht,
  family = Gamma(link = "log")
)

mht12 <- glmmTMB(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      treat:year +
      spp:year +
      (1 | plot) +
      (1 | tree),
  data = dht,
  family = Gamma(link = "log")
)

mht13 <- glmmTMB(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      treat:year +
      spp:year +
      (0 + spp | plot) +
      (0 + spp | tree),
  data = dht,
  family = Gamma(link = "log")
)

mht14 <- glmmTMB(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      treat:year +
      spp:year +
      (1 | plot) +
      (1 | tree),
  data = mutate(dht, year = factor(year)),
  family = Gamma(link = "log")
)

mht15 <- glmmTMB(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      treat:year +
      spp:year +
      (1 | plot) +
      (1 | tree),
  dispformula = ~year,
  data = mutate(dht, year = factor(year)),
  family = Gamma(link = "log")
)

mht16 <- glmmTMB(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      treat:year +
      spp:year +
      (1 | plot) +
      (1 | tree),
  dispformula = ~ year * spp,
  data = mutate(dht, year = factor(year)),
  family = Gamma(link = "log")
)

mht17 <- glmmTMB(
  ht ~
    treat +
      spp +
      year +
      treat:spp +
      treat:year +
      spp:year +
      (1 | plot) +
      (1 | tree),
  dispformula = ~ year * spp + treat + treat:spp,
  data = mutate(dht, year = factor(year)),
  family = Gamma(link = "log")
)

mht18 <- glmmTMB(
  ht ~ treat + spp + year + treat:year + spp:year + (1 | plot) + (1 | tree),
  dispformula = ~ year + spp + treat + treat:spp,
  data = mutate(dht, year = factor(year)),
  family = gaussian(link = "log")
)

do_model_summary_table <- function(model_list) {
  modelsummary(
    model_list,
    output = "markdown",
    shape = term:component + statistic ~ model,
    metrics = c("AIC", "BIC", "RMSE"),
    estimator = "ML",
    gof_omit = "F|Log.Lik",
    stars = TRUE
  )
}

do_model_summary_table(list(
  "Model 11" = mht11,
  "Model 12" = mht12,
  "Model 13" = mht13,
  "Model 14" = mht14,
  "Model 15" = mht15,
  "Model 16" = mht16,
  "Model 17" = mht17,
  "Model 18" = mht18
))
Table 9.2: Summary table for models 2-10, described above. Standard error in parantheses.
Model 11 Model 12 Model 13 Model 14 Model 15 Model 16 Model 17 Model 18
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) conditional -0.096 -0.118 -0.114 -0.104 -0.101 -0.095 -0.101 -0.107
(0.101) (0.098) (0.101) (0.090) (0.089) (0.085) (0.079)
treatLD conditional -0.608*** -0.612*** -0.625 -0.663*** -0.628*** -0.628*** -0.614*** -0.616***
(0.145) (0.138) (0.143) (0.126) (0.126) (0.124) (0.112)
treatHA conditional -0.723*** -0.659*** -0.655 -0.675*** -0.621*** -0.643*** -0.629*** -0.665***
(0.142) (0.136) (0.141) (0.124) (0.124) (0.121) (0.110)
treatHD conditional -0.679*** -0.656*** -0.660 -0.696*** -0.669*** -0.675*** -0.663*** -0.740***
(0.143) (0.137) (0.141) (0.124) (0.124) (0.121) (0.110)
sppSESE conditional 0.710*** 0.754*** 0.751 0.739*** 0.763*** 0.766*** 0.742*** 0.762***
(0.076) (0.058) (0.056) (0.065) (0.065) (0.062) (0.038)
year conditional 0.186*** 0.190*** 0.190
(0.008) (0.006)
treatLD × sppSESE conditional -0.007 0.002 0.014 -0.008 0.000 0.000 0.003
(0.115) (0.075) (0.078) (0.079) (0.079) (0.078)
treatHA × sppSESE conditional 0.232* 0.103 0.082 0.100 0.026 0.026 0.028
(0.108) (0.071) (0.074) (0.075) (0.074) (0.073)
treatHD × sppSESE conditional -0.053 -0.100 -0.116 -0.126+ -0.134+ -0.134+ -0.129+
(0.110) (0.071) (0.074) (0.075) (0.075) (0.074)
treatLD × year conditional 0.036** 0.037*** 0.037
(0.011) (0.008)
treatHA × year conditional 0.027* 0.015+ 0.016
(0.011) (0.008)
treatHD × year conditional 0.020+ 0.016* 0.017
(0.011) (0.008)
sppSESE × year conditional 0.000 -0.008 -0.008
(0.011) (0.006)
treatLD × sppSESE × year conditional 0.002
(0.016)
treatHA × sppSESE × year conditional -0.024
(0.015)
treatHD × sppSESE × year conditional -0.009
(0.016)
(Intercept) dispersion 0.430 0.431 0.429 0.330 1.370*** 1.599*** 1.950*** -1.147***
(0.053) (0.075) (0.114) (0.055)
year5 conditional 1.176*** 1.147*** 1.142*** 1.147*** 1.158***
(0.039) (0.043) (0.041) (0.033) (0.028)
year10 conditional 1.748*** 1.720*** 1.714*** 1.719*** 1.721***
(0.039) (0.044) (0.041) (0.034) (0.030)
treatLD × year5 conditional 0.326*** 0.286*** 0.286*** 0.275*** 0.279***
(0.052) (0.057) (0.056) (0.053) (0.042)
treatHA × year5 conditional 0.110* 0.081 0.103+ 0.091+ 0.155***
(0.049) (0.054) (0.053) (0.047) (0.038)
treatHD × year5 conditional 0.168*** 0.138* 0.144** 0.132** 0.158***
(0.050) (0.055) (0.054) (0.049) (0.038)
treatLD × year10 conditional 0.367*** 0.330*** 0.329*** 0.315*** 0.318***
(0.052) (0.058) (0.057) (0.054) (0.044)
treatHA × year10 conditional 0.141** 0.110* 0.133* 0.117* 0.178***
(0.049) (0.055) (0.054) (0.048) (0.040)
treatHD × year10 conditional 0.172*** 0.143* 0.149** 0.137** 0.178***
(0.050) (0.056) (0.055) (0.050) (0.040)
sppSESE × year5 conditional -0.028 -0.031 -0.035 -0.016 -0.049+
(0.036) (0.039) (0.040) (0.036) (0.029)
sppSESE × year10 conditional -0.062+ -0.069+ -0.071+ -0.043 -0.073*
(0.036) (0.040) (0.041) (0.037) (0.030)
year5 dispersion 2.957*** 2.650*** 2.716*** -0.339**
(0.146) (0.193) (0.202) (0.115)
year10 dispersion 2.409*** 2.200*** 2.188*** 0.863***
(0.116) (0.164) (0.160) (0.050)
sppSESE dispersion -0.422*** 0.610*** 0.301***
(0.107) (0.166) (0.074)
year5 × sppSESE dispersion 0.568* 0.460
(0.286) (0.297)
year10 × sppSESE dispersion 0.389+ 0.291
(0.231) (0.225)
treatLD dispersion -0.571*** -0.169*
(0.151) (0.077)
treatHA dispersion -0.441** -0.434***
(0.144) (0.074)
treatHD dispersion -0.379** -0.411***
(0.145) (0.074)
sppSESE × treatLD dispersion -1.054*** 0.299**
(0.216) (0.110)
sppSESE × treatHA dispersion -1.024*** 0.546***
(0.208) (0.108)
sppSESE × treatHD dispersion -1.312*** 0.424***
(0.211) (0.108)
SD (Intercept tree) conditional 0.225 0.225 0.294 0.344 0.344 0.341 0.330
SD (sppLIDE tree) conditional 0.101
SD (sppSESE tree) conditional 0.291
Cor (sppLIDE~sppSESE tree) conditional 0.000
SD (Intercept plot) conditional 0.170 0.170 0.182 0.141 0.141 0.141 0.138
SD (sppLIDE plot) conditional 0.180
SD (sppSESE plot) conditional 0.214
Cor (sppLIDE~sppSESE plot) conditional 0.637
Num.Obs. 2076 2076 2076 2076 2076 2076 2076 2076
AIC 6173.1 6170.5 6119.9 5448.4 4544.8 4534.7 4398.6 4278.4
BIC 6280.2 6260.7 6232.6 5566.8 4674.5 4681.3 4579.0 4430.6
RMSE 1.16 1.15 1.12 0.73 0.45 0.45 0.46 0.50

The model with the lowest AIC is model 18.

9.2.8 Multi-year model checking

Code
res <- simulateResiduals(mht18)
plot(res)
Figure 9.6: Residual vs fitted plot for the selected multi-year linear regression model for heights.

9.2.9 Multi-year results

Using model 18, we can now answer the questions we posed in the Objectives section.

Code
mht18emmeans <- emmeans(
  mht18,
  c("spp", "treat", "year"),
  type = "response"
)

mht18emmeans |>
  as_tibble() |>
  mutate(
    year = as.numeric(levels(year))[year]
  ) |>
  rename(any_of(c(lower.CL = "asymp.LCL", upper.CL = "asymp.UCL"))) |>
  ggplot(aes(year, response, color = spp, fill = spp, group = spp)) +
  geom_line() +
  facet_wrap(~treat) +
  geom_ribbon(
    aes(ymin = lower.CL, ymax = upper.CL, color = NULL),
    alpha = .2
  ) +
  stat_slab(data = dht, aes(year, ht), alpha = 0.4) +
  scale_color_brewer(palette = "Set2", aesthetics = c("color", "fill")) +
  theme(legend.position = "bottom") +
  labs(x = "Year", y = "Height (m)", color = "Species", fill = "Species")
Figure 9.7: Average species and treatment specific height as a linear function of year. Density distributions represent the raw data associated with the estimate at that point. The grey line is the OLS fit for reference.
Code
mht18_compare <- emmeans(
  mht18,
  spec = "treat",
  by = c("spp"),
  at = list(year = "10"),
  type = "response"
) |>
  cld(Letters = letters) |>
  as_tibble() |>
  mutate(
    .group = str_replace_all(.group, " ", "")
  ) |>
  rename(any_of(c(lower.CL = "asymp.LCL", upper.CL = "asymp.UCL")))

multiple_years_plot <- filter(dht, year == 10) |>
  ggplot(aes(treat, ht, color = spp, fill = spp)) +
  facet_grid(~spp, switch = "x") +
  theme(
    panel.spacing = unit(0, "lines"),
    strip.background = element_blank(),
    strip.text = element_blank(),
  ) +
  geom_dots() +
  scale_color_brewer(palette = "Set2", aesthetics = c("color", "fill")) +
  geom_pointrange(
    aes(y = response, ymin = lower.CL, ymax = upper.CL),
    data = mht18_compare,
    color = "gray50",
    position = position_nudge(x = -0.07),
    size = 0.7,
    linewidth = 1,
    show.legend = FALSE
  ) +
  geom_text(
    aes(y = upper.CL, label = .group),
    data = mht18_compare,
    color = "black",
    position = position_nudge(x = -0.3, y = 0.2)
  ) +
  labs(x = "Treatment", y = "Height (m)", color = "Species", fill = "Species") +
  theme(legend.position = "bottom")

multiple_years_plot
Figure 9.8: Marginal means at year 10 based on height data collected three times over 10 years. The same letter within a species indicates that there was not enough evidence to differentiate those treatments.

9.2.10 Multi-year vs. year 10 only

Code
standard_model_col_names <- c(
  LCL = "lower.CL",
  UCL = "upper.CL",
  LCL = "asymp.LCL",
  UCL = "asymp.UCL",
  emmean = "response",
  emmean = "the.emmean"
)

list("year 10 only" = emmcomp, "multi-year" = mht18_compare) |>
  map(\(x) rename(x, any_of(standard_model_col_names))) |>
  bind_rows(.id = "model") |>
  ggplot(aes(treat, ht, color = model, fill = model)) +
  facet_grid(~spp) +
  geom_pointrange(
    aes(y = emmean, ymin = LCL, ymax = UCL),
    position = position_dodge(width = 0.4),
    size = 0.7,
    linewidth = 1
  ) +
  geom_dots(
    data = dht10,
    color = "gray50",
    fill = "gray70",
    position = position_nudge(x = 0.2),
    scale = 0.7,
    binwidth = 0.18
  ) +
  theme(legend.position = "bottom") +
  scale_color_manual(
    values = rnd_color_brewer("Set2", c(1, 4)),
    aesthetics = c("color", "fill")
  )
[1] 1 4

This plot attempts to demonstrate the model fit by showing observed and predictied height distributions for our observed dataset.

Code
sprouts |>
  lengthen_data("ht") |>
  mutate(
    Predicted = predict(
      mht18,
      lst(site, plot, tree, treat, spp, year),
      type = "response"
    ),
    Observed = ht
  ) |>
  pivot_longer(c(Observed, Predicted)) |>
  ggplot(aes(value, treat, color = name, fill = name)) +
  stat_slab(alpha = 0.4, normalize = "xy") +
  facet_grid(vars(spp), vars(year), scales = "free", labeller = label_both) +
  labs(x = "Height (m)", y = "Treatment") +
  scale_color_brewer(palette = "Set2", aesthetics = c("color", "fill")) +
  theme(legend.position = "bottom", legend.title = element_blank())
Figure 9.9: Oberved and predicted height distribution of our observed dataset using the final, chosen, multi-year model. Predictions were made using all random effects.
Code
#| label: fig-height-pred-fit-10-only
#| fig-cap: >
#|   Oberved and predicted height distribution of our observed dataset using the
#|   selected year-10-only model.
#|   Predictions were made using all random effects.

sprouts |>
  lengthen_data("ht") |>
  filter(year == "10") |>
  mutate(
    Predicted = predict(
      ht10_sel,
      lst(site, plot, tree, treat, spp, year),
      type = "response"
    ),
    Observed = ht
  ) |>
  pivot_longer(c(Observed, Predicted)) |>
  ggplot(aes(value, treat, color = name, fill = name)) +
  stat_slab(alpha = 0.4, normalize = "xy") +
  facet_grid(vars(spp), scales = "free", labeller = label_both) +
  labs(x = "Height (m)", y = "Treatment") +
  scale_color_brewer(palette = "Set2", aesthetics = c("color", "fill")) +
  theme(legend.position = "bottom", legend.title = element_blank())

9.3 Output

saved data for later use

Code
save(ht10_sel, minc_sel, dht, dinc, dht10, mht18, file = "sprout_modeling.rda")