7  Sprout height data description

7.1 Getting started

First, we’ll load some libraries and our data.

Code
library(lme4)
library(modelsummary)
library(marginaleffects)
library(ggdist)
library(ggridges)
library(emmeans)
library(multcomp)
library(multcompView)
library(mgcv)
library(patchwork)
library(DHARMa)
library(glmmTMB)
library(ggeffects)
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)
Site Plot Trtmt Tree Species HT1yr_m HT5yr_m HT10yr_m HTI1-5yr HTI5-10yr DBH10yr_cm LCBH10yr CR10yr HD10yr SDIinit AggregatedYN ResidRWonClumpYN HT10rank
Waldo North 2 GS 285 LIDE 1.72 3.55 6.30 0.4575 0.550 NA NA NA NA 0.00 0 NA 4
Waldo South 4 GS 13 SESE 1.14 3.11 6.00 0.4925 0.578 6.6 0.60 0.900000 90.90909 0.00 0 0 21
Whiskey Springs 4 HA 1904 LIDE 0.29 0.96 1.94 0.1675 0.196 NA NA NA NA 536.33 1 NA 24
Whiskey Springs 2 HD 1405 SESE 1.83 5.21 9.20 0.8450 0.798 12.5 0.01 0.998913 73.60000 509.90 0 0 6

The data is in a wide format, variables are as follows:

Table 7.1: Descriptions of variables in dataset.
Variable Description
Site One of four sites where treatments were replicated. Sites were located on similar slope positions, but across a range of aspects.
Plot There were 4 plots at each site and each was randomly assigned a treatment
Trtmt Treatemnt type: GS = group selection, which is basically a small clearing, LD = low density–fewer trees remaining; HD = high density–more trees remain and they are dispersed; HA = high density aggregated–more trees remain and they are grouped into clumps.
Tree Unique sprout ID within Site, Plot, and Species
Species SESE = coast redwood, and LIDE = tanoak.
HT1yr_m Sprout height one year after treatment.
HT5yr_m Same as above, but for year 5
HT10yr_m Same as above, but for year 10
HTI1-5yr Height growth between years 1 and 5, (4 growth periods)
HTI5-10yr Height growth between years 5 and 10 (5 growth periods)
DBH10yr_cm Diameter at breast height in cm at year 10. Only collected for redwood
LCBH10yr Live crown base height (height to first live branch) at year 10. Only collected for redwood.
CR10yr Crown ratio (live crown length / total height) at year 10, only for redwood
HD10yr Unknown.
SDIinit Stand density index of plot immediately after treatment.
AggregatedYN Indicator for treatment HA.
ResidRWonClumpYN Unknown.
HT10rank Trees species specific height ranking within plot.

7.2 Wrangle

I’m going to change some variable names to make them more ergonomic

Code
newnames <- c(
  site = "Site",
  plot = "Plot",
  treat = "Trtmt",
  tree = "Tree",
  spp = "Species",
  ht1 = "HT1yr_m",
  ht5 = "HT5yr_m",
  ht10 = "HT10yr_m",
  ht_inc5 = "HTI1-5yr",
  ht_inc10 = "HTI5-10yr",
  dbh10 = "DBH10yr_cm",
  lcbh10 = "LCBH10yr",
  sdi_init = "SDIinit",
  agg = "AggregatedYN",
  ht_rnk10 = "HT10rank"
)

tibble(old = newnames, new = names(newnames)) |> knitr::kable()
old new
Site site
Plot plot
Trtmt treat
Tree tree
Species spp
HT1yr_m ht1
HT5yr_m ht5
HT10yr_m ht10
HTI1-5yr ht_inc5
HTI5-10yr ht_inc10
DBH10yr_cm dbh10
LCBH10yr lcbh10
SDIinit sdi_init
AggregatedYN agg
HT10rank ht_rnk10

Our data are nested. We have plots within sites, trees within plots, and observeations within trees (multiple observations per tree).

Sites
└─ Plots
   └─ Trees
      └─ Observations

Each Treatment is represented by one site/plot combination. Each site belongs to each treatment and vice versa. I think the terminology here is that Sites and treatments are crossed.

Currently, plot (integer) is only unique within site and tree is only unique within site, treat, and spp. It will be more convenient if plot and tree are globally unique identifiers. This makes the nesting structure implicit and we can simplify our model syntax.

Code
make_site_plot_unique <- function(data) {
  data |>
    mutate(plot = cur_group_id(), .by = c(site, treat)) |>
    mutate(tree = row_number())
}

I’m also going to order the treatments acording to our expectations about the most to least productive. This will affect how they are plotted and reported.

Code
treat_order <- c("GS", "LD", "HA", "HD")

set_expected_treatment_order <- function(data) {
  data |>
    mutate(treat = fct_relevel(treat, treat_order))
}

For modeling purposes, I will make sure that the data are of the correct type. Grouping variables and character data should be factors.

Code
# Assuming variable names have already been changed
set_data_types <- function(data) {
  data |>
    mutate(
      across(c(site, plot, treat, tree, spp), as.factor)
    )
}

And I’ll apply all the above steps in one go.

Code
sprouts <- sprouts_fresh |>
  dplyr::select(all_of(newnames)) |>
  make_site_plot_unique() |>
  set_expected_treatment_order() |>
  set_data_types()

I will need a function to convert a variable in the data from wide to long format.

Code
lengthen_data <- function(data, var) {
  # I need regex for first part to handle ht and ht_inc and
  pref <- switch(var, ht = "(ht)", ht_inc = "(ht_inc)")
  # and the year
  suf <- "(\\d+)"
  str_to_match <- paste0(pref, suf)
  pivot_longer(
    data,
    matches(str_to_match),
    names_to = c(".value", "year"),
    names_pattern = str_to_match,
    names_transform = list(year = as.integer)
  ) |>
    relocate(year, matches(paste0(pref, "$")), .after = spp)
}

Save the needed objects for the next script

Code
save(sprouts, lengthen_data, treat_order, file = "wrangled_sprouts.rda")