7 Sprout height data description

7.1 Getting started

First, we’ll load some libraries and our data.

Code

library(lme4)
library(modelsummary)
library(marginaleffects)
library(ggdist)
library(ggridges)
library(emmeans)
library(multcomp)
library(multcompView)
library(mgcv)
library(patchwork)
library(DHARMa)
library(glmmTMB)
library(ggeffects)
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)

Site	Plot	Trtmt	Tree	Species	HT1yr_m	HT5yr_m	HT10yr_m	HTI1-5yr	HTI5-10yr	DBH10yr_cm	LCBH10yr	CR10yr	HD10yr	SDIinit	AggregatedYN	ResidRWonClumpYN	HT10rank
Waldo North	2	GS	285	LIDE	1.72	3.55	6.30	0.4575	0.550	NA	NA	NA	NA	0.00	0	NA	4
Waldo South	4	GS	13	SESE	1.14	3.11	6.00	0.4925	0.578	6.6	0.60	0.900000	90.90909	0.00	0	0	21
Whiskey Springs	4	HA	1904	LIDE	0.29	0.96	1.94	0.1675	0.196	NA	NA	NA	NA	536.33	1	NA	24
Whiskey Springs	2	HD	1405	SESE	1.83	5.21	9.20	0.8450	0.798	12.5	0.01	0.998913	73.60000	509.90	0	0	6

The data is in a wide format, variables are as follows:

Table 7.1: Descriptions of variables in dataset.

Variable	Description
Site	One of four sites where treatments were replicated. Sites were located on similar slope positions, but across a range of aspects.
Plot	There were 4 plots at each site and each was randomly assigned a treatment
Trtmt	Treatemnt type: GS = group selection, which is basically a small clearing, LD = low density–fewer trees remaining; HD = high density–more trees remain and they are dispersed; HA = high density aggregated–more trees remain and they are grouped into clumps.
Tree	Unique sprout ID within Site, Plot, and Species
Species	SESE = coast redwood, and LIDE = tanoak.
HT1yr_m	Sprout height one year after treatment.
HT5yr_m	Same as above, but for year 5
HT10yr_m	Same as above, but for year 10
HTI1-5yr	Height growth between years 1 and 5, (4 growth periods)
HTI5-10yr	Height growth between years 5 and 10 (5 growth periods)
DBH10yr_cm	Diameter at breast height in cm at year 10. Only collected for redwood
LCBH10yr	Live crown base height (height to first live branch) at year 10. Only collected for redwood.
CR10yr	Crown ratio (live crown length / total height) at year 10, only for redwood
HD10yr	Unknown.
SDIinit	Stand density index of plot immediately after treatment.
AggregatedYN	Indicator for treatment HA.
ResidRWonClumpYN	Unknown.
HT10rank	Trees species specific height ranking within plot.

7.2 Wrangle

I’m going to change some variable names to make them more ergonomic

Code

newnames <- c(
  site = "Site",
  plot = "Plot",
  treat = "Trtmt",
  tree = "Tree",
  spp = "Species",
  ht1 = "HT1yr_m",
  ht5 = "HT5yr_m",
  ht10 = "HT10yr_m",
  ht_inc5 = "HTI1-5yr",
  ht_inc10 = "HTI5-10yr",
  dbh10 = "DBH10yr_cm",
  lcbh10 = "LCBH10yr",
  sdi_init = "SDIinit",
  agg = "AggregatedYN",
  ht_rnk10 = "HT10rank"
)

tibble(old = newnames, new = names(newnames)) |> knitr::kable()

old	new
Site	site
Plot	plot
Trtmt	treat
Tree	tree
Species	spp
HT1yr_m	ht1
HT5yr_m	ht5
HT10yr_m	ht10
HTI1-5yr	ht_inc5
HTI5-10yr	ht_inc10
DBH10yr_cm	dbh10
LCBH10yr	lcbh10
SDIinit	sdi_init
AggregatedYN	agg
HT10rank	ht_rnk10

Our data are nested. We have plots within sites, trees within plots, and observeations within trees (multiple observations per tree).

Sites
└─ Plots
   └─ Trees
      └─ Observations

Each Treatment is represented by one site/plot combination. Each site belongs to each treatment and vice versa. I think the terminology here is that Sites and treatments are crossed.

Currently, plot (integer) is only unique within site and tree is only unique within site, treat, and spp. It will be more convenient if plot and tree are globally unique identifiers. This makes the nesting structure implicit and we can simplify our model syntax.

Code

make_site_plot_unique <- function(data) {
  data |>
    mutate(plot = cur_group_id(), .by = c(site, treat)) |>
    mutate(tree = row_number())
}

I’m also going to order the treatments acording to our expectations about the most to least productive. This will affect how they are plotted and reported.

Code

treat_order <- c("GS", "LD", "HA", "HD")

set_expected_treatment_order <- function(data) {
  data |>
    mutate(treat = fct_relevel(treat, treat_order))
}

For modeling purposes, I will make sure that the data are of the correct type. Grouping variables and character data should be factors.

Code

# Assuming variable names have already been changed
set_data_types <- function(data) {
  data |>
    mutate(
      across(c(site, plot, treat, tree, spp), as.factor)
    )
}

And I’ll apply all the above steps in one go.

Code

sprouts <- sprouts_fresh |>
  dplyr::select(all_of(newnames)) |>
  make_site_plot_unique() |>
  set_expected_treatment_order() |>
  set_data_types()

I will need a function to convert a variable in the data from wide to long format.

Code

lengthen_data <- function(data, var) {
  # I need regex for first part to handle ht and ht_inc and
  pref <- switch(var, ht = "(ht)", ht_inc = "(ht_inc)")
  # and the year
  suf <- "(\\d+)"
  str_to_match <- paste0(pref, suf)
  pivot_longer(
    data,
    matches(str_to_match),
    names_to = c(".value", "year"),
    names_pattern = str_to_match,
    names_transform = list(year = as.integer)
  ) |>
    relocate(year, matches(paste0(pref, "$")), .after = spp)
}

Save the needed objects for the next script

Code

save(sprouts, lengthen_data, treat_order, file = "wrangled_sprouts.rda")