The data is in a wide format, variables are as follows:
Table 7.1: Descriptions of variables in dataset.
Variable
Description
Site
One of four sites where treatments were replicated. Sites were located on similar slope positions, but across a range of aspects.
Plot
There were 4 plots at each site and each was randomly assigned a treatment
Trtmt
Treatemnt type: GS = group selection, which is basically a small clearing, LD = low density–fewer trees remaining; HD = high density–more trees remain and they are dispersed; HA = high density aggregated–more trees remain and they are grouped into clumps.
Tree
Unique sprout ID within Site, Plot, and Species
Species
SESE = coast redwood, and LIDE = tanoak.
HT1yr_m
Sprout height one year after treatment.
HT5yr_m
Same as above, but for year 5
HT10yr_m
Same as above, but for year 10
HTI1-5yr
Height growth between years 1 and 5, (4 growth periods)
HTI5-10yr
Height growth between years 5 and 10 (5 growth periods)
DBH10yr_cm
Diameter at breast height in cm at year 10. Only collected for redwood
LCBH10yr
Live crown base height (height to first live branch) at year 10. Only collected for redwood.
CR10yr
Crown ratio (live crown length / total height) at year 10, only for redwood
HD10yr
Unknown.
SDIinit
Stand density index of plot immediately after treatment.
AggregatedYN
Indicator for treatment HA.
ResidRWonClumpYN
Unknown.
HT10rank
Trees species specific height ranking within plot.
7.2 Wrangle
I’m going to change some variable names to make them more ergonomic
Our data are nested. We have plots within sites, trees within plots, and observeations within trees (multiple observations per tree).
Sites
└─ Plots
└─ Trees
└─ Observations
Each Treatment is represented by one site/plot combination. Each site belongs to each treatment and vice versa. I think the terminology here is that Sites and treatments are crossed.
Currently, plot (integer) is only unique within site and tree is only unique within site, treat, and spp. It will be more convenient if plot and tree are globally unique identifiers. This makes the nesting structure implicit and we can simplify our model syntax.
I’m also going to order the treatments acording to our expectations about the most to least productive. This will affect how they are plotted and reported.
I will need a function to convert a variable in the data from wide to long format.
Code
lengthen_data <-function(data, var) {# I need regex for first part to handle ht and ht_inc and pref <-switch(var, ht ="(ht)", ht_inc ="(ht_inc)")# and the year suf <-"(\\d+)" str_to_match <-paste0(pref, suf)pivot_longer( data,matches(str_to_match),names_to =c(".value", "year"),names_pattern = str_to_match,names_transform =list(year = as.integer) ) |>relocate(year, matches(paste0(pref, "$")), .after = spp)}