Data

load the standard libraries we use to do work

# bread-and-butter
library(tidyverse) # the tidyverse
library(viridis) # viridis colors
library(harrypotter) # hp colors
library(palettetown) # poke colors
library(RColorBrewer) # brewer colors
library(scales) # work with number and plot scales
library(latex2exp)

# visualization
library(mapview) # interactive html maps
library(kableExtra) # tables
library(patchwork) # combine plots
library(ggnewscale) # new scale
library(ggrepel) # repel labels

# spatial analysis
library(terra) # raster
library(sf) # simple features
library(lidR) # lidR
library(cloud2trees) # cloud2trees

Though not necessary for cloud2trees data processing, let’s quickly check out the location and structure of the data we have

we got a folder of point cloud data: N1_400AGL_20MPH_TFOFF … let’s see what’s in that folder

# directory with the downloaded .las|.laz files
point_cld_folder <- "../data/N1_400AGL_20MPH_TFOFF"
# is there data?
list.files(point_cld_folder, pattern = ".*\\.(laz|las)$") %>% length()
## [1] 1
# what files are in here?
list.files(point_cld_folder, pattern = ".*\\.(laz|las)$")[1]
## [1] "cloud0.las"

again, this is not necessary for cloud2trees data processing but we can use lidR to read the point cloud folder as a catalog which doesn’t read in the actual points but just the point cloud header data which includes information on things like the spatial location of the data, the point density, and other point attributes

# read folder as LAScatalog
ctg_temp <- lidR::readLAScatalog(point_cld_folder)
# what information do we get about the point cloud?
ctg_temp
## class       : LAScatalog (v1.2 format 3)
## extent      : 489922, 490406.8, 4330448, 4331090 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N 
## area        : 311012.7 m²
## points      : 53.71 million points
## type        : airborne
## density     : 172.7 points/m²
## density     : 140.2 pulses/m²
## num. files  : 1

that’s a lot of points…can an ordinary or sub-optimal laptop handle it? we’ll find out.

let’s look at the point cloud extent on a map to orient ourselves in space

ctg_temp %>% 
  cloud2trees:::check_las_ctg_empty() %>% 
  purrr::pluck("data") %>% 
  mapview::mapview(popup = F, layer.name = "point cloud tile")

i told you that we didn’t need to do any of that for cloud2trees data processing and to prove it, we’ll remove the ctg_temp object from our session

remove(list = ls()[grep("_temp",ls())])
gc()
##           used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 3993004 213.3    7506972 401.0  5181989 276.8
## Vcells 5886724  45.0   12255594  93.6  8388606  64.0

ITD Tuning

from the cloud2trees readme:

The itd_tuning() function is used to visually assess tree crown delineation results from different window size functions used for the detection of individual trees. itd_tuning() allows users to test different window size functions on a sample of data to determine which function is most suitable for the area being analyzed. The preferred function can then be used in the ws parameter in raster2trees() and cloud2trees().

let’s take this cloud2trees::itd_tuning() function for a spin with the parameter settings chm_res_m = 0.25 since we plan on processing the entire data extent using a 0.25 m CHM resolution, n_samples = 4 to get four 0.1 ha sample areas on which to visually assess the window functions, and min_height = 1.37 to require that any potential tree has a height of at least 1.37 m to be considered a “tree”

itd_tuning_ans <- 
  cloud2trees::itd_tuning(
    input_las_dir = "../data/N1_400AGL_20MPH_TFOFF/"
    , n_samples = 4
    , min_height = 1.37
    , chm_res_m = 0.25
  )

what does cloud2trees::itd_tuning() give us?

names(itd_tuning_ans)
## [1] "plot_samples"        "ws_fn_list"          "plot_sample_summary"
## [4] "crowns"

it’s a named list. let’s check out the plot_* results

itd_tuning_ans$plot_samples
# write the file to the disk for posterity
ggplot2::ggsave(filename = "../data/itd_tuning_plot_samples1.jpg", height = 11, width = 8, dpi = "print")

the most noticeable difference between the window functions is with the segmentation of tall areas of the CHM using the log_fn compared to the other two. this function appears to not be separating tall trees appropriately: it is under-segmenting these areas resulting in too few tall trees. this under-segmentation by the log_fn can be seen most clearly in the top-left tall tree group of sample 1 and the top-left tall tree group of sample 4

comparing the lin_fn (linear function) and exp_fn (exponential function) shows that both functions result in similar segmentation results for taller trees but the primary differences appear for shorter trees. The difference in short-tree segmentation between these two functions is most evident in sample 2 where the lin_fn predicts many more small trees than the exp_fn. This appears to be an over-segmentation (too many trees) by the lin_fn for shorter trees which is perhaps most clear in the area just above the very center of sample 2

let’s check out the resulting tree distribution of the different window functions over these sample areas

itd_tuning_ans$plot_sample_summary
# write the file to the disk for posterity
ggplot2::ggsave(filename = "../data/itd_tuning_plot_sample_summary1.jpg", height = 11, width = 8, dpi = "print")

these results confirm what we identified about the log_fn compared to the other two: that it under-segments taller trees. predicting too few tall trees by not separating multiple tree crowns has the effect of producing tall trees with very wide crowns. this result is shown by the outlier yellow points in the RHS plots where the crown is predicted to be wider than the tree is tall (unlikely for this forest type)

let’s look at the default window function named lin_fn (i.e. linear function) from cloud2trees to see how we might adjust it to obtain different tree segmentation results

itd_tuning_ans$ws_fn_list$lin_fn
## function (x) 
## {
##     y <- dplyr::case_when(is.na(x) ~ 0.001, x < 0 ~ 0.001, x < 
##         2 ~ 1, x > 30 ~ 5, TRUE ~ 0.75 + (x * 0.14))
##     return(y)
## }
## <bytecode: 0x000002620145f428>
## <environment: 0x000002620145ac98>

it’s a function defining the window size (called y) based on the CHM height (called x)

we can plot the two functions that we did not rule out based on the first set of cloud2trees::itd_tuning() samples

# plot the ws fn
ggplot2::ggplot() +
  ggplot2::geom_function(fun = itd_tuning_ans$ws_fn_list$lin_fn, aes(color = "lin_fn"), lwd = 1) +
  ggplot2::geom_function(fun = itd_tuning_ans$ws_fn_list$log_fn, aes(color = "log_fn"), lwd = 1) +
  ggplot2::geom_function(fun = itd_tuning_ans$ws_fn_list$exp_fn, aes(color = "exp_fn"), lwd = 1) +
  ggplot2::xlim(-5,42) +
  ggplot2::labs(x = "heights", y = "ws", color = "") +
  ggplot2::theme_light()

the lin_fn (linear function) and exp_fn (exponential function) result in similar window sizes for higher CHM values but the exp_fn expands the window size for shorter areas compared to the lin_fn.

let’s make a custom linear function that expands the search window for shorter trees but keeps roughly the same window size for taller trees between 15-25 m (we don’t expect many trees above this in our study area). looking at the plot of the two functions, we want to make a new, piecewise linear function that passes through (4,1.5) which is a point between the exp_fn and lin_fn in the 2-7.5 x-range

# define a custom function
my_linear <- function(x){
    y <- dplyr::case_when(
      is.na(x) ~ 0.001
      , x < 0.01 ~ 0.001
      # next piece:
        # we want it to pass through ~ (1.5,0.75)
        # m = (y2-y1)/(x2-x1) >> m = (0.75-0.001)/(1.5-0.01) = 0.5026846
        # b = y-(m*x) >> b = 0.001-(0.5026846*0.01) = -0.004026846
      , x < 1.5 ~ -0.004026846 + x*0.5026846
      # next piece:
        # at x=1.5, first segment y = -0.004026846+1.5*0.5026846 = 0.75 >> (1.5,0.75)
        # connect to (3.0,1.0)
        # m = (1.0-0.75)/(3.0-1.5) = 0.1666667
        # b = 0.75-(0.1666667*1.5) = 0.4999999
      , x < 3.0 ~ 0.4999999 + x*0.1666667 
      # next piece:
        # at x=3.0, first segment y = 0.4999999+3.0*0.1666667 = 1.0 >> (3.0,1.0)
        # connect to (25,3.75)
        # m = (3.75-1.0)/(25-3.0) = 0.125
        # b = 1.0-(0.125*3.0) = 0.625
      , x < 25 ~ 0.625 + x*0.125
      # upper limit starts at (32.5,5)
      , x > 32.5 ~ 5
      # next piece:
        # connect to (32.5,5)
        # m = (5-3.75)/(32.5-25) = 0.1666667
        # b = 3.75-(0.1666667*25) = -0.4166675
      , T ~ -0.4166675 + x*0.1666667
    )
    return(y)
}
# my_linear

let’s visualize these functions we’ll use for the second cloud2trees::itd_tuning() sampling

ggplot2::ggplot() +
  ggplot2::geom_function(fun = itd_tuning_ans$ws_fn_list$lin_fn, aes(color = "lin_fn"), lwd = 1) +
  ggplot2::geom_function(fun = itd_tuning_ans$ws_fn_list$exp_fn, aes(color = "exp_fn"), lwd = 1) +
  ggplot2::geom_function(fun = my_linear, aes(color = "my_linear"), lwd = 1) +
  ggplot2::xlim(-5,40) +
  ggplot2::labs(x = "heights", y = "ws", color = "") +
  ggplot2::theme_light()

re-run tuning with new function

# let's put these in a list to test with the best default function we saved from above
my_fn_list <- list(
  lin_fn = cloud2trees::itd_ws_functions()[["lin_fn"]]
  , exp_fn = cloud2trees::itd_ws_functions()[["exp_fn"]]
  , my_linear = my_linear
)
itd_tuning_ans2 <- 
  cloud2trees::itd_tuning(
    input_las_dir = "../data/N1_400AGL_20MPH_TFOFF/"
    , ws_fn_list = my_fn_list
    , n_samples = 4
    , min_height = 1.37
    , chm_res_m = 0.25
  )

let’s check out the plot_* results

itd_tuning_ans2$plot_samples
# write the file to the disk for posterity
ggplot2::ggsave(filename = "../data/itd_tuning_plot_samples2.jpg", height = 11, width = 8, dpi = "print")

Our custom my_linear function appears to do a good job striking a balance between the default lin_fn which tended to undersegment taller trees an the exp_fn which tended to undersegment shorter trees. The custom function and the exp_fn yield similar tree segmentation results for sample 1 which is what we expected since that area consists primarily of taller trees and the biggest change we made to my_linear function was to limit the search window at the taller range and expand the search window for shorter trees so that less were detected compared to the default lin_fn. One notable difference with the exp_fn (exponential function) in sample 1 is in the lower left corner where the custom function limited crown sprawl for one of the taller objects in the scene an yielded more realistic crown shapes. Our custom my_linear function yields fewer small trees than the default lin_fn (but more than the exp_fn) since we allowed a larger window size at lower portions of the CHM with this effect most clear for the short trees (purple CHM regions) on sample 2.

let’s check out the resulting tree distribution of the different window functions over these sample areas

itd_tuning_ans2$plot_sample_summary
# write the file to the disk for posterity
ggplot2::ggsave(filename = "../data/itd_tuning_plot_sample_summary2.jpg", height = 11, width = 8, dpi = "print")
# save the crowns too?
itd_tuning_ans2$crowns %>% sf::st_write(dsn = "../data/itd_tuning_crowns.gpkg", quiet = T, append = F) 

let’s check out the crown polygon data we got from our second itd_tuining() iteration

itd_tuning_ans2$crowns %>% dplyr::glimpse()
## Rows: 492
## Columns: 9
## $ sample_number    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ ws_fn            <chr> "lin_fn", "lin_fn", "lin_fn", "lin_fn", "lin_fn", "li…
## $ treeID           <chr> "1_490149.4_4330784.4", "2_490163.9_4330783.1", "3_49…
## $ tree_height_m    <dbl> 21.6631, 21.3840, 24.3616, 21.9198, 20.9949, 24.1610,…
## $ tree_x           <dbl> 490149.4, 490163.9, 490154.6, 490179.9, 490169.9, 490…
## $ tree_y           <dbl> 4330784, 4330783, 4330782, 4330782, 4330781, 4330779,…
## $ crown_area_m2    <dbl> 1.8750, 35.5625, 40.3750, 27.2500, 49.6875, 27.4375, …
## $ crown_diameter_m <dbl> 1.952562, 10.201103, 9.154917, 8.634958, 10.398317, 7…
## $ geom             <MULTIPOLYGON [m]> MULTIPOLYGON (((490148.5 43..., MULTIPOL…

this data includes the crown polygons for each window function and 0.1 ha sample plot

itd_tuning_ans2$crowns %>% 
  sf::st_drop_geometry() %>% 
  dplyr::count(sample_number,ws_fn)
##    sample_number     ws_fn  n
## 1              1    exp_fn 31
## 2              1    lin_fn 25
## 3              1 my_linear 33
## 4              2    exp_fn 52
## 5              2    lin_fn 56
## 6              2 my_linear 57
## 7              3    exp_fn 43
## 8              3    lin_fn 40
## 9              3 my_linear 44
## 10             4    exp_fn 36
## 11             4    lin_fn 34
## 12             4 my_linear 41

let’s investigate the suspicious segments where crown diameter is greater than the tree height but limit our search to only trees > 2m in height since it is believable that a 5 foot tree could have a 5 foot crown diameter but it is not believable that a 90 foot tree could have a 90 foot crown diameter, for example

itd_tuning_ans2$crowns %>% 
  sf::st_drop_geometry() %>% 
  dplyr::filter(
    crown_diameter_m>tree_height_m &
    tree_height_m>2
  ) %>%
  dplyr::count(sample_number,ws_fn)
##   sample_number  ws_fn n
## 1             4 exp_fn 1

the exp_fn predicts a single tree with a crown diameter greater than height, let’s see the tree details for those questionable predictions

itd_tuning_ans2$crowns %>% 
  dplyr::filter(
    crown_diameter_m>tree_height_m &
    tree_height_m>2
  ) %>%
  dplyr::select(sample_number,ws_fn,tree_height_m,crown_diameter_m,tree_x,tree_y)
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 490049.5 ymin: 4330561 xmax: 490051 ymax: 4330563
## Projected CRS: WGS 84 / UTM zone 13N
##   sample_number  ws_fn tree_height_m crown_diameter_m   tree_x  tree_y
## 1             4 exp_fn        2.2017         2.358495 490050.1 4330562
##                             geom
## 1 MULTIPOLYGON (((490049.5 43...

let’s check the geometry types

sf::st_geometry_type(itd_tuning_ans2$crowns) %>% 
  droplevels() %>% 
  table()
## .
## MULTIPOLYGON 
##          492

note that there are MULTIPOLYGON geometries which means that the tree crown is, potentially, not a continuous, wholly connected object. This result is common when using rasterized data since the raster cells have potential to “connect” at the corners during segmentation. cloud2trees has functionality to handle these MULTIPOLYGON geometries by selecting the largest area polygon part by predicted segment (i.e. treeID from cloud2trees). let’s simplify the MULTIPOLYGON geometries, calculate the diameter of the new, simplified geometries

crowns_simplified <- itd_tuning_ans2$crowns %>% 
  # we're going to replace the treeID to ensure we have a 
  # unique identifier since the data is a special case where 
  # trees were segmented differently on the same geographic area
  dplyr::ungroup() %>% 
  dplyr::mutate(treeID = dplyr::row_number()) %>% 
  cloud2trees::simplify_multipolygon_crowns() %>% 
  # get the diameter which will be named `diameter_m`
  cloud2trees:::st_calculate_diameter() %>% 
  dplyr::select(-crown_diameter_m) %>% 
  dplyr::rename(crown_diameter_m=diameter_m)
# crowns_simplified %>% dplyr::glimpse()

we should have the same number of predicted trees

identical(
  nrow(crowns_simplified)
  , nrow(itd_tuning_ans2$crowns)
)
## [1] TRUE

we can verify the geometry type in the data now

sf::st_geometry_type(crowns_simplified) %>% 
  droplevels() %>% 
  table()
## .
## POLYGON 
##     492

now let’s look for trees where the where crown diameter is greater than the tree height but limit our search to only trees > 2m in height

crowns_simplified %>% 
  sf::st_drop_geometry() %>% 
  dplyr::filter(
    crown_diameter_m>tree_height_m &
    tree_height_m>2
  ) %>%
  dplyr::count(sample_number,ws_fn)
## # A tibble: 1 × 3
##   sample_number ws_fn      n
##           <int> <chr>  <int>
## 1             4 exp_fn     1

now there is only a single record from the exp_fn where the crown diameter is larger than the tree height. let’s look at the specific record

crowns_simplified %>% 
  sf::st_drop_geometry() %>% 
  dplyr::filter(
    crown_diameter_m>tree_height_m &
    tree_height_m>2
  ) %>%
  dplyr::select(sample_number,ws_fn,tree_height_m,crown_diameter_m,tree_x,tree_y)
## # A tibble: 1 × 6
##   sample_number ws_fn  tree_height_m crown_diameter_m  tree_x   tree_y
##           <int> <chr>          <dbl>            <dbl>   <dbl>    <dbl>
## 1             4 exp_fn          2.20             2.36 490050. 4330562.

the diameter is barely larger than the height for this record which is a relatively small tree…seems like the issue is resolved

we can quickly look at the height versus crown diameter scatter plot with the refined polygon geometries

crowns_simplified %>% 
  sf::st_drop_geometry() %>% 
  dplyr::rename(sample = sample_number) %>%
  ggplot2::ggplot(mapping = ggplot2::aes(x=tree_height_m,y=crown_diameter_m,color=ws_fn)) +
  ggplot2::geom_abline() +
  ggplot2::geom_point(size = 3,alpha=0.88) +
  ggplot2::facet_grid(
    # cols = dplyr::vars(ws_fn)
    cols = dplyr::vars(sample)
    , labeller = ggplot2::label_both
  ) +
  ggplot2::scale_color_viridis_d(name="") +
  ggplot2::scale_x_continuous(limits=c(0,NA), breaks = scales::breaks_extended(n=6)) +
  ggplot2::scale_y_continuous(limits=c(0,NA), breaks = scales::breaks_extended(n=6)) +
  ggplot2::labs(x = "height (m)", y = "crown diameter (m)") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
  ) + 
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, size = 6))
  )

looks pretty clean. let’s do one more visualization but we’re leaning toward our custom my_linear function

RGB itd_tunining Visualizations

with the crowns data we can explore alternative visualizations including overlaying the detected trees on the RGB data if available…we happen to have RGB data for a section of this study area, let’s load it

rgb_rast_fnm <- "../data/dom/dom.tif"
rgb_rast <- terra::rast(rgb_rast_fnm) %>% 
  # keep only rgb bands
  terra::subset(c(1,2,3))
# rgb_rast %>% 
#   terra::subset(c(4,1,2)) %>% #nir-r-g
#   terra::plotRGB()
# rgb_rast %>% 
#    terra::subset(c(1,2,3)) %>% 
#   terra::plotRGB()

this is very fine-resolution data

rgb_rast
## class       : SpatRaster 
## size        : 28196, 23887, 3  (nrow, ncol, nlyr)
## resolution  : 0.035, 0.035  (x, y)
## extent      : 489747.1, 490583.1, 4330274, 4331261  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source      : dom.tif 
## names       : dom_1, dom_2, dom_3

make a function to plot the crowns overlaid on the RGB

# make a function to plot these detected crowns with rgb data
plt_rgb_rast_itd_crowns <- function(sample_nmbr = 1, rgb_rast, itd_crowns, plt_lwd = 0.5, my_title = "") {
  # crop
  crp_rgb_rast_temp <- rgb_rast %>% 
    terra::subset(c(1,2,3)) %>% 
    terra::crop(
      itd_crowns %>% 
        dplyr::ungroup() %>% 
        dplyr::filter(sample_number == sample_nmbr) %>% 
        sf::st_union() %>% 
        sf::st_bbox() %>% 
        sf::st_as_sfc() %>% 
        sf::st_buffer(0.2) %>% 
        sf::st_transform(terra::crs(rgb_rast)) %>% 
        terra::vect()
    )
  # convert raster to a data frame and create hex colors
  # ?grDevices::rgb
  rgb_df_temp <-
    crp_rgb_rast_temp %>% 
    terra::as.data.frame(xy = TRUE) %>%
    dplyr::rename(
      red = 3, green = 4, blue = 5
    ) %>%
    dplyr::mutate(
      # rows that have missing color data
      is_missing = is.na(red) | is.na(green) | is.na(blue)
      # hex using 0s for NAs to avoid grDevices::rgb error
      , hex_col = grDevices::rgb(
        ifelse(is_missing, 0, red)
        , ifelse(is_missing, 0, green)
        , ifelse(is_missing, 0, blue)
        , maxColorValue = 255
      )
      # back to NA
      , hex_col = ifelse(is_missing, as.character(NA), hex_col)
    ) %>%
    dplyr::select(-c(is_missing))
      
    # dplyr::glimpse()
  
  # plt
  plt <- ggplot2::ggplot() +
    # add rgb base map
    ggplot2::geom_tile(data = rgb_df_temp, mapping = ggplot2::aes(x = x, y = y, fill = hex_col), color = NA) +
    # use identity scale so the hex codes are used directly
    ggplot2::scale_fill_identity(na.value = "transparent") + # !!! don't take this out or RGB plot will kill your computer
    # overlay polygons
    # ggplot2::geom_sf(data = polys, fill = NA, color = "red", linewidth = 0.5) +
    ggplot2::geom_sf(
      data = itd_crowns %>% 
        dplyr::filter(sample_number==sample_nmbr) %>% 
        # cloud2trees::simplify_multipolygon_crowns() %>% 
        # sf::st_make_valid() %>% 
        # dplyr::filter(sf::st_is_valid(.)) %>% 
        sf::st_transform(terra::crs(crp_rgb_rast_temp))
      , mapping = ggplot2::aes(color = ws_fn)
      , fill = NA
      , lwd = plt_lwd
      , inherit.aes = F
    ) +
    ggplot2::facet_grid(cols = dplyr::vars(ws_fn)) +
    ggplot2::scale_color_viridis_d(name = "") +
    ggplot2::coord_sf(expand = F) +
    ggplot2::labs(subtitle = my_title) +
    ggplot2::theme_void() +
    ggplot2::theme(
      legend.position = "none"
      , strip.text = ggplot2::element_text(face = "bold", color = "black", margin = ggplot2::margin(t = 4, b = 4))
      , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
      , panel.spacing = ggplot2::unit(1,"lines")
    )
  return(plt)
}

plot the trees detected in sample 1 on the RGB

plt_rgb_rast_itd_crowns(
  sample_nmbr = 1
  , rgb_rast = rgb_rast
  , itd_crowns = crowns_simplified
  , my_title = "sample #1"
)
ggplot2::ggsave(filename = "../data/itd_tuning2_rgb_crowns_sample1.jpg", height = 5.5, width = 7.5, dpi = "print")

plot the trees detected in sample 2 on the RGB

plt_rgb_rast_itd_crowns(
  sample_nmbr = 2
  , rgb_rast = rgb_rast
  , itd_crowns = crowns_simplified
  , my_title = "sample #2"
)
ggplot2::ggsave(filename = "../data/itd_tuning2_rgb_crowns_sample2.jpg", height = 4, width = 7.5, dpi = "print")

plot the trees detected in sample 3 on the RGB

plt_rgb_rast_itd_crowns(
  sample_nmbr = 3
  , rgb_rast = rgb_rast
  , itd_crowns = crowns_simplified
  , my_title = "sample #3"
)
ggplot2::ggsave(filename = "../data/itd_tuning2_rgb_crowns_sample3.jpg", height = 4, width = 7.5, dpi = "print")

looks good. let’s go with our custom my_linear function for tree detection over the full stand

Point Cloud Processing

This is a demonstration for a single example data set. We won’t show the processing of each individual data set of the study here to reduce clutter. Instead we’ll process all datasets using the methods demonstrated in this section and review the outputs and analyze the tree detection accuracy in the following sections.

the cloud2trees::cloud2trees() function combines methods in the cloud2trees package for an all-in-one approach, we’ll the function to:

We’ll set the options in the function to:

  • accuracy_level = 2 - uses triangulation with high point density (20 pts/m2) to height normalize the points
  • keep_intrmdt = T - keeps intermediate files created in the processing (e.g. height-normalized points)
  • dtm_res_m = 0.5 - sets the output DTM resolution to 0.5x0.5 m
  • chm_res_m = 0.25 - sets the output CHM resolution to 0.25x0.25 m which is also used for tree detection
  • min_height = 1.37 - the minimum height of a predicted segment that is required to be kept as a detected “tree”
  • ws = my_linear - the window function used in ITD
  • estimate_tree_dbh = TRUE - estimates DBH based on tree height
  • estimate_tree_type = TRUE - estimates FIA forest type group for the tree list
  • estimate_tree_hmd = TRUE - estimates height of maximum crown diameter for the trees
  • hmd_tree_sample_n = 20000 - samples 20k trees to build HMD model for filling missing values
  • hmd_estimate_missing_hmd = TRUE - fills in HMD for values not successfully extracted from the point cloud
  • estimate_tree_cbh = TRUE - estimates crown base height for the trees
  • cbh_tree_sample_n = 7500 - samples 7.5k trees to build CBH model for filling missing values
  • cbh_estimate_missing_cbh = TRUE - fills in CBH for values not successfully extracted from the point cloud
  • estimate_biomass_method = c("landfire","cruz") - estimates tree crown biomass and CBD
# outdir
c2t_output_dir <- "../data/processed_N1_400AGL_20MPH_TFOFF"
if(!dir.exists(c2t_output_dir)) dir.create(c2t_output_dir)
c2t_process_dir <- file.path(c2t_output_dir, "point_cloud_processing_delivery")
##############################################################
# cloud2trees::cloud2trees
##############################################################
if(
  !file.exists( file.path(c2t_process_dir, "chm_0.25m.tif") )
  || !file.exists( file.path(c2t_process_dir, "dtm_0.5m.tif") )
  || !file.exists( file.path(c2t_process_dir, "final_detected_tree_tops.gpkg") )
  || !file.exists( file.path(c2t_process_dir, "final_detected_crowns.gpkg") )
){
  # run it
  # cloud2trees
  cloud2trees_ans <- cloud2trees::cloud2trees(
    output_dir = c2t_output_dir
    , input_las_dir = point_cld_folder
    , accuracy_level = 2
    , keep_intrmdt = T
    , dtm_res_m = 0.5
    , chm_res_m = 0.25
    , min_height = 1.37 # 1.37 = DBH
    , ws = my_linear # a custom function
    ###################################
    # optional parameters to get more tree attributes
    ###################################
    , estimate_tree_dbh = TRUE # DBH
    , estimate_tree_type = TRUE # FIA type
    , estimate_tree_hmd = TRUE # HMD
    , hmd_tree_sample_n = 20000 # HMD
    , hmd_estimate_missing_hmd = TRUE # HMD
    , estimate_tree_cbh = TRUE # CBH
    , cbh_tree_sample_n = 7500
    , cbh_estimate_missing_cbh = TRUE # CBH
    , estimate_biomass_method = c("landfire","cruz") # crown biomass and CBD
  )
  
}else{
  dtm_temp <- terra::rast( file.path(c2t_process_dir, "dtm_0.5m.tif") )
  chm_temp <- terra::rast( file.path(c2t_process_dir, "chm_0.25m.tif") )
  crowns_temp <- sf::st_read(file.path(c2t_process_dir, "final_detected_crowns.gpkg"), quiet = T)
  ttops_temp <- sf::st_read(file.path(c2t_process_dir, "final_detected_tree_tops.gpkg"), quiet = T)
  
  cloud2trees_ans <- list(
    "dtm_rast" = dtm_temp
    , "chm_rast" = chm_temp
    , "crowns_sf" = crowns_temp
    , "treetops_sf" = ttops_temp
  )
}

Raster Products

there’s a DTM

# plot to check out the fine-resolution DTM raster
cloud2trees_ans$dtm_rast %>% 
  terra::plot(col = harrypotter::hp(n=100, option = "mischief"), main = "DTM (m)")

there’s a CHM

# plot to check out the fine-resolution CHM raster
cloud2trees_ans$chm_rast %>% 
  terra::plot(col = viridis::plasma(n=100), main = "CHM (m)")

let’s see some details about the CHM

# what chm?
cloud2trees_ans$chm_rast
## class       : SpatRaster 
## size        : 2566, 1939, 1  (nrow, ncol, nlyr)
## resolution  : 0.25, 0.25  (x, y)
## extent      : 489922, 490406.8, 4330448, 4331090  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source      : chm_0.25m.tif 
## name        :     chm 
## min value   :  1.3718 
## max value   : 27.3758

Tree list

let’s check out the tree top point data (the crowns data will have the same data attributes but have polygon geometry instead of point geometry)

cloud2trees_ans$treetops_sf %>% 
  dplyr::glimpse()
## Rows: 13,150
## Columns: 5
## $ treeID        <chr> "1_490327.1_4331074.4", "2_490327.6_4331074.4", "3_49032…
## $ tree_height_m <dbl> 1.8375, 1.8375, 1.8375, 2.7713, 2.9873, 2.9040, 4.8339, …
## $ crown_area_m2 <dbl> 0.2500, 0.1250, 0.1250, 0.2500, 0.6250, 0.6250, 2.1250, …
## $ dbh_cm        <dbl> 3.665011, 3.638346, 3.658594, 4.939101, 5.246245, 5.1429…
## $ geom          <POINT [m]> POINT (490327.1 4331074), POINT (490327.6 4331074)…

that’s a lot of data

let’s check the relationship between height and DBH as estimated by the regional allometric relationship

cloud2trees_ans$treetops_sf %>% 
  sf::st_drop_geometry() %>% 
  dplyr::slice_sample(prop = 0.55) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(x = tree_height_m, y = dbh_cm)) + 
  ggplot2::geom_point(color = "navy", alpha = 0.6) +
  ggplot2::labs(x = "tree ht. (m)", y = "tree DBH (cm)") +
  ggplot2::scale_x_continuous(limits = c(0,NA), breaks = scales::breaks_extended(n=8)) +
  ggplot2::scale_y_continuous(limits = c(0,NA), breaks = scales::breaks_extended(n=8)) +
  ggplot2::theme_light()

Let’s look at the distribution of tree diameter in our study area

cloud2trees_ans$treetops_sf %>% 
  sf::st_drop_geometry() %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(x = dbh_cm)) +
  ggplot2::geom_density(fill = "brown", color = "brown", alpha = 0.2) +
  ggplot2::scale_x_continuous(breaks = scales::breaks_extended(11)) +
  ggplot2::labs(x = "tree DBH (cm)", y = "") +
  ggplot2::theme_light() +
  ggplot2::theme(axis.text.y = ggplot2::element_blank(), axis.ticks.y = ggplot2::element_blank())

let’s look at the summary statistics

cloud2trees_ans$treetops_sf$dbh_cm %>% summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.027   5.341  11.006  18.007  31.627  54.495

lots of small trees. what if we only count trees that are at least 2 m in height?

cloud2trees_ans$treetops_sf %>% 
  sf::st_drop_geometry() %>% 
  dplyr::filter(tree_height_m>=2) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(x = dbh_cm)) +
  ggplot2::geom_density(fill = "brown", color = "brown", alpha = 0.2) +
  ggplot2::scale_x_continuous(breaks = scales::breaks_extended(11)) +
  ggplot2::labs(x = "tree DBH (cm)", y = "") +
  ggplot2::theme_light() +
  ggplot2::theme(axis.text.y = ggplot2::element_blank(), axis.ticks.y = ggplot2::element_blank())

still a two-aged stand

Tree Crowns

let’s quickly look at the central 2.5 ha area of our RGB extent and the tree crowns detected

aoi_temp <- terra::ext(rgb_rast) %>% 
  terra::vect() %>% 
  sf::st_as_sf() %>% 
  sf::st_set_crs(terra::crs(rgb_rast)) %>% 
  sf::st_centroid() %>% 
  sf::st_buffer(sqrt(25000/4),endCapStyle = "SQUARE") ## numerator = desired plot size in m2
# plot
rgb_rast %>% 
  terra::crop(
    aoi_temp %>% sf::st_buffer(10) %>% terra::vect()
    , mask = T
  ) %>% 
  terra::plotRGB()
# add crowns
crowns_temp <- 
  cloud2trees_ans$crowns_sf %>% 
  dplyr::inner_join(
    cloud2trees_ans$treetops_sf %>% 
      dplyr::filter(tree_height_m>=2) %>% 
      sf::st_transform(terra::crs(rgb_rast)) %>% 
      sf::st_intersection(aoi_temp) %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(treeID)
    , by = "treeID"
  ) %>% 
  cloud2trees::simplify_multipolygon_crowns()
# add to plot
crowns_temp %>% 
  sf::st_transform(terra::crs(rgb_rast)) %>% 
  terra::vect() %>% 
  terra::plot(col = NA, border = "cyan", add = T, lwd = 1)

looks good

and the same area on the CHM

# plot
cloud2trees_ans$chm_rast %>% 
  terra::crop(
    aoi_temp %>% 
      sf::st_buffer(10) %>% 
      sf::st_transform(terra::crs(cloud2trees_ans$chm_rast)) %>% 
      terra::vect()
    , mask = T
  ) %>% 
  terra::plot(col = viridis::plasma(n=100), alpha = 0.9, main = "CHM (m)", axes = F)
# add crowns
# add to plot
crowns_temp %>% 
  sf::st_transform(terra::crs(cloud2trees_ans$chm_rast)) %>% 
  terra::vect() %>% 
  terra::plot(col = NA, border = "cyan", add = T, lwd = 1)

interesting

Review of All Data Sets

We have now processed all of the data sets for this study following the methods outlined in the previous section. Here, we’ll review the output data generated by each data set where each data set represents a unique UAS flight plan. Remember, the objective of the study is to compare the tree detection and forest structure outcomes across different UAS-lidar flight plan settings.

Tracking

get a list of the data available and parse the folder names to create a data set with all relevant variables for tracking the different flight setting parameters

tracking_df <-
  list.dirs("../data/processed", recursive = F) %>% 
  dplyr::tibble() %>%
  dplyr::rename(path = 1) %>% 
  dplyr::mutate(
    folder = basename(path) %>% tolower()
    # , path = normalizePath(path)
  ) %>%
  tidyr::extract(
    col = folder
    , into = c("agl", "mph", "tf")
    , regex = "([0-9]+)agl_([0-9]+)mph_tf(on|off)"
    , remove = F
    , convert = T
  ) %>% 
  dplyr::arrange(agl, mph, tf) %>% 
  dplyr::mutate(
    dataset_factor = 
      stringr::str_glue("AGL: {agl} | MPH: {mph} | TF: {tf}") %>%
      forcats::as_factor() %>%
      forcats::fct_inorder(ordered = T)
    , tf = tf=="on"
  ) %>% 
  dplyr::rename_with(.cols = -c(path,folder,dataset_factor), .fn = ~paste0("flight_",.x,recycle0 = T) )
# huh?
dplyr::glimpse(tracking_df)
## Rows: 9
## Columns: 6
## $ path           <chr> "../data/processed/processed_N1_200AGL_10MPH_TFOn", "..…
## $ folder         <chr> "processed_n1_200agl_10mph_tfon", "processed_n1_200agl_…
## $ flight_agl     <int> 200, 200, 200, 300, 300, 300, 400, 400, 400
## $ flight_mph     <int> 10, 20, 20, 10, 20, 20, 10, 20, 20
## $ flight_tf      <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE
## $ dataset_factor <ord> AGL: 200 | MPH: 10 | TF: on, AGL: 200 | MPH: 20 | TF: o…
# View(tracking_df)

Raster Data and Study Boundary

now, we’re going to load in the CHM data for each flight

chm_rast <-
  tracking_df$path %>% 
  purrr::map(
    \(x)
    terra::rast(
      file.path(x, "point_cloud_processing_delivery", "chm_0.25m.tif")
    )
  )
names(chm_rast) <- tracking_df$folder
# huh?
chm_rast[1:min(length(chm_rast),2)]
## $processed_n1_200agl_10mph_tfon
## class       : SpatRaster 
## size        : 2175, 1725, 1  (nrow, ncol, nlyr)
## resolution  : 0.25, 0.25  (x, y)
## extent      : 489942.8, 490374, 4330489, 4331033  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source      : chm_0.25m.tif 
## name        : focal_mean 
## min value   :     1.3708 
## max value   :    27.4069 
## 
## $processed_n1_200agl_20mph_tfoff
## class       : SpatRaster 
## size        : 2171, 1735, 1  (nrow, ncol, nlyr)
## resolution  : 0.25, 0.25  (x, y)
## extent      : 489945.8, 490379.5, 4330496, 4331039  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source      : chm_0.25m.tif 
## name        : focal_mean 
## min value   :     1.3702 
## max value   :    27.4061

and we’ll load in the stand boundary to get an idea of how our UAS data relates

stand_boundary <-
  sf::st_read(
    "../data/N1_Boundary/N1_Boundary.shp"
    # "c:/data/usfs/uas_sfm_tree_detection/data/field_validation/field_data_Boundary.shp"
    , quiet = T
  ) %>% 
  dplyr::rename_with(tolower) %>% 
  dplyr::filter(
    name=="N1"
    # site=="N1"
  ) %>% 
  sf::st_transform(terra::crs(chm_rast[[1]])) %>% 
  dplyr::mutate(stand_area_m2 = sf::st_area(.) %>% as.numeric())
# what
stand_boundary %>% dplyr::glimpse()
## Rows: 1
## Columns: 6
## $ name          <chr> "N1"
## $ acres         <dbl> 22.488
## $ comment       <chr> "Unmanaged Stand 99% PIPO - dense regen patches - comple…
## $ inventory     <chr> "Stem Map"
## $ geometry      <POLYGON [m]> POLYGON ((490012.8 4330596,...
## $ stand_area_m2 <dbl> 92941.05

plot CHMs

graphics::par(mfrow = c(ceiling(length(chm_rast)/3),3)) # mfrow = c(nr, nc)
# use purrr::walk() because we only want it to draw the plot
purrr::iwalk(
  chm_rast
  , function(x,nm){
    terra::plot(
      x
      # , mar = c(b,l,t,r)
      , mar = c(0.5,0.5,2,1.3)
      , col = viridis::plasma(n=100)
      , axes = F
      , main = nm
      , plg = list(
        # title = "CHM (m)"
        cex = 0.8, title.cex = 0.9, shrink = 0.6
      )
    )
    # add stand
    terra::plot(terra::vect(stand_boundary), lwd = 3, col = NA, border = "blue", axes = F, add = T)
})

# reset graphics::par
graphics::par(mfrow = c(1, 1))

what about the DTMs?

dtm_rast <- 
  tracking_df$path %>% 
  purrr::map(
    \(x)
    terra::rast(
      file.path(x, "point_cloud_processing_delivery", "dtm_0.5m.tif")
    )
  )
names(dtm_rast) <- tracking_df$folder
# huh?
dtm_rast[1:min(length(dtm_rast),2)]
## $processed_n1_200agl_10mph_tfon
## class       : SpatRaster 
## size        : 1088, 863, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : 489942.5, 490374, 4330489, 4331033  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source      : dtm_0.5m.tif 
## name        : 1_dtm_0.5m 
## min value   :   2332.557 
## max value   :   2370.654 
## 
## $processed_n1_200agl_20mph_tfoff
## class       : SpatRaster 
## size        : 1087, 869, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : 489945.5, 490380, 4330496, 4331040  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source      : dtm_0.5m.tif 
## name        : 1_dtm_0.5m 
## min value   :   2332.226 
## max value   :   2369.860

plot DTMs

graphics::par(mfrow = c(ceiling(length(dtm_rast)/3),3)) # mfrow = c(nr, nc)
# use purrr::walk() because we only want it to draw the plot
purrr::iwalk(
  dtm_rast
  , function(x,nm){
    terra::plot(
      x
      # , mar = c(b,l,t,r)
      , mar = c(0.5,0.5,2,1.3)
      , col = harrypotter::hp(n=100, option = "mischief", direction = -1)
      , axes = F
      , main = nm
      , plg = list(
        # title = "DTM (m)"
        cex = 0.8, title.cex = 0.9, shrink = 0.6
      )
    )
    # add stand
    terra::plot(terra::vect(stand_boundary), lwd = 3, col = NA, border = "blue", axes = F, add = T)
})

# reset graphics::par
graphics::par(mfrow = c(1, 1))

because the “300 AGL, 20 MPH, TF on” flight is missing most of the plot area we’ll remove it from further analysis. It began to rain during this flight which cut the flight short and the water content in the air likely impacted the laser scanning sensor results

# remove from tracking
tracking_df <- 
  tracking_df %>% 
  dplyr::filter(folder != "processed_n1_300agl_20mph_tfon")
# remove from chm
chm_rast <- chm_rast %>% 
  purrr::discard_at("processed_n1_300agl_20mph_tfon")

let’s see what data we are left with for evaluation

# quality
tracking_df %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
        flight_tf ~ "Terrain Follow: On"
        , !flight_tf ~ "Terrain Follow: Off"
        , T ~ "error"
      )
    , dplyr::across(tidyselect::starts_with("flight_"), as.factor)
    , flight_mph = forcats::fct_rev(flight_mph)
  ) %>% 
  dplyr::arrange(flight_tf,desc(flight_mph),desc(flight_agl)) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = flight_agl, y = flight_mph, label = flight_tf, color = flight_tf, fontface = "bold")
  ) +
    ggplot2::geom_tile(fill = NA, color = "black") +
    ggrepel::geom_text_repel(direction="y",nudge_x = 0,seed=1) +
    ggplot2::labs() +
    ggplot2::scale_x_discrete(position = "top") +
    ggplot2::coord_cartesian(expand = F) +
    ggplot2::scale_color_manual(values = c("gray55","gray11"), guide = "none") +
    ggplot2::theme_light() +
    ggplot2::theme(
      panel.grid = ggplot2::element_blank()
      , axis.text = ggplot2::element_text(size = 11, color = "black")
      , axis.title = ggplot2::element_text(size = 12, face = "bold", color = "black")
      , panel.border = ggplot2::element_rect(color = "black")
    )

the structure of the data (unbalanced) we have available means statistical testing of the flight settings using frequentist methods will be limited. we only have variation in the terrain following at the 20 mph speed, thus we will not be able to test for the impact of terrain following at different speeds.

Field Trees

let’s load the field data of stem mapped trees with individual tree measurements for use in validation

# filter for height to match UAS filtering
my_tree_ht_m <- 2
# read and filter
validation_trees <-
  # sf::st_read("c:/data/usfs/uas_sfm_tree_detection/data/field_validation/N1.gpkg", quiet = T) %>%
  readxl::read_excel("../data/N1_2024_Field_Trees.xlsx") %>% 
  dplyr::rename_with(
    ~ tolower(.x) %>% 
      stringr::str_squish() %>% 
      make.names() %>% 
      stringr::str_replace_all( "\\.{2,}", ".") %>% 
      stringr::str_remove( "\\.$") %>% 
      stringr::str_replace_all( "\\.", "_")
  ) %>% 
  sf::st_as_sf(
    coords = c("easting", "northing")
    , crs = 26913
    , remove=F
  ) %>% 
  dplyr::rename(
    # field_dbh_cm = dbh_cm
    # , field_tree_height_m = ht_m
    field_dbh_in = dbh_24
    , field_tree_height_ft = ht_24
  ) %>% 
  # dplyr::select(field_dbh_in, field_tree_height_ft) %>% summary()
  dplyr::mutate(
    field_dbh_cm = field_dbh_in/0.394
    , field_tree_height_m = field_tree_height_ft/3.281
  ) %>% 
  # dplyr::select(field_dbh_cm, field_tree_height_m) %>% summary()
  sf::st_set_geometry("geometry") %>% 
  dplyr::filter(
    !is.na(field_dbh_cm)
    & !is.na(field_tree_height_m)
    & sf::st_is_valid(geometry)
    # only keep trees that are above height threshold used for uas processing
    & field_tree_height_m >= my_tree_ht_m
    # & field_dbh_cm >= min_tree_dbh_cm # if know min field dbh for field sampling
  ) %>% 
  sf::st_transform(terra::crs(chm_rast[[1]])) %>% 
  sf::st_intersection(
    stand_boundary %>%
      dplyr::select(stand_area_m2)
  ) %>%
  dplyr::mutate(
    field_tree_id = dplyr::row_number()
    , tree_utm_x = sf::st_coordinates(geometry)[,1] #lon
    , tree_utm_y = sf::st_coordinates(geometry)[,2] #lat
    # basal area
    , basal_area_m2 = pi * field_dbh_cm^2 / (4*10000)
  ) %>% 
  dplyr::relocate(field_tree_id)
# huh?
validation_trees %>% dplyr::glimpse()
## Rows: 5,217
## Columns: 22
## $ field_tree_id        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ tag_2024             <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ sect                 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ northing             <dbl> 4330895, 4330896, 4330896, 4330890, 4330889, 4330…
## $ easting              <dbl> 489992.3, 489993.1, 489994.1, 489996.5, 489995.7,…
## $ ns                   <dbl> 985.7, 987.6, 986.9, 968.2, 963.4, 958.7, 949.5, …
## $ ew                   <dbl> 1.0, 3.8, 7.1, 13.6, 10.6, 18.1, 16.9, 17.6, 21.8…
## $ spp                  <chr> "Pipo", "Pipo", "Pipo", "Pipo", "Pipo", "Pipo", "…
## $ status_24            <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",…
## $ field_dbh_in         <dbl> 14.9, 7.4, 7.5, 7.3, 16.8, 6.8, 14.3, 15.8, 18.2,…
## $ field_tree_height_ft <dbl> 66, 33, 23, 37, 77, 39, 66, 80, 79, 33, 22, 68, 6…
## $ lll_24               <chr> "26", "16", "11", "8", "29", "3", "31", "40", "42…
## $ height_defect        <chr> NA, "DT", NA, NA, NA, NA, "DT", NA, NA, NA, NA, N…
## $ field_check          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ comments_24          <chr> NA, "dead top", NA, NA, NA, NA, "dead top", NA, N…
## $ field_dbh_cm         <dbl> 37.817259, 18.781726, 19.035533, 18.527919, 42.63…
## $ field_tree_height_m  <dbl> 20.115818, 10.057909, 7.010058, 11.277050, 23.468…
## $ stand_area_m2        <dbl> 92941.05, 92941.05, 92941.05, 92941.05, 92941.05,…
## $ geometry             <POINT [m]> POINT (489992.3 4330895), POINT (489993.1 4…
## $ tree_utm_x           <dbl> 489992.3, 489993.1, 489994.1, 489996.5, 489995.7,…
## $ tree_utm_y           <dbl> 4330895, 4330896, 4330896, 4330890, 4330889, 4330…
## $ basal_area_m2        <dbl> 0.112323331, 0.027705174, 0.028459022, 0.02696144…

look at a map of the validation trees over the study boundary

mapview::mapview(
  stand_boundary
  , color = "blue"
  , lwd = 1.5
  , alpha.regions = 0
  , label = F
  , legend = F
  , popup = F
  , layer.name = "study boundary"
  , leaflet.options = list(maxZoom = 16)
) + 
mapview::mapview(
  validation_trees
  , zcol = "field_tree_height_m"
  , col.regions = viridis::plasma(n=100)
  , cex = 3
  , layer.name = "field tree ht. (m)"
)

let’s check out very high-level stand summary metrics

validation_trees %>% 
  sf::st_drop_geometry() %>% 
  dplyr::group_by(stand_area_m2) %>% 
  dplyr::summarise(
    dplyr::across(
      tidyselect::starts_with("field_")
      , .fns = list(mean = mean, sd = sd)
    )
    , n_trees = dplyr::n()
    , basal_area_m2 = sum(basal_area_m2)
  ) %>% 
  # add area
  dplyr::mutate(stand_area_ha = stand_area_m2/10000) %>% 
  dplyr::mutate(
    ht = paste0(
      field_tree_height_m_mean %>% 
        round(1) %>% 
        scales::comma(accuracy = 0.1)
      , "<br>("
      , field_tree_height_m_sd %>% 
          round(1) %>% 
          scales::comma(accuracy = 0.1)
      , ")"
    )
    , dbh = paste0(
      field_dbh_cm_mean %>% round(1) %>% scales::comma(accuracy = 0.1)
      , "<br>("
      , field_dbh_cm_sd %>% round(1) %>% scales::comma(accuracy = 0.1)
      , ")"
    )
    , trees_ha = n_trees/stand_area_ha
    , basal_area_m2_per_ha = basal_area_m2/stand_area_ha
    , stand_area_ha = stand_area_ha %>% round(1) %>% scales::comma(accuracy = 0.1)
    , n_trees = scales::comma(n_trees,accuracy=1)
  ) %>% 
  dplyr::select(
    stand_area_ha, n_trees, trees_ha, basal_area_m2_per_ha, ht, dbh
  ) %>% 
  kableExtra::kbl(
    caption = "Stand Summary of Field Data"
    , escape = F
    , digits = 1
    , col.names = c(
      "Hectares", "# trees"
      , "Trees<br>ha<sup>-1</sup>"
      , "Basal Area<br>m<sup>2</sup> ha<sup>-1</sup>"
      , "Height (m)", "DBH (cm)"
    )
  ) %>% 
  kableExtra::kable_styling()
Stand Summary of Field Data
Hectares # trees Trees
ha-1
Basal Area
m2 ha-1
Height (m) DBH (cm)
9.3 5,217 561.3 24.8 9.9
(7.3)
17.7
(15.7)

UAS Predicted Trees

let’s load the UAS predicted trees

predicted_trees <- 
  tracking_df$path %>% 
  # .[1:2] %>%
  purrr::map(function(i){
    dta <- 
      file.path(i, "point_cloud_processing_delivery", "final_detected_tree_tops.gpkg") %>% 
        sf::st_read(quiet=F) %>% 
        dplyr::filter(tree_height_m >= my_tree_ht_m) %>% 
        # clip to study boundary
        sf::st_intersection(
          stand_boundary %>% 
            dplyr::select(stand_area_m2) %>% 
            sf::st_buffer(3/2) # 0.5*max_dist_m where max_dist_m=allowable search distance for TP match
        ) %>% 
        dplyr::mutate(
          basal_area_m2 = pi * dbh_cm^2 / (4*10000)
          , las_overlap_stand_area_m2 = 
            file.path(i, "point_cloud_processing_delivery", "raw_las_ctg_info.gpkg") %>% 
              sf::st_read(quiet=T) %>% 
              sf::st_union() %>% 
              # sf::st_area() %>% as.numeric()
              # clip to study boundary
              sf::st_intersection(stand_boundary) %>% 
              sf::st_area() %>% as.numeric() %>% 
              sum()
        )
    # flag if in stand...do this to allow for trees outside of stand but might still match with reference due to lean, gps error, etc
    dta <- 
      dta %>% 
        dplyr::left_join(
          dta %>% 
            # clip to study boundary
            sf::st_intersection(
              stand_boundary %>% 
                dplyr::select(stand_area_m2) %>% 
                dplyr::mutate(is_in_stand = T)
            ) %>% 
            sf::st_drop_geometry() %>% 
            dplyr::select(treeID, is_in_stand)
          , by = "treeID"
        ) %>% 
        dplyr::mutate(
          is_in_stand = dplyr::coalesce(is_in_stand,F)
        )
    return (dta)
  })
names(predicted_trees) <- tracking_df$folder

let’s check the number of predicted trees for each data set

predicted_trees %>% 
  purrr::map_dfr(\(x) x %>% dplyr::filter(is_in_stand) %>% nrow()) %>% 
  tidyr::pivot_longer(dplyr::everything()) %>% 
  dplyr::rename(folder=name) %>% 
  dplyr::inner_join(tracking_df, by="folder") %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "On"
      , !flight_tf ~ "Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::select(tidyselect::starts_with("flight_"), value) %>% 
  dplyr::arrange(flight_agl) %>% 
  kableExtra::kbl(
    caption = "Predicted trees by flight"
    , col.names = c(
      "altitude (ft)", "speed (mph)", "Terrain Follow", "predicted trees"
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Predicted trees by flight
altitude (ft) speed (mph) Terrain Follow predicted trees
200 10 On 5267
20 Off 5513
On 5486
300 10 On 5408
20 Off 5493
400 10 On 5438
20 Off 5579
On 5516

check the DBH and height distribution for each data set

predicted_trees %>% 
  dplyr::bind_rows(.id = "folder") %>% 
  sf::st_drop_geometry() %>% 
  dplyr::filter(is_in_stand) %>% 
  dplyr::select(folder, tree_height_m, dbh_cm) %>% 
  tidyr::pivot_longer(cols = -c(folder)) %>% 
  dplyr::mutate(
    name = dplyr::recode_values(
      name
      , "tree_height_m" ~ "Height (m)"
      , "dbh_cm" ~ "DBH (cm)"
    )
  ) %>% 
  dplyr::inner_join(tracking_df, by="folder") %>% 
  dplyr::mutate(
    dplyr::across(tidyselect::starts_with("flight_"), as.factor)
    , dataset_factor = forcats::fct_rev(dataset_factor)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(x=value,y=dataset_factor,fill=flight_agl)) +
  ggplot2::geom_boxplot(width=0.6, outliers = F, ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(name)
    , scales = "free_x"
    # , axes = "all"
  ) +
  ggplot2::scale_fill_brewer(palette = "Blues") +
  ggplot2::scale_x_continuous(labels = scales::comma, breaks = scales::breaks_extended(7)) +
  ggplot2::theme_light() +
  ggplot2::labs(x = "", y = "", subtitle = "Predicted tree height and DBH by flight") +
  ggplot2::theme(
    legend.position = "none"
    , axis.text.y = ggplot2::element_text(size = 9, face = "bold")
    , axis.text.x = ggplot2::element_text(size = 8)
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
  )

both the count of trees and the height and DBH distributions look very similar across the different flight settings…this may be an early indication of limited influence of the flight settings on tree detection and forest structure quantification

let’s check out very high-level stand summary metrics

predicted_trees %>% 
  purrr::map_dfr(
    \(x)
    x %>% 
      sf::st_drop_geometry() %>% 
      dplyr::filter(is_in_stand) %>% 
      dplyr::group_by(las_overlap_stand_area_m2) %>% 
      dplyr::summarise(
        dplyr::across(
          c(tree_height_m, dbh_cm)
          , .fns = list(mean = mean, sd = sd)
        )
        , n_trees = dplyr::n()
        , basal_area_m2 = sum(basal_area_m2)
      ) %>% 
      # add area
      dplyr::mutate(stand_area_ha = las_overlap_stand_area_m2/10000) %>% 
      dplyr::mutate(
        ht = paste0(
          tree_height_m_mean %>% 
            round(1) %>% 
            scales::comma(accuracy = 0.1)
          , "<br>("
          , tree_height_m_sd %>% 
              round(1) %>% 
              scales::comma(accuracy = 0.1)
          , ")"
        )
        , dbh = paste0(
          dbh_cm_mean %>% round(1) %>% scales::comma(accuracy = 0.1)
          , "<br>("
          , dbh_cm_sd %>% round(1) %>% scales::comma(accuracy = 0.1)
          , ")"
        )
        , trees_ha = n_trees/stand_area_ha
        , basal_area_m2_per_ha = basal_area_m2/stand_area_ha
        , stand_area_ha = stand_area_ha %>% round(1) %>% scales::comma(accuracy = 0.1)
      ) %>% 
      dplyr::select(
        stand_area_ha, n_trees, trees_ha, basal_area_m2_per_ha, ht, dbh
      )
    , .id = "folder"
  ) %>% 
  # dplyr::glimpse()
  dplyr::inner_join(
    tracking_df %>% dplyr::select(folder, tidyselect::starts_with("flight_"))
    , by="folder"
  ) %>% 
  dplyr::select(-folder) %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "On"
      , !flight_tf ~ "Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::arrange(flight_agl) %>% 
  dplyr::relocate(
    c(stand_area_ha, n_trees, trees_ha, basal_area_m2_per_ha, ht, dbh)
    , .after = dplyr::last_col()
  ) %>% 
  # dplyr::glimpse() %>% 
  kableExtra::kbl(
    caption = "Stand Summary of Predicted Data"
    , col.names = c(
      "altitude (ft)", "speed (mph)", "Terrain Follow"
      , "Coverage<br>Hectares", "# trees"
      , "Trees<br>ha<sup>-1</sup>"
      , "Basal Area<br>m<sup>2</sup> ha<sup>-1</sup>"
      , "Height (m)", "DBH (cm)"
    )
    , escape = F
    , digits = 1
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Stand Summary of Predicted Data
altitude (ft) speed (mph) Terrain Follow Coverage
Hectares
# trees Trees
ha-1
Basal Area
m2 ha-1
Height (m) DBH (cm)
200 10 On 9.3 5267 566.7 27.2 10.1
(7.5)
19.2
(15.6)
20 Off 9.3 5513 593.2 27.7 9.9
(7.5)
18.9
(15.4)
On 9.3 5486 590.3 27.3 9.9
(7.5)
18.8
(15.4)
300 10 On 9.3 5408 581.9 27.8 10.1
(7.5)
19.2
(15.5)
20 Off 9.3 5493 591.0 27.8 10.0
(7.5)
19.0
(15.4)
400 10 On 9.3 5438 585.1 28.3 10.1
(7.5)
19.4
(15.4)
20 Off 9.3 5579 600.3 28.5 9.9
(7.5)
19.2
(15.4)
On 9.3 5516 593.5 28.3 10.0
(7.5)
19.3
(15.4)

Accuracy Evaluation

We’ll follow Tinkham and Swayze (2021) and Tinkham and Woolsey (2024) who describe a methodology for matching UAS detected trees with stem mapped trees identified via traditional field methods. Note, detected trees in the excerpt below references UAS detected trees.

The UAS-SfM detected trees were matched with the field-inventoried trees through an iterative process…Within a data set, a UAS-SfM target tree was selected and intersected with all field trees within a 3 m radius and within 2 m in height of the target UAS tree location and height. If there were multiple matched field trees, then the tree closest in height was considered the True Positive match and the matched UAS and field trees were removed from further matching. The remining UAS and field trees were matched iteratively until no additional tree pairs could be identified using this process. If no field tree could be identified after this matching process, the UAS tree was considered a False Positive with any remaining unmatched field trees classified as False Negatives. (Tinkham and Woolsey 2024, p.7).

After performing instance matching, we’ll calculate detection accuracy metrics by aggregating raw true positive (TP), false positive (FP; commission), and false negative (FN; omission) counts to quantify the method’s ability to find the reference (ground truth; validation) trees. Then, based on the TP matches, we’ll quantify the

Detection Accuracy Functions

we’ll make a function that uses the following input data parameters to match predicted trees with validation/reference trees:

  • pred_data: predicted data point locations as sf data
  • ref_data: reference/validation/ground truth data point locations as sf data
  • max_dist_m: radius of search for matching
  • max_height_error_m: maximum allowable difference in height for to be considered correct match

For instance matching using the point locations, we’ll use the RANN package which enables KD-tree searching/processing to build the tree which is a super-fast way to find the x number of near neighbors for each point in an input dataset (see RANN::nn2()).

Then, we’ll aggregate the instance matching results to evaluate omission rate (false negative rate or miss rate), commission rate (false positive rate), precision, recall (detection rate), and the F-score metric. As a reminder, true positive (TP) instances correctly match ground truth instances with a prediction, commission predictions do not match a ground truth instance (false positive; FP), and omissions are ground truth instances for which no predictions match (false negative; FN)

\[\textrm{omission rate} = \frac{FN}{TP+FN}\]

\[\textrm{commission rate} = \frac{FP}{TP+FP}\]

\[\textrm{precision} = \frac{TP}{TP+FP}\]

\[\textrm{recall} = \frac{TP}{TP+FN}\]

\[ \textrm{F-score} = 2 \times \frac{\bigl(precision \times recall \bigr)}{\bigl(precision + recall \bigr)} \]

###############################################################
###############################################################
###############################################################
###############################################################
# check df function
###############################################################
###############################################################
###############################################################
###############################################################
validate_df_fn <- function(data, required_cols = c("tree_height_m"), check_numeric = F) {
  # check data
  if(!inherits(data,"sf") && !inherits(data,"sfc")){stop("all input data requires POINT sf data")}
  if(
    !all( sf::st_is(data, c("POINT")) )
  ){
    stop("all input data requires POINT sf data")
  }
  # names
  # uses base::setdiff and base::names
  missing_cols <- setdiff(required_cols, names(data))
  
  if(length(missing_cols) > 0){
    stop("all input data requires columns: ", paste(missing_cols, collapse = ", "))
  }
  
  # numeric
  if(check_numeric){
    non_numeric <- data %>%
      sf::st_drop_geometry() %>% 
      dplyr::select(dplyr::all_of(required_cols)) %>%
      dplyr::summarise(dplyr::across(dplyr::everything(), ~ !is.numeric(.x))) %>%
      tidyr::pivot_longer(dplyr::everything()) %>%
      dplyr::filter(value == TRUE)

    if (nrow(non_numeric) > 0) {
      stop("columns must be numeric: ", paste(non_numeric$name, collapse = ", "))
    }
  }
  
  # is.na() catches both NA and NaN; is.infinite() catches Inf
  invalid_dta <- data %>%
    sf::st_drop_geometry() %>% 
    dplyr::summarise(
      dplyr::across(
        dplyr::all_of(required_cols)
        , ~ sum(is.na(.x) | is.infinite(.x))
      )
    ) %>%
    tidyr::pivot_longer(
      cols = dplyr::everything()
      , names_to = "column"
      , values_to = "count"
    ) %>%
    dplyr::filter(count > 0)

  if (nrow(invalid_dta) > 0) {
    bad_cols <- paste(invalid_dta$column, collapse = ", ")
    stop("invalid values (NA, NaN, or Inf) found in: ", bad_cols)
  }
  return(TRUE)
}
###############################################################
###############################################################
###############################################################
###############################################################
# match function
###############################################################
###############################################################
###############################################################
###############################################################
reference_prediction_match <- function(
  pred_data
  , ref_data
  , max_dist_m
  , max_height_error_m
) {
  #######################################################################
  # threshold checks
  #######################################################################
  if(
    any(is.na(max_dist_m)) || 
    any(is.null(max_dist_m)) || 
    !inherits(as.numeric(max_dist_m),"numeric")
  ){
    stop("`max_dist_m` must be numeric and not missing")
  }else{max_dist_m <- as.numeric(max_dist_m)[1]}
  if(
    any(is.na(max_height_error_m)) || 
    any(is.null(max_height_error_m)) || 
    !inherits(as.numeric(max_height_error_m),"numeric")
  ){
    stop("`max_height_error_m` must be numeric and not missing")
  }else{max_height_error_m <- as.numeric(max_height_error_m)[1]}
  #######################################################################
  # data checks
  #######################################################################
  validate_df_fn(data = pred_data, required_cols = c("tree_height_m"), check_numeric = T)
  validate_df_fn(data = ref_data, required_cols = c("tree_height_m"), check_numeric = T)
  if(nrow(pred_data)==0 | nrow(ref_data)==0){
    warning("no data so...?????")
    return(NULL)
  }
  # proj
  if(!identical(
    sf::st_crs(pred_data, parameters = T)[["epsg"]]
    , sf::st_crs(ref_data, parameters = T)[["epsg"]]
  )){stop("data must have same projection. see `sf::st_crs()`")}
  # sort to start with largest
  pred_data <- pred_data %>% 
    dplyr::arrange(desc(tree_height_m)) %>% 
    dplyr::mutate(pred_match_idx = dplyr::row_number())
  ref_data <- ref_data %>% 
    dplyr::arrange(desc(tree_height_m)) %>% 
    dplyr::mutate(ref_match_idx = dplyr::row_number())
  #######################################################################
  # build a 2D kd-tree
  #######################################################################
    # Use the X and Y coordinates of the predicted to build the tree.
    pred_coords_temp <- 
      pred_data %>% 
      dplyr::mutate(
        X = sf::st_coordinates(.)[,1] %>% as.numeric()
        , Y = sf::st_coordinates(.)[,2] %>% as.numeric()
      ) %>% 
      dplyr::select(X,Y) %>% 
      sf::st_drop_geometry()
    # Use the X and Y coordinates of the reference to build the tree.
    ref_coords_temp <- 
      ref_data %>% 
      dplyr::mutate(
        X = sf::st_coordinates(.)[,1] %>% as.numeric()
        , Y = sf::st_coordinates(.)[,2] %>% as.numeric()
      ) %>% 
      dplyr::select(X,Y) %>% 
      sf::st_drop_geometry()
    
    # perform a fixed-radius search using RANN::nn2
    nn_ans_temp <- RANN::nn2(
      # the point cloud xy data (builds the kd-tree on this)
      data = pred_coords_temp
      # the seed point xy data (number of columns must be the same as data)
      , query = ref_coords_temp
      # maximum number of neighbors to find per query
      , k = min(nrow(pred_coords_temp),nrow(ref_coords_temp))
      # the search radius (rcyl)
      , radius = max_dist_m
      , searchtype = "radius"
    )
    # huh?
    # nn_ans_temp %>% class()
    # nn_ans_temp %>% names()
    # nn_ans_temp$nn.idx %>% nrow()
    # nrow(ref_coords_temp)
    # nn_ans_temp$nn.idx %>% ncol()
    # nrow(pred_coords_temp)
    # nn_ans_temp$nn.dists %>% nrow()
    # nrow(ref_coords_temp)
    # nn_ans_temp$nn.dists %>% ncol()
    # nrow(pred_coords_temp)
    # nn_ans_temp$nn.dists[1:11,] %>% dplyr::as_tibble() %>% dplyr::glimpse()
  #######################################################################  
  # iterate through the kd-tree to make matches
  #######################################################################
    if(nrow(ref_coords_temp)!=nrow(ref_data)){stop("bad kd-tree")}
    # add pred match tracker
    ref_data$pred_match_idx <- 0
    ref_data$pred_match_height_diff_m <- as.numeric(NA)
    ref_data$pred_tree_height_m <- as.numeric(NA)
    ref_data$pred_match_dist_m <- as.numeric(NA)
    for(i in 1:nrow(ref_coords_temp)){
      # get indices of points in the current cylinder (0 means no point found)
      nn_idx_pts <- nn_ans_temp$nn.idx[i, ]
      cyl_pts <- nn_idx_pts[nn_idx_pts > 0] 
      # remove already matched
      cyl_pts <- cyl_pts[!(cyl_pts %in% ref_data$pred_match_idx)] 
      # if available matches
      if(dplyr::coalesce(length(cyl_pts),0) >= 1) {
        # get heights
        # ref_data %>% dplyr::select(field_tree_id, tree_height_m) %>% dplyr::glimpse()
        # ref_data$tree_height_m[i]
        # pred_data$tree_height_m[cyl_pts]
        ht_diffs <- abs(ref_data$tree_height_m[i]-pred_data$tree_height_m[cyl_pts])
        # get the minimum value among elements below the threshold
        min_val <- min(ht_diffs[ht_diffs <= max_height_error_m], na.rm = T)
        # if not none within threshold
        if(!is.infinite(min_val)){
          # the index of that value in the original full vector
          cyl_idx <- which(ht_diffs == min_val)[1] # if multiple matches, takes the nearest
          match_idx <- cyl_pts[cyl_idx]
          # distance
          nn_dist_idx <- which(nn_idx_pts == match_idx)
          nn_dist <- nn_ans_temp$nn.dists[i,nn_dist_idx]
          # update match
          ref_data$pred_match_height_diff_m[i] <- min_val
          ref_data$pred_tree_height_m[i] <- pred_data$tree_height_m[match_idx]
          ref_data$pred_match_idx[i] <- match_idx
          ref_data$pred_match_dist_m[i] <- nn_dist
          
        }
      }
    }
  #######################################################################
  # add ref id to pred and return
  #######################################################################
  # pred_data %>% dplyr::glimpse()
  # ref_data %>% dplyr::glimpse()
  pred_data <- pred_data %>% 
    # get rid of columns we'll create
    dplyr::select( -dplyr::any_of(c(
      "hey_xxxxxxxxxx"
      , "pred_match_height_diff_m"
      , "pred_match_dist_m"
      , "ref_tree_height_m"
      , "ref_match_idx"
    ))) %>% 
    dplyr::left_join(
      ref_data %>% 
        sf::st_drop_geometry() %>% 
        dplyr::select(ref_match_idx, pred_match_idx, pred_match_height_diff_m, pred_match_dist_m, tree_height_m) %>% 
        dplyr::rename(ref_tree_height_m=tree_height_m)
      , by = "pred_match_idx"
    )
  return(list(
    ref_data = ref_data
    , pred_data = pred_data
  ))
}
###############################################################
###############################################################
###############################################################
###############################################################
# combine match to classify reference/predictions
###############################################################
###############################################################
###############################################################
###############################################################
ground_truth_prediction_match <- function(
  reference_prediction_match_ans
  , comp_cols = c("tree_height_m")
) {
  # check for comparison cols
  comp_cols <- 
    comp_cols %>% 
    # remove whitespace
    stringr::str_trim() %>% 
    # keep only those with a value
    stringr::str_subset(".")
  if(inherits(comp_cols, "character") && length(comp_cols) > 0){
    validate_df_fn(data = reference_prediction_match_ans$ref_data, required_cols = comp_cols, check_numeric = T)
    validate_df_fn(data = reference_prediction_match_ans$pred_data, required_cols = comp_cols, check_numeric = T)
  }
  # tp matches
  return_df <-
    reference_prediction_match_ans$ref_data %>%
    dplyr::filter(pred_match_idx!=0 & !is.na(pred_match_idx)) %>% 
    dplyr::select(dplyr::all_of(c(
      "ref_match_idx"
      , "pred_match_idx"
      , "pred_match_dist_m"
      # , "pred_tree_height_m"
      # , "tree_height_m"
      , comp_cols
    ))) %>% 
    dplyr::rename_with(
      .cols = dplyr::all_of(comp_cols)
      , .fn = ~paste0("ref_", .x, recycle0 = T)
    ) %>% 
    sf::st_set_geometry("ref_geometry") %>% 
    # add on pred geom
    dplyr::inner_join(
      reference_prediction_match_ans$pred_data %>% 
        dplyr::select(dplyr::all_of(c(
          "ref_match_idx"
          , comp_cols
        ))) %>% 
        dplyr::rename_with(
          .cols = dplyr::all_of(comp_cols)
          , .fn = ~paste0("pred_", .x, recycle0 = T)
        ) %>% 
        dplyr::mutate(pred_geometry = sf::st_geometry(.)) %>%
        sf::st_drop_geometry()
      , by = "ref_match_idx"
    ) %>% 
    dplyr::mutate(match_grp = "true positive")
  
  # add omissions
  return_df <-
    return_df %>% 
    dplyr::bind_rows(
      reference_prediction_match_ans$ref_data %>%
        dplyr::filter(pred_match_idx==0 | is.na(pred_match_idx)) %>% 
        dplyr::select(dplyr::all_of(c(
          "ref_match_idx"
          , "pred_match_idx"
          , "pred_match_dist_m"
          # , "pred_tree_height_m"
          # , "tree_height_m"
          , comp_cols
        ))) %>% 
        dplyr::rename_with(
          .cols = dplyr::all_of(comp_cols)
          , .fn = ~paste0("ref_", .x, recycle0 = T)
        ) %>% 
        sf::st_set_geometry("ref_geometry") %>% 
        dplyr::mutate(match_grp = "omission")
    )
    
  # add commissions
  return_df <-
    return_df %>% 
    dplyr::bind_rows(
      reference_prediction_match_ans$pred_data %>%
        dplyr::filter(ref_match_idx==0 | is.na(ref_match_idx)) %>% 
        dplyr::select(dplyr::all_of(c(
          "ref_match_idx"
          , "pred_match_idx"
          , "pred_match_dist_m"
          # , "pred_tree_height_m"
          # , "tree_height_m"
          , comp_cols
        ))) %>% 
        dplyr::rename_with(
          .cols = dplyr::all_of(comp_cols)
          , .fn = ~paste0("pred_", .x, recycle0 = T)
        ) %>% 
        dplyr::mutate(pred_geometry = sf::st_geometry(.)) %>%
        sf::st_drop_geometry() %>% 
        dplyr::mutate(match_grp = "commission")
    )
  
  # make match_grp factor
  return_df <- return_df %>% 
    dplyr::mutate(
      match_grp = factor(
        match_grp
        , ordered = T
        , levels = c(
          "true positive"
          , "commission"
          , "omission"
        )
      ) %>% forcats::fct_rev()
    )
  
  
  if(inherits(comp_cols, "character") && length(comp_cols) > 0){
    return_df <- 
      return_df %>% 
      dplyr::mutate(
        # 'diff_' columns are calculated as the predicted value minus the actual value
        dplyr::across(
          .cols = paste0(
            "ref_"
            , comp_cols # c("tree_height_m", "dbh_cm")
            , recycle0 = T
          )
          , .fns = ~ base::get(stringr::str_replace(dplyr::cur_column(), "^ref_", "pred_")) - .x
          , .names = "diff_{stringr::str_remove(.col, '^ref_')}"
        )
        # 'pct_diff_' columns are calculated as the actual value minus the predicted value divided by the actual value
        , dplyr::across(
          .cols = paste0(
            "ref_"
            , comp_cols # c("tree_height_m", "dbh_cm")
            , recycle0 = T
          )
          , .fns = ~ (.x-base::get(stringr::str_replace(dplyr::cur_column(), "^ref_", "pred_")))/.x
          , .names = "pct_diff_{stringr::str_remove(.col, '^ref_')}"
        )
      ) 
  }
  # return
  if(nrow(return_df)==0){
    warning("no records found for ground truth to predition matching")
    return(NULL)
  }else{
    return(return_df)
  }
  
}

Example: Instance Match

here, we’ll demonstrate usage of the reference_prediction_match() and ground_truth_prediction_match() functions and what they return

the reference_prediction_match() function takes the point data of the predicted and reference tree locations and performs the instance segmentation routine described above, returning the input point location data with links to the other data. It is more of an intermediate function to be used by the developer, but we’ll check it out anyway since the next function, ground_truth_prediction_match(), relies on it’s output as input.

reference_prediction_match_ans_temp <- reference_prediction_match(
  pred_data = predicted_trees[[2]]
  , ref_data = validation_trees %>% dplyr::rename(tree_height_m=field_tree_height_m, dbh_cm=field_dbh_cm)
  , max_dist_m = 3
  , max_height_error_m = 2
)
# wha?
reference_prediction_match_ans_temp$ref_data %>% dplyr::glimpse()
## Rows: 5,217
## Columns: 27
## $ field_tree_id            <int> 2076, 2968, 2483, 1201, 2036, 2969, 2649, 268…
## $ tag_2024                 <dbl> 694, 1327, 963, 374, 669, 1328, 1092, 1111, 2…
## $ sect                     <dbl> 29, 52, 39, 15, 28, 52, 43, 44, 7, 16, 25, 10…
## $ northing                 <dbl> 4330830, 4330722, 4330824, 4330862, 4330838, …
## $ easting                  <dbl> 490262.7, 490048.1, 490251.1, 490128.5, 49021…
## $ ns                       <dbl> 710.2, 405.0, 692.9, 844.5, 748.0, 414.1, 521…
## $ ew                       <dbl> 870.8, 144.0, 831.6, 438.9, 722.6, 144.5, 252…
## $ spp                      <chr> "Pipo", "Pipo", "Pipo", "Pipo", "Pipo", "Pipo…
## $ status_24                <chr> "sd", "a", "a", "a", "a", "a", "a", "a", "a",…
## $ field_dbh_in             <dbl> 19.3, 24.2, 18.0, 23.8, 22.7, 23.4, 17.6, 22.…
## $ field_tree_height_ft     <dbl> 87.5, 86.5, 86.0, 85.0, 85.0, 84.5, 84.0, 84.…
## $ lll_24                   <chr> "NA", "46.5", "18", "11.5", "38", "47", "14",…
## $ height_defect            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ field_check              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ comments_24              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ dbh_cm                   <dbl> 48.98477, 61.42132, 45.68528, 60.40609, 57.61…
## $ tree_height_m            <dbl> 26.66870, 26.36391, 26.21152, 25.90674, 25.90…
## $ stand_area_m2            <dbl> 92941.05, 92941.05, 92941.05, 92941.05, 92941…
## $ geometry                 <POINT [m]> POINT (490262.7 4330830), POINT (490048…
## $ tree_utm_x               <dbl> 490262.7, 490048.1, 490251.1, 490128.5, 49021…
## $ tree_utm_y               <dbl> 4330830, 4330722, 4330824, 4330862, 4330838, …
## $ basal_area_m2            <dbl> 0.18845691, 0.29629762, 0.16392396, 0.2865836…
## $ ref_match_idx            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
## $ pred_match_idx           <dbl> 5, 1, 3, 14, 21, 2, 15, 11, 45, 4, 8, 64, 7, …
## $ pred_match_height_diff_m <dbl> 0.43949820, 0.99148664, 0.67767933, 0.4771359…
## $ pred_tree_height_m       <dbl> 26.2292, 27.3554, 26.8892, 25.4296, 25.0664, …
## $ pred_match_dist_m        <dbl> 2.63477069, 1.44906125, 2.40452486, 2.3700828…
reference_prediction_match_ans_temp$pred_data %>% dplyr::glimpse()
## Rows: 5,595
## Columns: 23
## $ treeID                    <chr> "7224_490048.9_4330722.9", "7163_490047.9_43…
## $ tree_height_m             <dbl> 27.3554, 27.1074, 26.8892, 26.5202, 26.2292,…
## $ crown_area_m2             <dbl> 57.1250, 35.6250, 67.5000, 76.5625, 43.5625,…
## $ fia_est_dbh_cm            <dbl> 53.10043, 52.62266, 52.26746, 51.57614, 51.2…
## $ fia_est_dbh_cm_lower      <dbl> 31.64323, 31.27280, 31.13277, 31.06520, 30.5…
## $ fia_est_dbh_cm_upper      <dbl> 82.13051, 80.96018, 81.03985, 79.07248, 78.9…
## $ dbh_cm                    <dbl> 53.10043, 52.62266, 52.26746, 51.57614, 51.2…
## $ is_training_data          <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ dbh_m                     <dbl> 0.5310043, 0.5262266, 0.5226746, 0.5157614, …
## $ radius_m                  <dbl> 0.2655021, 0.2631133, 0.2613373, 0.2578807, …
## $ basal_area_m2             <dbl> 0.2214552, 0.2174881, 0.2145619, 0.2089236, …
## $ basal_area_ft2            <dbl> 2.383744, 2.341042, 2.309544, 2.248854, 2.21…
## $ ptcld_extracted_dbh_cm    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ ptcld_predicted_dbh_cm    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ stand_area_m2             <dbl> 92941.05, 92941.05, 92941.05, 92941.05, 9294…
## $ las_overlap_stand_area_m2 <dbl> 92941.05, 92941.05, 92941.05, 92941.05, 9294…
## $ is_in_stand               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ pred_match_idx            <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ ref_match_idx             <int> 2, 6, 3, 10, 1, 32, 13, 11, 19, NA, 8, NA, 3…
## $ pred_match_height_diff_m  <dbl> 0.99148664, 1.35305771, 0.67767933, 1.223035…
## $ pred_match_dist_m         <dbl> 1.4490612, 0.6924579, 2.4045249, 0.9042397, …
## $ ref_tree_height_m         <dbl> 26.36391, 25.75434, 26.21152, 25.29717, 26.6…
## $ geom                      <POINT [m]> POINT (490048.9 4330723), POINT (49004…

the ground_truth_prediction_match() uses the output from reference_prediction_match() and creates a nice, clean data frame with the instances classified as TP, FP, or FN for use in aggregating to calculate detection accuracy metrics. Optionally, users can set the foundation for quantification accuracy assessment via the comp_cols argument (see the diff_* and pct_diff_* columns in the output)

gt_temp <- ground_truth_prediction_match(
  reference_prediction_match_ans = reference_prediction_match_ans_temp
  , comp_cols = c("tree_height_m", "dbh_cm")
)
# huh?
gt_temp %>% dplyr::glimpse()
## Rows: 7,463
## Columns: 14
## $ ref_match_idx          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ pred_match_idx         <dbl> 5, 1, 3, 14, 21, 2, 15, 11, 45, 4, 8, 64, 7, 59…
## $ pred_match_dist_m      <dbl> 2.63477069, 1.44906125, 2.40452486, 2.37008286,…
## $ ref_tree_height_m      <dbl> 26.66870, 26.36391, 26.21152, 25.90674, 25.9067…
## $ ref_dbh_cm             <dbl> 48.98477, 61.42132, 45.68528, 60.40609, 57.6142…
## $ pred_tree_height_m     <dbl> 26.2292, 27.3554, 26.8892, 25.4296, 25.0664, 27…
## $ pred_dbh_cm            <dbl> 51.21488, 53.10043, 52.26746, 49.89470, 49.3736…
## $ match_grp              <ord> true positive, true positive, true positive, tr…
## $ ref_geometry           <POINT [m]> POINT (490262.7 4330830), POINT (490048.1…
## $ pred_geometry          <POINT [m]> POINT (490261.9 4330827), POINT (490048.9…
## $ diff_tree_height_m     <dbl> -0.43949820, 0.99148664, 0.67767933, -0.4771359…
## $ diff_dbh_cm            <dbl> 2.230111, -8.320893, 6.582176, -10.511396, -8.2…
## $ pct_diff_tree_height_m <dbl> 0.016479927, -0.037607719, -0.025854255, 0.0184…
## $ pct_diff_dbh_cm        <dbl> -0.04552661, 0.13547240, -0.14407653, 0.1740121…

this is neat, we have the geometry of both the reference and prediction trees as well as the difference between the metric columns we gave the function via the comp_cols argument

let’s use the different geometries to plot the instance matching classification

# custom palette
pal_match_grp = c(
  "omission"=viridis::cividis(3)[1]
  # , "commission"= viridis::cividis(3)[2]
  , "commission"= "gray77"
  , "true positive"=viridis::cividis(3)[3]
)
# scales::show_col(pal_match_grp)
# plt
ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = gt_temp %>% dplyr::filter(match_grp %in% c("true positive","omission"))
    , mapping = ggplot2::aes(geometry = ref_geometry, color = match_grp)
    , size = 3, alpha = 0.8
  ) +
  ggplot2::geom_sf(
    data = gt_temp %>% dplyr::filter(match_grp=="commission")
    , mapping = ggplot2::aes(geometry = pred_geometry, color = match_grp)
    , size = 3, alpha = 0.8
  ) +
  ggplot2::scale_color_manual(values = pal_match_grp) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "top", legend.title = ggplot2::element_blank()) 

Example: Detection Accuracy

let’s make a function to calculate the detection accuracy metrics based on aggregated TP, FP, and FN counts

###############################################################
###############################################################
###############################################################
###############################################################
# first function takes df with cols tp_n, fp_n, and fn_n to calculate rates
###############################################################
###############################################################
###############################################################
###############################################################
confusion_matrix_scores_fn <- function(df) {
  df %>% 
  dplyr::mutate(
    omission_rate = dplyr::case_when(
      dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fn_n,0) == 0 ~ 0 # if there are no actual trees, there is nothing to miss
      , dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fn_n,0) > 0 ~ 1 # every single actual tree was missed
      , dplyr::coalesce(fn_n,0) == 0 & dplyr::coalesce(tp_n,0) > 0 ~ 0
      , T ~ fn_n/(tp_n+fn_n)
    ) # False Negative Rate or Miss Rate
    , commission_rate = dplyr::case_when(
      dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fp_n,0) == 0 ~ 0 # if no predictions are made, the model could not have made any commission errors
      , dplyr::coalesce(fp_n,0) == 0 & dplyr::coalesce(tp_n,0) > 0 ~ 0
      , T ~ fp_n/(tp_n+fp_n)
    ) # False Positive Rate
    , precision = dplyr::case_when(
      dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fp_n,0) == 0 ~ 1 # if no predictions are made, the model made zero incorrect positive claims
      , dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fp_n,0) > 0 ~ 0
      , T ~ tp_n/(tp_n+fp_n)
    )
    , recall = dplyr::case_when(
      dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fn_n,0) == 0 ~ 1 # if there are no actual trees, there is nothing to miss
      , dplyr::coalesce(tp_n,0) == 0 & dplyr::coalesce(fn_n,0) > 0 ~ 0 # every single actual tree was missed
      , T ~ tp_n/(tp_n+fn_n)
    )
    , f_score = dplyr::case_when(
      dplyr::coalesce(precision,0) == 0 | dplyr::coalesce(recall,0) == 0 ~ 0
      , T ~ 2 * ( (precision*recall)/(precision+recall) )
    )    
  ) 
}
###############################################################
###############################################################
###############################################################
###############################################################
# aggregate results from ground_truth_prediction_match()
###############################################################
###############################################################
###############################################################
###############################################################
agg_ground_truth_match <- function(ground_truth_prediction_match_ans) {
  if(nrow(ground_truth_prediction_match_ans)==0){return(NULL)}
  if( !(names(ground_truth_prediction_match_ans) %>% stringr::str_equal("match_grp") %>% any()) ){stop("ground_truth_prediction_match_ans must contain `match_grp` column")}
  
  # check for difference columns (contains "_diff") and calc rmse for only those to return a single line df with colums for each diff_rmse
    if(
      (ground_truth_prediction_match_ans %>% 
        sf::st_drop_geometry() %>% 
        dplyr::select(tidyselect::starts_with("diff_") | tidyselect::starts_with("pct_diff_")) %>% 
        ncol() )>0
    ){
      # get rmse and mean difference/error for all columns with "_diff" but not "pct_diff"
      # get mape for all columns with "pct_diff" but not "diff_"
      rmse_df <- 
        ground_truth_prediction_match_ans %>% 
        sf::st_drop_geometry() %>% 
        dplyr::ungroup() %>% 
        dplyr::select(
          tidyselect::starts_with("diff_")
          | tidyselect::starts_with("pct_diff_")
        ) %>%
        tidyr::pivot_longer(dplyr::everything(), values_drop_na = T) %>% 
        dplyr::group_by(name) %>% 
        dplyr::summarise(
          sq = sum(value^2, na.rm = T)
          , mean = mean(value, na.rm = T)
          , sumabs = sum(abs(value), na.rm = T)
          , nomiss = sum(!is.na(value))
        ) %>% 
        dplyr::ungroup() %>% 
        dplyr::mutate(
          rmse = dplyr::case_when(
            dplyr::coalesce(nomiss,0)==0 ~ as.numeric(NA)
            , T ~ sqrt(sq/nomiss)
          )
          , mape = dplyr::case_when(
            dplyr::coalesce(nomiss,0)==0 ~ as.numeric(NA)
            , T ~ sumabs/nomiss
          )
        ) %>% 
        # NA nonsense values
        dplyr::mutate(
          mape = dplyr::case_when(
            stringr::str_starts(name, "pct_diff_") ~ mape
            , T ~ as.numeric(NA)
          )
          , rmse = dplyr::case_when(
            stringr::str_starts(name, "pct_diff_") ~ as.numeric(NA)
            , T ~ rmse
          )
          , mean = dplyr::case_when(
            stringr::str_starts(name, "pct_diff_") ~ as.numeric(NA)
            , T ~ mean
          )
        ) %>% 
        dplyr::select(name,rmse,mean,mape) %>% 
        tidyr::pivot_wider(
          names_from = name
          , values_from = c(rmse,mean,mape)
          , names_glue = "{name}_{.value}"
        ) %>% 
        # remove columns with NA in all rows
        dplyr::select( dplyr::where( ~!all(is.na(.x)) ) )
      
      if(
        dplyr::coalesce(nrow(rmse_df),0)==0
        || dplyr::coalesce(ncol(rmse_df),0)==0
      ){
        # empty df
        rmse_df <- dplyr::tibble()
      }
    }else{
      # empty df
      rmse_df <- dplyr::tibble()
    }
  
  # count by match group
    agg <-
      ground_truth_prediction_match_ans %>% 
      sf::st_drop_geometry() %>% 
      dplyr::ungroup() %>% 
      dplyr::count(match_grp) %>% 
      dplyr::mutate(
        match_grp = dplyr::recode_values(
          match_grp
          , "true positive"~"tp"
          , "commission"~"fp"
          , "omission"~"fn"
        )
      )
    # true positive, false positive, false negative rates
    return_df <- dplyr::tibble(match_grp = c("tp","fp","fn")) %>% 
      dplyr::left_join(agg, by = "match_grp") %>% 
      dplyr::mutate(dplyr::across(.cols = c(n), .fn = ~dplyr::coalesce(.x,0))) %>% 
      tidyr::pivot_wider(
        names_from = match_grp
        , values_from = c(n)
        , names_glue = "{match_grp}_{.value}"
      )
    # rates, precision, recall, f-score
    return_df <- confusion_matrix_scores_fn(return_df)
    # add rmse
    if(nrow(rmse_df)>0){
      return_df <- return_df %>% dplyr::bind_cols(rmse_df)
    }
  # return
  return(return_df)
}

test it out on the example data (but without the diff_* and pct_diff_* columns which we’ll review next)

gt_temp %>% 
  # remove FP that were outside boundary since we included them to allow for edge reference trees to be matched
  dplyr::left_join(
    reference_prediction_match_ans_temp$pred_data %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(pred_match_idx, is_in_stand)
    , by = "pred_match_idx"
  ) %>% 
  dplyr::filter(
    !(!is_in_stand & match_grp=="commission")
  ) %>% 
  # remove diff cols
  dplyr::select(
    -tidyselect::starts_with("diff_")
    , -tidyselect::starts_with("pct_diff_")
  ) %>% 
  agg_ground_truth_match()
## # A tibble: 1 × 8
##    tp_n  fp_n  fn_n omission_rate commission_rate precision recall f_score
##   <dbl> <dbl> <dbl>         <dbl>           <dbl>     <dbl>  <dbl>   <dbl>
## 1  3349  2179  1868         0.358           0.394     0.606  0.642   0.623

Quantification Accuracy Metrics

Quantification accuracy metrics such as RMSE, MAPE, and Mean Error of tree form measurements (e.g. height, diameter) are calculated by aggregating the differences between the estimated tree attributes and the ground truth values for instances classified as True Positives. These metrics tell us about the method’s ability to accurately quantify the form of the trees it successfully identified.

We only have field-collected slash tree height and diameter measurements from a single study site, the PSINF Mixed Conifer Site. Data from this one study site will be used to test the slash tree quantification accuracies acheived by the proposed methodology, while data from all study sites will be used to validate the slash tree detection methodology.

to prepare our results for analysis, we will develop a function that aggregates the tree-level data into a single record for each metric-error measurement combination. this function will calculate detection performance metrics such as F-score, precision, and recall (using the confusion_matrix_scores_fn() we defined above), as well as quantification accuracy metrics including Root Mean Squared Error (RMSE), Mean Error (ME), and Mean Absolute Percentage Error (MAPE) to assess the accuracy of our tree form measurements. this could be a valuable function for any future analysis comparing predictions to ground truth data.

here are the quantification accuracy metric formulas:

\[ \textrm{RMSE} = \sqrt{ \frac{ \sum_{i=1}^{N} (y_{i} - \hat{y_{i}})^{2}}{N}} \]

\[ \textrm{ME} = \frac{ \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})}{N} \]

\[ \textrm{MAPE} = \frac{1}{N} \sum_{i=1}^{N} \left| \frac{y_{i} - \hat{y_{i}}}{y_{i}} \right| \]

Where \(N\) is equal to the total number of correctly matched trees, \(y_i\) is the ground truth measured value and \(\hat{y_i}\) is the predicted value of \(i\)

There is a lot going on in our agg_ground_truth_match() function but it’s application is straightforward enough:

  • The minimum required input is a data frame of the raw instance matches with a column named match_grp which is string/factor with the levels “true positive”, “commission”, and “omission” as returned by the ground_truth_prediction_match() function
  • Optionally, if the data contain columns with the prefix “diff_” the mean error (ME) is calculated for those columns with the return having the suffix “_mean” and the RMSE is calculated for those columns with the return having the suffix “_rmse”
    • interpretation of the ME is enhanced if these “diff_” columns are calculated as the predicted value minus the actual value (e.g. pred_diameter_m - gt_diameter_m)
  • Optionally, if the data contain columns with the prefix “pct_diff_” the mean absolute percent error (MAPE) is calculated for those columns with the return having the suffix “_mape”
    • these “pct_diff_” columns are calculated as the actual value minus the predicted value divided by the actual value (e.g. (gt_diameter_m - pred_diameter_m)/gt_diameter_m) ### Example: Quantification Accuracy

we already demonstrated minimum use of the agg_ground_truth_match() function to get detection accuracy metrics, now we’ll demonstrate the quantification accuracy metrics by keeping the diff_* and pct_diff_* columns calculated with the ground_truth_prediction_match() function

gt_temp %>% 
  agg_ground_truth_match() %>% 
  dplyr::glimpse()
## Rows: 1
## Columns: 14
## $ tp_n                        <dbl> 3349
## $ fp_n                        <dbl> 2246
## $ fn_n                        <dbl> 1868
## $ omission_rate               <dbl> 0.3580602
## $ commission_rate             <dbl> 0.4014298
## $ precision                   <dbl> 0.5985702
## $ recall                      <dbl> 0.6419398
## $ f_score                     <dbl> 0.6194969
## $ diff_dbh_cm_rmse            <dbl> 5.197515
## $ diff_tree_height_m_rmse     <dbl> 0.7583071
## $ diff_dbh_cm_mean            <dbl> 1.171105
## $ diff_tree_height_m_mean     <dbl> 0.158625
## $ pct_diff_dbh_cm_mape        <dbl> 0.2942456
## $ pct_diff_tree_height_m_mape <dbl> 0.08461773

nice, that could be useful for evaluation

remove(list = ls()[grep("_temp",ls())])
gc()
##            used  (Mb) gc trigger  (Mb)  max used  (Mb)
## Ncells  5020681 268.2    7506972 401.0   7506972 401.0
## Vcells 11145894  85.1  109757999 837.4 111785778 852.9

Apply Instance Match

let’s apply instance matching to all data sets of the flights we are evaluating

ground_truth_prediction_match_ans <- 
  predicted_trees %>% 
  purrr::map(function(x){
    rp <- 
      reference_prediction_match(
        pred_data = x
        , ref_data = validation_trees %>% dplyr::rename(tree_height_m=field_tree_height_m, dbh_cm=field_dbh_cm)
        , max_dist_m = 3
        , max_height_error_m = 2
      )
    # gt match and filter
    if(is.null(rp)){return(NULL)}
    ret <- 
      ground_truth_prediction_match(
        rp
        , comp_cols = c("tree_height_m", "dbh_cm", "basal_area_m2")
      ) %>% 
      # remove FP that were outside boundary since we included them to allow for edge reference trees to be matched
      dplyr::left_join(
        rp$pred_data %>% 
          sf::st_drop_geometry() %>% 
          dplyr::select(pred_match_idx, is_in_stand, treeID) %>% 
          dplyr::rename(pred_tree_id = treeID)
        , by = "pred_match_idx"
      ) %>% 
      dplyr::left_join(
        rp$ref_data %>% 
          sf::st_drop_geometry() %>% 
          dplyr::select(ref_match_idx, field_tree_id) %>% # we know the id is this
          dplyr::rename(ref_tree_id = field_tree_id)
        , by = "ref_match_idx"
      ) %>% 
      dplyr::filter(
        !(!is_in_stand & match_grp=="commission")
      ) %>% 
      dplyr::select(-tidyselect::ends_with("diff_basal_area_m2")) # this is the same as dbh_cm
    return(ret)
  })
  # purrr::list_rbind(names_to = "folder")

# # huh?
# # names(ground_truth_prediction_match_ans)
# ground_truth_prediction_match_ans %>%
#   purrr::list_rbind(names_to = "folder") %>%
#   dplyr::glimpse()

we’ll plot each result spatially

ground_truth_prediction_match_ans %>%
  purrr::list_rbind(names_to = "folder") %>%
  # sf::st_drop_geometry() %>% dplyr::count(folder) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  # sf::st_drop_geometry() %>% dplyr::count(dataset_factor, match_grp) %>%
  dplyr::arrange(dataset_factor, match_grp) %>% 
  # plot it
  # plt
  ggplot2::ggplot() +
    ggplot2::geom_sf(
      data = stand_boundary
      , fill = NA, color = "blue"
    ) +
    ggplot2::geom_sf(
      data = ~ dplyr::filter(.x, match_grp=="commission")
      , mapping = ggplot2::aes(geometry = pred_geometry, color = match_grp)
      , size = 0.6, alpha = 0.8
    ) +
    ggplot2::geom_sf(
      data = ~ dplyr::filter(.x, match_grp %in% c("true positive","omission"))
      , mapping = ggplot2::aes(geometry = ref_geometry, color = match_grp)
      , size = 0.6, alpha = 0.8
    ) +
    ggplot2::scale_color_manual(values = pal_match_grp) +
    ggplot2::facet_wrap(
      facets = dplyr::vars(dataset_factor)
      , ncol = 3
    ) +
    ggplot2::theme_light() +
    ggplot2::theme(
      legend.position = "top"
      , legend.title = ggplot2::element_blank()
      , legend.key = ggplot2::element_blank()
      , strip.text = ggplot2::element_text(face = "bold", color = "black", margin = ggplot2::margin(t = 4, b = 4))
      , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
      # , panel.background = ggplot2::element_rect(fill = NA, color = "black")
      # , panel.spacing = ggplot2::unit(1,"lines")
      , panel.grid.major = ggplot2::element_blank()
      , panel.grid.minor = ggplot2::element_blank()
      , axis.text = ggplot2::element_blank()
      , axis.title = ggplot2::element_blank()
      , axis.ticks = ggplot2::element_blank()
    ) +
    ggplot2::guides(
      color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 6, alpha = 1, fill = NA))
      , fill = "none"
    )

# save
# ggplot2::ggsave(filename = "../data/matchedpts_facets.jpg", height = 9.2, width = 8.5, dpi = "print")

interesting

let’s zoom in on an smaller area (0.20 ha) to better visualize the instance matching results

# re-read rgb
rgb_rast_temp <- terra::rast(rgb_rast_fnm) %>% 
  # keep only rgb bands
  terra::subset(c(1,2,3))
# aoi
aoi_temp <- 
  stand_boundary %>% 
  sf::st_centroid() %>% 
  sf::st_set_crs(terra::crs(rgb_rast_temp)) %>% 
  sf::st_buffer(sqrt(2000/4),endCapStyle = "SQUARE") ## numerator = desired plot size in m2
# sf::st_area(aoi_temp) %>% as.numeric() %>% `/`(10000)
# crop
rgb_rast_temp <- 
  rgb_rast_temp %>% 
  terra::subset(c(1,2,3)) %>% 
  terra::crop(
    aoi_temp %>% 
      sf::st_union() %>% 
      sf::st_bbox() %>% 
      sf::st_as_sfc() %>% 
      sf::st_buffer(2) %>% 
      sf::st_transform(terra::crs(rgb_rast_temp)) %>% 
      terra::vect()
  )
# convert raster to a data frame and create hex colors
# ?grDevices::rgb
rgb_df_temp <-
  rgb_rast_temp %>% 
  terra::as.data.frame(xy = TRUE) %>%
  dplyr::rename(
    red = 3, green = 4, blue = 5
  ) %>%
  dplyr::mutate(
    # rows that have missing color data
    is_missing = is.na(red) | is.na(green) | is.na(blue)
    # hex using 0s for NAs to avoid grDevices::rgb error
    , hex_col = grDevices::rgb(
      ifelse(is_missing, 0, red)
      , ifelse(is_missing, 0, green)
      , ifelse(is_missing, 0, blue)
      , maxColorValue = 255
    )
    # back to NA
    , hex_col = ifelse(is_missing, as.character(NA), hex_col)
  ) %>%
  dplyr::select(-c(is_missing))

# plt
ggplot2::ggplot() +
  # add rgb base map
  ggplot2::geom_tile(data = rgb_df_temp, mapping = ggplot2::aes(x = x, y = y, fill = hex_col), color = NA) +
  # use identity scale so the hex codes are used directly
  ggplot2::scale_fill_identity(na.value = "transparent") + # !!! don't take this out or RGB plot will kill your computer
  # overlay trees
  ggplot2::geom_sf(
    data = 
        ground_truth_prediction_match_ans %>%
        purrr::list_rbind(names_to = "folder") %>%
        dplyr::filter(match_grp %in% c("true positive","omission")) %>% 
        dplyr::inner_join(
          tracking_df %>% 
            dplyr::select(-c(path))
          , by = "folder"
        ) %>%
        dplyr::arrange(dataset_factor, match_grp) %>% 
        sf::st_set_geometry("ref_geometry") %>% 
        sf::st_transform(terra::crs(rgb_rast_temp)) %>% 
        sf::st_intersection(aoi_temp)
    , mapping = ggplot2::aes(geometry = ref_geometry, color = match_grp)
    , size = 1
  ) +
  ggplot2::geom_sf(
    data = 
      ground_truth_prediction_match_ans %>%
      purrr::list_rbind(names_to = "folder") %>%
      dplyr::filter(match_grp=="commission") %>% 
      dplyr::inner_join(
        tracking_df %>% 
          dplyr::select(-c(path))
        , by = "folder"
      ) %>%
      dplyr::arrange(dataset_factor, match_grp) %>% 
      sf::st_set_geometry("pred_geometry") %>% 
      sf::st_transform(terra::crs(rgb_rast_temp)) %>% 
      sf::st_intersection(aoi_temp)
    , mapping = ggplot2::aes(geometry = pred_geometry, color = match_grp)
    , size = 1
  ) +
  ggplot2::scale_color_manual(values = pal_match_grp) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(dataset_factor)
    , ncol = 3
  ) +
  ggplot2::coord_sf(expand = F) +
  ggplot2::labs(subtitle = "") +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "top"
    , legend.title = ggplot2::element_blank()
    , legend.key = ggplot2::element_blank()
    , strip.text = ggplot2::element_text(face = "bold", color = "black", margin = ggplot2::margin(t = 4, b = 4))
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
    # , panel.background = ggplot2::element_rect(fill = NA, color = "black")
    # , panel.spacing = ggplot2::unit(1,"lines")
    , panel.grid.major = ggplot2::element_blank()
    , panel.grid.minor = ggplot2::element_blank()
    , axis.text = ggplot2::element_blank()
    , axis.title = ggplot2::element_blank()
    , axis.ticks = ggplot2::element_blank()
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 6, alpha = 1, fill = NA))
    , fill = "none"
  )

# ggplot2::ggsave(filename = "../data/matchedpts_rgb_facets.jpg", height = 10.2, width = 9.5, dpi = "print")
remove(list = ls()[grep("_temp",ls())])
gc()
##            used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells  5280185 282.0   16628972  888.1  30896822 1650.1
## Vcells 38295006 292.2  525148856 4006.6 656079562 5005.5

Accuracy Results

aggregate the instance matching to get detection and quantification accuracy for each data set

agg_ground_truth_match_ans <- 
  ground_truth_prediction_match_ans %>% 
  purrr::compact() %>% 
  purrr::map(
    \(x)
    agg_ground_truth_match(x) %>% 
      dplyr::bind_cols(
        # add ba and stand area
        x %>% 
          sf::st_drop_geometry() %>% 
          dplyr::ungroup() %>% 
          dplyr::summarise(
            dplyr::across(
              c(ref_basal_area_m2,pred_basal_area_m2)
              , ~sum(.x,na.rm=T)
            )
          ) %>% 
          dplyr::mutate(
            stand_area_ha = stand_boundary$stand_area_m2[1]/10000
            , dplyr::across(
              c(ref_basal_area_m2,pred_basal_area_m2)
              , ~ .x/stand_area_ha
              , .names = "{.col}_per_ha"
            )
          )
      ) %>% 
      # comps not done in agg_ground_truth_match()
      dplyr::mutate(
        ref_trees_per_ha = (tp_n+fn_n)/stand_area_ha
        , pred_trees_per_ha = (tp_n+fp_n)/stand_area_ha
        # 'diff_' columns are calculated as the predicted value minus the actual value
        , diff_trees_per_ha = pred_trees_per_ha-ref_trees_per_ha
        , diff_basal_area_m2 = pred_basal_area_m2-ref_basal_area_m2
        , diff_basal_area_m2_per_ha = pred_basal_area_m2_per_ha-ref_basal_area_m2_per_ha
        # 'abs_diff_' columns are calculated as the predicted value minus the actual value
        , abs_diff_trees_per_ha = abs(diff_trees_per_ha)
        , abs_diff_basal_area_m2 = abs(diff_basal_area_m2)
        , abs_diff_basal_area_m2_per_ha = abs(diff_basal_area_m2_per_ha)
        # 'pct_diff_' columns are calculated as the predicted value minus the actual value divided by the actual value
        , pct_diff_trees_per_ha = (pred_trees_per_ha-ref_trees_per_ha)/ref_trees_per_ha
        , pct_diff_basal_area_m2 = (pred_basal_area_m2-ref_basal_area_m2)/ref_basal_area_m2
        , pct_diff_basal_area_m2_per_ha = (pred_basal_area_m2_per_ha-ref_basal_area_m2_per_ha)/ref_basal_area_m2_per_ha
      )
  ) %>% 
  purrr::list_rbind(names_to = "folder")
# add tracking data cols
agg_ground_truth_match_ans <- tracking_df %>% 
  dplyr::left_join(
    agg_ground_truth_match_ans
    , by = "folder"
  )
# huh?
# agg_ground_truth_match_ans %>% dplyr::glimpse()

Detection Accuracy Results

let’s table that

agg_ground_truth_match_ans %>% 
  dplyr::arrange(dataset_factor) %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "On"
      , !flight_tf ~ "Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::select(
    tidyselect::starts_with("flight_")
    , tidyselect::ends_with("_n")
    , precision, recall, f_score
  ) %>% 
  dplyr::mutate(
    dplyr::across(
      tidyselect::starts_with("flight_")
      , as.factor
    )
    , dplyr::across(
      tidyselect::ends_with("_n")
      , ~scales::comma(.x,accuracy=1)
    )
    , dplyr::across(
      dplyr::where(is.numeric)
      , ~scales::percent(.x,accuracy=0.1)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "Detection Accuracy"
    , col.names = c(
      "altitude (ft)", "speed (mph)", "Terrain Follow"
      , "TP predictions", "FP predictions", "FN predictions"
      , "Precision", "Recall", "F-score"
    )
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(3), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Detection Accuracy
altitude (ft) speed (mph) Terrain Follow TP predictions FP predictions FN predictions Precision Recall F-score
200 10 On 3,244 2,038 1,973 61.4% 62.2% 61.8%
20 Off 3,349 2,179 1,868 60.6% 64.2% 62.3%
On 3,299 2,201 1,918 60.0% 63.2% 61.6%
300 10 On 3,289 2,132 1,928 60.7% 63.0% 61.8%
20 Off 3,346 2,162 1,871 60.7% 64.1% 62.4%
400 10 On 3,303 2,146 1,914 60.6% 63.3% 61.9%
20 Off 3,377 2,215 1,840 60.4% 64.7% 62.5%
On 3,326 2,205 1,891 60.1% 63.8% 61.9%

and plot it

#pal
pal_eval_metric <- c(
  "F-score" = RColorBrewer::brewer.pal(3,"Oranges")[3]
  , "Precision" = RColorBrewer::brewer.pal(3,"Purples")[3]
  , "Recall" = RColorBrewer::brewer.pal(3,"Greys")[3]
)
#plt
agg_ground_truth_match_ans %>% 
  dplyr::arrange(dataset_factor) %>% 
  dplyr::select(
    dataset_factor
    , precision, recall, f_score
  ) %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = metric %>%
      forcats::fct_relevel("f_score", "recall", "precision") %>%
      forcats::fct_recode("F-score" = "f_score", "Recall" = "recall", "Precision" = "precision")
  ) %>% 
  dplyr::mutate(
    dplyr::across(
      dplyr::where(is.numeric)
      , list(lab = ~scales::percent(.x,accuracy=0.1))
    )
  ) %>% 
  dplyr::mutate(dataset_factor = forcats::fct_rev(dataset_factor)) %>% 
  # dplyr::glimpse()
  # dplyr::pull(metric) %>% levels()
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      x = value, y = dataset_factor
      , fill = metric
    )
  ) +
  ggplot2::geom_col(width = 0.5) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = value_lab, fontface = "bold")
    , color = "black"
    # , size = 2.3
    , hjust = -0.1
  ) +
  ggplot2::scale_fill_manual(values=pal_eval_metric) +
  ggplot2::scale_x_continuous(
    limits = c(0,1)
    , labels=scales::percent
    , breaks = scales::breaks_extended(n=6)
    , expand = ggplot2::expansion(mult = c(0,0.03))
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(metric)
    , scales = "free_x"
    # , axes = "all"
  ) +
  ggplot2::labs(
    x="",y=""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , axis.text.y = ggplot2::element_text(size = 9, face = "bold")
    , axis.text.x = ggplot2::element_blank()
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 6, alpha = 1))
  )

It doesn’t look like there is much variation in detection accuracy across the different flight settings. Furthermore, since we have exactly one observation per flight (e.g. no repeated measures with these flight settings such as from different study sites) we do not have any variation within the flight groups and have measured the “truth” for the particular flight settings. As such, we cannot statistically test one flight (with those exact settings) against another. However, we can calculate the raw differnces.

# calc pairwise diffs
# dplyr::glimpse(agg_ground_truth_match_ans)
pairwise_diffs_temp <-
  tidyr::crossing(
    agg_ground_truth_match_ans %>% 
      dplyr::select(dataset_factor,f_score) %>% 
      dplyr::rename_with(~paste0(.x,"_1",recycle0 = T))
    , agg_ground_truth_match_ans %>% 
      dplyr::select(dataset_factor,f_score) %>% 
      dplyr::rename_with(~paste0(.x,"_2",recycle0 = T))
  ) %>% 
  # filter to remove self-joins and duplicates
  dplyr::filter(dataset_factor_1 < dataset_factor_2) %>%
  dplyr::mutate(
    diff = f_score_1 - f_score_2
    , abs_diff = abs(diff)
    , comp = paste(dataset_factor_1, "vs", dataset_factor_2)
  )

pairwise difference table

pairwise_diffs_temp %>%
  dplyr::select(dataset_factor_1, dataset_factor_2, f_score_1, f_score_2, diff) %>%
  dplyr::mutate(dplyr::across(dplyr::where(is.numeric), ~scales::percent(.x,accuracy=0.1) )) %>% 
  kableExtra::kbl(
    caption = "Pairwise Differences in F-Score between Flights"
    , col.names = c(
      "Flight"
      , "Comparison Flight"
      , "F-score"
      , "Comparison F-score"
      , "F-score Difference"
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top")
Pairwise Differences in F-Score between Flights
Flight Comparison Flight F-score Comparison F-score F-score Difference
AGL: 200 | MPH: 10 | TF: on AGL: 200 | MPH: 20 | TF: off 61.8% 62.3% -0.5%
AGL: 200 | MPH: 20 | TF: on 61.8% 61.6% 0.2%
AGL: 300 | MPH: 10 | TF: on 61.8% 61.8% 0.0%
AGL: 300 | MPH: 20 | TF: off 61.8% 62.4% -0.6%
AGL: 400 | MPH: 10 | TF: on 61.8% 61.9% -0.1%
AGL: 400 | MPH: 20 | TF: off 61.8% 62.5% -0.7%
AGL: 400 | MPH: 20 | TF: on 61.8% 61.9% -0.1%
AGL: 200 | MPH: 20 | TF: off AGL: 200 | MPH: 20 | TF: on 62.3% 61.6% 0.8%
AGL: 300 | MPH: 10 | TF: on 62.3% 61.8% 0.5%
AGL: 300 | MPH: 20 | TF: off 62.3% 62.4% -0.1%
AGL: 400 | MPH: 10 | TF: on 62.3% 61.9% 0.4%
AGL: 400 | MPH: 20 | TF: off 62.3% 62.5% -0.1%
AGL: 400 | MPH: 20 | TF: on 62.3% 61.9% 0.4%
AGL: 200 | MPH: 20 | TF: on AGL: 300 | MPH: 10 | TF: on 61.6% 61.8% -0.3%
AGL: 300 | MPH: 20 | TF: off 61.6% 62.4% -0.8%
AGL: 400 | MPH: 10 | TF: on 61.6% 61.9% -0.4%
AGL: 400 | MPH: 20 | TF: off 61.6% 62.5% -0.9%
AGL: 400 | MPH: 20 | TF: on 61.6% 61.9% -0.3%
AGL: 300 | MPH: 10 | TF: on AGL: 300 | MPH: 20 | TF: off 61.8% 62.4% -0.6%
AGL: 400 | MPH: 10 | TF: on 61.8% 61.9% -0.1%
AGL: 400 | MPH: 20 | TF: off 61.8% 62.5% -0.7%
AGL: 400 | MPH: 20 | TF: on 61.8% 61.9% -0.1%
AGL: 300 | MPH: 20 | TF: off AGL: 400 | MPH: 10 | TF: on 62.4% 61.9% 0.5%
AGL: 400 | MPH: 20 | TF: off 62.4% 62.5% -0.1%
AGL: 400 | MPH: 20 | TF: on 62.4% 61.9% 0.5%
AGL: 400 | MPH: 10 | TF: on AGL: 400 | MPH: 20 | TF: off 61.9% 62.5% -0.5%
AGL: 400 | MPH: 20 | TF: on 61.9% 61.9% 0.0%
AGL: 400 | MPH: 20 | TF: off AGL: 400 | MPH: 20 | TF: on 62.5% 61.9% 0.6%

pairwise difference plot

ggplot2::ggplot(
  data = pairwise_diffs_temp
  , mapping = ggplot2::aes(y = reorder(comp, diff), x = diff)
) +
  ggplot2::geom_vline(xintercept = 0, color = "black") +
  ggplot2::geom_segment(
    mapping = ggplot2::aes(yend = comp, xend = 0)
    , color = "gray44"
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = diff > 0)
    , size = 3
  ) +
  ggplot2::scale_color_manual(values = c("firebrick","navy")) +
  ggplot2::scale_x_continuous(
    limits = c(-0.05,0.05)
    , labels = scales::percent
    , breaks = scales::breaks_extended(n=10)
    , sec.axis = ggplot2::dup_axis()
  ) +
  ggplot2::labs(
    y = "" # "comparison"
    , x = "difference F-score"
    , subtitle = "positive = first flight in pair scored higher"
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(legend.position = "none", plot.subtitle = ggplot2::element_text(hjust = 1))

so, the difference in F-score is no more than 0.92% across all flights. yes, that is less than 1% on a percentage point basis

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "terrain follow: On"
      , !flight_tf ~ "terrain follow: Off"
      , T ~ "error"
    )
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = flight_mph, y = f_score, color = as.factor(flight_agl))
  ) +
  ggplot2::geom_point(size = 5,alpha=0.9) +
  ggplot2::geom_line(
    mapping = ggplot2::aes(group = flight_agl)
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = scales::percent(f_score,accuracy=0.1))
    , color = "black"
    , vjust = -1
  ) +
  ggplot2::facet_grid(cols = dplyr::vars(flight_tf)) +
  ggplot2::scale_y_continuous(
    # limits = c(0,1)
    # , breaks = seq(0,1,by=0.2)
    labels = scales::percent
    , expand = ggplot2::expansion(mult = 0.4)
  ) +
  ggplot2::scale_color_viridis_d(option = "mako", begin = 0.5, end = 0.1) +
  ggplot2::labs(
    subtitle = "comparison of speed and altitude across terrain follow"
    , x = "speed (mph)"
    , y = "F-score"
    , color = "altitude (ft)"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, size = 6))
  )

note that y-axis range ::laugh_cry_emoji::

let’s see the relationship between tree detection (recall rate) and tree height within each flight. we’ll look at true positive and omissions (FN) and use the field-measured tree height to quantify the proportion of trees within a height bin that were detected by the UAS-lidar

# make ggplot2::cut_* bins pretty
make_bins_pretty <- function(bin_factor, suffix = "") {
  # convert factor to character and remove brackets and parens
  clean_labels <- bin_factor %>%
    as.character() %>%
    stringr::str_replace_all(c("\\[" = "", "\\(" = "", "\\]" = "", "\\)" = "")) %>%
    stringr::str_replace(",", "-")
  
  # suffix
  if(stringr::str_squish(suffix) != "" && inherits(suffix,"character")){
    clean_labels <- stringr::str_c(clean_labels, " ", suffix) %>% stringr::str_squish()
  }
  
  # reorder based on the original factor levels 
  forcats::fct_reorder(factor(clean_labels), as.numeric(bin_factor))
}

dta_temp <-
  ground_truth_prediction_match_ans %>%
  purrr::list_rbind(names_to = "folder") %>%
  sf::st_drop_geometry() %>% 
  dplyr::filter(match_grp %in% c("true positive","omission")) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  # dplyr::glimpse()
  # metric cuts/bins
  dplyr::mutate(
    tree_height_m_bin = ggplot2::cut_width(ref_tree_height_m, width = 5, boundary = 0) %>% 
      make_bins_pretty(suffix = "m") %>% 
      forcats::fct_relabel(~ stringr::str_replace(.x, "^0-", paste0(my_tree_ht_m,"-")) )
    , is_detected = match_grp %in% c("true positive")
    , is_detected = as.numeric(is_detected)
  )
dta_temp %>% 
  dplyr::count(dataset_factor, match_grp, tree_height_m_bin) %>% 
  dplyr::group_by(dataset_factor, tree_height_m_bin) %>% 
  dplyr::mutate(pct = n/sum(n,na.rm = T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    lab = ifelse(
      pct>0.095
      , paste0(
        scales::percent(pct,accuracy=1) 
        ,"\n("
        , scales::comma(n,accuracy=1) 
        , ")"
      )
      , ""
    )
  ) %>% 
  dplyr::ungroup() %>% 
  # dplyr::glimpse()
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = pct, x = tree_height_m_bin)
  ) +
  ggplot2::geom_col(
    mapping = ggplot2::aes(fill = match_grp)
    , width = 0.6
    , color = NA, alpha = 0.9
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = lab, group = match_grp, fontface = "bold", color = match_grp)
    # , color = "black"
    , size = 2.5
    , position = ggplot2::position_stack(vjust = 0.5)
  ) +
  ggplot2::scale_fill_manual(values = pal_match_grp) +
  ggplot2::scale_color_manual(values = c("white","black"), guide = "none") +
  ggplot2::scale_x_discrete(drop = F) +
  ggplot2::scale_y_continuous(
    breaks = seq(0,1,by=0.2)
    , labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.02))
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(dataset_factor)
    # , rows = dplyr::vars(trtmnt_block)
    , ncol = 3
    , axes = "all"
    # , switch = "y"
  ) +
  ggplot2::labs(
    x = "tree height", y = "", fill = ""
    , subtitle = "Recall rate by tree height bin"
  ) +
  ggplot2::theme_light()+
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7, angle = 45, hjust = 1, vjust = 1)
    , axis.text.y = ggplot2::element_blank()
    , axis.ticks.y = ggplot2::element_blank()
  )

# ggplot2::ggsave("c:/users/georg/Downloads/n1_recall_by_htbin.jpg", height = 10, width = 8)

the trend of increasing detection rate with tree height demonstrated with this data aligns well with prior work using aerial point cloud data for single-tree forest inventories

let’s look at the commission rate (commission error) by tree height using the UAS-detected tree heights to determine if our ITD variable window function could be improved

dta_temp <-
  ground_truth_prediction_match_ans %>%
  purrr::list_rbind(names_to = "folder") %>%
  sf::st_drop_geometry() %>% 
  dplyr::filter(match_grp %in% c("true positive","commission")) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  # dplyr::glimpse()
  # metric cuts/bins
  dplyr::mutate(
    tree_height_m_bin = ggplot2::cut_width(pred_tree_height_m, width = 5, boundary = 0) %>% 
      make_bins_pretty(suffix = "m") %>% 
      forcats::fct_relabel(~ stringr::str_replace(.x, "^0-", paste0(my_tree_ht_m,"-")) )
    , is_detected = match_grp %in% c("true positive")
    , is_detected = as.numeric(is_detected)
  )
dta_temp %>% 
  dplyr::count(dataset_factor, match_grp, tree_height_m_bin) %>% 
  dplyr::group_by(dataset_factor, tree_height_m_bin) %>% 
  dplyr::mutate(pct = n/sum(n,na.rm = T)) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    lab = ifelse(
      pct>0.095
      , paste0(
        scales::percent(pct,accuracy=1) 
        ,"\n("
        , scales::comma(n,accuracy=1) 
        , ")"
      )
      , ""
    )
    , match_grp = forcats::fct_rev(match_grp)
  ) %>% 
  dplyr::ungroup() %>% 
  # dplyr::glimpse()
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = pct, x = tree_height_m_bin)
  ) +
  ggplot2::geom_col(
    mapping = ggplot2::aes(fill = match_grp)
    , width = 0.6
    , color = NA, alpha = 0.9
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = lab, group = match_grp, fontface = "bold", color = match_grp)
    , color = "black"
    , size = 2.5
    , position = ggplot2::position_stack(vjust = 0.5)
  ) +
  ggplot2::scale_fill_manual(values = pal_match_grp) +
  # ggplot2::scale_color_manual(values = c("black","white"), guide = "none") +
  ggplot2::scale_x_discrete(drop = F) +
  ggplot2::scale_y_continuous(
    breaks = seq(0,1,by=0.2)
    , labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.02))
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(dataset_factor)
    # , rows = dplyr::vars(trtmnt_block)
    , ncol = 3
    , axes = "all"
    # , switch = "y"
  ) +
  ggplot2::labs(
    x = "tree height", y = "", fill = ""
    , subtitle = "Commission rate by tree height bin"
  ) +
  ggplot2::theme_light()+
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7, angle = 45, hjust = 1, vjust = 1)
    , axis.text.y = ggplot2::element_blank()
    , axis.ticks.y = ggplot2::element_blank()
  )

# ggplot2::ggsave("c:/users/georg/Downloads/n1_commission_by_htbin.jpg", height = 10, width = 8)

the commission error (percentage shown in the gray bar) is highest for the intermediate trees which suggests we could potentially improve our ITD window function to decrease the number of trees detected in this middle height range. However, going back to the detection rate above, this same height range also had the lowest detection rate which would suggest we increase the number of trees identified in this height range. This conflict suggests that the trees in this height range are likely difficult to detect for the study area using the data and ITD methods used in this study. For example, they may be subordinate to directly adjacent taller trees with the crowns interlocking, so they are often grouped with the taller tree during the CHM-based ITD.

Quantification Accuracy Results

Tree-Level

let’s table the tree-level aggregated measurement error

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    ref_trees = tp_n+fn_n
    , det_trees = tp_n+fp_n
  ) %>% 
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , site_area_m2
    # detection
    , ref_trees
    , det_trees
    , tp_n
    , omission_rate,commission_rate,recall,precision,f_score
    # quantification
    , tidyselect::ends_with("_mean")
    , tidyselect::ends_with("_rmse")
    , tidyselect::ends_with("_mape")
  ) %>% 
  # second select to arrange tree_metric
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , site_area_m2
    # detection
    , ref_trees
    , det_trees
    , tp_n
    , omission_rate,commission_rate,recall,precision,f_score
    # quantification
    # , c(tidyselect::contains("volume") & !tidyselect::contains("paraboloid"))
    , tidyselect::contains("area")
    , tidyselect::contains("height")
    , tidyselect::contains("diameter")
    , tidyselect::contains("dbh")
  ) %>% 
  # names()
  dplyr::mutate(
    dplyr::across(
      .cols = c(
        f_score, recall, precision, tidyselect::ends_with("_mape")
        , tidyselect::ends_with("_rate")
      )
      , .fn = ~ scales::percent(.x, accuracy = 1)
    )
    , dplyr::across(
      .cols = c(tidyselect::ends_with("_mean"))
      , .fn = ~ scales::comma(.x, accuracy = 0.01)
    )
    , dplyr::across(
      .cols = c(tidyselect::ends_with("_rmse"))
      , .fn = ~ scales::comma(.x, accuracy = 0.01)
    )
  ) %>% 
  dplyr::select(
    !tidyselect::contains("_area_")
    & !tidyselect::contains("diff_diameter_")
    & !tidyselect::ends_with("_trees")
    # & !tidyselect::ends_with("_n")
    & !tidyselect::ends_with("_rate")
  ) %>% 
  dplyr::arrange(dataset_factor) %>% 
  dplyr::select(
    -c(
      recall,precision,f_score
      , dataset_factor
    )
  ) %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "On"
      , !flight_tf ~ "Off"
      , T ~ "error"
    )
    , dplyr::across(
      tidyselect::starts_with("flight_")
      , as.factor
    )
  ) %>% 
  # dplyr::glimpse()
  
  kableExtra::kbl(
    caption = "Quantification Accuracy"
    , col.names = c(
      "altitude (ft)", "speed (mph)", "Terrain Follow"
      , "TP predictions"
      , rep(c("ME","RMSE","MAPE"), times = 2)
    )
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling(font_size = 11.5) %>% 
  kableExtra::add_header_above(c(
    " "=4
    # , "Detection" = 3
    # , "Area" = 3
    , "Height (m)" = 3
    , "DBH (cm)" = 3
  )) %>% 
  kableExtra::column_spec(seq(4,9,by=3), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Quantification Accuracy
Height (m)
DBH (cm)
altitude (ft) speed (mph) Terrain Follow TP predictions ME RMSE MAPE ME RMSE MAPE
200 10 On 3244 0.20 0.77 9% 1.25 5.23 29%
20 Off 3349 0.16 0.76 8% 1.17 5.20 29%
On 3299 0.17 0.75 8% 1.20 5.19 30%
300 10 On 3289 0.18 0.76 8% 1.23 5.20 30%
20 Off 3346 0.16 0.77 9% 1.17 5.21 29%
400 10 On 3303 0.16 0.74 8% 1.47 5.21 32%
20 Off 3377 0.15 0.74 8% 1.47 5.23 33%
On 3326 0.14 0.72 8% 1.36 5.22 31%

and plot predicted versus reference values for correct predictions (TP)

####################################
# area scatter
####################################
dta_temp <-
  ground_truth_prediction_match_ans %>% 
  purrr::compact() %>% 
  purrr::imap(
    \(x,nm) 
    dplyr::filter(x,match_grp=="true positive") %>% 
    sf::st_drop_geometry() %>% 
    dplyr::mutate(folder = nm)
  )%>% 
  dplyr::bind_rows() %>% 
  dplyr::left_join(
    agg_ground_truth_match_ans %>% 
      dplyr::select(folder, dataset_factor, tidyselect::ends_with("_rmse") , tidyselect::ends_with("_mape") )
    , by = "folder"
  )
  # dplyr::glimpse()
# plt fn temp
plt_fn_temp <- function(
    x
    , which_col = "dbh_cm"
    , title_lab = "DBH"
    , x_lab = "reference DBH (cm)" # latex2exp::TeX("reference area ($m^2$)")
    , y_lab = "predicted DBH (cm)" # latex2exp::TeX("predicted area ($m^2$)")
){
    d <- dta_temp %>% 
      dplyr::filter(folder==x)
    # construct the formula: "y_var ~ x_var"
    y_var <- paste0("pred_",which_col)
    x_var <- paste0("ref_",which_col)
    formula_obj <- stats::as.formula(
      paste( 
        y_var
        , "~"
        , x_var 
      )
    )
    lm_temp <- lm(
      formula = formula_obj
      , data = d
    )
    # summary(lm_temp)
    # scales::percent(summary(lm_temp)$r.squared, accuracy = 0.1)
    lmf_temp <- paste0(
      "y = "
      , scales::number(summary(lm_temp)$coefficients[2,1], accuracy = 0.01) #slope
      , "x"
      , ifelse(summary(lm_temp)$coefficients[1,1]<0," - ", " + ")
      , scales::number(abs(summary(lm_temp)$coefficients[1,1]), accuracy = 0.01) #intercept
    )
    # scale limits
    max_val <- max( 
      max(dta_temp[[y_var]],na.rm=T)
      , max(dta_temp[[x_var]],na.rm=T)
    )
    
    # plot
    d %>% 
      ggplot2::ggplot(mapping = ggplot2::aes(y = .data[[y_var]], x = .data[[x_var]])) +
      ggplot2::geom_abline(color = "gray33", lwd = 1) +
      ggplot2::geom_point(color = "navy", alpha = 0.7, size = 0.9) +
      ggplot2::geom_smooth(lwd = 0.8, method = "lm", se=F, color = "gray", linetype = "dashed") +
      ggplot2::annotate(
        "text", 
        x = -Inf, y = Inf # Top Left
        , label = 
          # paste0(
          #   "R^2 == "
          #   , scales::number((summary(lm_temp)$r.squared)*100, accuracy = 0.1)
          #   , "*'%'"
          # )
          paste0(
            "RMSE = "
            , d %>% 
              dplyr::slice(1) %>% 
              dplyr::select(dplyr::all_of(paste0("diff_",which_col, "_rmse"))) %>% 
              dplyr::pull(1) %>% 
              unique() %>% 
              # scales::number(accuracy = 0.1, suffix = " (m²)")
              scales::number(accuracy = 0.1, suffix = ifelse(which_col=="dbh_cm"," (cm)"," (m)") )
            , "\nMAPE = "
            , d %>% 
              dplyr::select(dplyr::all_of(paste0("pct_diff_",which_col, "_mape"))) %>% 
              dplyr::pull(1) %>% 
              unique() %>% 
              scales::percent(accuracy = 0.1)
            , "\n"
            , lmf_temp
            , "\n R² = "
            # , scales::number((summary(lm_temp)$r.squared)*100, accuracy = 0.1, suffix = "%")
            , scales::number((summary(lm_temp)$r.squared), accuracy = 0.01)
          )
        , hjust = -0.1, vjust = 1.1
        , parse = F
        , size = 3
      ) +
      ggplot2::facet_wrap(
        facets = dplyr::vars(dataset_factor), scales = "free", axis.labels = "all"
        # , labeller = ggplot2::labeller(class = ggplot2::label_wrap_gen(width = 10))
      ) +
      ggplot2::scale_x_continuous(limits = c(0, max_val ), expand = ggplot2::expansion(mult = c(0,0.1))) +
      ggplot2::scale_y_continuous(limits = c(0, max_val ), expand = ggplot2::expansion(mult = c(0,0.1))) +
      ggplot2::labs(
        x = x_lab
        , y = y_lab
        # , color = "image-field\ndiameter diff."
        # , subtitle = latex2exp::TeX("bulk volume ($m^3$) comparison")
        # , subtitle = paste("Predicted", title_lab,"versus Reference", title_lab)
      ) +
      ggplot2::theme_light() +
      ggplot2::theme(
        strip.text = ggplot2::element_text(face = "bold", color = "black")
      ) 
  }
# get the plottttttttttttttttttt
dbh_plts_temp <- 
  unique(dta_temp$folder) %>% 
  purrr::map(
    \(z)
    plt_fn_temp(x=z, which_col = "dbh_cm", title_lab = "DBH", x_lab = "reference DBH (cm)", y_lab = "predicted DBH (cm)")
  )
# dbh_plts_temp[[1]]
dbh_plt_temp <- 
  patchwork::wrap_plots(dbh_plts_temp, ncol = 3) + 
  patchwork::plot_annotation(
    subtitle = "Predicted DBH versus Reference DBH"
    , theme = ggplot2::theme(plot.subtitle = ggplot2::element_text(face = "bold"))
  ) &
  ggplot2::theme(
    axis.title = ggplot2::element_text(size = 6)
    , axis.text = ggplot2::element_text(size = 6)
  )
# dbh_plt_temp
# get the plottttttttttttttttttt
ht_plts_temp <- 
  unique(dta_temp$folder) %>% 
  purrr::map(
    \(z)
    plt_fn_temp(x=z, which_col = "tree_height_m", title_lab = "Height", x_lab = "reference Height (m)", y_lab = "predicted Height (m)")
  )
# ht_plts_temp[[1]]
ht_plt_temp <- 
  patchwork::wrap_plots(ht_plts_temp, ncol = 3) + 
  patchwork::plot_annotation(
    subtitle = "Predicted Height versus Reference Height"
    , theme = ggplot2::theme(plot.subtitle = ggplot2::element_text(face = "bold"))
  ) &
  ggplot2::theme(
    axis.title = ggplot2::element_text(size = 6)
    , axis.text = ggplot2::element_text(size = 6)
  )
# ht_plt_temp
# ggplot2::ggsave(plot = ht_plt_temp, filename = "../data/heyhtscatter.jpg", height = 9, width = 8.95, dpi = "print")

DBH

dbh_plt_temp

the UAS-predicted DBH values are directionally correct as evidenced by the high correlation coefficient of approximately 0.91 across all tested flights. there is a size-dependent bias of DBH predictions with the UAS tending to overpredict the diameter of small and intermediate trees while underpredicting the diameter of the largest trees. the total measurement error remains reasonable for forest inventory with the RMSE of approximately 5.21 cm (2.05 in) and the average magnitude of the DBH prediction error quantified by MAPE is approximately 30% across the flights

height

ht_plt_temp

UAS-predicted height is highly accurate compared to field-measured values, a finding supported by many prior studies. also, remember how we conditionally matched UAS-predicted trees with reference trees based on a maximum height difference threshold of 2 m ?

full distribution of predicted and reference values irrespective of correct or incorrect prediction match

ground_truth_prediction_match_ans %>%
  purrr::list_rbind(names_to = "folder") %>%
  sf::st_drop_geometry() %>%
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  dplyr::select(
    dataset_factor
    , ref_tree_height_m, pred_tree_height_m
    , ref_dbh_cm, pred_dbh_cm
  ) %>% 
  tidyr::pivot_longer(
    cols = -c(dataset_factor)
    , names_to = "metric"
    , values_to = "value"
    , values_drop_na = T
  ) %>% 
  dplyr::mutate(
    which_data = dplyr::case_when(
        stringr::str_starts(metric,"field_") | stringr::str_starts(metric,"ref_") ~ "field"
        , stringr::str_starts(metric,"pred_") ~ "prediction"
        , T ~ "error"
      ) %>% 
      ordered()
    , tree_metric = metric %>% 
      stringr::str_remove("(_rmse|_rrmse|_mean|_mape)$") %>% 
      stringr::str_extract("(height|dbh)") %>% 
      factor(
        ordered = T
        , levels = c(
          "height"
          , "dbh"
        )
        , labels = c(
          "Height (m)"
          , "DBH (cm)"
        )
      )
  ) %>% 
  dplyr::mutate(
    dataset_factor = dataset_factor %>% forcats::fct_relabel(~ str_replace_all(.x, "\\|", "\n"))
  ) %>% 
  # dplyr::glimpse()
# plot dist
  ggplot2::ggplot(mapping = ggplot2::aes(x = value, color = which_data, fill = which_data)) +
  ggplot2::geom_density(mapping = ggplot2::aes(y=ggplot2::after_stat(scaled)), alpha = 0.7) +
  ggplot2::facet_grid(
    rows = dplyr::vars(dataset_factor)
    , cols = dplyr::vars(tree_metric)
    , scales = "free_x", axes = "all_x"
    , switch = "y"
  ) +
  harrypotter::scale_color_hp_d(option = "hermionegranger") +
  harrypotter::scale_fill_hp_d(option = "hermionegranger") +
  ggplot2::scale_y_continuous(NULL,breaks=NULL) +
  ggplot2::scale_x_continuous(breaks=scales::breaks_extended(n=6)) +
  ggplot2::labs(
    color="",fill="",x=""
    # , subtitle = "comparison of height and DBH distributions"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text.x = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , strip.text.y.left =  ggplot2::element_text(size = 9, angle = 0, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 6)
    , panel.grid = ggplot2::element_blank()
  )

# ggplot2::ggsave(filename = "../data/heydist.jpg", height = 10.5, width = 8, dpi = "print")

Stand-Level

In addition to the tree-level accuracy metrics (e.g. height RMSE and MAPE), we can also calculate stand-level accuracy metrics by aggregating estimates from the reference (ground truth) and predicted trees irrespective of wheter or not they were matched (i.e. true positive) during instance matching.

We’ll evaluate stand level aggregation accuracy for stand basal area and trees per hectare. For these stand metrics, we calculated the difference (\(predicted - reference\)) to measure the bias in the original units (e.g., \(\frac{m^{2}}{ha}\) for basal area) which indicates the over- or under-estimation of the stand structure. We also calculated the percent difference (\(\frac{predicted-reference}{reference}\)) to measure the relative bias (scale-independent) to capture the magnitude of the error.

let’s table the stand-level measurement error

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    ref_trees = tp_n+fn_n
    , pred_trees = tp_n+fp_n
  ) %>% 
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , site_area_m2
    # detection
    , ref_trees
    , pred_trees
    # quantification
    , tidyselect::ends_with("_basal_area_m2_per_ha")
    , tidyselect::ends_with("_trees_per_ha")
  ) %>% 
  dplyr::select(!tidyselect::starts_with("abs_diff_")) %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "On"
      , !flight_tf ~ "Off"
      , T ~ "error"
    )
    , dplyr::across(
      .cols = c(
        tidyselect::starts_with("pct_diff_")
        , tidyselect::ends_with("_rate")
      )
      , .fn = ~ scales::percent(.x, accuracy = .1)
    )
    , dplyr::across(
      .cols = c(flight_agl,flight_mph,tidyselect::ends_with("_trees"))
      , .fn = ~ scales::comma(.x, accuracy = 1)
    )
    , dplyr::across(
      .cols = dplyr::where(is.numeric)
      , .fn = ~ scales::comma(.x, accuracy = 0.1)
    )
    # # one column for diff/pct
    , diff_basal_area_m2_per_ha = paste0(diff_basal_area_m2_per_ha, "<br>(", pct_diff_basal_area_m2_per_ha, ")")
    , diff_trees_per_ha = paste0(diff_trees_per_ha, "<br>(", pct_diff_trees_per_ha, ")")
  ) %>% 
  dplyr::select(
    !tidyselect::starts_with("pct_diff_")
  ) %>% 
  dplyr::arrange(dataset_factor) %>% 
  dplyr::select(
    -c(
      dataset_factor
    )
  ) %>% 
  # dplyr::glimpse()
  
  kableExtra::kbl(
    caption = "Stand-Level Accuracy"
    , col.names = c(
      "altitude (ft)", "speed (mph)", "Terrain Follow"
      , "reference", "predicted"
      , rep(c("reference","predicted","difference"), times = 2)
    )
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling(font_size = 11.5) %>% 
  kableExtra::add_header_above(
    c(
      " "=3
      , "# Trees" = 2
      # , "Area" = 3
      , "Basal Area<br />m<sup>2</sup> ha<sup>-1</sup>" = 3
      , "Trees<br />ha<sup>-1</sup>" = 3
    )
    , escape = F
  ) %>% 
  kableExtra::column_spec( c(3,seq(5,11,by=3) ), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Stand-Level Accuracy
# Trees

Basal Area
m2 ha-1

Trees
ha-1

altitude (ft) speed (mph) Terrain Follow reference predicted reference predicted difference reference predicted difference
200 10 On 5,217 5,282 24.8 27.3 2.5
(10.3%)
561.3 568.3 7.0
(1.2%)
20 Off 5,217 5,528 24.8 27.8 3.1
(12.4%)
561.3 594.8 33.5
(6.0%)
On 5,217 5,500 24.8 27.5 2.7
(10.9%)
561.3 591.8 30.4
(5.4%)
300 10 On 5,217 5,421 24.8 27.9 3.1
(12.6%)
561.3 583.3 21.9
(3.9%)
20 Off 5,217 5,508 24.8 27.9 3.2
(12.8%)
561.3 592.6 31.3
(5.6%)
400 10 On 5,217 5,449 24.8 28.4 3.6
(14.6%)
561.3 586.3 25.0
(4.4%)
20 Off 5,217 5,592 24.8 28.5 3.8
(15.3%)
561.3 601.7 40.3
(7.2%)
On 5,217 5,531 24.8 28.4 3.6
(14.7%)
561.3 595.1 33.8
(6.0%)

let’s plot the percent difference in BA and TPH by flight

agg_ground_truth_match_ans %>% 
  dplyr::select(
    dataset_factor
    , pct_diff_basal_area_m2_per_ha
    , pct_diff_trees_per_ha
  ) %>% 
  dplyr::mutate(
    tot = -1* (abs(pct_diff_basal_area_m2_per_ha) + abs(pct_diff_trees_per_ha))
  ) %>% 
  tidyr::pivot_longer(cols = -c(dataset_factor,tot)) %>% 
  dplyr::mutate(
    name = dplyr::recode_values(
      name
      , "pct_diff_basal_area_m2_per_ha" ~ latex2exp::TeX("Basal Area $m^{2} \\cdot ha^{-1}$", output = "character")
      , "pct_diff_trees_per_ha" ~ latex2exp::TeX("Trees $ha^{-1}$", output = "character")
    )
    , value_lab = scales::percent(value,accuracy=0.1)
  ) %>% 

ggplot2::ggplot(
  mapping = ggplot2::aes(
    y = reorder(dataset_factor, tot)
    , x = value
    , group = name
  )
) +
  ggplot2::geom_vline(xintercept = 0, color = "gray95", lwd = 1.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.05,-0.05), color = "gray77", lwd = 0.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.1,-0.1), color = "gray55", lwd = 0.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.15,-0.15), color = "gray33", lwd = 0.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.2,-0.2), color = "gray11", lwd = 0.5) +
  ggplot2::geom_segment(
    mapping = ggplot2::aes(yend = dataset_factor, xend = 0,color = value > 0)
    # , color = "gray44"
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = value > 0)
    , size = 3
  ) +
  # lhs labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,value<0)}
    , mapping = ggplot2::aes(label = value_lab, fontface = "bold")
    # , color = "black"
    , size = 3
    , nudge_x = -0.006
    , hjust = 1
  ) +
  # rhs labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,value>=0)}
    , mapping = ggplot2::aes(label = value_lab, fontface = "bold")
    # , color = "black"
    , size = 3
    , nudge_x = 0.006
    , hjust = 0
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(name)
    # , labeller = ggplot2::label_parsed
    , labeller = ggplot2::as_labeller(
      function(x) {
        paste0("atop(", x, ", 'percent error')")
      }
      , default = ggplot2::label_parsed
    )
  ) +
  ggplot2::scale_color_manual(values = c("firebrick","navy")) +
  ggplot2::scale_x_continuous(
    limits = c(-0.2,0.2)
    , labels = scales::percent
    # , breaks = scales::breaks_extended(n=10)
    # , breaks = function(x) seq(min(x), max(x), by = 0.05)
  ) +
  ggplot2::labs(
    y = "" # "comparison"
    , x = "percent error"
    , subtitle = "positive = UAS-measured higher than field-measured\nnegative = UAS-measured lower than field-measured"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , axis.text.y = ggplot2::element_text(size = 9, face = "bold")
    # , plot.subtitle = ggplot2::element_text(hjust = 1)
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , panel.grid.major.x = element_blank()
    , panel.grid.minor.x = element_blank()
  )

let’s take a slightly different look to explore the UAS bias in the original units for basal area which indicates the over- or under-estimation

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "terrain follow: On"
      , !flight_tf ~ "terrain follow: Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::mutate(
    dataset_factor = forcats::fct_rev(dataset_factor)
    , lab = paste0(
      ifelse(diff_basal_area_m2_per_ha>=0,"+","")
      , scales::comma(diff_basal_area_m2_per_ha,accuracy=0.1)
      , "\n("
      , scales::percent(pct_diff_basal_area_m2_per_ha,accuracy=0.1)
      , ")"
    )
  ) %>% 
  # dplyr::mutate(
  #   name = dplyr::recode_values(
  #     name
  #     , "diff_basal_area_m2_per_ha" ~ latex2exp::TeX("Basal Area $m^{2} \\cdot ha^{-1}$", output = "character")
  #     , "diff_trees_per_ha" ~ latex2exp::TeX("Trees $ha^{-1}$", output = "character")
  #   )
  #   , value_lab = scales::comma(value,accuracy=0.1)
  # ) %>% 
ggplot2::ggplot(
  mapping = ggplot2::aes(
    y = dataset_factor
  )
) +
  # ref
  ggplot2::geom_vline(
    mapping = ggplot2::aes(
      xintercept = ref_basal_area_m2_per_ha
      , color = "field"
    )
    , lwd = 1.5, alpha = 0.15
  ) +
  ggplot2::geom_segment(
    mapping = ggplot2::aes(
      x = ref_basal_area_m2_per_ha
      , xend = pred_basal_area_m2_per_ha
      , yend = dataset_factor
    )
    , color = "gray70", lwd = 1.5
  ) +
  # ref
  ggplot2::geom_point(
    mapping = ggplot2::aes(
      x = ref_basal_area_m2_per_ha
      , color = "field"
    )
    , size = 4
  ) +
  # pred
  ggplot2::geom_point(
    mapping = ggplot2::aes(
      x = pred_basal_area_m2_per_ha
      , color = "prediction"
    )
    , size = 4
  ) +
  # ref labs
  ggplot2::geom_text(
    mapping = ggplot2::aes(x = ref_basal_area_m2_per_ha, label = scales::comma(ref_basal_area_m2_per_ha, accuracy = 0.1))
    , size = 2.5
    , hjust = 0.5, vjust = 2.2
  ) +
  # pred labs
  ggplot2::geom_text(
    mapping = ggplot2::aes(x = pred_basal_area_m2_per_ha, label = scales::comma(pred_basal_area_m2_per_ha, accuracy = 0.1))
    , size = 2.5
    , hjust = 0.5, vjust = 2.2
  ) +
  # diff labs
  ggplot2::geom_text(
    mapping = ggplot2::aes(
      x = (ref_basal_area_m2_per_ha+pred_basal_area_m2_per_ha)/2
      , label = lab
    )
    , size = 2.1
    , hjust = 0.5, vjust = -0.5
    , color = "gray22"
  ) +
  harrypotter::scale_color_hp_d(option = "hermionegranger") +
  ggplot2::scale_x_continuous(
    limits = c(0,NA)
    , labels = scales::comma
    , breaks = scales::breaks_extended(n=10)
    , expand = ggplot2::expansion(mult = c(0.02,0.2))
    , sec.axis = ggplot2::dup_axis()
  ) +
  ggplot2::labs(
    y = "" # "comparison"
    , x = latex2exp::TeX("Basal Area $m^{2} \\cdot ha^{-1}$")
    , color = ""
    , subtitle = latex2exp::TeX("Predicted vs Reference Stand Basal Area ($m^{2} \\cdot ha^{-1}$)")
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    # , plot.subtitle = ggplot2::element_text(hjust = 1)
    , axis.text.y = ggplot2::element_text(size = 9, face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.title.x = ggplot2::element_text(size = 7)
    , panel.grid.major.x = element_blank()
    , panel.grid.minor.x = element_blank()
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(
      override.aes = list(shape = 15, size = 6, linetype = 0)
    )
  )

above we saw that the UAS generally overpredicted the DBH and the figure above shows that this bias propagated through to the stand-level basal area. as a result, basal area was overestimated across all flights with overestimations ranging from 10.3% to 15.3% across the tested flights

now we’ll look at the UAS bias in the original units for trees per hectare

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "terrain follow: On"
      , !flight_tf ~ "terrain follow: Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::mutate(
    dataset_factor = forcats::fct_rev(dataset_factor)
    , lab = paste0(
      ifelse(diff_trees_per_ha>=0,"+","")
      , scales::comma(diff_trees_per_ha,accuracy=0.1)
      , "\n("
      , scales::percent(pct_diff_trees_per_ha,accuracy=0.1)
      , ")"
    )
  ) %>% 
  # dplyr::mutate(
  #   name = dplyr::recode_values(
  #     name
  #     , "diff_trees_per_ha" ~ latex2exp::TeX("Basal Area $m^{2} \\cdot ha^{-1}$", output = "character")
  #     , "diff_trees_per_ha" ~ latex2exp::TeX("Trees $ha^{-1}$", output = "character")
  #   )
  #   , value_lab = scales::comma(value,accuracy=0.1)
  # ) %>% 
ggplot2::ggplot(
  mapping = ggplot2::aes(
    y = dataset_factor
  )
) +
  # ref
  ggplot2::geom_vline(
    mapping = ggplot2::aes(
      xintercept = ref_trees_per_ha
      , color = "field"
    )
    , lwd = 1.5, alpha = 0.15
  ) +
  ggplot2::geom_segment(
    mapping = ggplot2::aes(
      x = ref_trees_per_ha
      , xend = pred_trees_per_ha
      , yend = dataset_factor
    )
    , color = "gray70", lwd = 1.5
  ) +
  # ref
  ggplot2::geom_point(
    mapping = ggplot2::aes(
      x = ref_trees_per_ha
      , color = "field"
    )
    , size = 4
  ) +
  # pred
  ggplot2::geom_point(
    mapping = ggplot2::aes(
      x = pred_trees_per_ha
      , color = "prediction"
    )
    , size = 4
  ) +
  # ref labs
  ggplot2::geom_text(
    mapping = ggplot2::aes(x = ref_trees_per_ha, label = scales::comma(ref_trees_per_ha, accuracy = 0.1))
    , size = 2.5
    , hjust = 0.5, vjust = 2.2
  ) +
  # pred labs
  ggplot2::geom_text(
    mapping = ggplot2::aes(
      x = pred_trees_per_ha
      , label = ifelse(
        abs(pct_diff_trees_per_ha)<0.01
        , ""
        , scales::comma(pred_trees_per_ha, accuracy = 0.1)  
      )
    )
    , size = 2.5
    , hjust = 0.5, vjust = 2.2
  ) +
  # diff labs
  ggplot2::geom_text(
    mapping = ggplot2::aes(
      x = (ref_trees_per_ha+pred_trees_per_ha)/2
      , label = lab
    )
    , size = 2.1
    , hjust = 0.5, vjust = -0.5
    , color = "gray22"
  ) +
  harrypotter::scale_color_hp_d(option = "hermionegranger") +
  ggplot2::scale_x_continuous(
    labels = scales::comma
    , breaks = scales::breaks_extended(n=10)
    , expand = ggplot2::expansion(mult = c(0.5,0.2))
    , sec.axis = ggplot2::dup_axis()
  ) +
  ggplot2::labs(
    y = "" # "comparison"
    , x = latex2exp::TeX("Trees $ha^{-1}$")
    , color = ""
    , subtitle = latex2exp::TeX("Predicted vs Reference Stand Trees $ha^{-1}$")
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    # , plot.subtitle = ggplot2::element_text(hjust = 1)
    , axis.text.y = ggplot2::element_text(size = 9, face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.title.x = ggplot2::element_text(size = 7)
    , panel.grid.major.x = element_blank()
    , panel.grid.minor.x = element_blank()
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(
      override.aes = list(shape = 15, size = 6, linetype = 0)
    )
  )

stand-level predictions for trees per hectare vary by flight setting but all flights resulted in TPH estimates near the field-measured data. the largest TPH error was a 7.2% underestimation while the best performing flight resulted in a 1.2% error (i.e. nearly identical to the reference value). The remaining flights achieved TPH error rates between 4.1% and 6.0%.

the plots maybe don’t help us see the impact of different flight settings, let’s attempt that all on a single plot

basal area

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "terrain follow: On"
      , !flight_tf ~ "terrain follow: Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , diff_basal_area_m2_per_ha
    , tidyselect::ends_with("_basal_area_m2_per_ha")
  ) %>% 
  dplyr::select(!tidyselect::starts_with("abs_diff_")) %>% 
  tidyr::pivot_longer(cols = c(ref_basal_area_m2_per_ha, pred_basal_area_m2_per_ha)) %>% 
  dplyr::mutate(
    name = dplyr::recode_values(
      name
      , "ref_basal_area_m2_per_ha" ~ "field"
      , "pred_basal_area_m2_per_ha" ~ "prediction"
    )
    , value_lab = scales::comma(value,accuracy=0.1)
  ) %>% 
  
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = name, y = value, group = dataset_factor)
  ) +
  ggplot2::geom_line(key_glyph = "point", alpha = 0.7, color = "gray", lwd = 1.1) +
  # ref labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,name=="field")}
    , mapping = ggplot2::aes(label = value_lab)
    , size = 3
    , hjust = 0.5, vjust = 2.2
  ) +
  # pred labs
  ggrepel::geom_text_repel(
    data = function(x){dplyr::filter(x,name=="prediction")}
    , mapping = ggplot2::aes(label = value_lab)
    , size = 2.3
    , hjust = 0.5, vjust = 2.2
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = flight_tf)
    # mapping = ggplot2::aes(color = name)
    # , alpha = 0.7
    , size = 4
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_agl)
    , rows = dplyr::vars(flight_mph)
    , labeller = ggplot2::labeller( 
      flight_agl = ~paste("Altitude:", .x, " (ft)") 
      , flight_mph = ~paste("Speed:", .x, " (mph)") 
    )
    , axes = "all_x"
  ) +
  # harrypotter::scale_color_hp_d(option = "hermionegranger") +
  ggplot2::scale_color_manual(values = c("gray55","gray11"), guide = "none") +
  ggplot2::scale_y_continuous(
    limits = c(0,NA)
    , labels = scales::comma
    , breaks = scales::breaks_extended(n=10)
    , expand = ggplot2::expansion(mult = c(0.01,0.1))
  ) +
  ggplot2::labs(
    x = "" # "comparison"
    , y = latex2exp::TeX("Basal Area $m^{2} \\cdot ha^{-1}$")
    , color = ""
    , subtitle = latex2exp::TeX("Predicted vs Reference Stand Basal Area ($m^{2} \\cdot ha^{-1}$)")
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , axis.text.x = ggplot2::element_text(size = 10, face = "bold", color = "black")
    , axis.text.y = ggplot2::element_text(size = 7)
    , axis.title.y = ggplot2::element_text(size = 7)
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(
      override.aes = list(shape = 15, size = 6, linetype = 0)
    )
  )

trees per ha

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "terrain follow: On"
      , !flight_tf ~ "terrain follow: Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , diff_basal_area_m2_per_ha
    , tidyselect::ends_with("_trees_per_ha")
  ) %>% 
  dplyr::select(!tidyselect::starts_with("abs_diff_")) %>% 
  tidyr::pivot_longer(cols = c(ref_trees_per_ha, pred_trees_per_ha)) %>% 
  dplyr::mutate(
    name = dplyr::recode_values(
      name
      , "ref_trees_per_ha" ~ "field"
      , "pred_trees_per_ha" ~ "prediction"
    )
    , value_lab = scales::comma(value,accuracy=0.1)
  ) %>% 
  
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = name, y = value, group = dataset_factor)
  ) +
  ggplot2::geom_line(key_glyph = "point", alpha = 0.7, color = "gray", lwd = 1.1) +
  # ref labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,name=="field")}
    , mapping = ggplot2::aes(label = value_lab)
    , size = 3
    , hjust = 0.5, vjust = 2.2
  ) +
  # pred labs
  ggrepel::geom_text_repel(
    data = function(x){dplyr::filter(x,name=="prediction")}
    , mapping = ggplot2::aes(label = value_lab)
    , size = 2.3
    , hjust = 0.5, vjust = 2.2
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = flight_tf)
    # mapping = ggplot2::aes(color = name)
    # , alpha = 0.7
    , size = 4
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_agl)
    , rows = dplyr::vars(flight_mph)
    , labeller = ggplot2::labeller( 
      flight_agl = ~paste("Altitude:", .x, " (ft)") 
      , flight_mph = ~paste("Speed:", .x, " (mph)") 
    )
    , axes = "all_x"
  ) +
  # harrypotter::scale_color_hp_d(option = "hermionegranger") +
  ggplot2::scale_color_manual(values = c("gray55","gray11"), guide = "none") +
  ggplot2::scale_y_continuous(
    limits = c(0,NA)
    , labels = scales::comma
    , breaks = scales::breaks_extended(n=10)
    , expand = ggplot2::expansion(mult = c(0.01,0.1))
  ) +
  ggplot2::labs(
    x = "" # "comparison"
    , y = latex2exp::TeX("Trees $ha^{-1}$")
    , color = ""
    , subtitle = latex2exp::TeX("Predicted vs Reference Stand Trees $ha^{-1}$")
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , axis.text.x = ggplot2::element_text(size = 10, face = "bold", color = "black")
    , axis.text.y = ggplot2::element_text(size = 7)
    , axis.title.y = ggplot2::element_text(size = 7)
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(
      override.aes = list(shape = 15, size = 6, linetype = 0)
    )
  )

Statistical Analysis

Aggregated Stand-Level

We’ll quickly use a frequentist approach to test for differences in the detection (F-score) and quantification (e.g. DBH RMSE) accuracy at different levels of the flight setting variables. Due to the limited data, we’ll test if the effect of speed changes with altitude (interaction) while controlling for terrain following (intercept) using multiple linear regression.

Tree Detection

influence of flight setting parameters on F-score

# lm
lm_temp <- lm(
  # f_score ~ 0 + flight_agl * flight_mph * flight_tf # full three and two-way interactions
  f_score ~ 0 + flight_tf + flight_agl*flight_mph # metric interactions only
  , data = agg_ground_truth_match_ans
)
# summary(lm_temp)
broom::tidy(lm_temp) %>%
  dplyr::mutate(
    term = term %>% 
      stringr::str_replace_all("flight_tf","terrain follow") %>% 
      stringr::str_replace_all("flight_agl","altitude") %>% 
      stringr::str_replace_all("flight_mph","mph") %>% 
      stringr::str_replace_all("FALSE"," = off") %>% 
      stringr::str_replace_all("TRUE"," = on")
  ) %>% 
  kableExtra::kbl(
    caption = "MLR: influence of flight setting parameters on F-score"
    , digits = 4
  ) %>%
  kableExtra::kable_styling()
MLR: influence of flight setting parameters on F-score
term estimate std.error statistic p.value
terrain follow = off 0.6260 0.0026 240.5018 0.0000
terrain follow = on 0.6192 0.0025 247.5768 0.0000
altitude 0.0000 0.0000 0.2543 0.8157
mph -0.0003 0.0001 -1.8733 0.1577
altitude:mph 0.0000 0.0000 1.0648 0.3651

I don’t think we can make many inferences from this model other than “there is no impact (or no evidence of impact) of the various flight settings on overall tree detection accuracy.” because the terrain following was only tested at 20 mph, the model might be attributing some of the speed effect (mph) to the terrain following effect because they are essentially tied together in the data.

Basal Area

influence of flight setting parameters on basal area based on the absolute difference (total error) from the field-measured data

# lm
lm_temp <- lm(
  # f_score ~ 0 + flight_agl * flight_mph * flight_tf # full three and two-way interactions
  abs_diff_basal_area_m2_per_ha ~ 0 + flight_tf + flight_agl*flight_mph # metric interactions only
  , data = agg_ground_truth_match_ans
)
# summary(lm_temp)
broom::tidy(lm_temp) %>%
  dplyr::mutate(
    term = term %>% 
      stringr::str_replace_all("flight_tf","terrain follow") %>% 
      stringr::str_replace_all("flight_agl","altitude") %>% 
      stringr::str_replace_all("flight_mph","mph") %>% 
      stringr::str_replace_all("FALSE"," = off") %>% 
      stringr::str_replace_all("TRUE"," = on")
  ) %>% 
  kableExtra::kbl(
    caption = "MLR: influence of flight setting parameters on basal area error"
    , digits = 4
  ) %>%
  kableExtra::kable_styling()
MLR: influence of flight setting parameters on basal area error
term estimate std.error statistic p.value
terrain follow = off 1.2272 0.6780 1.8100 0.1680
terrain follow = on 1.0477 0.6515 1.6080 0.2062
altitude 0.0065 0.0021 3.1333 0.0519
mph 0.0441 0.0382 1.1543 0.3320
altitude:mph -0.0001 0.0001 -0.9969 0.3923

altitude alone is significant at the 0.1 level. this finding is evidence that the altitude flight setting influences basal area error in a meaningful way, but note that the estimated coefficient on the variable is very small.

let’s make pairwise contrasts (emmeans::contrast()) based on the estimated marginal means (emmeans::emmeans()) over the range of possible flight settings for altitude and speed to test if the practical settings (i.e. we cannot fly at 1000 ft or 0.05 mph or 99 mph) are different. if any of these contrasts are statistically significant (i.e. there is not a zero difference in the basal area error), it will help us identify if there is a flight setting that we should use as an operational rule if our objective is to minimize basal area error at the stand level

preds_temp <- emmeans::emmeans(
  lm_temp
  , ~ flight_agl * flight_mph
  , at = list(
      flight_agl = seq(
        floor( min(agg_ground_truth_match_ans$flight_agl) )
        , floor( max(agg_ground_truth_match_ans$flight_agl) )
        , by = 100
      )
      , flight_mph = seq(
        floor( min(agg_ground_truth_match_ans$flight_mph) )
        , floor( max(agg_ground_truth_match_ans$flight_mph) )
        , by = 5
      )
    )
)

contrast of altitude levels at different flight speeds (see the table footnote for interpretation) for basal area per hectare error

preds_temp %>% 
  emmeans::contrast(method = "pairwise", simple = "each", combine = TRUE) %>% 
  dplyr::as_tibble() %>%
  dplyr::filter(flight_mph!=".",!is.na(flight_mph)) %>% 
  dplyr::select(flight_mph, contrast,estimate,SE,t.ratio,p.value) %>%
  dplyr::arrange(flight_mph) %>% 
  dplyr::mutate(
    contrast = contrast %>% 
      # clean agl
      stringr::str_replace("flight_agl", "xxx") %>% 
      stringr::str_remove_all("flight_agl") %>% 
      stringr::str_replace("xxx", "altitude (ft): ") %>% 
      # clean mph
      stringr::str_replace("flight_mph", "xxx") %>% 
      stringr::str_remove_all("flight_mph") %>% 
      stringr::str_replace("xxx", "speed (mph): ")
      # rest
    , dplyr::across(
      c(estimate,SE,t.ratio,p.value)
      , ~scales::comma(.x, accuracy = 0.001)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "pairwise tests for non-zero differences in basal area error (m2/ha)"
    , col.names = c(
      "speed (mph)", "contrast"
      # , "speed (mph)", "altitude (ft)"
      , "estimate", "SE", "t.ratio", "p.value"
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(2), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1), valign = "top") %>% 
  kableExtra::footnote(
    general_title = "interpretation:"
    , general = c(
      "`estimate`: difference in error; positive means the first setting has higher error."
      , "`SE`: measures the precision of the estimate (smaller values indicate higher precision)"
      , "`t.ratio`: The test statistic; indicates distance from the null (zero difference)"
      , "`p.value`: probability of observing this result by chance; values < 0.05 are considered significant"
    )
    , footnote_as_chunk = FALSE
    , escape = F
  )
pairwise tests for non-zero differences in basal area error (m2/ha)
speed (mph) contrast estimate SE t.ratio p.value
10 altitude (ft): 200 - 300 -0.532 0.098 -5.426 0.221
altitude (ft): 200 - 400 -1.064 0.196 -5.426 0.221
altitude (ft): 300 - 400 -0.532 0.098 -5.426 0.221
15 altitude (ft): 200 - 300 -0.472 0.060 -7.863 0.077
altitude (ft): 200 - 400 -0.944 0.120 -7.863 0.077
altitude (ft): 300 - 400 -0.472 0.060 -7.863 0.077
20 altitude (ft): 200 - 300 -0.412 0.069 -5.947 0.171
altitude (ft): 200 - 400 -0.824 0.139 -5.947 0.171
altitude (ft): 300 - 400 -0.412 0.069 -5.947 0.171
interpretation:
estimate: difference in error; positive means the first setting has higher error.
SE: measures the precision of the estimate (smaller values indicate higher precision)
t.ratio: The test statistic; indicates distance from the null (zero difference)
p.value: probability of observing this result by chance; values < 0.05 are considered significant

contrast of speed levels at different flight speeds (see the table footnote for interpretation) for basal area per hectare error

preds_temp %>% 
  emmeans::contrast(method = "pairwise", simple = "each", combine = TRUE) %>% 
  dplyr::as_tibble() %>%
  dplyr::filter(flight_agl!=".",!is.na(flight_agl)) %>% 
  dplyr::select(flight_agl, contrast,estimate,SE,t.ratio,p.value) %>%
  dplyr::arrange(flight_agl) %>% 
  dplyr::mutate(
    contrast = contrast %>% 
      # clean agl
      stringr::str_replace("flight_agl", "xxx") %>% 
      stringr::str_remove_all("flight_agl") %>% 
      stringr::str_replace("xxx", "altitude (ft): ") %>% 
      # clean mph
      stringr::str_replace("flight_mph", "xxx") %>% 
      stringr::str_remove_all("flight_mph") %>% 
      stringr::str_replace("xxx", "speed (mph): ")
      # rest
    , dplyr::across(
      c(estimate,SE,t.ratio,p.value)
      , ~scales::comma(.x, accuracy = 0.001)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "pairwise tests for non-zero differences in basal area error (m2/ha)"
    , col.names = c(
      "altitude (ft)", "contrast"
      # , "speed (mph)", "altitude (ft)"
      , "estimate", "SE", "t.ratio", "p.value"
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(2), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1), valign = "top") %>% 
  kableExtra::footnote(
    general_title = "interpretation:"
    , general = c(
      "`estimate`: difference in error; positive means the first setting has higher error."
      , "`SE`: measures the precision of the estimate (smaller values indicate higher precision)"
      , "`t.ratio`: The test statistic; indicates distance from the null (zero difference)"
      , "`p.value`: probability of observing this result by chance; values < 0.05 are considered significant"
    )
    , footnote_as_chunk = FALSE
    , escape = F
  )
pairwise tests for non-zero differences in basal area error (m2/ha)
altitude (ft) contrast estimate SE t.ratio p.value
200 speed (mph): 10 - 15 -0.101 0.087 -1.154 1.000
speed (mph): 10 - 20 -0.201 0.174 -1.154 1.000
speed (mph): 15 - 20 -0.101 0.087 -1.154 1.000
300 speed (mph): 10 - 15 -0.041 0.063 -0.645 1.000
speed (mph): 10 - 20 -0.082 0.127 -0.645 1.000
speed (mph): 15 - 20 -0.041 0.063 -0.645 1.000
400 speed (mph): 10 - 15 0.019 0.087 0.218 1.000
speed (mph): 10 - 20 0.038 0.174 0.218 1.000
speed (mph): 15 - 20 0.019 0.087 0.218 1.000
interpretation:
estimate: difference in error; positive means the first setting has higher error.
SE: measures the precision of the estimate (smaller values indicate higher precision)
t.ratio: The test statistic; indicates distance from the null (zero difference)
p.value: probability of observing this result by chance; values < 0.05 are considered significant

the significant linear trends from the MLR indicate that altitude influences basal area error, but the lack of significant pairwise contrasts suggests these settings are practically equivalent within the operationally feasible range of these flight settings. this result implies that the UAS-lidar and point cloud processing framework is not meaningfully influenced by changes in the flight settings over the range tested here. additionally, given the small sample size with no cross-site replication these findings should be viewed with caution as the lack of significance may reflect low statistical power rather than absence of effect.

Trees per hectare

influence of flight setting parameters on trees per hectare based on the absolute difference (total error) from the field-measured data

# agg_ground_truth_match_ans$diff_trees_per_ha
# lm
lm_temp <- lm(
  # f_score ~ 0 + flight_agl * flight_mph * flight_tf # full three and two-way interactions
  abs_diff_trees_per_ha ~ 0 + flight_tf + flight_agl*flight_mph # metric interactions only
  , data = agg_ground_truth_match_ans
)
# summary(lm_temp)
broom::tidy(lm_temp) %>%
  dplyr::mutate(
    term = term %>% 
      stringr::str_replace_all("flight_tf","terrain follow") %>% 
      stringr::str_replace_all("flight_agl","altitude") %>% 
      stringr::str_replace_all("flight_mph","mph") %>% 
      stringr::str_replace_all("FALSE"," = off") %>% 
      stringr::str_replace_all("TRUE"," = on")
  ) %>% 
  kableExtra::kbl(
    caption = "MLR: influence of flight setting parameters on trees per hectare error"
    , digits = 4
  ) %>%
  kableExtra::kable_styling()
MLR: influence of flight setting parameters on trees per hectare error
term estimate std.error statistic p.value
terrain follow = off -39.4964 19.5197 -2.0234 0.1362
terrain follow = on -42.4194 18.7567 -2.2616 0.1088
altitude 0.1541 0.0599 2.5744 0.0822
mph 3.3435 1.0991 3.0419 0.0558
altitude:mph -0.0064 0.0035 -1.8598 0.1599

speed alone and altitude alone are significant at the 0.1 level. this finding is weak evidence that the flight settings of altitude and speed influence tree density error, but note that the estimated coefficient on those variables is small relative to the TPH of the site.

let’s make pairwise contrasts (emmeans::contrast()) based on the estimated marginal means (emmeans::emmeans()) over the range of possible flight settings for altitude and speed to test if the practical settings (i.e. we cannot fly at 1000 ft or 0.05 mph or 99 mph) are different. if any of these contrasts are statistically significant (i.e. there is not a zero difference in the trees per hectare error), it will help us identify if there is a flight setting that we should use as an operational rule if our objective is to minimize trees per hectare error at the stand level

preds_temp <- emmeans::emmeans(
  lm_temp
  , ~ flight_agl * flight_mph
  , at = list(
      flight_agl = seq(
        floor( min(agg_ground_truth_match_ans$flight_agl) )
        , floor( max(agg_ground_truth_match_ans$flight_agl) )
        , by = 100
      )
      , flight_mph = seq(
        floor( min(agg_ground_truth_match_ans$flight_mph) )
        , floor( max(agg_ground_truth_match_ans$flight_mph) )
        , by = 5
      )
    )
)

contrast of altitude levels at different flight speeds (see the table footnote for interpretation) for trees per hectare per hectare error

preds_temp %>% 
  emmeans::contrast(method = "pairwise", simple = "each", combine = TRUE) %>% 
  dplyr::as_tibble() %>%
  dplyr::filter(flight_mph!=".",!is.na(flight_mph)) %>% 
  dplyr::select(flight_mph, contrast,estimate,SE,t.ratio,p.value) %>%
  dplyr::arrange(flight_mph) %>% 
  dplyr::mutate(
    contrast = contrast %>% 
      # clean agl
      stringr::str_replace("flight_agl", "xxx") %>% 
      stringr::str_remove_all("flight_agl") %>% 
      stringr::str_replace("xxx", "altitude (ft): ") %>% 
      # clean mph
      stringr::str_replace("flight_mph", "xxx") %>% 
      stringr::str_remove_all("flight_mph") %>% 
      stringr::str_replace("xxx", "speed (mph): ")
      # rest
    , dplyr::across(
      c(estimate,SE,t.ratio,p.value)
      , ~scales::comma(.x, accuracy = 0.001)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "pairwise tests for non-zero differences in trees per hectare error"
    , col.names = c(
      "speed (mph)", "contrast"
      # , "speed (mph)", "altitude (ft)"
      , "estimate", "SE", "t.ratio", "p.value"
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(2), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1), valign = "top") %>% 
  kableExtra::footnote(
    general_title = "interpretation:"
    , general = c(
      "`estimate`: difference in error; positive means the first setting has higher error."
      , "`SE`: measures the precision of the estimate (smaller values indicate higher precision)"
      , "`t.ratio`: The test statistic; indicates distance from the null (zero difference)"
      , "`p.value`: probability of observing this result by chance; values < 0.05 are considered significant"
    )
    , footnote_as_chunk = FALSE
    , escape = F
  )
pairwise tests for non-zero differences in trees per hectare error
speed (mph) contrast estimate SE t.ratio p.value
10 altitude (ft): 200 - 300 -8.984 2.822 -3.183 0.899
altitude (ft): 200 - 400 -17.968 5.645 -3.183 0.899
altitude (ft): 300 - 400 -8.984 2.822 -3.183 0.899
15 altitude (ft): 200 - 300 -5.770 1.728 -3.338 0.800
altitude (ft): 200 - 400 -11.540 3.457 -3.338 0.800
altitude (ft): 300 - 400 -5.770 1.728 -3.338 0.800
20 altitude (ft): 200 - 300 -2.555 1.996 -1.280 1.000
altitude (ft): 200 - 400 -5.111 3.991 -1.280 1.000
altitude (ft): 300 - 400 -2.555 1.996 -1.280 1.000
interpretation:
estimate: difference in error; positive means the first setting has higher error.
SE: measures the precision of the estimate (smaller values indicate higher precision)
t.ratio: The test statistic; indicates distance from the null (zero difference)
p.value: probability of observing this result by chance; values < 0.05 are considered significant

contrast of speed levels at different flight speeds (see the table footnote for interpretation) for trees per hectare per hectare error

preds_temp %>% 
  emmeans::contrast(method = "pairwise", simple = "each", combine = TRUE) %>% 
  dplyr::as_tibble() %>%
  dplyr::filter(flight_agl!=".",!is.na(flight_agl)) %>% 
  dplyr::select(flight_agl, contrast,estimate,SE,t.ratio,p.value) %>%
  dplyr::arrange(flight_agl) %>% 
  dplyr::mutate(
    contrast = contrast %>% 
      # clean agl
      stringr::str_replace("flight_agl", "xxx") %>% 
      stringr::str_remove_all("flight_agl") %>% 
      stringr::str_replace("xxx", "altitude (ft): ") %>% 
      # clean mph
      stringr::str_replace("flight_mph", "xxx") %>% 
      stringr::str_remove_all("flight_mph") %>% 
      stringr::str_replace("xxx", "speed (mph): ")
      # rest
    , dplyr::across(
      c(estimate,SE,t.ratio,p.value)
      , ~scales::comma(.x, accuracy = 0.001)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "pairwise tests for non-zero differences in trees per hectare error"
    , col.names = c(
      "altitude (ft)", "contrast"
      # , "speed (mph)", "altitude (ft)"
      , "estimate", "SE", "t.ratio", "p.value"
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(2), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1), valign = "top") %>% 
  kableExtra::footnote(
    general_title = "interpretation:"
    , general = c(
      "`estimate`: difference in error; positive means the first setting has higher error."
      , "`SE`: measures the precision of the estimate (smaller values indicate higher precision)"
      , "`t.ratio`: The test statistic; indicates distance from the null (zero difference)"
      , "`p.value`: probability of observing this result by chance; values < 0.05 are considered significant"
    )
    , footnote_as_chunk = FALSE
    , escape = F
  )
pairwise tests for non-zero differences in trees per hectare error
altitude (ft) contrast estimate SE t.ratio p.value
200 speed (mph): 10 - 15 -10.289 2.511 -4.097 0.473
speed (mph): 10 - 20 -20.578 5.022 -4.097 0.473
speed (mph): 15 - 20 -10.289 2.511 -4.097 0.473
300 speed (mph): 10 - 15 -7.074 1.822 -3.883 0.545
speed (mph): 10 - 20 -14.149 3.644 -3.883 0.545
speed (mph): 15 - 20 -7.074 1.822 -3.883 0.545
400 speed (mph): 10 - 15 -3.860 2.511 -1.537 1.000
speed (mph): 10 - 20 -7.720 5.022 -1.537 1.000
speed (mph): 15 - 20 -3.860 2.511 -1.537 1.000
interpretation:
estimate: difference in error; positive means the first setting has higher error.
SE: measures the precision of the estimate (smaller values indicate higher precision)
t.ratio: The test statistic; indicates distance from the null (zero difference)
p.value: probability of observing this result by chance; values < 0.05 are considered significant

the significant linear trends from the MLR indicate that altitude and speed influence trees per hectare error, but the lack of significant pairwise contrasts suggests these settings are practically equivalent within the operationally feasible range of these flight settings. this result implies that the UAS-lidar and point cloud processing framework is not meaningfully influenced by changes in the flight settings over the range tested here. additionally, given the small sample size with no cross-site replication these findings should be viewed with caution as the lack of significance may reflect low statistical power rather than absence of effect.

Pr(tree detected): Frequentist

Let’s shift to the tree-level detection data to statistically analyze the influence of the flight settings (and tree characteristics) on the probability of individual tree detection. We’ll use logistic multiple linear regression

here is our analysis data that includes only the reference trees (TP and FN) and whether or not they were detected during a flight

# agg_ground_truth_match_ans %>% dplyr::glimpse()
# ground_truth_prediction_match_ans[[1]] %>% dplyr::glimpse()
tree_detect_df <- ground_truth_prediction_match_ans %>%
  purrr::list_rbind(names_to = "folder") %>%
  sf::st_drop_geometry() %>% 
  dplyr::filter(match_grp %in% c("true positive","omission")) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>% 
  dplyr::mutate(
    is_detected = match_grp=="true positive"
    , is_detected = as.numeric(is_detected)
  )
# dplyr::glimpse(tree_detect_df)
tree_detect_df %>% 
  dplyr::group_by(dataset_factor) %>% 
  dplyr::summarise(
    n = dplyr::n()
    , n_detected = sum(is_detected)
    , prop_detected = mean(is_detected)
  ) %>% 
  dplyr::mutate(
    n = scales::comma(n,accuracy=1)
    , n_detected = scales::comma(n_detected,accuracy=1)
    , prop_detected = scales::percent(prop_detected,accuracy=0.1)
  ) %>% 
  kableExtra::kbl(
    caption = "summary of single tree dataset for Pr(detection) analysis"
    , col.names = c("flight", "reference trees", "detected trees","% detected<br>(recall)")
    , escape = F
  ) %>%
  kableExtra::kable_styling()
summary of single tree dataset for Pr(detection) analysis
flight reference trees detected trees % detected
(recall)
AGL: 200 | MPH: 10 | TF: on 5,217 3,244 62.2%
AGL: 200 | MPH: 20 | TF: off 5,217 3,349 64.2%
AGL: 200 | MPH: 20 | TF: on 5,217 3,299 63.2%
AGL: 300 | MPH: 10 | TF: on 5,217 3,289 63.0%
AGL: 300 | MPH: 20 | TF: off 5,217 3,346 64.1%
AGL: 400 | MPH: 10 | TF: on 5,217 3,303 63.3%
AGL: 400 | MPH: 20 | TF: off 5,217 3,377 64.7%
AGL: 400 | MPH: 20 | TF: on 5,217 3,326 63.8%

note, this aggregation is exactly how recall (detection rate) is calculated

fit the logistic model on this data without interacting the flight parameters of speed and altitude with terrain following since we don’t have the data to estimate the influence of terrain following by speed or altitude

# logistic
mod_glm_temp <- stats::glm(
  is_detected ~ 
    # main effects
    0 + flight_tf + flight_agl + flight_mph + ref_tree_height_m +     
    # two-way interactions
    flight_agl:flight_mph +
    ref_tree_height_m:flight_agl +
    ref_tree_height_m:flight_mph
    # ref_tree_height_m:flight_tf
  , data = tree_detect_df
  , family = stats::binomial(link = "logit")
)
# summary(mod_glm_temp)

logistic model of Pr(tree detected) coefficient table

broom::tidy(mod_glm_temp) %>%
  dplyr::select(term,estimate,std.error,p.value) %>% 
  dplyr::mutate(
    odds_ratio = exp(estimate)
    , term = term %>% 
      stringr::str_replace_all("flight_tf","terrain follow") %>% 
      stringr::str_replace_all("flight_agl","altitude") %>% 
      stringr::str_replace_all("flight_mph","mph") %>% 
      stringr::str_replace_all("ref_tree_height_m","tree ht") %>% 
      stringr::str_replace_all("FALSE"," = off") %>% 
      stringr::str_replace_all("TRUE"," = on")
  ) %>%
  kableExtra::kbl(
    caption = "GLM coefficients and odds ratios"
    , digits = 3
  ) %>%
  kableExtra::kable_styling()
GLM coefficients and odds ratios
term estimate std.error p.value odds_ratio
terrain follow = off 0.006 0.156 0.971 1.006
terrain follow = on -0.032 0.151 0.831 0.968
altitude 0.000 0.000 0.481 1.000
mph 0.008 0.008 0.364 1.008
tree ht 0.044 0.007 0.000 1.045
altitude:mph 0.000 0.000 0.603 1.000
altitude:tree ht 0.000 0.000 0.730 1.000
mph:tree ht 0.000 0.000 0.743 1.000

the tree height is the only coefficient estimated to influence the probability of tree detection in a meaningful (and statistically significant with a p-value of < 0.001) way. the odds ratio on reference tree height of 1.045 means that for every one unit increase in tree height, the odds of UAS-lidar detection increase by 4.5%

let’s look at the marginal effects at different levels of the metric variables in the model (tree height, flight altitude, flight speed) using the emmeans package to plot the predicted detection probabilities

we’ll get the predictions over a range of each of the independent variables using emmeans::emmeans() with the range of the observed data for each metric variable (so we aren’t making out of sample predictions). then, we’ll look at the predictions at select tree heights

# emmeans
preds_temp <-
  emmeans::emmeans(
    mod_glm_temp
    , ~ ref_tree_height_m + flight_agl + flight_mph
    , at = list(
      ref_tree_height_m = seq(
        floor( min(tree_detect_df$ref_tree_height_m) )
        , floor( max(tree_detect_df$ref_tree_height_m)*1.05 )
        , by = 1
      )
      , flight_agl = seq(
        floor( min(tree_detect_df$flight_agl) )
        , floor( max(tree_detect_df$flight_agl) )
        , by = 100
      )
      , flight_mph = seq(
        floor( min(tree_detect_df$flight_mph) )
        , floor( max(tree_detect_df$flight_mph) )
        , by = 5
      )
    )
    , type = "response" # probabilities
  ) %>% 
 dplyr::as_tibble()
# huh?
# dplyr::glimpse(preds_temp)
# table
# fivenum(tree_detect_df$ref_tree_height_m)[c(1,3,5)]
preds_temp %>%
  dplyr::filter(
    ref_tree_height_m %in% 
      (quantile(tree_detect_df$ref_tree_height_m, probs = c(0.1,0.5,0.9)) %>% 
        round() %>% 
        unique())
  ) %>%
  dplyr::mutate(dplyr::across(
    c(prob, asymp.LCL, asymp.UCL)
    , ~scales::percent(.x,accuracy=0.1)
  )) %>% 
  dplyr::select(ref_tree_height_m, flight_agl, flight_mph, prob, asymp.LCL, asymp.UCL) %>%
  dplyr::arrange(ref_tree_height_m, flight_agl, flight_mph) %>%
  # dplyr::select(ref_tree_height_m, flight_mph, flight_agl, prob, asymp.LCL, asymp.UCL) %>%
  # dplyr::arrange(ref_tree_height_m, flight_mph, flight_agl) %>%
  kableExtra::kbl(
    caption = "tree detection probability at 10%, 50%, and 90% tree heights"
    , col.names = c(
      "tree ht (m)"
      , "altitude (ft)", "speed (mph)"
      # , "speed (mph)", "altitude (ft)"
      , "Pr(detection)", "lower bound", "upper bound"
    )
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(3), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
tree detection probability at 10%, 50%, and 90% tree heights
tree ht (m) altitude (ft) speed (mph) Pr(detection) lower bound upper bound
2 200 10 54.7% 53.0% 56.5%
15 55.3% 54.2% 56.5%
20 55.9% 54.6% 57.2%
300 10 55.2% 53.9% 56.6%
15 55.7% 54.9% 56.5%
20 56.1% 55.2% 57.0%
400 10 55.8% 54.0% 57.5%
15 56.0% 54.9% 57.2%
20 56.3% 55.0% 57.6%
6 200 10 59.0% 57.6% 60.5%
15 59.6% 58.7% 60.4%
20 60.1% 59.1% 61.1%
300 10 59.6% 58.5% 60.6%
15 60.0% 59.4% 60.6%
20 60.3% 59.7% 61.0%
400 10 60.1% 58.7% 61.6%
15 60.4% 59.5% 61.2%
20 60.6% 59.6% 61.6%
21 200 10 73.5% 71.8% 75.2%
15 73.8% 72.6% 75.0%
20 74.1% 72.8% 75.4%
300 10 74.2% 72.8% 75.5%
15 74.3% 73.5% 75.1%
20 74.5% 73.5% 75.4%
400 10 74.8% 73.0% 76.4%
15 74.8% 73.6% 76.0%
20 74.8% 73.5% 76.1%

let’s check out the conditional influence of flight speed at different altitude levels

ggplot2::ggplot(
  preds_temp
  , mapping = ggplot2::aes(
    x = ref_tree_height_m
    , y = prob
    , color = as.factor(flight_mph)
  )
) +
  ggplot2::geom_ribbon(
    mapping = ggplot2::aes(
      ymin = asymp.LCL
      , ymax = asymp.UCL
      , fill = as.factor(flight_mph)
    )
    , alpha = 0.05, color = NA
  ) +
  ggplot2::geom_line(size = 1.2) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_agl)
    , labeller = ggplot2::labeller( flight_agl = ~paste("Altitude:", .x, " (ft)") )
  ) +
  ggplot2::scale_color_viridis_d(option = "rocket", begin = 0.9, end = 0.5) +
  ggplot2::scale_fill_viridis_d(option = "rocket", begin = 0.9, end = 0.5) +
  ggplot2::scale_y_continuous(
    limits = c(0,1)
    , labels=scales::percent
    , breaks = scales::breaks_extended(n=6)
    # , expand = ggplot2::expansion(mult = c(0,0.03))
  ) +
  ggplot2::scale_x_continuous(
    labels=scales::comma
    , breaks = scales::breaks_extended(n=8)
  ) +
  ggplot2::labs(
    color = "Speed (mph)", fill = "Speed (mph)"
    , x = "tree height (m)", y = "detection probability"
    , title = "detection probability curves"
    , subtitle = "conditional effect of flight speed"
    , caption = "higher lines = better detection. steeper lines = higher sensitivity to tree height"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text.x = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , strip.text.y = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , plot.title = ggplot2::element_text(size = 10)
    , plot.subtitle = ggplot2::element_text(size = 10)
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(
      override.aes = list(shape = 15, lwd = 8, fill = NA)
    )
    , fill = "none"
  )

The wide confidence bounds with all lines contained within the overlapping bounds indicate that at all flight altitudes, the flight speed does not impact tree detection. There is some evidence that the higher the flight altitude, the less impactful speed is on the resulting tree detection (but not statistically significant).

now the conditional influence of flight altitude at different speeds

ggplot2::ggplot(
  preds_temp
  , mapping = ggplot2::aes(
    x = ref_tree_height_m
    , y = prob
    , color = as.factor(flight_agl)
  )
) +
  ggplot2::geom_ribbon(
    mapping = ggplot2::aes(
      ymin = asymp.LCL
      , ymax = asymp.UCL
      , fill = as.factor(flight_agl)
    )
    , alpha = 0.05, color = NA
  ) +
  ggplot2::geom_line(size = 1.2) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_mph)
    , labeller = ggplot2::labeller( flight_mph = ~paste("Speed:", .x, " (mph)") )
  ) +
  ggplot2::scale_color_viridis_d(option = "mako", begin = 0.5, end = 0.1) +
  ggplot2::scale_fill_viridis_d(option = "mako", begin = 0.5, end = 0.1) +
  ggplot2::scale_y_continuous(
    limits = c(0,1)
    , labels=scales::percent
    , breaks = scales::breaks_extended(n=6)
    # , expand = ggplot2::expansion(mult = c(0,0.03))
  ) +
  ggplot2::scale_x_continuous(
    labels=scales::comma
    , breaks = scales::breaks_extended(n=8)
  ) +
  ggplot2::labs(
    color = "altitude (ft)", fill = "altitude (ft)"
    , x = "tree height (m)", y = "detection probability"
    , title = "detection probability curves"
    , subtitle = "conditional effect of flight altitude"
    , caption = "higher lines = better detection. steeper lines = higher sensitivity to tree height"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text.x = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , strip.text.y = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , plot.title = ggplot2::element_text(size = 10)
    , plot.subtitle = ggplot2::element_text(size = 10)
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, lwd = 8, fill = NA))
    , fill = "none"
  )

The wide confidence bounds with all lines contained within the overlapping bounds indicate that at all flight speeds, the flight altitude does not impact tree detection. There is some evidence that the higher the flight speed, the less impactful altitude is on the resulting tree detection (but not statistically significant).

let’s do some contrasts of the probability of detection while holding the tree height constant at the median (6.5 m) to test an altitude level against every other altitude and each speed against every other speed.

em_temp <- 
  emmeans::emmeans(
    mod_glm_temp
    , ~ flight_agl + flight_mph
    , at = list(
      ref_tree_height_m = median(tree_detect_df$ref_tree_height_m)
      , flight_agl = seq(
        floor( min(tree_detect_df$flight_agl) )
        , floor( max(tree_detect_df$flight_agl) )
        , by = 100
      )
      , flight_mph = seq(
        floor( min(tree_detect_df$flight_mph) )
        , floor( max(tree_detect_df$flight_mph) )
        , by = 5
      )
    )
    , type = "response" # probabilities
  )

contrast of altitude levels at different flight speeds (see the table footnote for interpretation) for the median (6.5 m) tree

# em_temp
# pairwise test for non-zero differences
# simple = "each" tests altitudes within each speed and vice versa
em_temp %>% 
  emmeans::contrast(method = "pairwise", simple = "each", combine = TRUE) %>% 
  dplyr::as_tibble() %>%
  dplyr::filter(flight_mph!=".",!is.na(flight_mph)) %>% 
  dplyr::select(flight_mph, contrast,odds.ratio,SE,z.ratio,p.value) %>% 
  dplyr::arrange(flight_mph) %>% 
  dplyr::mutate(
    contrast = contrast %>% 
      # clean agl
      stringr::str_replace("flight_agl", "xxx") %>% 
      stringr::str_remove_all("flight_agl") %>% 
      stringr::str_replace("xxx", "altitude (ft): ") %>% 
      # clean mph
      stringr::str_replace("flight_mph", "xxx") %>% 
      stringr::str_remove_all("flight_mph") %>% 
      stringr::str_replace("xxx", "speed (mph): ")
      # rest
    , dplyr::across(
      c(odds.ratio,SE,z.ratio,p.value)
      , ~scales::comma(.x, accuracy = 0.001)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "pairwise tests for non-zero differences (at median tree height)"
    , col.names = c(
      "speed (mph)", "contrast"
      # , "speed (mph)", "altitude (ft)"
      , "odds.ratio", "SE", "z.ratio", "p.value"  
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(2), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1), valign = "top") %>% 
  kableExtra::footnote(
    general_title = "interpretation:"
    , general = c(
      "`odds.ratio`: <br> values near 1 mean levels perform similarly <br> >1 means the first level is better <br> <1 means the second is better at tree detection"
      , "`SE`: measures the precision of the estimate (smaller values indicate higher precision)"
      # , "z.ratio: The test statistic; indicates how many standard deviations the result is from the null (0)."
      , "`z.ratio`: >0 means the first level detection is higher, <0 means the second level detection is higher"
      , "`p.value`: probability of observing this result by chance; values < 0.05 are considered significant"
    )
    , footnote_as_chunk = FALSE
    , escape = F
  )
pairwise tests for non-zero differences (at median tree height)
speed (mph) contrast odds.ratio SE z.ratio p.value
10 altitude (ft): 200 / 300 0.977 0.021 -1.107 1.000
altitude (ft): 200 / 400 0.955 0.040 -1.107 1.000
altitude (ft): 300 / 400 0.977 0.021 -1.107 1.000
15 altitude (ft): 200 / 300 0.983 0.013 -1.250 1.000
altitude (ft): 200 / 400 0.967 0.026 -1.250 1.000
altitude (ft): 300 / 400 0.983 0.013 -1.250 1.000
20 altitude (ft): 200 / 300 0.990 0.015 -0.666 1.000
altitude (ft): 200 / 400 0.980 0.030 -0.666 1.000
altitude (ft): 300 / 400 0.990 0.015 -0.666 1.000
interpretation:
odds.ratio:
values near 1 mean levels perform similarly
>1 means the first level is better
<1 means the second is better at tree detection
SE: measures the precision of the estimate (smaller values indicate higher precision)
z.ratio: >0 means the first level detection is higher, <0 means the second level detection is higher
p.value: probability of observing this result by chance; values < 0.05 are considered significant

contrast of speed levels at different flight altitudes (see the table footnote for interpretation) for the median (6.5 m) tree

# em_temp
# pairwise test for non-zero differences
# simple = "each" tests altitudes within each speed and vice versa
em_temp %>% 
  emmeans::contrast(method = "pairwise", simple = "each", combine = TRUE) %>% 
  dplyr::as_tibble() %>%
  dplyr::filter(flight_agl!=".",!is.na(flight_agl)) %>% 
  dplyr::select(flight_agl, contrast,odds.ratio,SE,z.ratio,p.value) %>% 
  dplyr::arrange(flight_agl) %>% 
  dplyr::mutate(
    contrast = contrast %>% 
      # clean agl
      stringr::str_replace("flight_agl", "xxx") %>% 
      stringr::str_remove_all("flight_agl") %>% 
      stringr::str_replace("xxx", "altitude (ft): ") %>% 
      # clean mph
      stringr::str_replace("flight_mph", "xxx") %>% 
      stringr::str_remove_all("flight_mph") %>% 
      stringr::str_replace("xxx", "speed (mph): ")
      # rest
    , dplyr::across(
      c(odds.ratio,SE,z.ratio,p.value)
      , ~scales::comma(.x, accuracy = 0.001)
    )
  ) %>% 
  kableExtra::kbl(
    caption = "pairwise tests for non-zero differences (at median tree height)"
    , col.names = c(
      "altitude (ft)", "contrast"
      # , "speed (mph)", "altitude (ft)"
      , "odds.ratio", "SE", "z.ratio", "p.value"  
    )
    , digits = 3
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(2), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1), valign = "top") %>% 
  kableExtra::footnote(
    general_title = "interpretation:"
    , general = c(
      "`odds.ratio`: <br> values near 1 mean levels perform similarly <br> >1 means the first level is better <br> <1 means the second is better at tree detection"
      , "`SE`: measures the precision of the estimate (smaller values indicate higher precision)"
      # , "z.ratio: The test statistic; indicates how many standard deviations the result is from the null (0)."
      , "`z.ratio`: >0 means the first level detection is higher, <0 means the second level detection is higher"
      , "`p.value`: probability of observing this result by chance; values < 0.05 are considered significant"
    )
    , footnote_as_chunk = FALSE
    , escape = F
  )
pairwise tests for non-zero differences (at median tree height)
altitude (ft) contrast odds.ratio SE z.ratio p.value
200 speed (mph): 10 / 15 0.978 0.018 -1.183 1.000
speed (mph): 10 / 20 0.957 0.036 -1.183 1.000
speed (mph): 15 / 20 0.978 0.018 -1.183 1.000
300 speed (mph): 10 / 15 0.985 0.014 -1.123 1.000
speed (mph): 10 / 20 0.969 0.027 -1.123 1.000
speed (mph): 15 / 20 0.985 0.014 -1.123 1.000
400 speed (mph): 10 / 15 0.991 0.019 -0.481 1.000
speed (mph): 10 / 20 0.982 0.037 -0.481 1.000
speed (mph): 15 / 20 0.991 0.019 -0.481 1.000
interpretation:
odds.ratio:
values near 1 mean levels perform similarly
>1 means the first level is better
<1 means the second is better at tree detection
SE: measures the precision of the estimate (smaller values indicate higher precision)
z.ratio: >0 means the first level detection is higher, <0 means the second level detection is higher
p.value: probability of observing this result by chance; values < 0.05 are considered significant

plot the probability of detection for the median (6.5 m) tree for all settings with 95% confidence intervals. if the error bars do not overlap, the difference is likely non-zero and vice-versa

the influence of speed at different altitudes

em_temp %>% 
  as.data.frame() %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      x = 0
        # as.factor(flight_agl)
      , y = prob
      , color = as.factor(flight_mph)
    )
  ) +
  ggplot2::geom_point(
    size = 4
    , position = ggplot2::position_dodge(width = 0.5)
  ) +
  ggplot2::geom_errorbar(
    ggplot2::aes(ymin = asymp.LCL, ymax = asymp.UCL)
    , width = 0.2
    , position = ggplot2::position_dodge(width = 0.5)
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_agl)
    , labeller = ggplot2::labeller( flight_agl = ~paste("Altitude:", .x, " (ft)") )
  ) +
  ggplot2::scale_color_viridis_d(option = "rocket", begin = 0.9, end = 0.5) +
  ggplot2::scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  ggplot2::scale_x_continuous(NULL, breaks = NULL) +
  ggplot2::labs(
    title = "comparison of detection probabilities at median tree height"
    , y = "detection probability"
    , x = "altitude (ft)"
    , color = "speed (mph)"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text.x = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , strip.text.y = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , plot.title = ggplot2::element_text(size = 10)
    , plot.subtitle = ggplot2::element_text(size = 10)
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, lwd = 8, fill = NA))
    , fill = "none"
  )

the influence of altitude at different speeds

em_temp %>% 
  as.data.frame() %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      x = 0
      , y = prob
      , color = as.factor(flight_agl)
    )
  ) +
  ggplot2::geom_point(
    size = 4
    , position = ggplot2::position_dodge(width = 0.5)
  ) +
  ggplot2::geom_errorbar(
    ggplot2::aes(ymin = asymp.LCL, ymax = asymp.UCL)
    , width = 0.2
    , position = ggplot2::position_dodge(width = 0.5)
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_mph)
    , labeller = ggplot2::labeller( flight_mph = ~paste("Speed:", .x, " (mph)") )
  ) +
  ggplot2::scale_color_viridis_d(option = "mako", begin = 0.5, end = 0.1) +
  ggplot2::scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  ggplot2::scale_x_continuous(NULL, breaks = NULL) +
  ggplot2::labs(
    title = "comparison of detection probabilities at median tree height"
    , y = "detection probability"
    , x = "speed (mph)"
    , color = "altitude (ft)"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text.x = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , strip.text.y = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , plot.title = ggplot2::element_text(size = 10)
    , plot.subtitle = ggplot2::element_text(size = 10)
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, lwd = 8, fill = NA))
    , fill = "none"
  )

yeah, no difference in the detection accuracy across the flight settings tested for this forest stand

Volume Estimates

We’ll try to predict volume using National Volume Estimator Library (NVEL) which contains the standard equations used by the USFS for estimating merchantable and total volume.

If the RForInvt package was not broken at the time of analysis, I would have used it. Sadly, it was broken and I didn’t want to try to fix it (but see here?) so….

try the rFIA package, get the tree list which already has volume applied to the trees, make a volume lookup table by species, DBH (1 inch increments), and height (1 foot increments)

# install.packages("rFIA")
library(rFIA)
library(USAboundaries)
# get data
# ?rFIA::getFIA
# what state are we in?
st_temp <- USAboundaries::us_states() %>% 
  sf::st_intersection(stand_boundary %>% sf::st_transform( sf::st_crs(USAboundaries::us_states()) )) %>% 
  dplyr::group_by(state_abbr) %>% 
  dplyr::mutate(
    area_xxx = sf::st_union(.) %>% 
      sf::st_area() %>% 
      as.numeric()
  ) %>% 
  dplyr::ungroup() %>% 
  sf::st_drop_geometry() %>% 
  dplyr::slice_max(order_by = area_xxx, n = 1, with_ties = F) %>% 
  dplyr::pull(state_abbr)
# unfortunately, rFIA::getFIA() does not recognize if we already downloaded the data...
# :\
fnm_temp <- file.path("../data", paste0(st_temp,"_TREE.csv"))
if(
  !file.exists(fnm_temp)
){
  fiadb_temp <- rFIA::getFIA(
    states = st_temp
    , dir = "../data", common = TRUE, tables = c("TREE")
  )  
}else{
  fiadb_temp <- list(
    TREE = readr::read_csv(file = fnm_temp, progress = F, show_col_types = F)
  )
}
# filter this dirty data
fiadb_temp$TREE <- 
  fiadb_temp$TREE %>% 
  dplyr::rename_with(.cols = dplyr::everything(), tolower) %>% 
  dplyr::select(spcd,dia,ht,volcfnet,volcfgrs) %>% 
  tidyr::drop_na()
# huh?
fiadb_temp$TREE %>% dplyr::glimpse()
## Rows: 213,643
## Columns: 5
## $ spcd     <dbl> 746, 108, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 1…
## $ dia      <dbl> 1.4, 8.1, 13.2, 7.4, 9.1, 1.9, 13.1, 19.1, 35.4, 19.3, 14.6, …
## $ ht       <dbl> 9, 24, 39, 19, 23, 8, 42, 51, 56, 57, 32, 27, 31, 30, 28, 23,…
## $ volcfnet <dbl> 0.000000, 3.340663, 13.829286, 1.792516, 3.634316, 0.000000, …
## $ volcfgrs <dbl> 0.000000, 3.360063, 14.231275, 1.800650, 3.667301, 0.000000, …

no one knows what these numeric species codes are….get reference species so a human can understand what is happening

# holy smokes, no one knows what these numeric species codes are
  # ....get reference species so a human can understand these
url_temp <- "https://apps.fs.usda.gov/fia/datamart/CSV/FIADB_REFERENCE.zip"
zip_temp <- "FIADB_REFERENCE.zip"
dir_temp <- file.path("../data", str_remove(basename(zip_temp), "\\.[^.]+$"))
# dl if needed
if (!dir.exists(dir_temp)){
  download.file(url = url_temp, destfile = zip_temp, mode = "wb")
  unzip(zipfile = zip_temp, exdir = dir_temp)
  file.remove(zip_temp)
}
# read
ref_species_temp <- readr::read_csv(file = file.path(dir_temp, "REF_SPECIES.csv"), progress = F, show_col_types = F) %>% 
  dplyr::rename_with(.cols = dplyr::everything(), tolower) %>% 
  dplyr::select(spcd, species_symbol, common_name, genus, species)
# huh
ref_species_temp %>% dplyr::glimpse()
## Rows: 2,697
## Columns: 5
## $ spcd           <dbl> 7783, 7792, 7793, 7794, 7795, 7798, 7799, 7800, 7801, 7…
## $ species_symbol <chr> "MEPO5", "MERU2", "METRO", "METR5", "MEWA", "MEAM4", "M…
## $ common_name    <chr> "ohia lehua", "lehua papa", "lehua", "lehua ahihi", "Ka…
## $ genus          <chr> "Metrosideros", "Metrosideros", "Metrosideros", "Metros…
## $ species        <chr> "polymorpha", "rugosa", "spp.", "tremuloides", "waialea…

join and make the volume lookup table by species, DBH (1 inch increments), and height (1 foot increments). for both of the height and diameter increments we’ll use the floor to take a conservative approach for volume estimation. for example, DBH values of 5.0 to 5.999999 will be grouped in the 5.0 inch class. we’ll apply this same logic to our field and predicted tree list for joining.

vol_lookup <-
  fiadb_temp$TREE %>% 
  dplyr::inner_join(
    ref_species_temp
    , by = "spcd"
    , relationship = "many-to-one"
  ) %>% 
  # aggregate
  dplyr::mutate(dplyr::across(
    .cols = c(ht,dia)
    , ~ floor(.x) %>% as.integer()
  )) %>% 
  dplyr::group_by(
    dplyr::across(dplyr::all_of(
      c(
        names(ref_species_temp)
        ,"ht","dia"
      )
    ))
  ) %>% 
  dplyr::summarise(dplyr::across(
   .cols = tidyselect::starts_with("volcf") 
   , median
  )) %>% 
  dplyr::ungroup() %>% 
  dplyr::rename_with(
    .cols = tidyselect::starts_with("volcf")
    , ~ stringr::str_remove(.x,"^volcf") %>% stringr::str_c("_volume_ft3")
  ) %>% 
  dplyr::mutate(
    dplyr::across(
      tidyselect::ends_with("volume_ft3")
       # 1 cubic foot = 0.0283168 cubic meters
      , ~ .x*0.0283168
      , .names = "{.col}_m3"
    )
    , species_symbol = factor(species_symbol)
  ) %>% 
  dplyr::rename_with(
    .cols = tidyselect::ends_with("volume_ft3_m3")
    , ~ stringr::str_remove(.x,"_ft3")
  )
vol_lookup %>% dplyr::glimpse()
## Rows: 11,559
## Columns: 11
## $ spcd           <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,…
## $ species_symbol <fct> ABCO, ABCO, ABCO, ABCO, ABCO, ABCO, ABCO, ABCO, ABCO, A…
## $ common_name    <chr> "white fir", "white fir", "white fir", "white fir", "wh…
## $ genus          <chr> "Abies", "Abies", "Abies", "Abies", "Abies", "Abies", "…
## $ species        <chr> "concolor", "concolor", "concolor", "concolor", "concol…
## $ ht             <int> 5, 6, 7, 8, 8, 8, 9, 10, 10, 11, 11, 11, 12, 12, 13, 13…
## $ dia            <int> 3, 1, 1, 1, 2, 3, 1, 1, 2, 1, 2, 5, 1, 5, 1, 2, 5, 8, 1…
## $ net_volume_ft3 <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, …
## $ grs_volume_ft3 <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, …
## $ net_volume_m3  <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000…
## $ grs_volume_m3  <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000…

let’s see what this looks like for the net volume where…

The net volume of wood in the central stem of timber species (trees where diameter is measured at breast height >= 5.0 inches d.b.h., from a 1-foot stump to a minimum 4-inch top diameter, or to where the central stem breaks into limbs all of which are <4.0 inches in diameter…Does not include rotten, missing, and form cull (volume loss due to rotten, missing, and form cull defect has been deducted)…gross volume is identical to the net volume definition except that gross includes volume from portions of the stem that are rotten, missing, and considered form cull. (https://doserlab.com/files/rfia/articles/biomassvolume)

vol_lookup %>% 
  dplyr::filter(
    # species_symbol!="ABCO"
    # , species_symbol=="POTR5"
    dia<=60
    , ht<=100
    , species_symbol %in% unique(toupper(validation_trees$spp))
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      y = dia
      , x = ht
      , fill = net_volume_ft3
    )
  ) +
  ggplot2::geom_tile(
    color = NA
    , width = 1 # 5
    , height = 1
  ) +
  # ggplot2::geom_text(color = "white", size = 3) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(species_symbol)
    # , ncol = 2
    # , scales = "free"
    , axes = "all"
  ) + 
  ggplot2::scale_x_continuous(expand = c(0, 0), breaks = scales::breaks_extended(n=10)) +
  ggplot2::scale_y_continuous(expand = c(0, 0), breaks = scales::breaks_extended(n=10)) +
  # ggplot2::scale_fill_viridis_c(option = "mako", direction=-1, begin = 0.2, end = 0.8) +
  ggplot2::scale_fill_distiller(palette = "Greens", direction = 1, labels = scales::comma) +
  ggplot2::labs(
    x = "height (ft)"
    , y = "DBH (in)"
    , fill = "net volume (ft3)"
  ) +
  ggplot2::theme_light() + 
  ggplot2::theme(
    legend.position = "top"
    # , axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust = 1)
    , panel.background = ggplot2::element_blank()
    , panel.grid = ggplot2::element_blank()
    , plot.subtitle = ggplot2::element_text(hjust = 0.5)
    , strip.text = ggplot2::element_text(color = "black", face = "bold")
  )

sure? we’ll use a linear model to fill in volume where we don’t have a DBH, height, species observation in the FIA data

net_vol_mod <- lm(
  net_volume_ft3 ~ 0 + species_symbol*dia*ht
  , data = vol_lookup
)
grs_vol_mod <- lm(
  grs_volume_ft3 ~ 0 + species_symbol*dia*ht
  , data = vol_lookup
)
# summary(net_vol_mod)
# get predictions for full dbh,ht range
vol_lookup_pred <- tidyr::crossing(
  dplyr::tibble(
    dia = seq(
      from = 1
      , to = max(
        max(vol_lookup$dia)
        , max(as.integer(ceiling(validation_trees$field_dbh_in)))
        , max(as.integer(ceiling(predicted_trees[[1]]$dbh_cm*0.394)))
      )
      , by = 1
    )
  )
  , dplyr::tibble(
    ht = seq(
      from = 1
      , to = max(
        max(vol_lookup$ht)
        , max(as.integer(ceiling(validation_trees$field_tree_height_ft)))
        , max(as.integer(ceiling(predicted_trees[[1]]$tree_height_m*3.281)))
      )
      , by = 1
    )
  )
  , vol_lookup %>% 
    # dplyr::filter(species_symbol %in% unique(toupper(validation_trees$spp))) %>% 
    dplyr::distinct(species_symbol)
)
# predict
vol_lookup_pred$net_volume_ft3 <- predict(net_vol_mod, newdata = vol_lookup_pred)
vol_lookup_pred$grs_volume_ft3 <- predict(grs_vol_mod, newdata = vol_lookup_pred)
# updt
vol_lookup_pred <- vol_lookup_pred %>% 
  dplyr::mutate(
    net_volume_ft3 = ifelse(net_volume_ft3<0,0,net_volume_ft3)
    , grs_volume_ft3 = ifelse(grs_volume_ft3<0,0,grs_volume_ft3)
  ) %>% 
  dplyr::mutate(
    dplyr::across(
      tidyselect::ends_with("volume_ft3")
       # 1 cubic foot = 0.0283168 cubic meters
      , ~ .x*0.0283168
      , .names = "{.col}_m3"
    )
    , species_symbol = factor(species_symbol)
  ) %>% 
  dplyr::rename_with(
    .cols = tidyselect::ends_with("volume_ft3_m3")
    , ~ stringr::str_remove(.x,"_ft3")
  ) %>% 
  dplyr::rename_with(
    .cols = tidyselect::contains("_volume_")
    , ~ paste0("pred_",.x,recycle0 = T)
  )
# vol_lookup_pred %>% dplyr::glimpse()

here’s a plot of the imputed volume for each species which we’ll use when we don’t have a DBH, height, species observation in the FIA data

# plot preds
vol_lookup_pred %>% 
  dplyr::filter(
    # species_symbol==vol_lookup_pred$species_symbol[1]
    # , species_symbol=="POTR5"
    dia<=60
    , ht<=100
    , species_symbol %in% unique(toupper(validation_trees$spp))
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      y = dia
      , x = ht
      , fill = pred_net_volume_ft3
    )
  ) +
  ggplot2::geom_tile(
    color = NA
    , width = 1 # 5
    , height = 1
  ) +
  # ggplot2::geom_text(color = "white", size = 3) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(species_symbol)
    # , ncol = 2
    # , scales = "free"
    , axes = "all"
  ) + 
  ggplot2::scale_x_continuous(expand = c(0, 0), breaks = scales::breaks_extended(n=10)) +
  ggplot2::scale_y_continuous(expand = c(0, 0), breaks = scales::breaks_extended(n=10)) +
  # ggplot2::scale_fill_viridis_c(option = "mako", direction=-1, begin = 0.2, end = 0.8) +
  ggplot2::scale_fill_distiller(palette = "Greens", direction = 1, labels = scales::comma) +
  ggplot2::labs(
    x = "height (ft)"
    , y = "DBH (in)"
    , fill = "IMPUTED\nnet volume (ft3)"
  ) +
  ggplot2::theme_light() + 
  ggplot2::theme(
    legend.position = "top"
    # , axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust = 1)
    , panel.background = ggplot2::element_blank()
    , panel.grid = ggplot2::element_blank()
    , plot.subtitle = ggplot2::element_text(hjust = 0.5)
    , strip.text = ggplot2::element_text(color = "black", face = "bold")
  )

Field Data

get the volume for the field data (remembering we’re working in imperial units)

validation_trees <- 
  validation_trees %>% 
  # increments for dbh,ht
  dplyr::mutate(
    dia = floor(field_dbh_in) %>% as.integer()
    , ht = floor(field_tree_height_ft) %>% as.integer()
    , species_symbol = toupper(spp) %>% 
      stringr::str_squish() %>% 
      stringr::word() %>% 
      stringr::str_replace("JUSC","JUSC2")
  ) %>% 
  # add on preds
  dplyr::left_join(
    vol_lookup_pred
    , dplyr::join_by(species_symbol,ht,dia)
    , relationship = "many-to-one"
  ) %>% 
  # add on fia
  dplyr::left_join(
    vol_lookup %>% dplyr::select(species_symbol,ht,dia,tidyselect::contains("_volume_"))
    , dplyr::join_by(species_symbol,ht,dia)
    , relationship = "many-to-one"
  ) %>% 
  # fill in missing fia volume
  dplyr::mutate(
    net_volume_ft3 = ifelse(is.na(net_volume_ft3), pred_net_volume_ft3, net_volume_ft3) %>% 
      dplyr::coalesce(0) # smallest of trees
    , grs_volume_ft3 = ifelse(is.na(grs_volume_ft3), pred_grs_volume_ft3, grs_volume_ft3) %>% 
      dplyr::coalesce(0) # smallest of trees
    , net_volume_m3 = ifelse(is.na(net_volume_m3), pred_net_volume_m3, net_volume_m3) %>% 
      dplyr::coalesce(0) # smallest of trees
    , grs_volume_m3 = ifelse(is.na(grs_volume_m3), pred_grs_volume_m3, grs_volume_m3) %>% 
      dplyr::coalesce(0) # smallest of trees
  ) %>% 
  dplyr::select(
    -tidyselect::starts_with("pred_net_volume")
    ,-tidyselect::starts_with("pred_grs_volume")
    , -c(dia,ht)
  )
# validation_trees %>% dplyr::glimpse()
# add all these volumes to the agg data
agg_ground_truth_match_ans <- 
  agg_ground_truth_match_ans %>% 
  dplyr::bind_cols(
    validation_trees %>% 
      sf::st_drop_geometry() %>% 
      dplyr::ungroup() %>% 
      dplyr::summarise(
        dplyr::across(
          c(tidyselect::ends_with("_volume_ft3"),tidyselect::ends_with("_volume_m3"))
          , .fns = sum
        )
      ) %>% 
      dplyr::rename_with(.cols=dplyr::everything(),~paste0("ref_",.x,recycle0 = T))
  )

let’s check out the stand summary of volume

validation_trees %>% 
  sf::st_drop_geometry() %>% 
  dplyr::group_by(stand_area_m2) %>% 
  dplyr::summarise(
    dplyr::across(
      c(tidyselect::ends_with("_volume_ft3"),tidyselect::ends_with("_volume_m3"))
      , .fns = sum
    )
    , n_trees = dplyr::n()
  ) %>% 
  # add area
  dplyr::mutate(stand_area_ha = stand_area_m2/10000) %>% 
  dplyr::mutate(
    stand_area_ha = stand_area_ha %>% round(1) %>% scales::comma(accuracy = 0.1)
    , n_trees = scales::comma(n_trees,accuracy=1)
    , dplyr::across(
      .cols = tidyselect::contains("_volume_")
      , ~ scales::comma(.x,accuracy=0.1)
    )
  ) %>% 
  dplyr::select(
    stand_area_ha, n_trees
    , tidyselect::ends_with("_volume_ft3"),tidyselect::ends_with("_volume_m3")
  ) %>% 
  kableExtra::kbl(
    caption = "Field Data: Volume-Focused Stand Summary"
    , escape = F
    , digits = 1
    , col.names = c(
      "Hectares", "# trees"
      , rep(c("net", "gross"), 2)
    )
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::add_header_above(
    c(
      " "=2
      , "Volume<br />ft<sup>3</sup>" = 2
      , "Volume<br />m<sup>3</sup>" = 2
    )
    , escape = F
  ) %>% 
  kableExtra::column_spec( seq(2,6,by=2), border_right = TRUE, include_thead = TRUE)
Field Data: Volume-Focused Stand Summary

Volume
ft3

Volume
m3

Hectares # trees net gross net gross
9.3 5,217 56,386.2 57,076.5 1,596.7 1,616.2

Predicted Data

Species

cloud2trees does not currently include functionality to attach species to it’s tree list. there are a few paths forward for performing species-specific workflows with the cloud2trees tree list:

  1. use the FIA forest type group attached to the tree list via cloud2trees::cloud2trees(..., estimate_tree_type = T) or cloud2trees::trees_type(). since this process currently only attributes trees with a forest type group, we would need to manually map these groups to a specific species, for example, by selecting the main species for a given type.
  2. use the regional_dbh_height_model_training_data.csv file written to the point cloud processing directory when we execute cloud2trees::cloud2trees(..., estimate_tree_dbh = T) or cloud2trees::trees_dbh() to proportionally allocate tree species to the tree list randomly. For example, if there are 66% PIPO and 34% PSME in the regional_dbh_height_model_training_data.csv data, then we could randomly assign 66% of the cloud2trees tree list as PIPO and 34% as PSME
  3. use the regional_dbh_height_model_training_data.csv file written to the point cloud processing directory when we execute cloud2trees::cloud2trees(..., estimate_tree_dbh = T) or cloud2trees::trees_dbh() to build a species classification model based on height that would inherently account for the proportional distribution of different species. For example, using Softmax regression or a random forest classifier and then use that model to predict the tree species for our cloud2trees list
Proportional Species

let’s start by trying out option (2) which is to simply proportionally allocate the tree species to our tree list

here’s what that regional_dbh_height_model_training_data.csv file includes, fyi

readr::read_csv(
  file.path(
    tracking_df$path[1]
    , "point_cloud_processing_delivery"
    , "regional_dbh_height_model_training_data.csv"
  )
  , show_col_types = F
  , progress = F
) %>% 
dplyr::glimpse()
## Rows: 276
## Columns: 7
## $ tm_id          <dbl> 4891, 4891, 4891, 4891, 4891, 4891, 4891, 4891, 4891, 4…
## $ cn             <dbl> 3.537319e+13, 3.537319e+13, 3.537319e+13, 3.537319e+13,…
## $ species_symbol <chr> "PIPO", "PIPO", "PIPO", "PIPO", "PIPO", "PIPO", "PIPO",…
## $ tree_weight    <dbl> 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,…
## $ dbh_cm         <dbl> 57.912, 22.606, 12.700, 13.462, 9.652, 18.542, 16.256, …
## $ tree_height_m  <dbl> 20.4216, 10.6680, 7.0104, 10.3632, 5.7912, 8.8392, 8.22…
## $ which_treemap  <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2…

do it (attach species) for all predictions

ground_truth_prediction_match_ans_updt <-
  tracking_df$folder %>% 
  purrr::map(function(x){
    # read training
    training_df <-
      readr::read_csv(
        file.path(
          tracking_df %>% dplyr::filter(folder==x) %>% dplyr::pull(path)
          , "point_cloud_processing_delivery"
          , "regional_dbh_height_model_training_data.csv"
        )
        , show_col_types = F
        , progress = F
      ) %>% 
      # get proportion by species
      dplyr::group_by(species_symbol) %>% 
      dplyr::summarise(tree_weight = sum(tree_weight)) %>% 
      dplyr::ungroup() %>% 
      dplyr::mutate(species_proportion = tree_weight/sum(tree_weight))
    # training_df %>% dplyr::glimpse()  
    # make a df of species to join with our predictions
    pred_ids <-
      ground_truth_prediction_match_ans[[x]]$pred_match_idx %>% 
      purrr::discard(~.x==0)
    set.seed(33)
    pred_spec_df <-
      dplyr::tibble(
        pred_match_idx = pred_ids
        , pred_species_symbol = 
          sample(
            x = training_df$species_symbol
            , size = length(pred_ids)
            , prob = training_df$species_proportion
            , replace = T
          ) 
      )
    # add on to precitions
    ground_truth_prediction_match_ans <- 
      ground_truth_prediction_match_ans[[x]] %>% 
      dplyr::left_join(
        pred_spec_df
        , by = "pred_match_idx"
      ) 
    # add volume
    ground_truth_prediction_match_ans <- 
      ground_truth_prediction_match_ans %>% 
      # increments for dbh,ht
      dplyr::mutate(
        dia = floor(pred_dbh_cm*0.394) %>% as.integer()
        , ht = floor(pred_tree_height_m*3.281) %>% as.integer()
        , species_symbol = pred_species_symbol
      ) %>% 
      # add on preds
      dplyr::left_join(
        vol_lookup_pred
        , dplyr::join_by(species_symbol,ht,dia)
        , relationship = "many-to-one"
      ) %>% 
      # add on fia
      dplyr::left_join(
        vol_lookup %>% dplyr::select(species_symbol,ht,dia,tidyselect::contains("_volume_"))
        , dplyr::join_by(species_symbol,ht,dia)
        , relationship = "many-to-one"
      ) %>% 
      # fill in missing fia volume
      dplyr::mutate(
        net_volume_ft3 = ifelse(is.na(net_volume_ft3), pred_net_volume_ft3, net_volume_ft3) %>% 
          # any species not in the FIA data are set to zero (probs not timber...JUSC2,QUGA)
          dplyr::coalesce(0) # smallest of trees
        , grs_volume_ft3 = ifelse(is.na(grs_volume_ft3), pred_grs_volume_ft3, grs_volume_ft3) %>% 
          # any species not in the FIA data are set to zero (probs not timber...JUSC2,QUGA)
          dplyr::coalesce(0) # smallest of trees
        , net_volume_m3 = ifelse(is.na(net_volume_m3), pred_net_volume_m3, net_volume_m3) %>% 
          # any species not in the FIA data are set to zero (probs not timber...JUSC2,QUGA)
          dplyr::coalesce(0) # smallest of trees
        , grs_volume_m3 = ifelse(is.na(grs_volume_m3), pred_grs_volume_m3, grs_volume_m3) %>% 
          # any species not in the FIA data are set to zero (probs not timber...JUSC2,QUGA)
          dplyr::coalesce(0) # smallest of trees
      ) %>% 
      dplyr::select(
        -tidyselect::starts_with("pred_net_volume")
        ,-tidyselect::starts_with("pred_grs_volume")
        , -c(dia,ht,species_symbol)
      ) %>% 
      dplyr::rename_with(
        .cols = c(tidyselect::starts_with("grs_volume_"),tidyselect::starts_with("net_volume_"))
        , ~ paste0("pred_",.x,recycle0 = T)
      )
    return(ground_truth_prediction_match_ans)
  })
names(ground_truth_prediction_match_ans_updt) <- tracking_df$folder

let’s check out an example distribution of species for one of the UAS flights

# wut
ground_truth_prediction_match_ans_updt[[1]] %>%
  dplyr::filter(pred_match_idx!=0) %>% 
  sf::st_drop_geometry() %>% 
  dplyr::count(pred_species_symbol) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    pct = scales::percent(n/sum(n), accuracy = 0.1)
    , n = scales::comma(n)
  ) %>% 
  kableExtra::kbl(
    caption = "predicted species distribution based on proportional allocation"
    , col.names = c("species symbol","n", "%")
  ) %>% 
  kableExtra::kable_styling()
predicted species distribution based on proportional allocation
species symbol n %
ABCO 176 3.3%
JUSC2 16 0.3%
PIFL2 30 0.6%
PIPO 3,960 75.0%
POTR5 352 6.7%
PSME 605 11.5%
QUGA 143 2.7%

looks good

aggregate predicted volume within the stand by flight and attach to the aggregated data so we can calculate prediction error

agg_ground_truth_match_ans <- 
  agg_ground_truth_match_ans %>% 
  dplyr::left_join(
    ground_truth_prediction_match_ans_updt %>%
      purrr::list_rbind(names_to = "folder") %>%
      sf::st_drop_geometry() %>% 
      dplyr::filter(match_grp %in% c("true positive","commission")) %>% 
      dplyr::select(
        folder
        , pred_match_idx
        , c(tidyselect::starts_with("pred_") & tidyselect::contains("_volume_"))
      ) %>% 
      dplyr::group_by(folder) %>% 
      dplyr::summarize(
        dplyr::across(
          .cols = tidyselect::contains("_volume_")
          , sum
        )
        # , n_trees = dplyr::n()
      ) %>% 
      dplyr::ungroup()
    , by = "folder"
  ) %>% 
  # diff cols
  dplyr::mutate(
    # 'diff_' columns are calculated as the predicted value minus the actual value
    diff_net_volume_m3 = pred_net_volume_m3-ref_net_volume_m3
    , diff_grs_volume_m3 = pred_grs_volume_m3-ref_grs_volume_m3
    # 'abs_diff_' columns are calculated as the predicted value minus the actual value
    , abs_diff_net_volume_m3 = abs(diff_net_volume_m3)
    , abs_diff_grs_volume_m3 = abs(diff_grs_volume_m3)
    # 'pct_diff_' columns are calculated as the predicted value minus the actual value divided by the actual value
    , pct_diff_net_volume_m3 = (pred_net_volume_m3-ref_net_volume_m3)/ref_net_volume_m3
    , pct_diff_grs_volume_m3 = (pred_grs_volume_m3-ref_grs_volume_m3)/ref_grs_volume_m3
  )

table that

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    ref_trees = tp_n+fn_n
    , pred_trees = tp_n+fp_n
  ) %>% 
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , site_area_m2
    # detection
    , ref_trees
    , pred_trees
    # quantification
    , tidyselect::ends_with("_volume_m3")
  ) %>% 
  dplyr::select(!tidyselect::starts_with("abs_diff_")) %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "On"
      , !flight_tf ~ "Off"
      , T ~ "error"
    )
    , dplyr::across(
      .cols = c(
        tidyselect::starts_with("pct_diff_")
        , tidyselect::ends_with("_rate")
      )
      , .fn = ~ scales::percent(.x, accuracy = .1)
    )
    , dplyr::across(
      .cols = c(flight_agl,flight_mph,tidyselect::ends_with("_trees"))
      , .fn = ~ scales::comma(.x, accuracy = 1)
    )
    , dplyr::across(
      .cols = dplyr::where(is.numeric)
      , .fn = ~ scales::comma(.x, accuracy = 0.1)
    )
    # # one column for diff/pct
    , diff_net_volume_m3 = paste0(diff_net_volume_m3, "<br>(", pct_diff_net_volume_m3, ")")
    , diff_grs_volume_m3 = paste0(diff_grs_volume_m3, "<br>(", pct_diff_grs_volume_m3, ")")
  ) %>% 
  dplyr::select(
    !tidyselect::starts_with("pct_diff_")
  ) %>% 
  dplyr::arrange(dataset_factor) %>% 
  dplyr::select(
    -c(
      dataset_factor
    )
  ) %>% 
  dplyr::select(
    tidyselect::starts_with("flight_")
    , tidyselect::ends_with("_trees")
    , tidyselect::contains("grs_volume")
    , tidyselect::contains("net_volume")
  ) %>% 
  # dplyr::glimpse()
  kableExtra::kbl(
    caption = "Volume: Stand-Level Accuracy"
    , col.names = c(
      "altitude (ft)", "speed (mph)", "Terrain Follow"
      , "reference", "predicted"
      , rep(c("reference","predicted","difference"), times = 2)
    )
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling(font_size = 11.5) %>% 
  kableExtra::add_header_above(
    c(
      " "=3
      , "# Trees" = 2
      # , "Area" = 3
      , "Gross<br />Volume m<sup>3</sup>" = 3
      , "Net<br />Volume m<sup>3</sup>" = 3
    )
    , escape = F
  ) %>% 
  kableExtra::column_spec( c(3,seq(5,11,by=3) ), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Volume: Stand-Level Accuracy
# Trees

Gross
Volume m3

Net
Volume m3

altitude (ft) speed (mph) Terrain Follow reference predicted reference predicted difference reference predicted difference
200 10 On 5,217 5,282 1,616.2 1,741.1 124.9
(7.7%)
1,596.7 1,721.9 125.2
(7.8%)
20 Off 5,217 5,528 1,616.2 1,769.7 153.4
(9.5%)
1,596.7 1,753.1 156.4
(9.8%)
On 5,217 5,500 1,616.2 1,746.1 129.9
(8.0%)
1,596.7 1,727.9 131.2
(8.2%)
300 10 On 5,217 5,421 1,616.2 1,769.9 153.7
(9.5%)
1,596.7 1,751.5 154.8
(9.7%)
20 Off 5,217 5,508 1,616.2 1,772.0 155.7
(9.6%)
1,596.7 1,754.1 157.5
(9.9%)
400 10 On 5,217 5,449 1,616.2 1,774.5 158.3
(9.8%)
1,596.7 1,756.3 159.6
(10.0%)
20 Off 5,217 5,592 1,616.2 1,778.8 162.6
(10.1%)
1,596.7 1,757.4 160.8
(10.1%)
On 5,217 5,531 1,616.2 1,753.3 137.0
(8.5%)
1,596.7 1,732.9 136.3
(8.5%)

plot

agg_ground_truth_match_ans %>% 
  dplyr::select(
    dataset_factor
    , pct_diff_net_volume_m3
    , pct_diff_grs_volume_m3
  ) %>% 
  dplyr::mutate(
    tot = -1* (abs(pct_diff_net_volume_m3) + abs(pct_diff_grs_volume_m3))
  ) %>% 
  tidyr::pivot_longer(cols = -c(dataset_factor,tot)) %>% 
  dplyr::mutate(
    name = dplyr::recode_values(
      name
      , "pct_diff_net_volume_m3" ~ latex2exp::TeX("Net volume $m^{3}$", output = "character")
      , "pct_diff_grs_volume_m3" ~ latex2exp::TeX("Gross volume $m^{3}$", output = "character")
    )
    , value_lab = scales::percent(value,accuracy=0.1)
  ) %>% 

ggplot2::ggplot(
  mapping = ggplot2::aes(
    y = reorder(dataset_factor, tot)
    , x = value
    , group = name
  )
) +
  ggplot2::geom_vline(xintercept = 0, color = "gray95", lwd = 1.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.05,-0.05), color = "gray77", lwd = 0.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.1,-0.1), color = "gray55", lwd = 0.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.15,-0.15), color = "gray33", lwd = 0.5, alpha = 0.7) +
  ggplot2::geom_vline(xintercept = c(0.2,-0.2), color = "gray11", lwd = 0.5) +
  ggplot2::geom_segment(
    mapping = ggplot2::aes(yend = dataset_factor, xend = 0,color = value > 0)
    # , color = "gray44"
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = value > 0)
    , size = 3
  ) +
  # lhs labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,value<0)}
    , mapping = ggplot2::aes(label = value_lab, fontface = "bold")
    # , color = "black"
    , size = 3
    , nudge_x = -0.006
    , hjust = 1
  ) +
  # rhs labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,value>=0)}
    , mapping = ggplot2::aes(label = value_lab, fontface = "bold")
    # , color = "black"
    , size = 3
    , nudge_x = 0.006
    , hjust = 0
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(name)
    # , labeller = ggplot2::label_parsed
    , labeller = ggplot2::as_labeller(
      function(x) {
        paste0("atop(", x, ", 'percent error')")
      }
      , default = ggplot2::label_parsed
    )
  ) +
  ggplot2::scale_color_manual(values = c("firebrick","navy")) +
  ggplot2::scale_x_continuous(
    limits = c(-0.2,0.2)
    , labels = scales::percent
    # , breaks = scales::breaks_extended(n=10)
    # , breaks = function(x) seq(min(x), max(x), by = 0.05)
  ) +
  ggplot2::labs(
    y = "" # "comparison"
    , x = "percent error"
    , subtitle = "positive = UAS-measured higher than field-measured\nnegative = UAS-measured lower than field-measured"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , axis.text.y = ggplot2::element_text(size = 9, face = "bold")
    # , plot.subtitle = ggplot2::element_text(hjust = 1)
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
    , panel.grid.major.x = element_blank()
    , panel.grid.minor.x = element_blank()
  )

another plot

agg_ground_truth_match_ans %>% 
  dplyr::mutate(
    flight_tf = dplyr::case_when(
      flight_tf ~ "terrain follow: On"
      , !flight_tf ~ "terrain follow: Off"
      , T ~ "error"
    )
  ) %>% 
  dplyr::select(
    dataset_factor, tidyselect::starts_with("flight_")
    # , diff_basal_area_m2_per_ha
    , tidyselect::ends_with("_net_volume_m3")
  ) %>% 
  dplyr::select(!tidyselect::starts_with("abs_diff_")) %>% 
  tidyr::pivot_longer(cols = c(ref_net_volume_m3, pred_net_volume_m3)) %>% 
  dplyr::mutate(
    name = dplyr::recode_values(
      name
      , "ref_net_volume_m3" ~ "field"
      , "pred_net_volume_m3" ~ "prediction"
    )
    , value_lab = scales::comma(value,accuracy=0.1)
  ) %>% 
  
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = name, y = value, group = dataset_factor)
  ) +
  ggplot2::geom_line(key_glyph = "point", alpha = 0.7, color = "gray", lwd = 1.1) +
  # ref labs
  ggplot2::geom_text(
    data = function(x){dplyr::filter(x,name=="field")}
    , mapping = ggplot2::aes(label = value_lab)
    , size = 3
    , hjust = 0.5, vjust = 2.2
  ) +
  # pred labs
  ggrepel::geom_text_repel(
    data = function(x){dplyr::filter(x,name=="prediction")}
    , mapping = ggplot2::aes(label = value_lab)
    , size = 2.3
    , hjust = 0.5, vjust = 2.2
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = flight_tf)
    # mapping = ggplot2::aes(color = name)
    # , alpha = 0.7
    , size = 4
  ) +
  ggplot2::facet_grid(
    cols = dplyr::vars(flight_agl)
    , rows = dplyr::vars(flight_mph)
    , labeller = ggplot2::labeller( 
      flight_agl = ~paste("Altitude:", .x, " (ft)") 
      , flight_mph = ~paste("Speed:", .x, " (mph)") 
    )
    , axes = "all_x"
  ) +
  # harrypotter::scale_color_hp_d(option = "hermionegranger") +
  ggplot2::scale_color_manual(values = c("gray55","gray11"), guide = "none") +
  ggplot2::scale_y_continuous(
    limits = c(0,NA)
    , labels = scales::comma
    , breaks = scales::breaks_extended(n=10)
    , expand = ggplot2::expansion(mult = c(0.01,0.1))
  ) +
  ggplot2::labs(
    x = "" # "comparison"
    , y = latex2exp::TeX("Net Volume $m^{3}$")
    , color = ""
    , subtitle = latex2exp::TeX("Predicted vs Reference Stand Net Volume ($m^{3}$)")
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , axis.text.x = ggplot2::element_text(size = 10, face = "bold", color = "black")
    , axis.text.y = ggplot2::element_text(size = 7)
    , axis.title.y = ggplot2::element_text(size = 7)
    , strip.text = ggplot2::element_text(size = 10, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(
      override.aes = list(shape = 15, size = 6, linetype = 0)
    )
  )

ok

Predict Species

let’s demo option (3) to use the TreeMap imputed FIA plot tree list to build a species classification model based on height and tree location to attach tree species to our tree list. such a model would inherently account for the proportional distribution of different species. We’ll try using Softmax regression with a model trained on the FIA tree list data to predict the tree species for our cloud2trees list.

For the Softmax regression model the categorical (i.e. nominal) tree species will be our dependent variable and the most simple model will use tree height and tree location (X and Y coordinates) as independent variables. The training FIA tree list data (identified from the TreeMap imputed plots of the area of interest) includes tree height and we’ll use the centroid of the TreeMap 30m raster cell to represent the individual tree X and Y location (i.e. each tree within the same cell will have the same coordinate). Considering the combined tree height and spatial location via a full interaction model (i.e. height*x*y) allows the model to capture how the species-height relationship varies across space.

An alternative model specification would be to use a hierarchical model grouped by imputed FIA plot identifier that allows the intercept and the slope of height to vary by plot. This model form would inherently capture spatial clustering without needing explicit X and Y coordinates in the model while also allowing the use of model weights to handle redundant TreeMap imputed plots. However, hierarchical models in brms can be computationally expensive because the model has to estimate a separate coefficients for every species. Thus, we’ll start with the interaction model (height*x*y) and treat space as a smooth surface since interpretation of the model coefficients is not really needed for our purposes, just the predictions.

Training Data

load in the TreeMap data and clip to our trees as in cloud2trees::trees_dbh()

# where?
treemap_data_finder_ans_temp <- cloud2trees:::find_ext_data()[["treemap_dir"]] %>% 
  cloud2trees:::treemap_data_finder()
# treemap_data_finder_ans_temp

# read in treemap data
# read in treemap (no memory is taken)
treemap_rast <- terra::rast(treemap_data_finder_ans_temp$treemap_rast)
# treemap_rast
### filter treemap based on trees now in memory
treemap_rast <- treemap_rast %>%
  terra::crop(
    predicted_trees %>%
      purrr::list_rbind() %>% 
      sf::st_bbox() %>%
      sf::st_as_sfc() %>%
      sf::st_buffer(50) %>%
      sf::st_union() %>%
      sf::st_transform(terra::crs(treemap_rast)) %>%
      terra::vect()
  ) %>%
  terra::subset(1)
# treemap_rast
# terra::plot(treemap_rast)
# terra::plot(treemap_rast, levels = NULL)
# treemap_rast %>% 
#   terra::catalyze() %>% 
#   terra::subset(1) %>% 
#   terra::as.factor() %>%
#   terra::plot()

### get weights for weighting each tree in the population models
# treemap id = tm_id for linking to tabular data
tm_id_weight_temp <- treemap_rast %>% ## works
  terra::values() %>%
  table() %>%
  dplyr::as_tibble() %>%
  dplyr::rename(tm_id=1,tree_weight=n) %>%
  dplyr::mutate(tm_id = cloud2trees:::as_character_safe(tm_id))
# tm_id_weight_temp

############################################################################
### get the TreeMap FIA tree list for only the plots included
############################################################################
if(treemap_data_finder_ans_temp$which_treemap==2022){
  treemap_cols <- c(
    "TM_ID"
    , "PLT_CN"
    , "SPECIES_SYMBOL"
    , "STATUSCD"
    , "DIA"
    , "HT"
    , "CR" # crown ratio
  )
}else if(treemap_data_finder_ans_temp$which_treemap==2016){
  treemap_cols <- c(
    "tm_id"
    , "CN"
    , "SPECIES_SYMBOL"
    , "STATUSCD"
    , "DIA"
    , "HT"
    , "CR" # crown ratio
  )
}else{
  stop("unknown TreeMap vintage")
}
### read it
# treemap_data_finder_ans$treemap_trees
treemap_trees_df <-
  readr::read_csv(
    treemap_data_finder_ans_temp$treemap_trees
    , col_select = treemap_cols
    , progress = F
    , show_col_types = F
  ) %>%
  dplyr::rename_with(tolower)
# rename cn to fit with original table str
if(treemap_data_finder_ans_temp$which_treemap==2022){
  treemap_trees_df <- treemap_trees_df %>% dplyr::rename(cn = plt_cn)
}
# clean it
treemap_trees_df <-
  treemap_trees_df %>%
  dplyr::mutate(
    cn = cloud2trees:::as_character_safe(cn)
    , tm_id = cloud2trees:::as_character_safe(tm_id)
    , cr = ifelse(cr>100|cr<0,NA,cr)*0.01
  ) %>%
  dplyr::inner_join(
    tm_id_weight_temp
    , by = dplyr::join_by("tm_id")
  ) %>%
  dplyr::filter(
    # keep live trees only: 1=live;2=dead
    statuscd == 1
    & !is.na(dia)
    & !is.na(ht)
    & !is.na(tree_weight)
  ) %>%
  dplyr::mutate(
    dbh_cm = dia*2.54
    , tree_height_m = ht*0.3048
    , tm_id = as.factor(tm_id)
    , species_symbol = factor(species_symbol)
    # # normalize ht
    # , ht_wt_mean = sum(tree_height_m * tree_weight) / sum(tree_weight)
    # , ht_wt_sd = sqrt( sum(tree_weight * (tree_height_m - ht_wt_mean)^2 ) / sum(tree_weight) )
    # , ht_z = (tree_height_m - ht_wt_mean) / ht_wt_sd
  ) %>%
  dplyr::select(-c(statuscd,dia,ht,cr)) # maybe we can use cr in a future version with and NSUR approach

# !!!!!!! use this if using hierarchical model with slope and intercept varying by plot and weights for repeated plot ids
# !!!!!!! since spatial location is captured by the plot id (tm_id)
# treemap_trees_df %>% dplyr::glimpse()

# we need the spatial information from the raster...
treemap_tree_list_df <- 
  treemap_rast %>% 
  terra::catalyze() %>% 
  terra::subset(1) %>% 
  terra::as.data.frame(xy = T) %>% 
  dplyr::rename(tm_id=3) %>% 
  dplyr::mutate(
    tm_id = cloud2trees:::as_character_safe(tm_id)
  ) %>% 
  # add on the tree data in which a row is unique by tm_id and the tree (which doesn't have an id so we'll fake one)
  dplyr::inner_join(
    treemap_trees_df %>% 
      dplyr::ungroup() %>% 
      dplyr::mutate(
        tree_id = dplyr::row_number()
        , tm_id = as.character(tm_id)
      )
    , by = "tm_id"
    , relationship = "many-to-many"
  ) %>% 
  dplyr::mutate(
    # to factor
    tm_id = forcats::fct_relevel(tm_id, levels(treemap_trees_df$tm_id) )
  )
  # a row is now unique by x,y,tree_id and the number of rows per tree_id = tree_weight

what’s in the FIA tree list using the TreeMap imputed plots filtered for our area of interest?

treemap_tree_list_df %>% dplyr::glimpse()
## Rows: 6,237
## Columns: 9
## $ x              <dbl> -779640, -779640, -779640, -779640, -779640, -779640, -…
## $ y              <dbl> 1826640, 1826640, 1826640, 1826640, 1826640, 1826640, 1…
## $ tm_id          <fct> 5291, 5291, 5291, 5291, 5291, 5291, 5291, 5291, 5291, 5…
## $ cn             <chr> "37263092010690", "37263092010690", "37263092010690", "…
## $ species_symbol <fct> PIPO, PIPO, PIPO, PIPO, PIPO, PIPO, PIFL2, ABCO, PSME, …
## $ tree_weight    <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,…
## $ dbh_cm         <dbl> 33.782, 31.242, 23.876, 28.956, 51.816, 19.812, 18.034,…
## $ tree_height_m  <dbl> 15.5448, 11.2776, 11.2776, 15.5448, 18.5928, 8.8392, 10…
## $ tree_id        <int> 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,…

this is essentially the same thing as regional_dbh_height_model_training_data.csv that we reviewed above, but we added the spatial x and y coordinates

before we build the model we need to scale the x, y, and height data but we need to be sure to scale both the training and data to be predicted using the same scale so we use data on the same scale across both data. Centering and scaling the x, y, and height variables puts the center of the area of interest (weighted by number of trees) at the “zero” point and resolves the sign (i.e. +/-) conflicts of the coordinate system (i.e. UTMs) by making the signs represent relative direction (north vs south) rather than arbitrary coordinate origins. we also need to convert the tree species in the training data to factor from character data type.

# scale fn
my_scale_fn <- function(x, ref_mean, ref_sd) {
  (x - ref_mean) / ref_sd
}

# ref values from training data
train_stats_temp <- list(
  x_mean = mean(treemap_tree_list_df$x, na.rm = T)
  , x_sd   = sd(treemap_tree_list_df$x, na.rm = T)
  , y_mean = mean(treemap_tree_list_df$y, na.rm = T)
  , y_sd   = sd(treemap_tree_list_df$y, na.rm = T)
  , h_mean = mean(treemap_tree_list_df$tree_height_m, na.rm = T)
  , h_sd   = sd(treemap_tree_list_df$tree_height_m, na.rm = T)
)
# train_stats_temp

# apply to training data
treemap_tree_list_df <- 
  treemap_tree_list_df %>% 
  dplyr::mutate(
    x_z = my_scale_fn(x, train_stats_temp$x_mean, train_stats_temp$x_sd)
    , y_z = my_scale_fn(y, train_stats_temp$y_mean, train_stats_temp$y_sd)
    , ht_z = my_scale_fn(tree_height_m, train_stats_temp$h_mean, train_stats_temp$h_sd)
    , species_symbol = factor(species_symbol)
    , tm_id = as.factor(tm_id)
  )
# treemap_tree_list_df %>% dplyr::glimpse()
# treemap_tree_list_df %>% dplyr::select(x_z,y_z,ht_z) %>% summary()
# apply to prediction data
# !!! use the train_stats_temp here, NOT the prediction data mean/sd
# predicted_trees %>% purrr::list_rbind(names_to = "folder") %>% dplyr::glimpse()

predicted_trees <-
  predicted_trees %>% 
  purrr::map(function(x){
    orig_crs <- sf::st_crs(x)
    # z val of x,y,ht
    dta_temp <- 
      x %>% 
      sf::st_transform(terra::crs(treemap_rast)) %>% 
      dplyr::mutate(
        xxx = sf::st_coordinates(.)[,1]
        , yyy = sf::st_coordinates(.)[,2]
        , x_z = my_scale_fn(xxx, train_stats_temp$x_mean, train_stats_temp$x_sd)
        , y_z = my_scale_fn(yyy, train_stats_temp$y_mean, train_stats_temp$y_sd)
        , ht_z = my_scale_fn(tree_height_m, train_stats_temp$h_mean, train_stats_temp$h_sd)
      ) %>% 
      dplyr::select(-c(xxx,yyy))
    # add tm_id
    tmids_temp <- 
      terra::extract(
        x = treemap_rast %>% 
          terra::catalyze() %>% 
          terra::subset(1)
        , y = dta_temp %>% 
          terra::vect()
      ) %>% 
      dplyr::rename_with(tolower) %>% 
      dplyr::mutate(
        tm_id = cloud2trees:::as_character_safe(tm_id) %>% 
          forcats::as_factor() %>% 
          forcats::fct_expand( levels(treemap_trees_df$tm_id) ) %>% 
          forcats::fct_relevel( levels(treemap_trees_df$tm_id) )
      )
    dta_temp$tm_id <- tmids_temp$tm_id
    # orig_crs
    dta_temp <- dta_temp %>% sf::st_transform(orig_crs)
    return(dta_temp)
  })
# predicted_trees %>% purrr::list_rbind(names_to = "folder") %>% dplyr::glimpse()
# predicted_trees[[1]] %>% ggplot2::ggplot() + ggplot2::geom_sf(mapping = ggplot2::aes(color=tm_id))

# align the factors
validation_trees <- 
  validation_trees %>% 
  dplyr::mutate(
    species_symbol = 
      species_symbol %>% 
      forcats::as_factor() %>% 
      forcats::fct_expand( levels(treemap_trees_df$species_symbol) ) %>% 
      forcats::fct_relevel( levels(treemap_trees_df$species_symbol) )
  )

pal_species <- 
  forcats::lvls_union(list(
    validation_trees$species_symbol
    , treemap_trees_df$species_symbol
  )) %>% 
  length() %>% 
  # RColorBrewer::brewer.pal(name = "Set2") %>% 
  viridis::turbo() %>% 
  setNames(
    forcats::lvls_union(list(
      validation_trees$species_symbol
      , treemap_trees_df$species_symbol
    )) %>% 
    as.character()
  )
# scales::show_col(pal_species)

let’s quickly look at the training data predictions over the independent variables of height and x, y

# x,y
plt1_temp <-
  treemap_tree_list_df %>% 
  dplyr::slice_sample(n = 1111) %>% 
  dplyr::mutate(flab = "Y coord vs X coord") %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = x_z, y = y_z, fill = species_symbol, color = species_symbol, label = species_symbol)
  ) +
  ggplot2::geom_hline(yintercept = 0, color = "black") + 
  ggplot2::geom_vline(xintercept = 0, color = "black") + 
  # ggplot2::geom_text(size = 3) + 
  ggrepel::geom_text_repel(
    size = 1.8
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    , max.overlaps = Inf
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(flab)) +
  # ggplot2::scale_color_discrete(palette = "Set2") + 
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::coord_equal() + 
  ggplot2::labs(
    x = "x", y = "y"
  ) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
  )
# plt1_temp

# ht,x
plt2_temp <-
  treemap_tree_list_df %>% 
  dplyr::slice_sample(n = 1111) %>% 
  dplyr::mutate(flab = "Height vs X coord") %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = x_z, y = ht_z, fill = species_symbol, color = species_symbol, label = species_symbol)
  ) +
  ggplot2::geom_hline(yintercept = 0, color = "black") + 
  ggplot2::geom_vline(xintercept = 0, color = "black") + 
  # ggplot2::geom_text(size = 1.8) + 
  ggrepel::geom_text_repel(
    size = 1.8
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    , max.overlaps = Inf
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(flab)) +
  # ggplot2::scale_color_discrete(palette = "Set2") + 
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::coord_equal() + 
  ggplot2::labs(
    x = "x", y = "tree height (z)"
  ) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
  )
# plt2_temp

# ht,y
plt3_temp <-
  treemap_tree_list_df %>% 
  dplyr::slice_sample(n = 1111) %>% 
  dplyr::mutate(flab = "Height vs Y coord") %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = y_z, y = ht_z, fill = species_symbol, color = species_symbol, label = species_symbol)
  ) +
  ggplot2::geom_hline(yintercept = 0, color = "black") + 
  ggplot2::geom_vline(xintercept = 0, color = "black") + 
  # ggplot2::geom_text(size = 1.8) + 
  ggrepel::geom_text_repel(
    size = 1.8
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    , max.overlaps = Inf
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(flab)) +
  # ggplot2::scale_color_discrete(palette = "Set2") + 
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::coord_equal() + 
  ggplot2::labs(
    x = "y", y = "tree height (z)"
  ) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
  )
# plt3_temp
patchwork::wrap_plots(list(plt2_temp,plt3_temp,plt1_temp), ncol = 2)

# ggplot2::ggsave(filename = "../data/spec_predictors_ex.jpg", height = 10.5, width = 9, dpi = "print")

Predictions

Softmax model it

# softmax regression with spatial interaction
brms_spec_i_mod <- brms::brm(
  formula = species_symbol ~ ht_z * (x_z + y_z)
  , data = treemap_tree_list_df
  , family = brms::categorical(link = "logit")
  # , prior = c(
  #   brms::prior(normal(0, 5), class = "b"),
  #   brms::prior(normal(0, 5), class = "Intercept")
  # )
  , iter = 5000, warmup = 2500, chains = 4
  , cores = lasR::half_cores()
  , file = paste0("../data/", "species_model")
  # , file_refit = "always" # uncomment for c2t
)

9.3 hours later…

this model with tree-level data was super slowwwwww and won’t work for integration within cloud2trees

let’s try the hierarchical model grouped by imputed FIA plot identifier that allows the intercept and the slope of height to vary by plot. This model form would inherently capture spatial clustering without needing explicit X and Y coordinates in the model while also allowing the use of model weights to handle redundant TreeMap imputed plots.

# treemap_trees_df %>% dplyr::glimpse()
# hierarchical model with slope and intercept varying by plot and weights for repeated plot ids
# treemap_trees_df %>% dplyr::distinct(tm_id,tree_id,tree_weight,tree_height_m,species_symbol) %>% dplyr::glimpse()
brms_spec_h_mod <- 
  brms::brm(
    formula = brms::bf(species_symbol | weights(tree_weight) ~ 1 + tree_height_m + (1 + tree_height_m | tm_id))
    # formula = brms::bf(species_symbol | weights(tree_weight) ~ 1 + ht_z + (1 + ht_z | tm_id))
    , data = treemap_trees_df
    , family = brms::categorical(link = "logit")
    # , prior = c(
    #   brms::prior(normal(0, 5), class = "b")
    #   brms::prior(exponential(1), class = "sd")
    # )
    , iter = 5000, warmup = 2500, chains = 4
    , cores = lasR::half_cores()
    , file = paste0("../data/", "species_model_h")
    # , control = list(adapt_delta = 0.995)
    # , file_refit = "always" # uncomment for c2t
  )

that ran much faster (~20-30 mins) but still not great on the estimation time

let’s get the model estimates for our predicted tree list so that the trees have a species probabilistically estimated

predicted_trees <-
  predicted_trees %>% 
  purrr::map(function(x){
    # take 1 draw to get a single species per tree
    preds_temp <- brms::posterior_predict(
      object = brms_spec_h_mod
      , newdata = x
      , ndraws = 1
      , cores = lasR::half_cores()
    )
    # preds_temp %>% dplyr::glimpse()
    # attr(preds_temp, "levels")
    
    # preds back to tree list
    dta_temp <-
      x %>% 
      dplyr::mutate(
        species_symbol = 
          preds_temp %>% 
          as.vector() %>% 
          factor(
            levels = seq_along(attr(preds_temp, "levels"))
            , labels = attr(preds_temp, "levels")
          )
      )
    # dta_temp %>% 
    #   sf::st_drop_geometry() %>% 
    #   dplyr::count(species_symbol) %>% 
    #   dplyr::mutate(pct=n/sum(n))
    return(dta_temp)
  })
# predicted_trees %>% purrr::list_rbind(names_to = "folder") %>% dplyr::glimpse()
# predicted_trees[[1]] %>% ggplot2::ggplot() + ggplot2::geom_sf(mapping = ggplot2::aes(color=species_symbol))

let’s look at the predicted species by height and location for a single flight predicted trees

# x,y
plt1_temp <-
  predicted_trees[[1]] %>% 
  dplyr::slice_sample(n = 999) %>% 
  dplyr::mutate(flab = "Y coord vs X coord") %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = x_z, y = y_z, fill = species_symbol, color = species_symbol, label = species_symbol)
  ) +
  ggplot2::geom_hline(yintercept = 0, color = "black") + 
  ggplot2::geom_vline(xintercept = 0, color = "black") + 
  # ggplot2::geom_text(size = 3) + 
  ggrepel::geom_text_repel(
    size = 1.8
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    , max.overlaps = Inf
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(flab)) +
  # ggplot2::scale_color_discrete(palette = "Set2") + 
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::coord_equal() + 
  ggplot2::labs(
    x = "x", y = "y"
  ) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
  )
# plt1_temp

# ht,x
plt2_temp <-
  predicted_trees[[1]] %>% 
  dplyr::slice_sample(n = 999) %>% 
  dplyr::mutate(flab = "Height vs X coord") %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = x_z, y = ht_z, fill = species_symbol, color = species_symbol, label = species_symbol)
  ) +
  ggplot2::geom_hline(yintercept = 0, color = "black") + 
  ggplot2::geom_vline(xintercept = 0, color = "black") + 
  # ggplot2::geom_text(size = 1.8) + 
  ggrepel::geom_text_repel(
    size = 1.8
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    , max.overlaps = Inf
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(flab)) +
  # ggplot2::scale_color_discrete(palette = "Set2") + 
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::coord_equal() + 
  ggplot2::labs(
    x = "x", y = "tree height (z)"
  ) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
  )
# plt2_temp

# ht,y
plt3_temp <-
  predicted_trees[[1]] %>% 
  dplyr::slice_sample(n = 999) %>% 
  dplyr::mutate(flab = "Height vs Y coord") %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = y_z, y = ht_z, fill = species_symbol, color = species_symbol, label = species_symbol)
  ) +
  ggplot2::geom_hline(yintercept = 0, color = "black") + 
  ggplot2::geom_vline(xintercept = 0, color = "black") + 
  # ggplot2::geom_text(size = 1.8) + 
  ggrepel::geom_text_repel(
    size = 1.8
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    , max.overlaps = Inf
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(flab)) +
  # ggplot2::scale_color_discrete(palette = "Set2") + 
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::coord_equal() + 
  ggplot2::labs(
    x = "y", y = "tree height (z)"
  ) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 9, face = "bold")
    , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
  )
# plt3_temp
patchwork::wrap_plots(list(plt2_temp,plt3_temp,plt1_temp), ncol = 2)

# ggplot2::ggsave(filename = "../data/spec_predictors_pred_ex.jpg", height = 10.5, width = 9, dpi = "print")

yep, that resembles the training data. let’s map predicted species in space for each flight

predicted_trees %>%
  purrr::list_rbind(names_to = "folder") %>%
  # sf::st_drop_geometry() %>% dplyr::count(folder) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  dplyr::arrange(dataset_factor, desc(species_symbol)) %>% 
  # plot it
  # plt
  ggplot2::ggplot() +
    ggplot2::geom_sf(
      data = stand_boundary
      , fill = NA, color = "blue"
    ) +
    ggplot2::geom_sf(
      mapping = ggplot2::aes(geometry = geom, color = species_symbol)
      , size = 0.6, alpha = 0.88
    ) +
    ggplot2::scale_color_manual(values = pal_species, drop = F) +
    ggplot2::facet_wrap(
      facets = dplyr::vars(dataset_factor)
      , ncol = 3
    ) +
    ggplot2::theme_light() +
    ggplot2::theme(
      legend.position = "top"
      , legend.title = ggplot2::element_blank()
      , legend.key = ggplot2::element_blank()
      , strip.text = ggplot2::element_text(face = "bold", color = "black", margin = ggplot2::margin(t = 4, b = 4))
      , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
      # , panel.background = ggplot2::element_rect(fill = NA, color = "black")
      # , panel.spacing = ggplot2::unit(1,"lines")
      , panel.grid.major = ggplot2::element_blank()
      , panel.grid.minor = ggplot2::element_blank()
      , axis.text = ggplot2::element_blank()
      , axis.title = ggplot2::element_blank()
      , axis.ticks = ggplot2::element_blank()
    ) +
    ggplot2::guides(
      color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 6, alpha = 1, fill = NA))
      , fill = "none"
    )

# ggplot2::ggsave(filename = "../data/spec_predicted.jpg", height = 9.8, width = 8.7, dpi = "print")

and table that

dta_temp <- predicted_trees %>%
  purrr::list_rbind(names_to = "folder") %>%
  # sf::st_drop_geometry() %>% dplyr::count(folder) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  dplyr::select(
    dataset_factor
    , tidyselect::starts_with("flight_")
    , species_symbol
  ) %>% 
  dplyr::count(dplyr::pick(dplyr::everything())) %>% 
  dplyr::group_by(dataset_factor) %>% 
  dplyr::mutate(pct=n/sum(n,na.rm = T)) %>% 
  dplyr::ungroup() 
# plt
dta_temp %>% 
  dplyr::mutate(
    # # one column for diff/pct
    n = paste0(scales::comma(n,accuracy=1), "<br />(", scales::percent(pct,accuracy=0.1), ")")
  ) %>% 
  dplyr::select(-c(dataset_factor,pct)) %>% 
  tidyr::pivot_wider(values_from = n, names_from = species_symbol) %>% 
  dplyr::rename(
    `altitude (ft)` = flight_agl
    , `speed (mph)` = flight_mph
    , `Terrain Follow` = flight_tf
  ) %>% 
  # dplyr::glimpse()
  kableExtra::kbl(
    caption = "Predicted Species"
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(c(3), border_right = TRUE, include_thead = TRUE) %>% 
  kableExtra::collapse_rows(columns = c(1:2), valign = "top")
Predicted Species
altitude (ft) speed (mph) Terrain Follow ABCO PIFL2 PIPO POTR5 PSME QUGA JUSC2
200 10 TRUE 7
(0.1%)
8
(0.1%)
4,166
(77.8%)
460
(8.6%)
665
(12.4%)
46
(0.9%)
NA
20 FALSE 15
(0.3%)
3
(0.1%)
4,131
(73.8%)
781
(14.0%)
662
(11.8%)
1
(0.0%)
2
(0.0%)
TRUE 10
(0.2%)
2
(0.0%)
4,336
(77.9%)
479
(8.6%)
659
(11.8%)
2
(0.0%)
81
(1.5%)
300 10 TRUE 15
(0.3%)
3
(0.1%)
4,287
(78.1%)
462
(8.4%)
664
(12.1%)
43
(0.8%)
15
(0.3%)
20 FALSE 10
(0.2%)
8
(0.1%)
4,355
(78.1%)
502
(9.0%)
633
(11.4%)
33
(0.6%)
36
(0.6%)
400 10 TRUE 10
(0.2%)
9
(0.2%)
4,157
(75.4%)
395
(7.2%)
653
(11.8%)
272
(4.9%)
17
(0.3%)
20 FALSE 10
(0.2%)
5
(0.1%)
4,140
(73.2%)
750
(13.3%)
569
(10.1%)
185
(3.3%)
NA
TRUE 14
(0.2%)
3
(0.1%)
4,356
(77.7%)
458
(8.2%)
744
(13.3%)
8
(0.1%)
20
(0.4%)

notice the proportional allocation of species varies across the flights even though we used the same training data and species classification model. this variation arises from the uncertainty in the model which is integrated into our predicted tree list via the brms::posterior_predict(... , ndraws = 1) setting.

we can plot the proportional distribution by flight

dta_temp %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    lab = ifelse(
      pct>0.0005
      , paste0(
        scales::percent(pct,accuracy=1) 
        ,"\n("
        , scales::comma(n,accuracy=1) 
        , ")"
      )
      , ""
    )
  ) %>% 
  dplyr::ungroup() %>% 
  # dplyr::glimpse()
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = pct, x = species_symbol)
  ) +
  ggplot2::geom_col(
    mapping = ggplot2::aes(fill = species_symbol)
    , width = 0.6
    , color = NA, alpha = 0.8
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = lab)
    , color = "black", size = 2.5, vjust = -0.2
  ) +
  ggplot2::scale_fill_manual(values = pal_species) +
  # ggplot2::scale_color_manual(values = c("black","white"), guide = "none") +
  ggplot2::scale_x_discrete(drop = F) +
  ggplot2::scale_y_continuous(
    # breaks = seq(0,1,by=0.2)
    labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.15))
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(dataset_factor)
    # , rows = dplyr::vars(trtmnt_block)
    , ncol = 3
    , axes = "all"
    # , switch = "y"
  ) +
  ggplot2::labs(
    x = "", y = "", fill = ""
    , subtitle = "Predicted Species"
  ) +
  ggplot2::theme_light()+
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7, angle = 45, hjust = 1, vjust = 1)
    , axis.text.y = ggplot2::element_blank()
    , axis.ticks.y = ggplot2::element_blank()
  )

lastly, let’s look at the height distribution within each species since species was predicted based on height and location

predicted_trees %>%
  purrr::list_rbind(names_to = "folder") %>%
  # sf::st_drop_geometry() %>% dplyr::count(folder) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  dplyr::select(
    dataset_factor
    , tidyselect::starts_with("flight_")
    , species_symbol, tree_height_m
  ) %>% 
  # dplyr::glimpse()
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = tree_height_m, x = species_symbol)
  ) +
  ggplot2::geom_violin(
    mapping = ggplot2::aes(fill = species_symbol)
    , width = 0.6
    , color = NA, alpha = 0.8
  ) +
  ggplot2::geom_boxplot(width = 0.1, fill = NA, outliers = F) +
  ggplot2::scale_fill_manual(values = pal_species) +
  # ggplot2::scale_color_manual(values = c("black","white"), guide = "none") +
  ggplot2::scale_x_discrete(drop = F) +
  ggplot2::scale_y_continuous(
    labels = scales::comma
    , breaks = scales::breaks_extended(n=7)
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(dataset_factor)
    # , rows = dplyr::vars(trtmnt_block)
    , ncol = 3
    , axes = "all"
    # , switch = "y"
  ) +
  ggplot2::labs(
    y = "tree height (m)", x = "", fill = ""
    , subtitle = "Predicted Species height distribution"
  ) +
  ggplot2::theme_light()+
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7, angle = 45, hjust = 1, vjust = 1)
    , axis.text.y = ggplot2::element_text(size = 8)
  )

let’s try another way to visualize the height distribution by species

predicted_trees %>%
  purrr::list_rbind(names_to = "folder") %>%
  dplyr::slice_sample(prop = 0.44, by = folder) %>% 
  dplyr::inner_join(
    tracking_df %>% 
      dplyr::select(-c(path))
    , by = "folder"
  ) %>%
  dplyr::select(
    dataset_factor
    , tidyselect::starts_with("flight_")
    , species_symbol, tree_height_m
  ) %>% 
  # dplyr::glimpse()
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = tree_height_m, y = species_symbol)
  ) +
  ggplot2::geom_point(
    mapping = ggplot2::aes(color = species_symbol)
    , alpha = 0.9
    , shape = "|"
  ) +
  ggplot2::scale_color_manual(values = pal_species) +
  # ggplot2::scale_color_manual(values = c("black","white"), guide = "none") +
  ggplot2::scale_y_discrete(drop = F) +
  ggplot2::scale_x_continuous(
    labels = scales::comma
    , breaks = scales::breaks_extended(n=7)
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(dataset_factor)
    # , rows = dplyr::vars(trtmnt_block)
    , ncol = 3
    , axes = "all"
    # , switch = "y"
  ) +
  ggplot2::labs(
    x = "tree height (m)", y = ""
    , subtitle = "Predicted Species height distribution"
  ) +
  ggplot2::theme_light()+
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.y = ggplot2::element_text(size = 7)
    , axis.text.x = ggplot2::element_text(size = 8)
  )

Field Data

let’s check out the reference data species distribution

let’s map the reference species in space

# plt
validation_trees %>% 
  dplyr::arrange(desc(species_symbol)) %>% 
  dplyr::mutate(dataset_factor="Reference Trees") %>% 
  # plot it
  # plt
  ggplot2::ggplot() +
    ggplot2::geom_sf(
      data = stand_boundary
      , fill = NA, color = "blue"
    ) +
    ggplot2::geom_sf(
      mapping = ggplot2::aes(color = species_symbol)
      , size = 1.2, alpha = 0.88
    ) +
    ggplot2::scale_color_manual(values = pal_species) +
    ggplot2::facet_wrap(
      facets = dplyr::vars(dataset_factor)
      , ncol = 3
    ) +
    ggplot2::theme_light() +
    ggplot2::theme(
      legend.position = "top"
      , legend.title = ggplot2::element_blank()
      , legend.key = ggplot2::element_blank()
      , strip.text = ggplot2::element_text(face = "bold", color = "black", margin = ggplot2::margin(t = 4, b = 4))
      , strip.background = ggplot2::element_rect(fill = "gray88", color = "gray88")
      # , panel.background = ggplot2::element_rect(fill = NA, color = "black")
      # , panel.spacing = ggplot2::unit(1,"lines")
      , panel.grid.major = ggplot2::element_blank()
      , panel.grid.minor = ggplot2::element_blank()
      , axis.text = ggplot2::element_blank()
      , axis.title = ggplot2::element_blank()
      , axis.ticks = ggplot2::element_blank()
    ) +
    ggplot2::guides(
      color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 6, alpha = 1, fill = NA))
      , fill = "none"
    )

and table that

dta_temp <- validation_trees %>%
  sf::st_drop_geometry() %>% 
  dplyr::mutate(dataset_factor="Reference Trees") %>% 
  dplyr::select(
    dataset_factor
    , species_symbol
  ) %>% 
  dplyr::count(dplyr::pick(dplyr::everything())) %>% 
  dplyr::group_by(dataset_factor) %>% 
  dplyr::mutate(pct=n/sum(n,na.rm = T)) %>% 
  dplyr::ungroup() 
# plt
dta_temp %>% 
  dplyr::mutate(
    # # one column for diff/pct
    n = paste0(scales::comma(n,accuracy=1), "<br />(", scales::percent(pct,accuracy=0.1), ")")
  ) %>% 
  dplyr::select(-c(dataset_factor,pct)) %>% 
  tidyr::pivot_wider(values_from = n, names_from = species_symbol) %>% 
  # dplyr::glimpse()
  kableExtra::kbl(
    caption = "Reference Species"
    , escape = F
    # , digits = 2
  ) %>% 
  kableExtra::kable_styling()
Reference Species
JUSC2 PIPO PSME PIPU PIEN
1
(0.0%)
5,169
(99.1%)
44
(0.8%)
2
(0.0%)
1
(0.0%)

pretty simple species mix…

plot the proportional distribution

dta_temp %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    lab = ifelse(
      pct>0.0005
      , paste0(
        scales::percent(pct,accuracy=1) 
        ,"\n("
        , scales::comma(n,accuracy=1) 
        , ")"
      )
      , ""
    )
  ) %>% 
  dplyr::ungroup() %>% 
  # dplyr::glimpse()
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = pct, x = species_symbol)
  ) +
  ggplot2::geom_col(
    mapping = ggplot2::aes(fill = species_symbol)
    , width = 0.6
    , color = NA, alpha = 0.8
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = lab)
    , color = "black", size = 2.5, vjust = -0.2
  ) +
  ggplot2::scale_fill_manual(values = pal_species) +
  # ggplot2::scale_color_manual(values = c("black","white"), guide = "none") +
  ggplot2::scale_x_discrete(drop = F) +
  ggplot2::scale_y_continuous(
    # breaks = seq(0,1,by=0.2)
    labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.15))
  ) +
  ggplot2::labs(
    x = "", y = "", fill = ""
    , subtitle = "Reference Species"
  ) +
  ggplot2::theme_light()+
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7, angle = 45, hjust = 1, vjust = 1)
    , axis.text.y = ggplot2::element_blank()
    , axis.ticks.y = ggplot2::element_blank()
  )

Predicted Species Accuracy

accuracy metrics for a species classification model are calculated using a confusion matrix comparing the predicted and reference data

for this demonstration, we’ll limit our evaluation to a single flight since all flights used the same species prediction model and we are evaluating the method moreso than the flight settings when it comes to the species prediction. since we have data across the different flight settings we’ll just use the flight that resulted in the highest F-score noting that the flights were very similar in their detection accuracy

species_comp_temp <- 
  forcats::lvls_union(list(
    validation_trees$species_symbol
    , predicted_trees %>% purrr::list_rbind() %>% dplyr::pull(species_symbol)
  )) %>% 
  dplyr::as_tibble() %>% 
  dplyr::rename(species_symbol=value) %>% 
  # count reference trees
  dplyr::left_join(
    validation_trees %>% 
      sf::st_drop_geometry() %>% 
      dplyr::count(species_symbol) %>% 
      dplyr::mutate(pct=n/sum(n)) %>% 
      dplyr::rename(
        ref_n=n
        , ref_pct=pct
      )
    , by = "species_symbol"
  ) %>% 
  # count pred trees
  dplyr::left_join(
    predicted_trees %>% 
      purrr::pluck(
        agg_ground_truth_match_ans %>% 
          dplyr::slice_max(n=1, order_by = f_score, with_ties = F) %>% 
          dplyr::pull(folder)    
      ) %>% 
      sf::st_drop_geometry() %>% 
      dplyr::count(species_symbol) %>% 
      dplyr::mutate(pct=n/sum(n)) %>% 
      dplyr::rename(
        pred_n=n
        , pred_pct=pct
      )
    , by = "species_symbol"
  ) %>% 
  dplyr::mutate(
    dplyr::across(
      dplyr::where(is.numeric)
      , ~ dplyr::coalesce(.x,0)
    )
    # calc intersecting shared area per species (Schoener's Overlap index component)
    , overlap_pct = pmin(pred_pct, ref_pct)
    , sq_pct_error = (pred_pct - ref_pct)^2
  )
# species_comp_temp  

let’s start by plotting the predicted versus reference counts by species

species_comp_temp %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x=ref_n,y=pred_n,color=species_symbol,label=species_symbol,fontface="bold")
  ) +
  ggplot2::geom_abline() +
  ggplot2::geom_point(size = 3, alpha = 0.88) +
  ggrepel::geom_text_repel(
    size = 4
    , segment.color = NA
    , box.padding = 0.1
    , point.padding = 0
    , force = 0.5
    # , max.overlaps = Inf
  ) +
  ggplot2::scale_color_manual(values = pal_species) +
  ggplot2::scale_x_continuous(
    limits = c(0,max(species_comp_temp$pred_n,species_comp_temp$ref_n))
    , labels = scales::comma
  ) +
  ggplot2::scale_y_continuous(
    limits = c(0,max(species_comp_temp$pred_n,species_comp_temp$ref_n))
    , labels = scales::comma
  ) +
  ggplot2::labs(
    x = "# field trees"
    , y = "# predicted trees"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
      legend.position = "top"
      , legend.title = ggplot2::element_blank()
      , legend.key = ggplot2::element_blank()
    ) +
    ggplot2::guides(
      color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 6, alpha = 1, fill = NA))
      , fill = "none"
    )  

table it

species_comp_temp %>% 
  dplyr::mutate(
    # # one column for diff/pct
    pred_pct = paste0(scales::comma(pred_n,accuracy=1), "<br />(", scales::percent(pred_pct,accuracy=0.1), ")")
    , ref_pct = paste0(scales::comma(ref_n,accuracy=1), "<br />(", scales::percent(ref_pct,accuracy=0.1), ")")
  ) %>% 
  dplyr::select(species_symbol, pred_pct, ref_pct) %>% 
  tidyr::pivot_longer(
    cols = c(pred_pct, ref_pct)
    , names_to = "dta"
    , values_to = "prop"
  ) %>% 
  dplyr::mutate(
    dta = ifelse(dta == "pred_pct", "predicted", "field")
  ) %>% 
  tidyr::pivot_wider(
    names_from = species_symbol
    , values_from = prop
  ) %>% 
  dplyr::rename(`_`=dta) %>% 
  kableExtra::kbl(
    caption = paste0(
      "Species Proportional Comparision"
      , "<br />Total Overlap Index: "
      , sum(species_comp_temp$overlap_pct) %>% scales::percent(accuracy = 0.1 )
      , "<br />Proportion RMSE: "
      , sqrt(mean(species_comp_temp$sq_pct_error)) %>% scales::percent(accuracy = 0.1 )
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::add_header_above(
    c(" " = 1, "Species Tree Counts and Proportions" = nrow(species_comp_temp))
  )
Species Proportional Comparision
Total Overlap Index: 74.0%
Proportion RMSE: 10.2%
Species Tree Counts and Proportions
_ ABCO JUSC2 PIFL2 PIPO POTR5 PSME QUGA PIPU PIEN
predicted 10
(0.2%)
0
(0.0%)
5
(0.1%)
4,140
(73.2%)
750
(13.3%)
569
(10.1%)
185
(3.3%)
0
(0.0%)
0
(0.0%)
field 0
(0.0%)
1
(0.0%)
0
(0.0%)
5,169
(99.1%)
0
(0.0%)
44
(0.8%)
0
(0.0%)
2
(0.0%)
1
(0.0%)

we can plot the predicted vs reference proportional distribution by species

species_comp_temp %>% 
  dplyr::select(species_symbol, pred_pct, ref_pct) %>% 
  tidyr::pivot_longer(
    cols = c(pred_pct, ref_pct)
  ) %>% 
  dplyr::mutate(name = ifelse(name == "pred_pct", "predicted", "field")) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = species_symbol, y = value, fill = name, group = name)
  ) +
  ggplot2::geom_area(position = "identity", alpha = 0.4) +
  ggplot2::geom_line(
    mapping = ggplot2::aes(color = name)
    , linewidth = 1
  ) +
  ggplot2::scale_y_continuous(labels = scales::percent, breaks = scales::breaks_extended(n=9)) +
  harrypotter::scale_color_hp_d(option = "hermionegranger") +
  harrypotter::scale_fill_hp_d(option = "hermionegranger") +
  ggplot2::labs(
    subtitle = "Relative Species Abundance"
    , x = ""
    , y = "relative proportion of total"
    , fill = "", color = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(legend.position = "top")

and the proportion of total

species_comp_temp %>% 
  dplyr::select(species_symbol, pred_pct, ref_pct) %>% 
  tidyr::pivot_longer(
    cols = c(pred_pct, ref_pct)
  ) %>% 
  dplyr::mutate(
    lab = ifelse(
      value>0.05
      , paste0(
        species_symbol
        ,"\n("
        , scales::percent(value,accuracy=1)
        , ")"
      )
      , ""
    )
  ) %>% 
  dplyr::mutate(
    name = ifelse(name == "pred_pct", "predicted", "field")
    , species_symbol = forcats::fct_reorder(species_symbol, value, .fun = sum)
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = name, y = value, fill = species_symbol)
  ) +
  ggplot2::geom_col(width = 0.5, color = NA, alpha = 0.9) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(label = lab, group = species_symbol, fontface = "bold")
    # , color = "black"
    , size = 2.5
    , position = ggplot2::position_stack(vjust = 0.5)
  ) +
  ggplot2::scale_y_continuous(labels = scales::percent, breaks = scales::breaks_extended(n=9)) +
  ggplot2::scale_fill_manual(values = pal_species) +
  ggplot2::labs(
    subtitle = "Species Composition Comparison"
    , x = ""
    , y = ""
    , fill = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(legend.position = "top")