Section 6 Data Fusion

We’ll now test and approach that uses both aerial point cloud data (for structural information) and RGB imagery (for spectral information) which is a data fusion approach. First, we’ll identify initial candidate slash piles based on their structural form using our raster-based watershed segmentation approach. Then, we’ll use the RGB imagery to filter these candidates spectrally: relevant RGB vegetation indices such as Green Red Vegetation Index (GRVI), Red Green Ratio Index (RGRI), Visible Band-Difference Vegetation Index (VDVI), Red Green Ratio Index (RGRI), Red Green Blue Vegetation Index (RGBVI), and Excess Green (ExG) are calculated for each candidate segment, and thresholds are applied to remove those exhibiting high greenness. Index thresholds tested include those found to perform well in distinguising green vegetation in previous research (Motohka et al. 2010; Wang et al. 2025; Riehle et al. 2020). Filtering candidate segments using spectral data will enhance our method’s ability to distinguish non-photosynthetic slash piles from living vegetation (e.g., small trees or shrubs) that might share similar structural profiles.

While RGB vegetation indices are good for separating green biomass, differentiating between various shades of brown, black, and white of non-vegetated surfaces like slash piles, rocks, and bare soil can be more challenging. Many common rock-forming minerals are “spectrally featureless” in the visible range, often appearing pale grey to white (Harris et al. 2010). As such, we will attempt to filter out very dark (black) or very light (white) boulders from the reddish-brown tones common of wood in slash piles using RGB indices that focus on overall brightness and color purity (Kior et al. 2024). The Brightness Index (BI) can help us exclude extremely dark (black; low BI) or bright (white; high BI) objects. Setting a minimum threshold for saturation (i.e. color purity) can remove achromatic features (like grey, black, or white objects that have very low saturation values close to 0) to help us isolate more chromatic (brown) slash piles.

This data fusion methodology leverages the complementary strengths of both data types, using 3D geometry for initial object segmentation and 2D spectral data to refine detections by excluding green biomass and potential rock and soil features that appear very dark or very light, leading to more accurate slash pile identification.

6.1 RGB Indices

For these formulas, \(R\), \(G\), and \(B\) represent the raw pixel values for the Red, Green, and Blue bands, respectively. For indices that use normalized values, \(r\), \(g\), and \(b\) are defined as:

\(r = \frac{R}{R+G+B}\) \(g = \frac{G}{R+G+B}\) \(b = \frac{B}{R+G+B}\)

Green Red Vegetation Index (GRVI)

\[GRVI = \frac{G - R}{G + R}\]

Red Green Ratio Index (RGRI)

\[RGRI = \frac{R}{G}\]

Visible Band-Difference Vegetation Index (VDVI)

\[VDVI = \frac{2G - R - B}{2G + R + B}\]

Red Green Blue Vegetation Index (RGBVI)

\[RGBVI = \frac{G^2 - (B \cdot R)}{G^2 + (B \cdot R)}\]

Excess Green (ExG)

\[ExG = 2g - r - b = \frac{2G - R - B}{R+G+B}\]

Excessive Red (ExR) \[ExR = 1.4r - g = \frac{1.4R - G}{R+G+B}\]

Excess Green-Excess Red (ExGR)

\[ExGR = 3g - 2.4r - b = \frac{3G - 2.4R - B}{R+G+B}\]

Brightness Index (BI)

\[BI = \sqrt{\frac{R^2+G^2+B^2}{3}}\]

Saturation

\[Saturation = \frac{\max(R,G,B) - \min(R,G,B)}{\max(R,G,B)}\]

6.1.1 Spectral Index Functions

Although there are dedicated R packages to calculate various spectral indicies (e.g. RStoolbox [Muller et al. 2025]), we’ll make our own to ensure data quality checks and to limit the spectral indices to those highlighted above.

# check raster bands and extract rgb layers
check_raster_bands <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  # convert to SpatRaster if input is from 'raster' package
  if(
    inherits(rast, "RasterStack") 
    || inherits(rast, "RasterBrick")
  ){
    rast <- terra::rast(rast)
  }else if(!inherits(rast, "SpatRaster")){
    stop("Input 'rast' must be a SpatRaster from the `terra` package")
  }

  # check if band indices are valid
  num_bands <- terra::nlyr(rast)
  # let 999999 be a cheat code
  cheat_code <- 999999
  
  if(
    ( red_band_idx!=cheat_code && red_band_idx > num_bands )
    || ( green_band_idx!=cheat_code && green_band_idx > num_bands )
    || ( blue_band_idx!=cheat_code &&  blue_band_idx > num_bands )
    || ( red_band_idx!=cheat_code && red_band_idx < 1 )
    || ( green_band_idx!=cheat_code && green_band_idx < 1 )
    || ( blue_band_idx!=cheat_code &&  blue_band_idx < 1 )
    || length(unique(c(red_band_idx,green_band_idx,blue_band_idx)))!=3
  ){
    stop("Invalid band index provided. Band indices must correspond to existing, unique layers in the raster object.")
  }

  # extract bands
  if(red_band_idx!=cheat_code){ R <- rast[[red_band_idx]] }else{ R <- rast[[1]] }
  if(green_band_idx!=cheat_code){ G <- rast[[green_band_idx]] }else{ G <- rast[[1]] }
  if(blue_band_idx!=cheat_code){ B <- rast[[blue_band_idx]] }else{ B <- rast[[1]] }

  return(list(R = R, G = G, B = B))
}
# check_raster_bands(ortho_rast, red_band_idx=1, green_band_idx=3, blue_band_idx = 999999)

# calculate green red vegetation index (GRVI)
# (G - R) / (G + R)
spectral_index_grvi <- function(rast, red_band_idx, green_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx = 999999) # blue_band_idx is dummy here
  R <- bands$R
  G <- bands$G

  grvi <- (G - R) / (G + R)
  names(grvi) <- "grvi"
  return(grvi)
}
# spectral_index_grvi(ortho_rast,red_band_idx = 1, green_band_idx = 2) %>% terra::plot()

# red green ratio index (RGRI)
# R/G
spectral_index_rgri <- function(rast, red_band_idx, green_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx = 999999) # blue_band_idx is dummy here
  R <- bands$R
  G <- bands$G

  rgri <- R/G
  names(rgri) <- "rgri"
  return(rgri)
}
# spectral_index_grvi(ortho_rast,red_band_idx = 1, green_band_idx = 2) %>% terra::plot()

# calculate visible band-difference vegetation index (VDVI)
# (2G - R - B) / (2G + R + B)
spectral_index_vdvi <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  vdvi <- (2 * G - R - B) / (2 * G + R + B)
  names(vdvi) <- "vdvi"
  return(vdvi)
}
# spectral_index_vdvi(ortho_rast,red_band_idx = 1, green_band_idx = 2, blue_band_idx = 3) %>% terra::plot()

# calculate red green blue vegetation index (RGBVI)
# (G^2 - (B * R)) / (G^2 + (B * R))
spectral_index_rgbvi <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  rgbvi <- (G^2 - (B * R)) / (G^2 + (B * R))
  names(rgbvi) <- "rgbvi"
  return(rgbvi)
}
# spectral_index_rgbvi(ortho_rast,red_band_idx = 1, green_band_idx = 2, blue_band_idx = 3) %>% terra::plot()

# calculate excess green (ExG)
# (2G - R - B) / (R + G + B) (using normalized RGB values)
# 2G - R - B (using raw values, then normalized by sum of R+G+B)
spectral_index_exg <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  # Calculate normalized RGB values
  sum_rgb <- R + G + B
  r_norm <- R / sum_rgb
  g_norm <- G / sum_rgb
  b_norm <- B / sum_rgb

  exg <- (2 * g_norm - r_norm - b_norm)
  names(exg) <- "exg"
  return(exg)
}
# spectral_index_exg(ortho_rast,red_band_idx = 1, green_band_idx = 2, blue_band_idx = 3) %>% terra::plot()

# calculate brightness index (BI)
# sqrt((R^2 + G^2 + B^2) / 3)
spectral_index_bi <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  bi <- sqrt((R^2 + G^2 + B^2) / 3)
  # normalize to 0-1 range
  max_brightness = max(
    terra::minmax(R)[2]
    , terra::minmax(G)[2]
    , terra::minmax(B)[2]
    , na.rm = T
  )
  
  bi <- bi/max_brightness
  names(bi) <- "bi"
  return(bi)
}
# spectral_index_bi(ortho_rast,red_band_idx = 1, green_band_idx = 2, blue_band_idx = 3) %>% terra::plot()

# calculate excessive red (ExR)
# 1.4r - g, where r and g are normalized RGB values.
spectral_index_exr <- function(rast, red_band_idx, green_band_idx, blue_band_idx){
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  # Calculate normalized RGB values
  sum_rgb <- R + G + B
  r_norm <- R / sum_rgb
  g_norm <- G / sum_rgb

  exr <- (1.4 * r_norm - g_norm)
  names(exr) <- "exr"
  return(exr)
}

#' calculate excess green-excess red (ExGR)
#' 3g - 2.4r - b, where r, g, b are normalized RGB values.
#' equivalent to ExG - ExR.
spectral_index_exgr <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  # Calculate normalized RGB values
  sum_rgb <- R + G + B
  r_norm <- R / sum_rgb
  g_norm <- G / sum_rgb
  b_norm <- B / sum_rgb

  exgr <- (3 * g_norm - 2.4 * r_norm - b_norm)
  names(exgr) <- "exgr"
  return(exgr)
}

# calculate saturation (SAT)
# (max(R,G,B) - min(R,G,B)) / max(R,G,B)
spectral_index_saturation <- function(rast, red_band_idx, green_band_idx, blue_band_idx) {
  bands <- check_raster_bands(rast, red_band_idx, green_band_idx, blue_band_idx)
  R <- bands$R
  G <- bands$G
  B <- bands$B

  max_rgb <- max(R, G, B)
  min_rgb <- min(R, G, B)
  sat <- (max_rgb - min_rgb) / max_rgb
  names(sat) <- "sat"
  return(sat)
}
# spectral_index_saturation(ortho_rast,red_band_idx = 1, green_band_idx = 2, blue_band_idx = 3) %>% terra::plot()

# calculate all indices
calculate_all_rgb_indices <- function(raster_obj, red_band_idx, green_band_idx, blue_band_idx) {
  # call individual index functions
  grvi_layer <- spectral_index_grvi(raster_obj, red_band_idx, green_band_idx)
  rgri_layer <- spectral_index_rgri(raster_obj, red_band_idx, green_band_idx)
  vdvi_layer <- spectral_index_vdvi(raster_obj, red_band_idx, green_band_idx, blue_band_idx)
  rgbvi_layer <- spectral_index_rgbvi(raster_obj, red_band_idx, green_band_idx, blue_band_idx)
  exg_layer <- spectral_index_exg(raster_obj, red_band_idx, green_band_idx, blue_band_idx)
  exr_layer <- spectral_index_exr(raster_obj, red_band_idx, green_band_idx, blue_band_idx)
  exgr_layer <- spectral_index_exgr(raster_obj, red_band_idx, green_band_idx, blue_band_idx)
  bi_layer <- spectral_index_bi(raster_obj, red_band_idx, green_band_idx, blue_band_idx)
  sat_layer <- spectral_index_saturation(raster_obj, red_band_idx, green_band_idx, blue_band_idx)

  # stack all calculated indices into a single spatraster
  all_indices <- c(
    grvi_layer
    , rgri_layer
    , vdvi_layer
    , rgbvi_layer
    , exg_layer
    , exr_layer
    , exgr_layer
    , bi_layer
    , sat_layer
  )

  return(all_indices)
}

let’s test this calculate_all_rgb_indices() function with our orthomosaic

# calculate_all_rgb_indices
all_rgb_indices_rast <- calculate_all_rgb_indices(ortho_rast,red_band_idx = 1, green_band_idx = 2, blue_band_idx = 3)
# huh?
all_rgb_indices_rast %>% names()
## [1] "grvi"  "rgri"  "vdvi"  "rgbvi" "exg"   "exr"   "exgr"  "bi"    "sat"

let’s plot all of those indices for the entire extent of our orthomoasic

# plot
terra::plot(
  all_rgb_indices_rast
  , nc = 3
  # , nr = 3
  , mar = c(0.5,0.5,2,0.5)
  , axes = FALSE
  , legend = F
)

we can also look at the correlation between the different indices

# investigate correlation among covariates
all_rgb_indices_rast %>% 
  terra::pairs(
    maxcells = min(11111, terra::ncell(all_rgb_indices_rast)*.01)
  )

many of these spectral indices are highly (even perfectly) correlated, meaning they provide redundant information. to streamline our method and ensure we utilize unique spectral insights for distinguishing slash piles from the surrounding terrain and other objects, we will select only one index from each correlated group. GRVI, RGRI, and ExR are highly correlated, as are VDVI, RGBVI, and ExG. we’ll choose one from each of these sets.

6.1.2 Spectral Index of Polygons

we now need a function to crop the raster with all spectral indices given a polygon input data and return the spectral index values as columns attached to the polygon

extract_rast_values <- function(sf_data, rast, fun_agg = mean) {
  # checks
  if(!inherits(rast, "SpatRaster")){
    stop("Input `rast` must be a SpatRaster object.")
  }
  if(!inherits(sf_data, "sf")){
    stop("Input `sf_data` must be an sf data frame.")
  }
  if(!all(sf::st_geometry_type(sf_data) %in% c("POLYGON", "MULTIPOLYGON"))) {
    stop("Input `sf_data` must contain polygon geometries.")
  }
  if(!is.function(fun_agg)) {
    stop("Argument `fun_agg` must be a function (e.g., mean, median, sum).")
  }

  # crs
  sf_data <- sf_data %>% sf::st_transform(terra::crs(rast))

  # extract values for each layer within each polygon
  extracted_values <- terra::extract(
    x = rast
    , y = sf_data
    , fun = fun_agg
    , na.rm = TRUE
  )

  # clean data
  fun_name <- deparse(substitute(fun_agg))
  extracted_values <- extracted_values %>% 
    dplyr::select(-ID) %>% 
    dplyr::rename_with(
      ~ paste0(
        "rast_"
        # , fun_name ### if we want to have custom output depending on the fun_agg
        , "agg"
        , "_"
        , .x
        , recycle0 = TRUE
      )
    )

  # Merge the extracted values back to the original sf data frame
  # The row order is preserved by terra::extract, so a direct cbind is safe
  # if no rows were dropped due to spatial mismatch.
  # For robustness, we can explicitly join by row ID if needed, but for simple cases, cbind works.
  # Assuming sf_data has a unique ID column or row order is stable:
  sf_data_with_indices <- sf_data %>% dplyr::bind_cols(extracted_values)

  return(sf_data_with_indices)
}
# extract_rast_values(slash_piles_polys, rast = ortho_rast) %>% dplyr::glimpse()

6.2 Ground Truth Pile Spectral Summary

let’s calculate the various spectral indices on our ground truth slash pile polygons by getting the median value within the bounds of the pile

# extract_rast_values
rgb_indices_df <- extract_rast_values(slash_piles_polys, rast = all_rgb_indices_rast, fun_agg = median)
# quick summary
rgb_indices_df %>% 
  sf::st_drop_geometry() %>% 
  dplyr::select(tidyselect::starts_with("rast_agg_")) %>%
  summary()
##  rast_agg_grvi       rast_agg_rgri    rast_agg_vdvi      rast_agg_rgbvi     
##  Min.   :-0.044498   Min.   :0.7723   Min.   :-0.03340   Min.   :-0.057555  
##  1st Qu.:-0.018684   1st Qu.:0.9955   1st Qu.:-0.01230   1st Qu.:-0.023593  
##  Median :-0.006677   Median :1.0134   Median : 0.00838   Median : 0.019351  
##  Mean   :-0.002969   Mean   :1.0073   Mean   : 0.00156   Mean   : 0.005411  
##  3rd Qu.: 0.002257   3rd Qu.:1.0381   3rd Qu.: 0.01486   3rd Qu.: 0.031555  
##  Max.   : 0.128458   Max.   :1.0931   Max.   : 0.02562   Max.   : 0.053403  
##   rast_agg_exg        rast_agg_exr     rast_agg_exgr       rast_agg_bi     
##  Min.   :-0.044049   Min.   :0.02682   Min.   :-0.20026   Min.   :0.05338  
##  1st Qu.:-0.016327   1st Qu.:0.13408   1st Qu.:-0.15482   1st Qu.:0.32600  
##  Median : 0.011205   Median :0.14114   Median :-0.13148   Median :0.48446  
##  Mean   : 0.002176   Mean   :0.13711   Mean   :-0.13468   Mean   :0.41320  
##  3rd Qu.: 0.019909   3rd Qu.:0.15109   3rd Qu.:-0.11505   3rd Qu.:0.51972  
##  Max.   : 0.034451   Max.   :0.17660   Max.   :-0.03147   Max.   :0.59641  
##   rast_agg_sat    
##  Min.   :0.03437  
##  1st Qu.:0.06765  
##  Median :0.08387  
##  Mean   :0.09973  
##  3rd Qu.:0.11523  
##  Max.   :0.37712

let’s plot it

# pivot
agg_df_temp <- rgb_indices_df %>% 
  sf::st_drop_geometry() %>% 
  dplyr::select(tidyselect::starts_with("rast_agg_")) %>% 
  tidyr::pivot_longer(cols = dplyr::everything()) %>% 
  dplyr::mutate(name = stringr::str_remove_all(name,"rast_agg_") %>% stringr::str_to_upper())
# plot
ggplot2::ggplot() +
  ggplot2::geom_density(
    data = agg_df_temp
    , mapping = ggplot2::aes(x = value, fill = name)
    , color = NA, alpha = 0.8
  ) +
  ggplot2::geom_vline(
    data = agg_df_temp %>% dplyr::group_by(name) %>% dplyr::mutate(value=median(value,na.rm=T))
    , mapping = ggplot2::aes(xintercept = value, color = "median")
    , linetype = "solid", lwd = 1
  ) +
    ggplot2::geom_vline(
    data = agg_df_temp %>% dplyr::group_by(name) %>% dplyr::mutate(value=quantile(value,na.rm=T,probs=0.025))
    , mapping = ggplot2::aes(xintercept = value, color = "p95")
    , linetype = "solid", lwd = 1
  ) +
  ggplot2::geom_vline(
    data = agg_df_temp %>% dplyr::group_by(name) %>% dplyr::mutate(value=quantile(value,na.rm=T,probs=(1-0.025)))
    , mapping = ggplot2::aes(xintercept = value, color = "p95")
    , linetype = "solid", lwd = 1
  ) +
  ggplot2::scale_fill_viridis_d(option = "turbo", begin = 0.1, end = 0.9, alpha = 0.7) +
  # ggplot2::scale_fill_brewer(palette = "Set2") +
  ggplot2::scale_color_manual(values = c("gray22","gray","gray")) +
  ggplot2::scale_x_continuous(breaks = scales::breaks_extended(8)) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(name)
    , ncol = 3
    , scales = "free"
  ) +
  ggplot2::labs(
    x = "", y = "", color = "", fill = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.text.y = ggplot2::element_blank()
  ) + 
  ggplot2::guides(fill = "none")

that’s interesting. let’s see how those values compare with the thresholds identified in the research to identify green crops, trees, and shrubs using RGB spectral indicies

Index Likely Crop Likely Tree Likely Shrub Source
VDVI x > 0.06 x > 0.04 x > 0.03 Wang et al. 2025
RGRI x < 0.63 x < 0.70 x < 0.52 Wang et al. 2025
RGBVI x > -20.0 x > -22.0 x > -18.0 Wang et al. 2025
ExG x > 58.0 x > 59.5 x > 59.0 Wang et al. 2025
GRVI x > 0.0 x > 0.0 Motohka et al. 2010
ExGR x > 0.0 x > 0.0 x > 0.0 Riehle et al. 2020
  • VDVI: setting a threshold to remove candidate slash piles with VDVI>0.03 should work well
  • RGRI: setting a threshold to remove candidate slash piles with RGRI<0.70 should work well
  • RGBVI: setting a threshold to remove candidate slash piles with RGBVI>-18.0 should not work well
    • There appears to be a mis-match between the range of values we calculated ([-0.72, 1.00]) and their range ([-50, 78])
  • ExG: setting a threshold to remove candidate slash piles with ExG>59.5 should not work well
    • There appears to be a mis-match between the range of values we calculated ([-1.00, 2.00]) and their range ([0, 150])
  • GRVI: setting a threshold to remove candidate slash piles with GRVI>0.0 should work moderately well
    • We may need to raise this threshold to remove candidate slash piles with GRVI>0.05, for example
  • ExGR: setting a threshold to remove candidate slash piles with ExGR>0.0 should work well
  • BI: setting a threshold to remove candidate slash piles that are extremely dark (BI<0.10) or bright (BI>0.90) should work well
  • Saturation: setting a threshold to remove candidate slash piles that have low saturation values (SAT<0.05) should work well

Based on these thresholds identified in the literature and our objective to remove very highly correlated indices, we’ll proceed with the following five indices to refine the structurally detected candidate slash piles:

  1. RGRI
  2. VDVI
  3. ExGR
  4. BI
  5. Saturation (SAT)

6.2.1 Voting System

let’s consider a voting system approach for filtering candidate slash piles using the multiple spectral indices. a voting system could allow for a more robust and nuanced decision than relying on a single index.

let’s make a highly specialized function using the data returned by out extract_rast_values() function

rgb_indices_threshold_voting <- function(
  rgb_indices_df
  # define ranges to *keep* piles
  , th_rgri = c((0.7+0.01),Inf) # increase each by 0.01 since we'll be checking lower<=x<=upper
  , th_vdvi = c(-Inf,(0.03+0.01)) # increase each by 0.01 since we'll be checking lower<=x<=upper
  , th_exgr = c(-Inf,0)
  , th_bi = c(0.1,0.9)
  , th_sat = c(0.05,Inf)
){

  # checks
  if(!inherits(rgb_indices_df, "data.frame")){
      stop("Input `rgb_indices_df` must be an data.frame.")
  }
  # names
  agg_cols <- c("rast_agg_exgr","rast_agg_rgri","rast_agg_vdvi","rast_agg_bi","rast_agg_sat") # "rast_agg_rgbvi",
  nm_diff <- base::setdiff(
    agg_cols
    , names(rgb_indices_df)
  )
  if(length(nm_diff)>0){
    stop(paste0("required variables missing:\n", "... ", paste(nm_diff, collapse = ", ") ))
  }
  # thresholds
  if(length(th_exgr)!=2 || th_exgr[1]>th_exgr[2]){
    stop("Input `th_exgr` must be of length 2 as: c(lower,upper) defining the range of values to keep where lower<=upper")
  }
  if(length(th_rgri)!=2 || th_rgri[1]>th_rgri[2]){
    stop("Input `th_rgri` must be of length 2 as: c(lower,upper) defining the range of values to keep where lower<=upper")
  }
  if(length(th_vdvi)!=2 || th_vdvi[1]>th_vdvi[2]){
    stop("Input `th_vdvi` must be of length 2 as: c(lower,upper) defining the range of values to keep where lower<=upper")
  }
  if(length(th_bi)!=2 || th_bi[1]>th_bi[2]){
    stop("Input `th_bi` must be of length 2 as: c(lower,upper) defining the range of values to keep where lower<=upper")
  }
  if(length(th_sat)!=2 || th_sat[1]>th_sat[2]){
    stop("Input `th_sat` must be of length 2 as: c(lower,upper) defining the range of values to keep where lower<=upper")
  }
  
  # get rid of columns we'll create
  rgb_indices_df <- rgb_indices_df %>%
    # throw in hey_xxxxxxxxxx to test it works if we include non-existant columns
    dplyr::select( -dplyr::any_of(c(
      "hey_xxxxxxxxxx"
      , "inrange_th_exgr"
      , "inrange_th_rgri"
      , "inrange_th_vdvi"
      , "inrange_th_bi"
      , "inrange_th_sat"
    )))
  
  # check threshold
  ret_df <- rgb_indices_df %>% 
    # dplyr::select(dplyr::all_of(agg_cols)) %>% 
    dplyr::mutate(
      inrange_th_exgr = ifelse(
          !is.na(rast_agg_exgr)
          & rast_agg_exgr >= th_exgr[1]
          & rast_agg_exgr <= th_exgr[2]
          , 1, 0
        )
      , inrange_th_rgri = ifelse(
          !is.na(rast_agg_rgri)
          & rast_agg_rgri >= th_rgri[1]
          & rast_agg_rgri <= th_rgri[2]
          , 1, 0
        )
      , inrange_th_vdvi = ifelse(
          !is.na(rast_agg_vdvi)
          & rast_agg_vdvi >= th_vdvi[1]
          & rast_agg_vdvi <= th_vdvi[2]
          , 1, 0
        )
      , inrange_th_bi = ifelse(
          !is.na(rast_agg_bi)
          & rast_agg_bi >= th_bi[1]
          & rast_agg_bi <= th_bi[2]
          , 1, 0
        )
      , inrange_th_sat = ifelse(
          !is.na(rast_agg_sat)
          & rast_agg_sat >= th_sat[1]
          & rast_agg_sat <= th_sat[2]
          , 1, 0
        )
    ) %>% 
    dplyr::rowwise() %>%
    dplyr::mutate(
      inrange_th_votes = sum(
        dplyr::c_across(tidyselect::starts_with("inrange_th_"))
        , na.rm = T
      ) %>% 
      dplyr::coalesce(0)
    ) %>%
    ungroup()
  
  #return
  return(ret_df)
}

let’s look at the columns we get from our rgb_indices_threshold_voting() function

rgb_indices_df <- rgb_indices_threshold_voting(rgb_indices_df=rgb_indices_df)
# huh?
rgb_indices_df %>% 
  sf::st_drop_geometry() %>% 
  dplyr::select(tidyselect::starts_with("inrange_th_")) %>% 
  summary()
##  inrange_th_exgr inrange_th_rgri inrange_th_vdvi inrange_th_bi  
##  Min.   :1       Min.   :1       Min.   :1       Min.   :0.000  
##  1st Qu.:1       1st Qu.:1       1st Qu.:1       1st Qu.:1.000  
##  Median :1       Median :1       Median :1       Median :1.000  
##  Mean   :1       Mean   :1       Mean   :1       Mean   :0.929  
##  3rd Qu.:1       3rd Qu.:1       3rd Qu.:1       3rd Qu.:1.000  
##  Max.   :1       Max.   :1       Max.   :1       Max.   :1.000  
##  inrange_th_sat   inrange_th_votes
##  Min.   :0.0000   Min.   :4.000   
##  1st Qu.:1.0000   1st Qu.:5.000   
##  Median :1.0000   Median :5.000   
##  Mean   :0.9126   Mean   :4.842   
##  3rd Qu.:1.0000   3rd Qu.:5.000   
##  Max.   :1.0000   Max.   :5.000

notice the “Mean” value in the summary above is the proportion of ground truth piles that successfully met the spectral index threshold criteria (i.e. piles to be “kept”). it looks like the SAT threshold was most unaligned with these ground truth piles, something we anticipated by looking at the distributions above compared the threshold value recommended in the literature for detecting green vegetation.

let’s look at the proportional distribution of ground truth piles meeting the threshold by individual spectral index

rgb_indices_df %>% 
  sf::st_drop_geometry() %>% 
  dplyr::select(tidyselect::starts_with("inrange_th_")) %>% 
  tidyr::pivot_longer(cols = dplyr::everything()) %>% 
  dplyr::mutate(name = stringr::str_remove_all(name,"inrange_th_") %>% stringr::str_to_upper()) %>% 
  dplyr::filter(name!="VOTES") %>% 
  dplyr::count(name,value) %>% 
  dplyr::group_by(name) %>% 
  dplyr::mutate(
    pct = n/sum(n)
    , value = factor(value, levels = 0:1, labels = c("outside threshold","within threshold"), ordered = T)
    , lab = paste0(scales::percent(pct,accuracy=0.1), "\n(n=", scales::comma(n,accuracy=1), ")")
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, y = pct, label = lab, fill = value)
  ) +
  ggplot2::geom_col(
    width = 0.6
    , color = NA, alpha = 0.8
  ) +
  ggplot2::geom_text(color = "black", size = 3, vjust = -0.2) +
  ggplot2::scale_fill_viridis_d(option = "magma", begin = 0.3, end = 0.7) +
  ggplot2::scale_y_continuous(
    breaks = seq(0,1,by=0.2)
    , labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.2))
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(name)
    , ncol = 2
  ) +
  ggplot2::labs(
    x = "", y = "", color = "", fill = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 12)
    , axis.text.y = ggplot2::element_blank()
  )

and let’s look at the distribution of ground truth piles based on the number of individual spectral index thresholds met. we’ll use this count as our voting system.

dplyr::tibble(value=0:5) %>% 
  dplyr::left_join(
    rgb_indices_df %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(tidyselect::starts_with("inrange_th_votes")) %>% 
      tidyr::pivot_longer(cols = dplyr::everything()) %>% 
      dplyr::mutate(name = stringr::str_remove_all(name,"inrange_th_") %>% stringr::str_to_upper()) %>% 
      dplyr::count(name,value)
    , by = dplyr::join_by(value)
  ) %>% 
  dplyr::mutate(
    n = dplyr::coalesce(n,0)
    , pct = n/sum(n)
    , lab = paste0(scales::percent(pct,accuracy=0.1), "\n(n=", scales::comma(n,accuracy=1), ")")
    , value = factor(value)
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, y = pct, label = lab, fill = value)
  ) +
  ggplot2::geom_col(
    width = 0.6
    , color = NA, alpha = 0.8
  ) +
  ggplot2::geom_text(color = "black", size = 4, vjust = -0.2) +
  ggplot2::scale_fill_viridis_d(option = "mako", direction=-1) +
  ggplot2::scale_y_continuous(
    labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.15))
  ) +
  ggplot2::labs(
    y = "", x = "spectral index threshold votes", color = "", fill = ""
    , subtitle = "distribution of ground truth piles meeting spectral index thresholds"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , axis.text.x = ggplot2::element_text(size = 12)
    , axis.text.y = ggplot2::element_blank()
  )

we can use this spectral index voting system to filter candidate slash piles with a user-defined parameter which defines the sensitivity of filtering based on the spectral information. For example, a value of “5” would heavily weight the spectral information in determining which piles to keep while a value of “1” would put less weight on the spectral data.

let’s see what some of those piles with the fewest votes look like on the imagery

# filter for the plots with the fewest votes
dta_temp <- rgb_indices_df %>% 
  dplyr::filter(inrange_th_votes<=4) %>% 
  dplyr::slice_sample(n=12) %>% 
  sf::st_transform(terra::crs(ortho_rast))
# dta_temp %>% sf::st_drop_geometry() %>% dplyr::select(tidyselect::starts_with("rast_agg")) %>% summary()
# plot on ortho
plts_temp <-
  1:nrow(dta_temp) %>%
  purrr::map(function(x){
    dta <- dta_temp %>% dplyr::slice(x)
    dta <- dta %>% 
      dplyr::bind_cols(
        dta %>% 
          sf::st_drop_geometry() %>% 
          dplyr::select(tidyselect::starts_with("inrange_th_")) %>% 
          tidyr::pivot_longer(cols = dplyr::everything()) %>% 
          dplyr::mutate(name = stringr::str_remove_all(name,"inrange_th_") %>% stringr::str_to_upper()) %>% 
          dplyr::filter(name!="VOTES" & value == 0) %>% 
          dplyr::select(-value) %>% 
          tidyr::pivot_wider(values_from = name) %>% 
          dplyr::rowwise() %>% 
          dplyr::mutate(
            lab = paste(
                dplyr::c_across(dplyr::everything())
                , collapse = ", "
              )
          ) %>% 
          dplyr::select(lab)
      )
    ortho_plt_fn(dta) +
      ggplot2::geom_sf(data = dta, fill = NA, color = "white", lwd = 0.6) +
      ggplot2::labs(subtitle = paste0("Outside Threshold:\n",dta$lab)) + 
      ggplot2::theme(plot.subtitle = ggplot2::element_text(size = 9))  
  })
# combine
patchwork::wrap_plots(
  plts_temp
  , ncol = 4
)

all of those piles would have been voted out based on SAT or BI since they are either in shadows or the dead wood appears very bright/white (perhaps we need to adjust those thresholds?). let’s see a summary of those spectral index values for these piles with the fewest votes

dta_temp %>% sf::st_drop_geometry() %>% dplyr::select(rast_agg_bi, rast_agg_sat) %>% summary()
##   rast_agg_bi       rast_agg_sat    
##  Min.   :0.05338   Min.   :0.03437  
##  1st Qu.:0.36397   1st Qu.:0.04338  
##  Median :0.50831   Median :0.04702  
##  Mean   :0.40218   Mean   :0.10587  
##  3rd Qu.:0.52536   3rd Qu.:0.09826  
##  Max.   :0.57442   Max.   :0.37712

6.3 Candidate Polygon Spectral Filtering Function

let’s put all of this together to define a function that takes as input: 1) a spatial data frame of candidate polygons; 2) a raster with RGB spectral data; 3) user-defined spectral weighting (voting system)

polygon_spectral_filtering <- function(
  sf_data
  , rgb_rast
  # define the band index
  , red_band_idx
  , green_band_idx
  , blue_band_idx
  # spectral weighting
  , spectral_weight = 3
) {
  ### could make these parameters
  th_rgri <- c((0.7+0.01),Inf) # increase each by 0.01 since we'll be checking lower<=x<=upper
  th_vdvi <- c(-Inf,(0.03+0.01)) # increase each by 0.01 since we'll be checking lower<=x<=upper
  th_exgr = c(-Inf,0)
  th_bi <- c(0.1,0.9)
  th_sat <- c(0.05,Inf)
  
  # checks
  if(!inherits(rgb_rast, "SpatRaster")){
    stop("Input `rast` must be a SpatRaster object.")
  }
  if(!inherits(sf_data, "sf")){
    stop("Input `sf_data` must be an sf data frame.")
  }
  if(!all(sf::st_geometry_type(sf_data) %in% c("POLYGON", "MULTIPOLYGON"))) {
    stop("Input `sf_data` must contain polygon geometries.")
  }
  spectral_weight <- as.numeric(spectral_weight)
  if( !(spectral_weight %in% c(0:5)) ){
    stop("Input `spectral_weight` must be a number between 0 (no filtering based on spectral) and 5 (highest weighting of spectral data)")
  }
  # if you don't want to do it, then why do it?
  if(spectral_weight==0){
    return(sf_data)
  }
  ##################################################
  # calculate_all_rgb_indices
  ##################################################
  all_rgb_indices_rast <- calculate_all_rgb_indices(
    raster_obj = rgb_rast
    , red_band_idx = red_band_idx
    , green_band_idx = green_band_idx
    , blue_band_idx = blue_band_idx
  )
  ##################################################
  # extract_rast_values
  ##################################################
  rgb_indices_df <- extract_rast_values(
    sf_data = sf_data %>% dplyr::ungroup()
    , rast = all_rgb_indices_rast
    , fun_agg = median
  )
  ##################################################
  # rgb_indices_threshold_voting
  ##################################################
  rgb_indices_df <- rgb_indices_threshold_voting(
    rgb_indices_df=rgb_indices_df
    , th_rgri = th_rgri
    , th_vdvi = th_vdvi
    , th_exgr = th_exgr
    , th_bi = th_bi
    , th_sat = th_sat
  )
  ##################################################
  # filtering
  ##################################################
  rgb_indices_df <- rgb_indices_df %>% dplyr::filter(inrange_th_votes>=spectral_weight)
  # return
  return(rgb_indices_df)
}
# polygon_spectral_filtering(
#   sf_data = slash_piles_polys
#   , rgb_rast = ortho_rast
#   , red_band_idx = 1
#   , green_band_idx = 2
#   , blue_band_idx = 3
#   , spectral_weight = 4
# ) %>%
# nrow()
# # dplyr::glimpse()
# nrow(slash_piles_polys)

6.4 Sensitivity Testing - Structure+Spectral

We already tested the sensitivity of the parameter settings for our structural, raster-based slash pile detection methodology (see here). Now, we’ll use the set of predicted piles based on the combinations of the different parameter settings in the raster-based method to test the sensitivity of our spectral based method. In short, we’re going to use the RGB imagery to filter these candidates spectrally using different weighting (voting system) settings for the spectral data. We’ll test the spectral_weight parameter from the lowest weighting of the spectral data of “1” (only one spectral index threshold must be met) to the highest weighting of spectral data “5” (all spectral index thresholds must be met).

For evaluating our structural and spectral data fusion approach, the RGB imagery will simply serve to spectrally filter candidate slash piles initially identified by our structural, raster-based watershed segmentation approach. This means we don’t need to re-evaluate these initial candidates against ground truth to calculate performance metrics (recall, precision, F-score). Instead, we will determine which candidates to retain or remove from the structural list based on their spectral properties. If a structurally detected true positive is filtered out by spectral data, it will be reclassified as an omission. Additionally, any commission (false positive) from the structural detection that is filtered out spectrally will be removed from consideration since the data fusion approach successfully eliminated these incorrect predictions. No other shifts within the confusion matrix are possible, since incorporating the spectral data under this approach does not introduce any new candidate piles for evaluation against the ground truth data.

if( length(ls()[grep("param_combos_piles",ls())])!=1 ){
  param_combos_piles <- sf::st_read("../data/param_combos_piles.gpkg", quiet = T)
  # let's attach a flag to only work with piles that intersect with the stand boundary
  # add in/out to piles data
  param_combos_piles <- param_combos_piles %>% 
    dplyr::left_join(
      param_combos_piles %>% 
        sf::st_intersection(
          stand_boundary %>% 
            sf::st_transform(sf::st_crs(param_combos_piles))
        ) %>% 
        sf::st_drop_geometry() %>% 
        dplyr::select(rn,pred_id) %>% 
        dplyr::mutate(is_in_stand = T)
      , by = dplyr::join_by(rn,pred_id)
    ) %>% 
    dplyr::mutate(
      is_in_stand = dplyr::coalesce(is_in_stand,F)
    )
}
if( length(ls()[grep("param_combos_gt",ls())])!=1 ){
  param_combos_gt <- readr::read_csv("../data/param_combos_gt.csv", progress = F, show_col_types = F)
}
# param_combos_piles %>% dplyr::glimpse()
# param_combos_gt %>% dplyr::glimpse()

# map over our polygon_spectral_filtering and save the result
f_temp <- "../data/param_combos_spectral_gt.csv"
if(!file.exists(f_temp)){
  param_combos_spectral_gt <- 
    c(1:5) %>%
    purrr::map(function(sw){
      # polygon_spectral_filtering
      param_combos_piles_filtered <- polygon_spectral_filtering(
          sf_data = param_combos_piles
          , rgb_rast = ortho_rast
          , red_band_idx = 1
          , green_band_idx = 2
          , blue_band_idx = 3
          , spectral_weight = sw
        ) %>%
        dplyr::mutate(spectral_weight=sw)
      
      # dplyr::glimpse(param_combos_piles_filtered)
      # param_combos_piles_filtered %>% sf::st_drop_geometry() %>% dplyr::count(rn) %>%  
      #   dplyr::inner_join(
      #     param_combos_piles %>% sf::st_drop_geometry() %>% dplyr::count(rn) %>% dplyr::rename(orig_n=n)
      #   ) %>% 
      #   dplyr::arrange(desc(orig_n-n))
      
      # now we need to reclassify the combinations against the ground truth data
      gt_reclassify <-
        param_combos_gt %>% 
          # add info from predictions
          dplyr::left_join(
            param_combos_piles_filtered %>%
              sf::st_drop_geometry() %>%
              dplyr::select(
                rn,pred_id,spectral_weight
              )
            , by = dplyr::join_by(rn,pred_id)
          ) %>%
          # dplyr::count(match_grp)
          # reclassify
          dplyr::mutate(
            # reclassify match_grp
            match_grp = dplyr::case_when(
              match_grp == "true positive" & is.na(spectral_weight) ~ "omission" # is.na(spectral_weight) => removed by spectral filtering
              , match_grp == "commission" & is.na(spectral_weight) ~ "remove" # is.na(spectral_weight) => removed by spectral filtering
              , T ~ match_grp
            )
            # update pred_id
            , pred_id = dplyr::case_when(
              match_grp == "omission" ~ NA
              , T ~ pred_id
            )
            # update spectral_weight (just adds it to this iteration's omissions)
            , spectral_weight = sw
          ) %>% 
          # remove old commissions
          # dplyr::count(match_grp)
          dplyr::filter(match_grp!="remove")
      # return
      return(gt_reclassify)
    }) %>% 
    dplyr::bind_rows()
  
    # save
    param_combos_spectral_gt %>% readr::write_csv(f_temp, append = F, progress = F)
}else{
  param_combos_spectral_gt <- readr::read_csv(f_temp, progress = F, show_col_types = F)
}
# what?
param_combos_spectral_gt %>% dplyr::glimpse()
## Rows: 1,302,012
## Columns: 8
## $ pile_id         <dbl> 142, 140, 88, 137, 50, 52, 145, 53, 141, 101, 80, 102,…
## $ i_area          <dbl> 40.700000, 41.700000, 39.400000, 33.000000, 16.700000,…
## $ u_area          <dbl> 113.916297, 100.318477, 70.470793, 79.777859, 43.19390…
## $ iou             <dbl> 0.3572799, 0.4156762, 0.5590969, 0.4136486, 0.3866287,…
## $ pred_id         <dbl> 5588, 4135, 5160, 5688, 7886, 5921, 5283, 6538, 6852, …
## $ match_grp       <chr> "true positive", "true positive", "true positive", "tr…
## $ rn              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ spectral_weight <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
# # a row in `param_combos_spectral_gt` is unique by:
#   # spectral_weight, rn, pile_id, pred_id 
#   # where rn is a link to the `param_combos_df` table defining the 
#   # unique combination of parameters tested in our raster-based watershed pile detection method
# identical(
#   param_combos_spectral_gt %>% dplyr::distinct(spectral_weight, rn, pile_id, pred_id) %>% nrow()
#   , nrow(param_combos_spectral_gt)
# )

let’s attach a flag to only work with piles that intersect with the stand boundary and add the area of the ground truth and predicted piles

# add it to validation
param_combos_spectral_gt <-
  param_combos_spectral_gt %>% 
  # add original candidate piles from the structural watershed method with spectral_weight=0
  dplyr::bind_rows(
    param_combos_gt %>% 
      dplyr::mutate(spectral_weight=0) %>% 
      dplyr::select(names(param_combos_spectral_gt))
  ) %>% 
  # make a description of spectral_weight
  dplyr::mutate(
    spectral_weight_desc = factor(
      spectral_weight
      , ordered = T
      , levels = 0:5
      , labels = c(
        "no spectral criteria"
        , "1 spectral criteria req."
        , "2 spectral criteria req."
        , "3 spectral criteria req."
        , "4 spectral criteria req."
        , "5 spectral criteria req."
      )
    )
  ) %>% 
  # add area of gt
  dplyr::left_join(
    slash_piles_polys %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(pile_id,image_gt_area_m2,field_gt_area_m2,image_gt_volume_m3,field_gt_volume_m3,height_m) %>% 
      dplyr::rename(gt_height_m = height_m) %>% 
      dplyr::mutate(pile_id=as.numeric(pile_id))
    , by = "pile_id"
  ) %>% 
  # add info from predictions
  dplyr::left_join(
    param_combos_piles %>%
      sf::st_drop_geometry() %>%
      dplyr::select(
        rn,pred_id
        , is_in_stand
        , area_m2, volume_m3, max_height_m
      ) %>% 
      dplyr::rename(
        pred_area_m2 = area_m2, pred_volume_m3 = volume_m3, pred_height_m = max_height_m
      ) %>% 
      # add volume assuming cone or conical shape
      dplyr::mutate(
        pred_cone_volume_m3 = (1/3)*pi*(sqrt(pred_area_m2/pi)^2)*pred_height_m # sqrt(pred_area_m2/pi) = radius assuming of circle covering same area
      )
    , by = dplyr::join_by(rn,pred_id)
  ) %>%
  dplyr::mutate(
    is_in_stand = dplyr::case_when(
      is_in_stand == T ~ T
      , is_in_stand == F ~ F
      , match_grp == "omission" ~ T
      , T ~ F
    )
    ### calculate these based on the formulas below...agg_ground_truth_match() depends on those formulas
    # area diffs
    , height_diff = pred_height_m-gt_height_m
    , pct_diff_height = (gt_height_m-pred_height_m)/gt_height_m
    # area diffs
    , area_diff_image = pred_area_m2-image_gt_area_m2
    , pct_diff_area_image = (image_gt_area_m2-pred_area_m2)/image_gt_area_m2
    , area_diff_field = pred_area_m2-field_gt_area_m2
    , pct_diff_area_field = (field_gt_area_m2-pred_area_m2)/field_gt_area_m2
    # volume diffs
    , volume_diff_image = pred_volume_m3-image_gt_volume_m3
    , pct_diff_volume_image = (image_gt_volume_m3-pred_volume_m3)/image_gt_volume_m3
    , volume_diff_field = pred_volume_m3-field_gt_volume_m3
    , pct_diff_volume_field = (field_gt_volume_m3-pred_volume_m3)/field_gt_volume_m3
    # volume diffs cone
    , cone_volume_diff_image = pred_cone_volume_m3-image_gt_volume_m3
    , cone_volume_diff_field = pred_cone_volume_m3-field_gt_volume_m3
  )
# what?
# param_combos_spectral_gt %>% dplyr::glimpse()

6.4.1 Example comparison by spectral weighting

let’s check out a quick plot of the slash pile prediction classification by spectral_weight which controls the strictness of our spectral filtering. as the spectral_weight parameter increases from its lowest setting (“1”, requiring only one spectral index threshold to be met) to its highest (“5”, requiring all spectral index thresholds be met), we expect a clear trade-off: commissions (false positives) should decrease while true positive matches should decrease and omissions should increase. remember, the objective of this sensitivity testing is to find the parameter combination for our rules-based slash pile detection method that best balances correctly identifying ground truth piles while minimizing incorrect positive predictions as measured by F-score to quantify overall accuracy

param_combos_spectral_gt %>% 
  dplyr::count(spectral_weight,spectral_weight_desc,match_grp) %>% 
  dplyr::group_by(spectral_weight,spectral_weight_desc) %>% 
  dplyr::mutate(pct = n/sum(n)) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = n, x = spectral_weight_desc, color = match_grp, group = match_grp)
  ) +
  ggplot2::geom_line(lwd = 1.5, alpha = 0.8) +
  ggplot2::geom_point(size = 2) +
  ggplot2::scale_color_manual(values = pal_match_grp, name = "", na.value = "red") +
  ggplot2::labs(x = "", y = "", color = "", fill = "") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , axis.text.y = ggplot2::element_blank()
    , axis.ticks.y = ggplot2::element_blank()
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 5, alpha = 1))
    , shape = "none"
  )

the trends here are as expected. however, for this data it appears that two criteria are highly correlated since we don’t see a change in the prediction classification until we require that at least three spectral index threshold criteria are met.

let’s look at a single parameter combination and the difference between different levels of spectral filtering

# plot it
ortho_plt_temp <- ortho_plt_fn(
    stand = stand_boundary %>% 
      sf::st_transform(terra::crs(ortho_rast))
    , buffer = 30
  ) +
    ggplot2::geom_sf(
      data = stand_boundary %>% 
        sf::st_transform(terra::crs(ortho_rast))
      , color = "black", fill = NA, lwd = 1.1
    )
# ortho_plt_temp
# filter for one combination

df_temp <- param_combos_spectral_gt %>% 
  # dplyr::filter(rn == unique(param_combos_spectral_gt$rn) %>% sample(1))
  dplyr::filter(rn == 172)

# add piles
plt_list_temp <- c(0:5) %>% 
  purrr::map(\(x)
    ortho_plt_temp +
      ggplot2::geom_sf(
        data = 
          slash_piles_polys %>% 
          dplyr::filter(is_in_stand) %>% 
          sf::st_transform(terra::crs(ortho_rast)) %>% 
            dplyr::left_join(
              df_temp %>% 
                dplyr::filter(spectral_weight==x) %>%
                dplyr::select(pile_id,match_grp,spectral_weight_desc) %>% 
                dplyr::mutate(pile_id=as.character(pile_id))
              , by = "pile_id"
            )
        , mapping = ggplot2::aes(color = match_grp)
        , inherit.aes = F
        , fill = NA , lwd = 0.7
      )+
      ggplot2::geom_sf(
        data =
          param_combos_piles %>% 
            dplyr::inner_join(
              df_temp %>%
                dplyr::filter(is_in_stand & match_grp == "commission") %>% 
                dplyr::filter(spectral_weight==x) %>%
                dplyr::select(pred_id,match_grp,spectral_weight_desc)
              , by = "pred_id"
            )
        , mapping = ggplot2::aes(color = match_grp)
        , inherit.aes = F
        , fill = NA , lwd = 0.4
      ) +
      ggplot2::scale_color_manual(
        values = c(
            "omission"= "blue2" # viridis::cividis(3)[1]
            , "commission"= "gray88" #viridis::cividis(3)[2]
            , "true positive"= "yellow2" #viridis::cividis(3)[3]
          )
        , name = ""
      ) +
      ggplot2::labs(
        subtitle = levels(param_combos_spectral_gt$spectral_weight_desc)[x+1]
      ) +
      ggplot2::theme(
        legend.position = "bottom"
        , plot.subtitle = ggplot2::element_text(size = 9, hjust = 0.4, face = "bold")
      ) +
      ggplot2::guides(
        color = ggplot2::guide_legend(override.aes = list(lwd = 9))
        , fill = "none"
      )
  )
patchwork::wrap_plots(
    plt_list_temp
    , ncol = 2
    , guides = "collect"
  ) &
  ggplot2::theme(legend.position = "bottom")

ggplot2::ggsave("../data/best_detect_ever_spect_index_filtered.jpeg", height = 10.5, width = 8.5)

the spectral filtering performed as intended, successfully removing candidate piles that visually appear as green vegetation on the imagery and it also eliminated some candidates that appeared to be rocks or rock outcroppings, though with less success visually than for vegetation

6.4.2 Aggregate assessment metrics

now we need to aggregate these single tree level results to a single line for each parameter combination in this analysis with the confusion matrix performance metrics (e.g. precision, recall, F-score)

now let’s apply it at the parameter combination level

# unique combinations
combo_temp <- param_combos_spectral_gt %>% dplyr::distinct(rn,spectral_weight)
# aggregate over combinations
param_combos_spectral_gt_agg <- 
  1:nrow(combo_temp) %>% 
  purrr::map(
    \(x)
    agg_ground_truth_match(
      param_combos_spectral_gt %>% 
        dplyr::filter(
          is_in_stand
          & rn == combo_temp$rn[x]
          & spectral_weight == combo_temp$spectral_weight[x]
        )
    ) %>% 
    dplyr::mutate(
      rn = combo_temp$rn[x]
      , spectral_weight = combo_temp$spectral_weight[x]
    )
  ) %>% 
  dplyr::bind_rows() %>% 
  # add in info on all parameter combinations
  dplyr::inner_join(
    param_combos_df
    , by = "rn"
    , relationship = "many-to-one"
  )

# we can also add the relative rmse (rrmse) by comparing the rmse to the mean value of the ground truth data
param_combos_spectral_gt_agg <- param_combos_spectral_gt_agg %>%
  dplyr::bind_cols(
    # add means of gt
    slash_piles_polys %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(image_gt_area_m2,field_gt_area_m2,image_gt_volume_m3,field_gt_volume_m3,height_m) %>% 
      dplyr::ungroup() %>% 
      dplyr::summarise(dplyr::across(dplyr::everything(), ~mean(.x,na.rm=T))) %>% 
      dplyr::rename_with(~ paste0("gt_", .x, recycle0 = TRUE))
  ) %>% 
  dplyr::mutate(
    area_diff_field_rrmse = area_diff_field_rmse/gt_field_gt_area_m2
    , area_diff_image_rrmse = area_diff_image_rmse/gt_image_gt_area_m2
    , volume_diff_field_rrmse = volume_diff_field_rmse/gt_field_gt_volume_m3
    , volume_diff_image_rrmse = volume_diff_image_rmse/gt_image_gt_volume_m3
    , cone_volume_diff_field_rrmse = cone_volume_diff_field_rmse/gt_field_gt_volume_m3
    , cone_volume_diff_image_rrmse = cone_volume_diff_image_rmse/gt_image_gt_volume_m3
    , height_diff_rrmse = height_diff_rmse/gt_height_m
  ) %>% 
  dplyr::select(!tidyselect::starts_with("gt_"))
# what is this?
param_combos_spectral_gt_agg %>% dplyr::glimpse()
## Rows: 7,560
## Columns: 41
## $ tp_n                         <dbl> 109, 108, 106, 95, 71, 19, 112, 111, 105,…
## $ fp_n                         <dbl> 179, 148, 113, 62, 32, 7, 168, 142, 112, …
## $ fn_n                         <dbl> 12, 13, 15, 26, 50, 102, 9, 10, 16, 28, 5…
## $ omission_rate                <dbl> 0.09917355, 0.10743802, 0.12396694, 0.214…
## $ commission_rate              <dbl> 0.6215278, 0.5781250, 0.5159817, 0.394904…
## $ precision                    <dbl> 0.3784722, 0.4218750, 0.4840183, 0.605095…
## $ recall                       <dbl> 0.9008264, 0.8925620, 0.8760331, 0.785124…
## $ f_score                      <dbl> 0.5330073, 0.5729443, 0.6235294, 0.683453…
## $ area_diff_field_rmse         <dbl> 5.738122, 5.765817, 5.809439, 6.033033, 6…
## $ area_diff_image_rmse         <dbl> 15.18938, 15.25870, 15.38346, 16.09463, 1…
## $ cone_volume_diff_field_rmse  <dbl> 12.393136, 12.452527, 12.573711, 13.21744…
## $ cone_volume_diff_image_rmse  <dbl> 26.90997, 27.04027, 27.30445, 28.72903, 3…
## $ height_diff_rmse             <dbl> 0.6570244, 0.6571267, 0.6537457, 0.660446…
## $ volume_diff_field_rmse       <dbl> 9.631694, 9.664515, 9.757351, 10.245587, …
## $ volume_diff_image_rmse       <dbl> 23.66330, 23.77412, 24.00697, 25.26748, 2…
## $ area_diff_field_mean         <dbl> -3.205005, -3.232908, -3.243510, -3.33351…
## $ area_diff_image_mean         <dbl> -9.207075, -9.276893, -9.351781, -9.75963…
## $ cone_volume_diff_field_mean  <dbl> -4.005182, -4.056353, -4.126757, -4.42177…
## $ cone_volume_diff_image_mean  <dbl> -10.099390, -10.200505, -10.364435, -11.1…
## $ height_diff_mean             <dbl> -0.065212636, -0.072121496, -0.084214989,…
## $ volume_diff_field_mean       <dbl> -1.16513141, -1.22739111, -1.26469155, -1…
## $ volume_diff_image_mean       <dbl> -7.259339, -7.371543, -7.502369, -8.13330…
## $ pct_diff_area_field_mape     <dbl> 0.2883263, 0.2906851, 0.2886247, 0.285357…
## $ pct_diff_area_image_mape     <dbl> 0.5143481, 0.5173229, 0.5166946, 0.515038…
## $ pct_diff_height_mape         <dbl> 0.1929837, 0.1919159, 0.1890983, 0.189716…
## $ pct_diff_volume_field_mape   <dbl> 0.3734381, 0.3672460, 0.3678187, 0.364309…
## $ pct_diff_volume_image_mape   <dbl> 0.3525434, 0.3491698, 0.3497211, 0.343597…
## $ rn                           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
## $ spectral_weight              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ max_ht_m                     <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ min_area_m2                  <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ max_area_m2                  <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
## $ convexity_pct                <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0…
## $ circle_fit_iou_pct           <dbl> 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.3, 0.4, 0…
## $ area_diff_field_rrmse        <dbl> 0.5409350, 0.5435458, 0.5476580, 0.568736…
## $ area_diff_image_rrmse        <dbl> 0.7958969, 0.7995289, 0.8060665, 0.843330…
## $ volume_diff_field_rrmse      <dbl> 0.9452834, 0.9485046, 0.9576158, 1.005532…
## $ volume_diff_image_rrmse      <dbl> 1.4047235, 1.4113022, 1.4251248, 1.499952…
## $ cone_volume_diff_field_rrmse <dbl> 1.2162997, 1.2221285, 1.2340219, 1.297199…
## $ cone_volume_diff_image_rrmse <dbl> 1.597455, 1.605190, 1.620873, 1.705440, 1…
## $ height_diff_rrmse            <dbl> 0.3010821, 0.3011289, 0.2995796, 0.302650…

6.5 Parameter Sensitivity Test Results

let’s get a quick summary of the evaluation metrics across all parameter combinations

# pal
pal_eval_metric <- c(
  RColorBrewer::brewer.pal(3,"Oranges")[3]
  , RColorBrewer::brewer.pal(3,"Greys")[3]
  , RColorBrewer::brewer.pal(3,"Purples")[3]
)
# summary
param_combos_spectral_gt_agg %>% 
  dplyr::select(f_score,recall,precision,tidyselect::ends_with("_mape")) %>% 
  summary()
##     f_score           recall          precision      pct_diff_area_field_mape
##  Min.   :0.1374   Min.   :0.07438   Min.   :0.2304   Min.   :0.1693          
##  1st Qu.:0.4626   1st Qu.:0.55372   1st Qu.:0.4052   1st Qu.:0.2716          
##  Median :0.5683   Median :0.75207   Median :0.5669   Median :0.2826          
##  Mean   :0.5349   Mean   :0.65858   Mean   :0.5783   Mean   :0.2760          
##  3rd Qu.:0.6667   3rd Qu.:0.88430   3rd Qu.:0.7368   3rd Qu.:0.2921          
##  Max.   :0.8327   Max.   :0.95868   Max.   :1.0000   Max.   :0.3202          
##  pct_diff_area_image_mape pct_diff_height_mape pct_diff_volume_field_mape
##  Min.   :0.4076           Min.   :0.1182       Min.   :0.2185            
##  1st Qu.:0.4984           1st Qu.:0.1895       1st Qu.:0.3751            
##  Median :0.5072           Median :0.1968       Median :0.3894            
##  Mean   :0.5004           Mean   :0.1954       Mean   :0.3909            
##  3rd Qu.:0.5126           3rd Qu.:0.2024       3rd Qu.:0.4015            
##  Max.   :0.5350           Max.   :0.2748       Max.   :0.6351            
##  pct_diff_volume_image_mape
##  Min.   :0.1968            
##  1st Qu.:0.3329            
##  Median :0.3460            
##  Mean   :0.3378            
##  3rd Qu.:0.3543            
##  Max.   :0.3858

6.5.1 Main effect of spectral parameter on performance

let’s average across all other factors to look at the main effect of spectral_weight which controls the strictness of our spectral filtering. as the spectral_weight parameter increases from its lowest setting (“1”, requiring only one spectral index threshold to be met) to its highest (“5”, requiring all spectral index thresholds be met)

param_combos_spectral_gt_agg %>% 
  dplyr::select(c(spectral_weight,precision,recall,f_score)) %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  tidyr::pivot_longer(
    cols = c(spectral_weight)
    , names_to = "param"
    , values_to = "param_value"
  ) %>% 
  dplyr::group_by(param, param_value, metric) %>%
  dplyr::summarise(
    median = median(value,na.rm=T)
    , q25 = stats::quantile(value,na.rm=T,probs = 0.25)
    , q75 = stats::quantile(value,na.rm=T,probs = 0.75)
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
        )
      )
    , param_value = factor(x = param_value, labels = levels(param_combos_spectral_gt$spectral_weight_desc))
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = median, x = param_value, color = metric, fill = metric, group = metric, shape = metric)
  ) +
  ggplot2::geom_ribbon(
    mapping = ggplot2::aes(ymin = q25, ymax = q75)
    , alpha = 0.2, color = NA
  ) +
  ggplot2::geom_line(lwd = 1.5, alpha = 0.8) +
  ggplot2::geom_point(size = 2) +
  ggplot2::facet_wrap(facets = dplyr::vars(param), scales = "free_x") +
  # ggplot2::scale_color_viridis_d(begin = 0.2, end = 0.8) +
  ggplot2::scale_fill_manual(values = pal_eval_metric) +
  ggplot2::scale_color_manual(values = pal_eval_metric) +
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(x = "", y = "median value", color = "", fill = "") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 5, alpha = 1))
    , shape = "none"
  )

ggplot2::ggsave("../data/best_pile_detect_test.jpeg", height = 8, width = 8)

based on these main effect aggregated results:

  • increasing the spectral_weight (where “5” requires all spectral index thresholds to be met) had minimal impact on metrics until a value of “3”, at which point F-score and precision both saw slight improvements. at a spectral_weight of “4”, the F-score significantly improved due to a substantial increase in precision. setting spectral_weight to “5” resulted in a significant drop in recall (detection rate) and a proportionally inverse increase in precision, leading to only a minor overall change in F-score.

6.5.2 Main effect of spectral parameter on form quantification

let’s average across all other factors to look at the main effect by parameter for the MAPE metrics quantifying the pile form accuracy

param_combos_spectral_gt_agg %>% 
  dplyr::select(
    spectral_weight
    , pct_diff_volume_field_mape
    , pct_diff_volume_image_mape
    , pct_diff_area_field_mape
    , pct_diff_area_image_mape
    , pct_diff_height_mape
  ) %>% 
  tidyr::pivot_longer(
    cols = c(tidyselect::ends_with("_mape"))
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  tidyr::pivot_longer(
    cols = c(spectral_weight)
    , names_to = "param"
    , values_to = "param_value"
  ) %>% 
  dplyr::group_by(param, param_value, metric) %>%
  dplyr::summarise(
    median = median(value,na.rm=T)
    , q25 = stats::quantile(value,na.rm=T,probs = 0.25)
    , q75 = stats::quantile(value,na.rm=T,probs = 0.75)
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
        # rmse
        , metric == "pct_diff_volume_field_mape" ~ 4
        , metric == "pct_diff_volume_image_mape" ~ 5
        , metric == "pct_diff_area_field_mape" ~ 6
        , metric == "pct_diff_area_image_mape" ~ 7
        , metric == "pct_diff_height_mape" ~ 8
      ) %>% 
      factor(
        ordered = T
        , levels = 1:8
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
          , "MAPE: Volume field (%)"
          , "MAPE: Volume image (%)"
          , "MAPE: Area field (%)"
          , "MAPE: Area image (%)"
          , "MAPE: Height (%)"
        )
      )
    , param_value = factor(x = param_value, labels = levels(param_combos_spectral_gt$spectral_weight_desc))
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = median, x = param_value, color = metric, fill = metric, group = metric, shape = metric)
  ) +
  # ggplot2::geom_ribbon(
  #   mapping = ggplot2::aes(ymin = q25, ymax = q75)
  #   , alpha = 0.2, color = NA
  # ) +
  ggplot2::geom_line(lwd = 1.5, alpha = 0.8) +
  ggplot2::geom_point(size = 2) +
  ggplot2::facet_wrap(facets = dplyr::vars(param), scales = "free_x") +
  ggplot2::scale_fill_manual(values = pal_get_pairs_fn(6)[1:5]) +
  ggplot2::scale_color_manual(values = pal_get_pairs_fn(6)[1:5]) +
  ggplot2::scale_shape_manual(values = c(15,16,17,18,0)) +
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(x = "", y = "median value", color = "", fill = "") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 5, alpha = 1))
    , shape = "none"
  )

these results indicate that the method’s accuracy in outlining slash pile form is robust to parameter changes, as the relevant shape accuracy metrics remain consistent across various settings of the spectral_weight parameter

6.5.3 Best results

we can cut to the chase and just look at the parameter combinations that achieved the best results based on the evaluation metrics

rank_th_temp <- 18
df_temp <-
  param_combos_spectral_gt_agg %>% 
  dplyr::ungroup() %>% 
  dplyr::arrange(desc(f_score)) %>% 
  dplyr::mutate(
    rank_f_score = dplyr::row_number()
  ) %>% 
  dplyr::arrange(desc(precision)) %>% 
  dplyr::mutate(
    rank_precision = dplyr::row_number()
  ) %>% 
  dplyr::arrange(desc(recall)) %>% 
  dplyr::mutate(
    rank_recall = dplyr::row_number()
  ) %>% 
  dplyr::arrange(desc(f_score), desc(recall), desc(precision)) %>% 
  # now get the max of these pct ranks by row
  dplyr::rowwise() %>% 
  dplyr::mutate(
    rank_min = min(
      dplyr::c_across(
        tidyselect::starts_with("rank")
      )
      , na.rm = T
    )
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    rank_lab = dplyr::case_when(
      rank_f_score<=rank_th_temp ~ 1
      , rank_recall<=rank_th_temp ~ 2
      , rank_precision<=rank_th_temp ~ 3
      , (rank_f_score>=(nrow(param_combos_spectral_gt_agg)-rank_th_temp) & dplyr::coalesce(f_score,0)>0) ~ 4
    ) %>% 
      factor(
        ordered = T
        , levels = 1:4
        , labels = c(
          paste0("top ", scales::comma(rank_th_temp, accuracy = 1), " F-score")
          , paste0("top ", scales::comma(rank_th_temp, accuracy = 1), " Recall")
          , paste0("top ", scales::comma(rank_th_temp, accuracy = 1), " Precision")
          , paste0("bottom ", scales::comma(rank_th_temp, accuracy = 1), " F-score")
        )
      )
  ) %>% 
  dplyr::filter(
    !is.na(rank_lab) # rank_min>=0.99
  ) %>% 
  dplyr::mutate(
    lab = stringr::str_c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct,spectral_weight, sep = ":")
  ) 
# plot
df_temp %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
        )
      )
    , lab = forcats::fct_reorder(lab, desc(rank_f_score))
    , val_lab = scales::percent(value, accuracy = 0.1)
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, y = lab, fill = metric, label = val_lab)
  ) +
  ggplot2::geom_col(width = 0.6) +
  ggplot2::geom_text(color = "black", size = 2, hjust = -0.2) +
  ggplot2::scale_fill_manual(values = pal_eval_metric) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1)
    , limits = c(0,1.05)
    # , expand = expansion(mult = c(0, .08))
  ) +
  ggplot2::facet_grid(cols = dplyr::vars(metric), rows = dplyr::vars(rank_lab), scales = "free_y") +
  ggplot2::labs(
    x = "", y = "max_ht_m : max_area_m2 : convexity_pct : circle_fit_iou_pct : spectral_weight"
    , fill = ""
    , subtitle = "top (and bottom) parameter combinations by evaluation metrics"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text = ggplot2::element_text(size = 6.2)
  )

let’s make a table of these results

df_temp %>% 
  dplyr::filter(rank_min<=rank_th_temp) %>% 
  dplyr::select(
    rank_lab
    , max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct,spectral_weight
    , f_score, recall, precision
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::arrange(rank_lab,desc(f_score), desc(recall), desc(precision)) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(f_score, recall, precision)
    , .fn = ~ scales::percent(.x, accuracy = 1)
  )) %>% 
  dplyr::mutate(blank= "   " ) %>% 
  dplyr::relocate(blank, .before = f_score) %>% 
  kableExtra::kbl(
    caption = "parameter combinations that achieved the best slash pile detection results"
    , col.names = c(
      "."
      ,"max_ht_m","max_area_m2","convexity_pct","circle_fit_iou_pct","spectral_weight"
      , "   "
      , "F-score", "Recall", "Precision"
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling(font_size = 10.5) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>% 
  kableExtra::add_header_above(c(" "=7, "Evaluation Metric" = 3))
Table 6.1: parameter combinations that achieved the best slash pile detection results
Evaluation Metric
. max_ht_m max_area_m2 convexity_pct circle_fit_iou_pct spectral_weight F-score Recall Precision
top 18 F-score 4 70 0.4 0.5 4 83% 88% 79%
4 60 0.8 0.5 4 83% 85% 81%
4 60 0.3 0.5 4 83% 90% 77%
4 50 0.5 0.6 4 83% 82% 84%
4 110 0.8 0.5 4 83% 82% 84%
4 120 0.6 0.6 4 83% 83% 82%
4 100 0.5 0.6 4 83% 81% 84%
4 90 0.5 0.5 4 83% 88% 78%
4 90 0.5 0.6 4 83% 80% 85%
4 90 0.8 0.5 4 82% 83% 81%
4 50 0.5 0.5 4 82% 89% 76%
4 100 0.8 0.5 4 82% 83% 81%
4 90 0.8 0.3 4 82% 87% 78%
4 90 0.8 0.4 4 82% 87% 78%
4 50 0.2 0.5 4 82% 88% 76%
4 70 0.3 0.5 4 82% 88% 76%
4 110 0.7 0.5 4 82% 88% 77%
4 80 0.3 0.6 4 82% 82% 82%
top 18 Recall 5 90 0.6 0.3 4 72% 95% 58%
5 80 0.5 0.3 4 71% 96% 57%
5 70 0.3 0.3 4 69% 96% 54%
6 110 0.4 0.3 4 65% 95% 49%
6 100 0.3 0.3 4 64% 95% 48%
6 110 0.6 0.3 4 63% 95% 48%
5 90 0.6 0.3 3 50% 95% 34%
5 80 0.5 0.3 3 49% 96% 33%
5 70 0.3 0.3 3 48% 96% 32%
5 90 0.6 0.3 1 48% 95% 32%
5 90 0.6 0.3 2 48% 95% 32%
5 90 0.6 0.3 0 48% 95% 32%
5 80 0.5 0.3 1 47% 96% 31%
5 80 0.5 0.3 2 47% 96% 31%
5 80 0.5 0.3 0 47% 96% 31%
5 70 0.3 0.3 2 47% 96% 31%
5 70 0.3 0.3 1 46% 96% 31%
5 70 0.3 0.3 0 46% 96% 31%
top 18 Precision 6 120 0.2 0.8 5 27% 16% 100%
5 100 0.8 0.8 5 26% 15% 100%
6 90 0.3 0.8 5 26% 15% 100%
4 50 0.5 0.8 5 25% 14% 100%
5 40 0.3 0.8 5 25% 14% 100%
5 60 0.3 0.8 5 25% 14% 100%
5 70 0.3 0.8 5 25% 14% 100%
5 130 0.8 0.8 5 25% 14% 100%
6 70 0.3 0.8 5 25% 14% 100%
4 100 0.4 0.8 5 23% 13% 100%
4 120 0.7 0.8 5 23% 13% 100%
5 50 0.8 0.8 5 23% 13% 100%
6 40 0.2 0.8 5 23% 13% 100%
6 40 0.6 0.8 5 23% 13% 100%
6 40 0.7 0.8 5 23% 13% 100%
6 60 0.7 0.8 5 23% 13% 100%
6 90 0.2 0.8 5 23% 13% 100%
6 110 0.3 0.8 5 23% 13% 100%

if we look at only the top 5% of parameter combinations by F-score, what is the distribution of individual parameter settings?

pal_param <- viridis::plasma(n=5, begin = 0.1, end = 0.9, alpha = 0.8)
df_temp <- param_combos_spectral_gt_agg %>% 
  dplyr::ungroup() %>% 
  dplyr::arrange(desc(f_score)) %>% 
  dplyr::mutate(
    pct_rank_f_score = dplyr::percent_rank(f_score)
  ) %>% 
  dplyr::filter(pct_rank_f_score>=pct_rank_th_temp) %>%
  dplyr::select(tidyselect::contains("f_score"), max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct,spectral_weight) %>% 
  tidyr::pivot_longer(
    cols = c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct,spectral_weight)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::count(metric, value) %>% 
  dplyr::group_by(metric) %>% 
  dplyr::mutate(
    pct=n/sum(n)
    , lab = paste0(scales::percent(pct,accuracy=0.1), "\nn=", scales::comma(n,accuracy=1))
  )
# save best for later
best_spectral_weight <- df_temp %>% 
  dplyr::filter(metric=="spectral_weight") %>% 
  dplyr::arrange(desc(n)) %>% 
  dplyr::slice(1) %>% 
  dplyr::pull(value)
# plot
df_temp %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = factor(value), y=pct, label=lab, fill = metric)
  ) +
  ggplot2::geom_col(width = 0.6) +
  ggplot2::geom_text(color = "black", size = 2.5, vjust = -0.2) +
  ggplot2::facet_wrap(facets = dplyr::vars(metric), ncol=2, scales = "free_x") +
  ggplot2::scale_y_continuous(
    breaks = seq(0,1,by=0.2)
    , labels = scales::percent
    , expand = ggplot2::expansion(mult = c(0,0.2))
  ) +
  ggplot2::scale_fill_manual(values = pal_param) +
  ggplot2::labs(
    x = "parameter setting", y = ""
    , fill = ""
    , subtitle = paste0("parameter settings of top ", scales::percent(1-pct_rank_th_temp,accuracy=1), " of combinations by F-score")
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.y = ggplot2::element_text(size = 6)
    , axis.text.x = ggplot2::element_text(size = 8)
  )

let’s use the value of spectral_weight that most frequently results in the best pile detection based on F-score (spectral_weight of “4”) to compare results to not using any spectral data (i.e. just using structural data as outlined in our method here). We’ll then average across all other factors to look at the main effect of spectral_weight after creating a dichotomous factor of structural only versus structural+spectral

# filter data
combos_df_temp <-
  param_combos_spectral_gt_agg %>% 
  dplyr::mutate(
    combo = forcats::fct_cross(
      factor(max_ht_m)
      , factor(max_area_m2)
      , factor(convexity_pct)
      , factor(circle_fit_iou_pct)
    )
  ) %>% 
  dplyr::select(c(
    combo,spectral_weight
    ,precision,recall,f_score
    , pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape
  )) %>% 
  dplyr::filter(
    spectral_weight == 0
    | spectral_weight == best_spectral_weight
  ) %>% 
  dplyr::mutate(
    spectral_weight_fact = ifelse(spectral_weight==0,"structural only","structural+spectral") %>% 
      factor()
  )

# quick mod
dep_vars_temp <- c("f_score","recall","precision")
# list of formulas
formulas_temp <- dep_vars_temp %>% 
  purrr::map(~stats::reformulate(c("spectral_weight_fact","-1"), response = .x))
names(formulas_temp) <- dep_vars_temp
# formulas_temp
models_temp <- formulas_temp %>% 
  purrr::map(\(x) lm(formula = x, data = combos_df_temp))
names(models_temp) <- dep_vars_temp
# models_temp
# combine into df
pred_df_temp <- purrr::map_df(models_temp, broom::tidy, .id = "metric") %>% 
  dplyr::select(-c(statistic)) %>% 
  dplyr::mutate(
    dep_var = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
        )
      )
    , term = stringr::str_remove_all(term, "spectral_weight_fact") %>% factor()
    # to obtain the 95% confidence interval...
      # ...1.96 times the standard error is added and subtracted from the sample mean
    , ul = estimate+std.error*1.96
    , ll = estimate-std.error*1.96
  ) %>%
  dplyr::rename(spectral_weight_fact=term) %>% 
  dplyr::relocate(dep_var) %>% 
  dplyr::arrange(dep_var,spectral_weight_fact)

# pred_df_temp
# table it
pred_df_temp %>% 
  dplyr::select(-c(metric,ul,ll)) %>% 
  dplyr::mutate(
    estimate = scales::percent(estimate, accuracy = 0.1)
    , std.error = scales::percent(std.error, accuracy = 0.01)
    , p.value = ifelse(
        p.value < 0.001
        , "< 0.001"
        , scales::comma(p.value, accuracy = 0.0001)
      )
  ) %>% 
  kableExtra::kbl(
    caption = "Predictions for performance metrics by detection method<br>averaging across all other structural detection parameters"
    , col.names = c(".","method","predicted mean", "SE", "p-value")
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top")
Table 6.2: Predictions for performance metrics by detection method
averaging across all other structural detection parameters
. method predicted mean SE p-value
F-score structural only 48.8% 0.47% < 0.001
structural+spectral 64.5% 0.47% < 0.001
Recall structural only 68.7% 0.77% < 0.001
structural+spectral 68.7% 0.77% < 0.001
Precision structural only 47.1% 0.38% < 0.001
structural+spectral 73.4% 0.38% < 0.001

plot the change from including spectral data to the pile detection methodology

# scales::show_col(pal_eval_metric)
# plot
ggplot2::ggplot(
  data = pred_df_temp
  , mapping = ggplot2::aes(x = spectral_weight_fact)
) +
  ggplot2::geom_line(
    data = combos_df_temp %>% 
      dplyr::slice_sample(prop = 0.3) %>% 
      tidyr::pivot_longer(
        cols = c(precision,recall,f_score)
        , names_to = "metric"
        , values_to = "value"
      ) %>% 
      dplyr::mutate(
        dep_var = dplyr::case_when(
          metric == "f_score" ~ 1
          , metric == "recall" ~ 2
          , metric == "precision" ~ 3
        ) %>% 
        factor(
          ordered = T
          , levels = 1:3
          , labels = c(
            "F-score"
            , "Recall"
            , "Precision"
          )
        )
      )
    , mapping = ggplot2::aes(y = value, group = combo, color = metric)
    , lwd = 0.6, alpha = 0.1
    # , color = pal_eval_metric[1]
  ) +
  geom_ribbon(
    mapping = ggplot2::aes(ymin = ll, ymax = ul, group = 1)
    , fill = "black", alpha = 0.2
  ) +
  ggplot2::geom_line(
    mapping = ggplot2::aes(y = estimate, group = 1)
    , lwd = 1, color = "black" # pal_eval_metric[1]
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(y=estimate, label = scales::percent(estimate,accuracy=0.1), group = 1)
    , vjust = -0.25
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(dep_var)) +
  ggplot2::scale_color_manual(values = c(pal_eval_metric[1],pal_eval_metric[3],pal_eval_metric[2]) ) + # pal_eval_metric
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(
    x = "", y = "", color = "", fill = ""
    , caption = paste0(
      "*The spectral+structural group has the `spectral_weight` parameter set to '"
      , best_spectral_weight, "'"
    )
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , plot.caption = ggplot2::element_text(size = 8)
  )

remember, these values appear low but the value is not the important takeaway here because we averaged across all other parameter settings, including those that resulted in poor pile detection structurally, instead the focus is on the change from including the spectral data. based on the data used here, these results indicate that including the spectral data resulted in a 15.7 percentage point increase in the F-score compared to just using the structural data irrespective of the parameter settings used in the structural detection method.

if we look at only the best pile detection methodology based on F-score using the structural data and the best using both structural and spectral data, what is the change in our performance metrics from including the spectral data?

combos_df_temp <- combos_df_temp %>% 
  # get the top by spectral_weight
  dplyr::group_by(spectral_weight) %>% 
  dplyr::mutate(
    is_top = f_score == max(f_score)
    , top_what = ifelse(is_top, spectral_weight, NA)
  ) %>% 
  dplyr::group_by(combo) %>% 
  dplyr::filter(any(is_top)) %>% 
  dplyr::mutate(
    top_what = dplyr::first(top_what, na_rm = T)
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    top_what = ifelse(top_what==0,"Top structural only","Top structural+spectral") %>% 
      factor()
    , desc = paste0(
      top_what
      , " : max_ht_m=", stringr::word(combo,1,sep = ":")
      , ", max_area_m2=", stringr::word(combo,2,sep = ":")
      , ", convexity_pct=", stringr::word(combo,3,sep = ":")
      , ", circle_fit_iou_pct=", stringr::word(combo,4,sep = ":")
    )
  ) 
# plot
combos_df_temp %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    dep_var = dplyr::case_when(
      metric == "f_score" ~ 1
      , metric == "recall" ~ 2
      , metric == "precision" ~ 3
    ) %>% 
    factor(
      ordered = T
      , levels = 1:3
      , labels = c(
        "F-score"
        , "Recall"
        , "Precision"
      )
    )
  ) %>% 
# plot
ggplot2::ggplot(
  mapping = ggplot2::aes(x = spectral_weight_fact,y = value,label = scales::percent(value,accuracy=0.1), group = combo, color = desc)
) +
  ggplot2::geom_line(key_glyph = "point") +
  ggplot2::geom_text(
    vjust = -0.25
    , show.legend = FALSE
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(dep_var)) +
  ggplot2::scale_color_viridis_d(begin = 0.2,end=0.8) +
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(
    x = "", y = "", color = "", fill = ""
    , caption = paste0(
      "*The spectral+structural group has the `spectral_weight` parameter set to '"
      , best_spectral_weight, "'"
    )
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , legend.direction = "vertical"
    , legend.justification = "left"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , plot.caption = ggplot2::element_text(size = 8)
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, size = 10, alpha = 1))
  )

table it

# table it
combos_df_temp %>% 
  dplyr::ungroup() %>% 
  dplyr::select(c(
    desc,spectral_weight_fact
    ,f_score, recall, precision
  )) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(f_score, recall, precision)
    , .fn = ~ scales::percent(.x, accuracy = 1)
  )) %>% 
  dplyr::arrange(desc,spectral_weight_fact) %>% 
  kableExtra::kbl(
    caption = paste0(
      "parameter combinations that achieved the best slash pile detection results"
    )
    , col.names = c(
      "."
      , "method"
      , "F-score", "Recall", "Precision"
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling(font_size = 12) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>% 
  kableExtra::add_header_above(c(" "=2, "Evaluation Metric" = 3)) %>% 
  kableExtra::footnote(
    symbol = paste0(
      "the spectral+structural group has the `spectral_weight` parameter set to '"
      , best_spectral_weight, "'"
    )
  )
Table 6.3: parameter combinations that achieved the best slash pile detection results
Evaluation Metric
. method F-score Recall Precision
Top structural only : max_ht_m=4, max_area_m2=90, convexity_pct=0.5, circle_fit_iou_pct=0.6 structural only 70% 80% 62%
structural+spectral 83% 80% 85%
Top structural+spectral : max_ht_m=4, max_area_m2=70, convexity_pct=0.4, circle_fit_iou_pct=0.5 structural only 65% 88% 51%
structural+spectral 83% 88% 79%
* the spectral+structural group has the spectral_weight parameter set to ‘4’

looking at the best-performing methods:

  • the top structural-only method achieved an F-score of 70%
  • when spectral data was included with spectral_weight of “4”, the F-score for this same structural setting increased to 83%
  • the absolute best-performing method using both structural and spectral data with spectral_weight of “4” achieved an F-score of 83%

this clearly demonstrates the value of integrating spectral information for more accurate slash pile identification.

6.5.4 Best results: form quantification

we can cut to the chase and just look at the parameter combinations that achieved the best results based on the form quantification

pct_rank_th_temp <- 0.998
df_temp <-
  param_combos_spectral_gt_agg %>% 
  dplyr::arrange(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
    , .fn = dplyr::percent_rank
    , .names = "pct_rank_{.col}"
  )) %>% 
  # now get the max of these pct ranks by row
  dplyr::rowwise() %>% 
  dplyr::mutate(
    pct_rank_max = max(
      dplyr::c_across(
        tidyselect::starts_with("pct_rank")
      )
      , na.rm = T
    )
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    rank_lab = dplyr::case_when(
      pct_rank_pct_diff_volume_field_mape<=(1-pct_rank_th_temp) ~ 1
      , pct_rank_pct_diff_area_field_mape<=(1-pct_rank_th_temp) ~ 2
      , pct_rank_pct_diff_height_mape<=(1-pct_rank_th_temp) ~ 3
    ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          paste0("top ", scales::percent(ceiling((1-pct_rank_th_temp)*100)/100, accuracy = 1), " MAPE Volume field")
          , paste0("top ", scales::percent(ceiling((1-pct_rank_th_temp)*100)/100, accuracy = 1), " MAPE Area field")
          , paste0("top ", scales::percent(ceiling((1-pct_rank_th_temp)*100)/100, accuracy = 1), " MAPE Height")
        )
      )
  ) %>% 
  dplyr::filter(
    !is.na(rank_lab) # pct_rank_max>=0.99
  ) %>%
  dplyr::mutate(
    lab = stringr::str_c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct,spectral_weight, sep = ":")
  ) 
df_temp %>% 
  tidyr::pivot_longer(
    cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "pct_diff_volume_field_mape" ~ 1
        , metric == "pct_diff_area_field_mape" ~ 2
        , metric == "pct_diff_height_mape" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "MAPE Volume field"
          , "MAPE Area field"
          , "MAPE Height"
        )
      )
    , lab = forcats::fct_reorder(lab, pct_rank_pct_diff_volume_field_mape) %>% forcats::fct_rev()
    , val_lab = scales::percent(value, accuracy = 0.1)
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, y = lab, fill = metric, label = val_lab)
  ) +
  ggplot2::geom_col(width = 0.6) +
  ggplot2::geom_text(color = "black", size = 2, hjust = -0.2) +
  ggplot2::scale_fill_manual(values = pal_get_pairs_fn(6)[1:5][c(1,3,5)]) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1)
    , limits = c(0,1.05)
    # , expand = expansion(mult = c(0, .08))
  ) +
  ggplot2::facet_grid(cols = dplyr::vars(metric), rows = dplyr::vars(rank_lab), scales = "free_y") +
  ggplot2::labs(
    x = "", y = "max_ht_m : max_area_m2 : convexity_pct : circle_fit_iou_pct : spectral_weight"
    , fill = ""
    , subtitle = "top (and bottom) parameter combinations by evaluation metrics"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text = ggplot2::element_text(size = 6)
  )

let’s make a table of these results

df_temp %>% 
  dplyr::select(
    rank_lab
    , max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct,spectral_weight
    , pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::arrange(rank_lab,pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
    , .fn = ~ scales::percent(.x, accuracy = 1)
  )) %>% 
  dplyr::mutate(blank= "   " ) %>% 
  dplyr::relocate(blank, .before = pct_diff_volume_field_mape) %>% 
  kableExtra::kbl(
    caption = "parameter combinations that achieved the best slash pile form quantification results"
    , col.names = c(
      "."
      ,"max_ht_m","max_area_m2","convexity_pct","circle_fit_iou_pct","spectral_weight"
      , "   "
      , "MAPE Volume field"
      , "MAPE Area field"
      , "MAPE Height"
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling(font_size = 10.5) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>% 
  kableExtra::add_header_above(c(" "=7, "Evaluation Metric" = 3))
Table 6.4: parameter combinations that achieved the best slash pile form quantification results
Evaluation Metric
. max_ht_m max_area_m2 convexity_pct circle_fit_iou_pct spectral_weight MAPE Volume field MAPE Area field MAPE Height
top 1% MAPE Volume field 5 50 0.4 0.8 5 22% 30% 17%
5 70 0.7 0.8 5 23% 27% 20%
4 120 0.8 0.8 5 24% 23% 15%
5 120 0.6 0.8 5 25% 27% 20%
5 70 0.7 0.8 1 26% 30% 23%
5 70 0.7 0.8 2 26% 30% 23%
5 70 0.7 0.8 3 26% 30% 23%
5 70 0.7 0.8 4 26% 30% 23%
5 70 0.7 0.8 0 26% 30% 23%
5 120 0.5 0.8 1 26% 28% 19%
5 120 0.5 0.8 2 26% 28% 19%
5 120 0.5 0.8 3 26% 28% 19%
5 120 0.5 0.8 4 26% 28% 19%
5 120 0.5 0.8 5 26% 28% 19%
5 120 0.5 0.8 0 26% 28% 19%
6 40 0.8 0.8 5 27% 23% 14%
top 1% MAPE Area field 5 70 0.5 0.8 1 42% 18% 18%
5 70 0.5 0.8 2 42% 18% 18%
5 70 0.5 0.8 3 42% 18% 18%
5 70 0.5 0.8 4 42% 18% 18%
5 70 0.5 0.8 0 42% 18% 18%
4 50 0.7 0.8 1 46% 18% 20%
4 50 0.7 0.8 2 46% 18% 20%
4 50 0.7 0.8 3 46% 18% 20%
4 50 0.7 0.8 4 46% 18% 20%
4 50 0.7 0.8 0 46% 18% 20%
4 50 0.6 0.8 1 51% 18% 19%
4 50 0.6 0.8 2 51% 18% 19%
4 50 0.6 0.8 3 51% 18% 19%
4 50 0.6 0.8 4 51% 18% 19%
4 50 0.6 0.8 0 51% 18% 19%
5 110 0.5 0.8 1 60% 17% 19%
5 110 0.5 0.8 2 60% 17% 19%
5 110 0.5 0.8 3 60% 17% 19%
5 110 0.5 0.8 4 60% 17% 19%
5 110 0.5 0.8 0 60% 17% 19%
top 1% MAPE Height 4 40 0.8 0.8 5 38% 24% 13%
6 130 0.7 0.8 1 41% 22% 13%
6 130 0.7 0.8 2 41% 22% 13%
6 130 0.7 0.8 3 41% 22% 13%
6 130 0.7 0.8 4 41% 22% 13%
6 130 0.7 0.8 0 41% 22% 13%
4 40 0.8 0.8 1 44% 22% 12%
4 40 0.8 0.8 2 44% 22% 12%
4 40 0.8 0.8 3 44% 22% 12%
4 40 0.8 0.8 4 44% 22% 12%
4 40 0.8 0.8 0 44% 22% 12%
4 80 0.6 0.8 1 46% 24% 13%
4 80 0.6 0.8 2 46% 24% 13%
4 80 0.6 0.8 3 46% 24% 13%
4 80 0.6 0.8 4 46% 24% 13%
4 80 0.6 0.8 5 46% 24% 13%
4 80 0.6 0.8 0 46% 24% 13%

let’s use the value of spectral_weight that most frequently results in the best pile detection based on F-score (“4”) to compare results to not using any spectral data (i.e. just using structural data as outlined in our method here). We’ll then average across all other factors to look at the main effect of spectral_weight after creating a dichotomous factor of structural only versus structural+spectral

# quick mod
dep_vars_temp <- c("pct_diff_volume_field_mape", "pct_diff_area_field_mape", "pct_diff_height_mape")
# list of formulas
formulas_temp <- dep_vars_temp %>% 
  purrr::map(~stats::reformulate(c("spectral_weight_fact","-1"), response = .x))
names(formulas_temp) <- dep_vars_temp
# formulas_temp
models_temp <- formulas_temp %>% 
  purrr::map(\(x) lm(formula = x, data = combos_df_temp))
names(models_temp) <- dep_vars_temp
# models_temp
# combine into df
pred_df_temp <- purrr::map_df(models_temp, broom::tidy, .id = "metric") %>% 
  dplyr::select(-c(statistic)) %>% 
  dplyr::mutate(
    dep_var = dplyr::case_when(
        metric == "pct_diff_volume_field_mape" ~ 1
        , metric == "pct_diff_area_field_mape" ~ 2
        , metric == "pct_diff_height_mape" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "MAPE Volume field"
          , "MAPE Area field"
          , "MAPE Height"
        )
      )
    , term = stringr::str_remove_all(term, "spectral_weight_fact") %>% factor()
    # to obtain the 95% confidence interval...
      # ...1.96 times the standard error is added and subtracted from the sample mean
    , ul = estimate+std.error*1.96
    , ll = estimate-std.error*1.96
  ) %>%
  dplyr::rename(spectral_weight_fact=term) %>% 
  dplyr::relocate(dep_var) %>% 
  dplyr::arrange(dep_var,spectral_weight_fact)

# pred_df_temp
# table it
pred_df_temp %>% 
  dplyr::select(-c(metric,ul,ll)) %>% 
  dplyr::mutate(
    estimate = scales::percent(estimate, accuracy = 0.1)
    , std.error = scales::percent(std.error, accuracy = 0.01)
    , p.value = ifelse(
        p.value < 0.001
        , "< 0.001"
        , scales::comma(p.value, accuracy = 0.0001)
      )
  ) %>% 
  kableExtra::kbl(
    caption = "Predictions for performance metrics by detection method<br>averaging across all other structural detection parameters"
    , col.names = c(".","method","predicted mean", "SE", "p-value")
  ) %>%
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top")
Table 6.5: Predictions for performance metrics by detection method
averaging across all other structural detection parameters
. method predicted mean SE p-value
MAPE Volume field structural only 39.7% 1.10% < 0.001
structural+spectral 39.7% 1.10% < 0.001
MAPE Area field structural only 28.7% 0.12% < 0.001
structural+spectral 28.7% 0.12% < 0.001
MAPE Height structural only 20.2% 0.44% < 0.001
structural+spectral 20.2% 0.44% < 0.001

plot the change from including spectral data to the pile detection methodology

# plot
ggplot2::ggplot(
  data = pred_df_temp
  , mapping = ggplot2::aes(x = spectral_weight_fact)
) +
  ggplot2::geom_line(
    data = combos_df_temp %>% 
      dplyr::slice_sample(prop = 0.6) %>% 
      tidyr::pivot_longer(
        cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
        , names_to = "metric"
        , values_to = "value"
      ) %>% 
      dplyr::mutate(
        dep_var = dplyr::case_when(
          metric == "pct_diff_volume_field_mape" ~ 1
          , metric == "pct_diff_area_field_mape" ~ 2
          , metric == "pct_diff_height_mape" ~ 3
        ) %>% 
        factor(
          ordered = T
          , levels = 1:3
          , labels = c(
            "MAPE Volume field"
            , "MAPE Area field"
            , "MAPE Height"
          )
        )
      )
    , mapping = ggplot2::aes(y = value, group = combo, color = metric)
    , lwd = 0.6, alpha = 0.1
    # , color = pal_eval_metric[1]
  ) +
  geom_ribbon(
    mapping = ggplot2::aes(ymin = ll, ymax = ul, group = 1)
    , fill = "black", alpha = 0.2
  ) +
  ggplot2::geom_line(
    mapping = ggplot2::aes(y = estimate, group = 1)
    , lwd = 1, color = "black" # pal_eval_metric[1]
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(y=estimate, label = scales::percent(estimate,accuracy=0.1), group = 1)
    , vjust = -0.25
  ) +
  ggplot2::facet_wrap(facets = dplyr::vars(dep_var)) +
  ggplot2::scale_color_manual(values = pal_get_pairs_fn(6)[1:5][c(1,3,5)]) +
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(x = "", y = "", color = "", fill = "") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    # , panel.grid.major.x = ggplot2::element_line(color = "gray33")
  )

the spectral data does not alter the quantification of slash pile form. this is because spectral information is used solely to filter candidate piles, meaning it neither reshapes existing ones nor introduces new detections.