Section 7 Sensitivity Testing Method

In this section we introduce a slash pile detection workflow to perform parameter sensitivity testing to obtain a set of point estimates for both the detection and quantification accuracy metrics of the method. In the next section, we’ll use this workflow with different resolution CHM data as input to generate the data used for analysis to evaluate the method performance.

Parameter sensitivity testing is a systematic process for evaluating how changes to the specific thresholds and settings within the detection methodology impact the final results. Since our method is rules-based and does not use training data, its performance is highly dependent on these manually defined parameters. The objective of this testing is to understand the robustness of the methodology and identify the optimal combination of settings that yields the best detection performance, balancing factors like detection rate (recall), accuracy of positive predictions (precision), and form quantification (e.g., height or diameter MAPE and RMSE).

The primary objective of the sensitivity testing is to obtain a set of point estimates for both the detection and quantification accuracy metrics, which will then serve as the input dataset for subsequent statistical modeling to quantify the influence of parameters and input data on accuracy. Sensitivity testing is performed by repeatedly executing the entire rules-based detection method while systematically varying the values of one or more of its parameters (such as minimum area or maximum height) to generate a range of empirical results. The results of the sensitivity tests are individual point estimates of detection accuracy (F-score) and quantification accuracy (e.g. RMSE and MAPE) for each parameter combination. These sensitivity test results are distinct from the statistical modeling, which does not calculate accuracy itself but instead uses these point estimates as dependent variables to statistically model the functional relationship between the tested detection parameters and the resulting accuracy. This modeling approach will provide insight into the potentially complex interactions between the input data and parameter settings, while allowing us to generalize which parameter combinations optimize the methodology’s performance and to probabilistically quantify parameter influence while accounting for uncertainty.

here are the general steps for sensitivity testing and statistical modeling that we’ll follow:

Define Parameter Ranges and Increments: For each parameter in the rules-based method, determine a reasonable range of values to test and the step size for incrementing through that range. For example, if a threshold is currently 0.5, you might test from 0.3 to 0.7 in increments of 0.05.
Automate the Detection Workflow: Create a script or automated process that can run the entire rules-based slash pile detection method using different combinations of these parameter values.
Execute Tests: Run the automated workflow for each defined parameter combination.
Collect Performance Metrics: For every run, calculate the key performance metrics against the ground truth data (e.g., recall, precision, F-score, and height/diameter MAPE). These calculated metrics will be used as the point estimates for our statistical models.
Statistical Modeling: Models will be built with the detection and quantification accuracy metrics as the dependent variables. These models will provide insight into the complex relationships between different input data and parameter settings. They will be used to help visualize trends, identify sweet spots where performance is maximized, and understand trade-offs. For example, increasing recall might decrease precision, or one parameter change might be particularly impactful on form quantification accuracy.
Select Optimal Parameters: Based on the statistical analysis and the specific goals of the project (e.g., maximizing detection accuracy versus maximizing overall accuracy F-score or maximizing form quantification accuracy), select the parameter set that provides the most desirable performance balance.

7.1 Workflow over parameter combinations

let’s define parameter ranges and increments for our sensitivity testing and store them in a data frame

param_combos_df <-
  tidyr::crossing(
    max_ht_m = seq(from = 2, to =  5, by = 1) # set the max expected pile height (could also do a minimum??)
    , min_area_m2 = c(2) # fixed in workflow defined below so only use one value
    , max_area_m2 = seq(from = 10, to = 60, by = 10) # set the max expected pile area
    , convexity_pct = seq(from = 0.1, to =  0.9, by = 0.2) # min required overlap between the predicted pile and the convex hull of the predicted pile
    , circle_fit_iou_pct = seq(from = 0.1, to = 0.9, by = 0.2) 
  ) %>% 
  dplyr::mutate(rn = dplyr::row_number()) %>% 
  dplyr::relocate(rn)
# huh?
param_combos_df %>% dplyr::glimpse()

## Rows: 600
## Columns: 6
## $ rn                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ max_ht_m           <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ min_area_m2        <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ max_area_m2        <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ convexity_pct      <dbl> 0.1, 0.1, 0.1, 0.1, 0.1, 0.3, 0.3, 0.3, 0.3, 0.3, 0…
## $ circle_fit_iou_pct <dbl> 0.1, 0.3, 0.5, 0.7, 0.9, 0.1, 0.3, 0.5, 0.7, 0.9, 0…

to automate the detection workflow we can simply map these combinations over our slash_pile_detect_watershed() function. however, that would result in us running the same watershed segmentation process repeatedly with the same settings. as such, let’s make a function to efficiently perform the detection workflow using the same watershed segmentation and circle fitting for use with the different filtering combinations.

#########################################################################
# 1)
# function to apply watershed seg over a list of max_ht_m
# function to apply watershed segmentation over a list of different maximum height threshold (`max_ht_m`) which determines the "slice" of the CHM to use
#########################################################################
chm_watershed_seg_fn <- function(chm_rast,max_ht_m) {
  # get unique hts
  max_ht_m <- unique(as.numeric(max_ht_m))
  max_ht_m <- max_ht_m[!is.na(max_ht_m)]
  if(
    dplyr::coalesce(length(max_ht_m),0)<1
  ){stop("could not detect `max_ht_m` parameter setting which should be numeric list")}
  # checks
  if(!inherits(chm_rast,"SpatRaster")){stop("`chm_rast` must be raster data with the class `SpatRaster` ")}
  # just get the first layer
  chm_rast <- chm_rast %>% terra::subset(subset = 1)
  # # first, calculate the area of each cell
  # area_rast <- terra::cellSize(chm_rast)
  # names(area_rast) <- "area_m2"
  
  # map over the max_ht_m to get the raster slice
  chm_ret_rast <- max_ht_m %>% 
    purrr::map(\(x)
      terra::clamp(chm_rast, upper = x, lower = 0, values = F)
    ) %>% 
    terra::rast()
  # name
  names(chm_ret_rast) <- as.character(max_ht_m)
  
  # chm_ret_rast
  # chm_ret_rast %>% terra::subset(1) %>% terra::plot()
  # chm_ret_rast %>% terra::subset(2) %>% terra::plot()
  # chm_rast[[1]]
  
  # # map over the volume
  # # then, multiply area by the CHM (elevation) for each cell to get a raster with cell volumes
  # vol_ret_rast <- 
  #   1:terra::nlyr(chm_ret_rast) %>% 
  #   purrr::map(function(x){
  #     vol_rast <- area_rast*chm_ret_rast[[x]]
  #     names(vol_rast) <- "volume_m3"
  #     return(vol_rast)
  #   }) %>% 
  #   terra::rast()
  # # name
  # names(vol_ret_rast) <- as.character(max_ht_m)
  
  # map over the watershed
  # let's run watershed segmentation using `lidR::watershed()` which is based on the bioconductor package `EBIimage`
  # return is a raster with the first layer representing the identified watershed segments
  ret_rast <- 
    1:terra::nlyr(chm_ret_rast) %>% 
    purrr::map(\(x)
      lidR::watershed(
        chm = chm_ret_rast[[x]]
        , th_tree = 0.1
      )()
    ) %>% 
    terra::rast()
  # name
  names(ret_rast) <- as.character(max_ht_m)
  
  # ret_rast
  # ret_rast %>% terra::subset(1) %>% terra::as.factor() %>% terra::plot()
  # ret_rast %>% terra::subset(2) %>% terra::as.factor() %>% terra::plot()
  
  return(list(
    chm_rast = chm_ret_rast # length = length(max_ht_m)
    # , area_rast = area_rast # length = 1
    # , volume_rast = vol_ret_rast # length = length(max_ht_m)
    , watershed_rast = ret_rast # length = length(max_ht_m)
  ))
}
# xxx <- chm_watershed_seg_fn(
#   cloud2raster_ans$chm_rast %>%
#     terra::crop(
#       stand_boundary %>%
#         dplyr::slice(1) %>%
#         sf::st_buffer(-80) %>%
#         sf::st_transform(terra::crs(cloud2raster_ans$chm_rast)) %>%
#         terra::vect()
#     )
#   ,c(4,5)
# )

# xxx$chm_rast %>% terra::subset( as.character(unique(c(4,5))[1]) ) %>% terra::plot()
# xxx$watershed_rast %>% terra::subset( as.character(unique(c(4,5))[1]) ) %>% terra::as.factor() %>% terra::plot()
# # xxx$volume_rast %>% terra::subset( as.character(unique(c(4,5))[1]) ) %>% terra::plot()
# # xxx$area_rast[[1]] %>% terra::minmax()
# # xxx$area_rast %>% terra::plot()

#########################################################################
# 2)
# now we need to define a function to apply the filters to the resulting watershed segmented rasters 
# based on a data frame of difference combinations of filters for irregularity using the comparison to the convex hull
#########################################################################
param_combos_detect_convexity_fn <- function(
  chm_watershed_seg_fn_ans
  #### height and area thresholds for the detected piles
  # these should be based on data from the literature or expectations based on the prescription
  , max_ht_m = 4 # set the max expected pile height (could also do a minimum??)
  #### irregularity filtering
  # 1 = perfectly convex (no inward angles); 0 = so many inward angles
  # values closer to 1 remove more irregular segments; 
    # values closer to 0 keep more irregular segments (and also regular segments)
  # these will all be further filtered for their circularity and later smoothed to remove blocky edges
  # and most inward angles by applying a convex hull to the original detected segment
  , convexity_pct = 0.7 # min required overlap between the predicted pile and the convex hull of the predicted pile
) {
  # extract individual elements from chm_watershed_seg_fn
  chm_rast <- chm_watershed_seg_fn_ans$chm_rast %>% terra::subset( as.character(unique(max_ht_m)[1]) )
  watershed_ans <- chm_watershed_seg_fn_ans$watershed_rast %>% terra::subset( as.character(unique(max_ht_m)[1]) )
  names(watershed_ans) <- "pred_id"
  # checks
  if(!inherits(chm_rast,"SpatRaster")){stop("`chm_rast` must be raster data with the class `SpatRaster` ")}
  if(!inherits(watershed_ans,"SpatRaster")){stop("`watershed_ans` must be raster data with the class `SpatRaster` ")}
  
  ########################################################################################
  ## convert to polys and area filters
  ########################################################################################
    # let's convert the watershed-detected segments from raster to vector data 
    # and create a convex hull of the shapes for comparison
    # vectors of segments
    watershed_ans_poly <-
      watershed_ans %>% 
      terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>% 
      sf::st_as_sf() %>% 
      sf::st_simplify() %>% 
      sf::st_make_valid() %>% 
      dplyr::filter(sf::st_is_valid(.)) %>% 
      dplyr::mutate(treeID=pred_id) %>% 
      cloud2trees::simplify_multipolygon_crowns() %>% 
      dplyr::select(-treeID)
    
    if(dplyr::coalesce(nrow(watershed_ans_poly),0)<1){
      return(NULL)
    }
    
  ########################################################################################
  ## 2) irregularity filtering
  ########################################################################################
    # let's first filter out segments that have holes in them 
    # or are very irregularly shaped by comparing the area of the polygon and convex hull
    # convexity_pct = min required overlap between the predicted pile and the convex hull of the predicted pile
    if(convexity_pct>0){
      # apply the irregularity filtering on the polygons
      watershed_ans_poly <- watershed_ans_poly %>% 
        st_irregular_remove(pct_chull_overlap = convexity_pct)
    }
    
    if(dplyr::coalesce(nrow(watershed_ans_poly),0)==0){
      return(NULL)
    }else{
      return(watershed_ans_poly %>% dplyr::mutate(max_ht_m = max_ht_m, convexity_pct = convexity_pct))
    }
}

# # just get the unique combinations needed for this param_combos_detect_convexity_fn function
# param_combos_convexity_df <-
#   param_combos_df %>% 
#   dplyr::filter(max_ht_m %in% as.numeric(names(xxx$chm_rast))) %>% ## !!!!!!!!!!take this out
#   dplyr::distinct(max_ht_m,convexity_pct)
# # apply this using our data frame to map over the combinations
# ### takes ~40 mins
# param_combos_detect_convexity_ans <- 
#   1:nrow(param_combos_convexity_df) %>% 
#   sample(2) %>% 
#   purrr::map(\(x)
#     param_combos_detect_convexity_fn(
#         chm_watershed_seg_fn_ans = xxx  ## !!!!!!!!!!change this
#         , max_ht_m = param_combos_convexity_df$max_ht_m[x]
#         , convexity_pct = param_combos_convexity_df$convexity_pct[x]
#       )
#     , .progress = T
#   ) %>% 
#   dplyr::bind_rows()
# # param_combos_detect_convexity_ans %>% dplyr::glimpse()
# # param_combos_detect_convexity_ans %>% sf::st_drop_geometry() %>% dplyr::count(max_ht_m,convexity_pct)
# # param_combos_detect_convexity_ans %>% 
# #   dplyr::mutate(area_xxxx = sf::st_area(.) %>% as.numeric()) %>% 
# #   sf::st_drop_geometry() %>% 
# #   dplyr::group_by(max_ht_m,convexity_pct) %>% 
# #   dplyr::summarise(dplyr::across(area_xxxx,  list(min = min, max = max), .names = "{.col}.{.fn}"))
# 
# # now just join to filter for area
# # expands to row unique by max_ht_m,convexity_pct,min_area_m2,max_area_m2,pred_id
# param_combos_area <- 
#   param_combos_detect_convexity_ans %>% 
#   dplyr::mutate(area_xxxx = sf::st_area(.) %>% as.numeric()) %>% 
#   dplyr::inner_join(
#     # add area ranges
#     param_combos_df %>% 
#       dplyr::distinct(max_ht_m,convexity_pct,min_area_m2,max_area_m2)
#     , by = dplyr::join_by(
#       max_ht_m, convexity_pct
#       , area_xxxx>=min_area_m2
#       , area_xxxx<=max_area_m2
#     )
#     , relationship = "many-to-many"
#   )
#   # dplyr::filter(
#   #   area_xxxx>=min_area_m2
#   #   , area_xxxx<=max_area_m2
#   # )
# param_combos_area %>% dplyr::glimpse()
# param_combos_area %>% 
#   sf::st_drop_geometry() %>% 
#   dplyr::group_by(max_ht_m,convexity_pct,min_area_m2,max_area_m2) %>% 
#   dplyr::summarise(dplyr::across(area_xxxx,  list(min = min, max = max), .names = "{.col}.{.fn}"))
# 
# # geometries are unique by max_ht_m,pred_id
# param_combos_area %>% 
#   dplyr::filter(
#     pred_id == param_combos_area$pred_id[111]
#     , max_ht_m == param_combos_area$max_ht_m[111]
#   ) %>% 
#   dplyr::mutate(lab = stringr::str_c(max_ht_m,convexity_pct,min_area_m2,max_area_m2, sep = ":")) %>% 
#   ggplot() + geom_sf(aes(linetype = lab, color = lab),fill=NA) + facet_wrap(facets = dplyr::vars(lab)) + theme_void()
# param_combos_area %>% 
#   dplyr::group_by(max_ht_m,pred_id) %>%
#   dplyr::filter(dplyr::row_number()==1) %>%
#   dplyr::select(max_ht_m,pred_id) %>% 
#   dplyr::ungroup() %>% 
#   # dplyr::glimpse()
#   ggplot() + geom_sf(aes(color = factor(max_ht_m)),fill=NA) + facet_wrap(facets = dplyr::vars(max_ht_m))

# unique_geometries_cf <- sf_data_circle_fit(
#   param_combos_area %>% 
#   dplyr::group_by(max_ht_m,pred_id) %>%
#   dplyr::filter(dplyr::row_number()==1) %>%
#   dplyr::select(max_ht_m,pred_id) %>%
#   dplyr::ungroup()
# )
# unique_geometries_cf %>% dplyr::glimpse()

# function to smooth
raster_smooth_smoother <- function(
  chm_watershed_seg_fn_ans
  , max_ht_m = 4
  , watershed_ans_poly
  , min_area_m2 = 2
) {
  min_ht_m <- 0.5 ## !!!!!!!!!!!!!!!! fixed for testing
  # extract individual elements from chm_watershed_seg_fn
  chm_rast <- chm_watershed_seg_fn_ans$chm_rast %>% terra::subset( as.character(unique(max_ht_m)[1]) )
  watershed_ans <- chm_watershed_seg_fn_ans$watershed_rast %>% terra::subset( as.character(unique(max_ht_m)[1]) )
  names(watershed_ans) <- "pred_id"
  # checks
  if(!inherits(chm_rast,"SpatRaster")){stop("`chm_rast` must be raster data with the class `SpatRaster` ")}
  if(!inherits(watershed_ans,"SpatRaster")){stop("`watershed_ans` must be raster data with the class `SpatRaster` ")}
  
  chm_res <- max(terra::res(chm_rast)[1:2],na.rm = T)
  
  ws_for_smooth <- ws_for_smooth_fn(chm_res = chm_res, min_area_m2 = min_area_m2) # 3 # needs to be the same for the watershed seg and CHM smooth
  # search_area = (res^2) * (ws^2)
  
  ########################################################################################
  ## 5) raster smoothing
  ########################################################################################
    ########################################
    # use the remaining segments that meet the geometric and area filtering
    # to smooth the watershed raster
    ########################################
    smooth_watershed_ans <- watershed_ans %>% 
      terra::mask(
        watershed_ans_poly %>% #these are irregularity and area filtered already
          terra::vect()
        , updatevalue=NA
      )
  
    if(dplyr::coalesce(ws_for_smooth,0)>=3){
      # smooths the raster using the majority value
      smooth_watershed_ans <- smooth_watershed_ans %>% 
        terra::focal(w = ws_for_smooth, fun = "modal", na.rm = T, na.policy = "only") # only fill NA cells  
    }

    names(smooth_watershed_ans) <- "pred_id"

    ########################################
    # mask the chm rast to these remaining segments and smooth to match the smoothing for the segments
    ########################################
    smooth_chm_rast <- chm_rast %>% 
      terra::mask(smooth_watershed_ans)
    
    if(dplyr::coalesce(ws_for_smooth,0)>=3){
      # smooths the raster to match the smoothing in the watershed segments
      smooth_chm_rast <- smooth_chm_rast %>% 
        terra::focal(w = ws_for_smooth, fun = "mean", na.rm = T, na.policy = "only") #only for cells that are NA
    }

    # now mask the watershed_ans raster to only keep cells that are in the originating CHM
    smooth_watershed_ans <- smooth_watershed_ans %>% 
      terra::mask(smooth_chm_rast)

  ########################################################################################
  ## calculate raster-based area and volume 
  ########################################################################################
    # first, calculate the area of each cell
    area_rast <- terra::cellSize(smooth_chm_rast)
    names(area_rast) <- "area_m2"
    # area_rast %>% terra::plot()
    # then, multiply area by the CHM (elevation) for each cell to get a raster with cell volumes
    vol_rast <- area_rast*smooth_chm_rast
    names(vol_rast) <- "volume_m3"
    # vol_rast %>% terra::plot()
    # sum area within each segment to get the total area
    area_df <- terra::zonal(x = area_rast, z = smooth_watershed_ans, fun = "sum", na.rm = T)
    # sum volume within each segment to get the total volume
    vol_df <- terra::zonal(x = vol_rast, z = smooth_watershed_ans, fun = "sum", na.rm = T)
    # max ht within each segment to get the max ht
    ht_df <- terra::zonal(x = smooth_chm_rast, z = smooth_watershed_ans, fun = "max", na.rm = T) %>% 
      dplyr::rename(max_height_m=2)
      
    # let's convert the smoothed and filtered watershed-detected segments from raster to vector data 
    # vectors of segments
    watershed_ans_poly <-
      smooth_watershed_ans %>% 
      terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>% 
      sf::st_as_sf() %>% 
      sf::st_simplify() %>% 
      sf::st_make_valid() %>% 
      dplyr::filter(sf::st_is_valid(.)) %>% 
      dplyr::mutate(treeID=pred_id) %>% 
      cloud2trees::simplify_multipolygon_crowns() %>% 
      dplyr::select(-treeID)

    # add area and volume to our vector data  
    # we'll do this with a slick trick to perform multiple joins succinctly using purrr::reduce
    watershed_ans_poly <- 
      purrr::reduce(
        list(watershed_ans_poly, area_df, vol_df, ht_df)
        , dplyr::left_join
        , by = 'pred_id'
      ) %>% 
      dplyr::mutate(
        volume_per_area = volume_m3/area_m2
      ) %>% 
      dplyr::filter(
        dplyr::coalesce(max_height_m,0) >= min_ht_m
      )
      # # filter out the segments that don't meet the size thresholds
      # dplyr::filter(
      #   dplyr::coalesce(area_m2,0) >= min_area_m2
      #   & dplyr::coalesce(area_m2,0) <= max_area_m2
      # ) %>% 
      # # do one more pass of the irregularity filtering
      # st_irregular_remove(pct_chull_overlap = convexity_pct)
      
      if(dplyr::coalesce(nrow(watershed_ans_poly),0)==0){
        return(NULL)
      }else{
        return(watershed_ans_poly)
      }
   
}

# function to do the whole thing
param_combos_piles_detect_fn <- function(
  chm
  , param_combos_df
  , smooth_segs = T
  , ofile = "../data/param_combos_piles.gpkg"
) {
  ########################################################################################
  ## 1) chm slice and watershed segmentation
  ########################################################################################
  # apply chm_watershed_seg_fn
  ### takes ~37 mins
  chm_watershed_seg_ans <- chm_watershed_seg_fn(chm, max_ht_m = unique(param_combos_df$max_ht_m))
  # # what did we get?
  # chm_watershed_seg_ans %>% 
  #   terra::subset( as.character(unique(param_combos_df$max_ht_m)[3]) ) %>% 
  #   terra::plot()
  
  ########################################################################################
  ## 2) irregularity filter
  ########################################################################################
  # just get the unique combinations needed for this param_combos_detect_convexity_fn function
    param_combos_convexity_df <-
      param_combos_df %>% 
      dplyr::distinct(max_ht_m,convexity_pct)
    
    # apply this using our data frame to map over the combinations
    ### takes ~40 mins
    param_combos_detect_convexity_ans <- 
      1:nrow(param_combos_convexity_df) %>% 
      purrr::map(\(x)
        param_combos_detect_convexity_fn(
            chm_watershed_seg_fn_ans = chm_watershed_seg_ans
            , max_ht_m = param_combos_convexity_df$max_ht_m[x]
            , convexity_pct = param_combos_convexity_df$convexity_pct[x]
          )
        , .progress = "convexity filtering"
      ) %>% 
      dplyr::bind_rows()
    # param_combos_detect_convexity_ans %>% dplyr::glimpse()
    # param_combos_detect_convexity_ans %>% sf::st_drop_geometry() %>% dplyr::count(max_ht_m,convexity_pct)
    # param_combos_detect_convexity_ans %>% 
    #   dplyr::mutate(area_xxxx = sf::st_area(.) %>% as.numeric()) %>% 
    #   sf::st_drop_geometry() %>% 
    #   dplyr::group_by(max_ht_m,convexity_pct) %>% 
    #   dplyr::summarise(dplyr::across(area_xxxx,  list(min = min, max = max), .names = "{.col}.{.fn}"))
  
  ########################################################################################
  ## 3) area filter
  ########################################################################################
    # now just join to filter for area
    # expands to row unique by max_ht_m,convexity_pct,min_area_m2,max_area_m2,pred_id
    param_combos_area <- 
      param_combos_detect_convexity_ans %>% 
      dplyr::mutate(area_xxxx = sf::st_area(.) %>% as.numeric()) %>% 
      dplyr::inner_join(
        # add area ranges
        param_combos_df %>% 
          dplyr::distinct(max_ht_m,convexity_pct,min_area_m2,max_area_m2)
        , by = dplyr::join_by(
          max_ht_m, convexity_pct
          , area_xxxx>=min_area_m2
          , area_xxxx<=max_area_m2
        )
        , relationship = "many-to-many"
      ) %>% 
      dplyr::select(-area_xxxx)
      # dplyr::filter(
      #   area_xxxx>=min_area_m2
      #   , area_xxxx<=max_area_m2
      # )
    
    # param_combos_area %>% dplyr::glimpse()
    # param_combos_area %>% 
    #   sf::st_drop_geometry() %>% 
    #   dplyr::group_by(max_ht_m,convexity_pct,min_area_m2,max_area_m2) %>% 
    #   dplyr::summarise(dplyr::across(area_xxxx,  list(min = min, max = max), .names = "{.col}.{.fn}"))
    
    # !!!!!!!!!!!!!!!!!!!!!!geometries are unique by max_ht_m,pred_id!!!!!!!!!!!!!!!!!!!!!!
    # param_combos_area %>% 
    #   dplyr::filter(
    #     pred_id == param_combos_area$pred_id[111]
    #     , max_ht_m == param_combos_area$max_ht_m[111]
    #   ) %>% 
    #   dplyr::mutate(lab = stringr::str_c(max_ht_m,convexity_pct,min_area_m2,max_area_m2, sep = ":")) %>% 
    #   ggplot() + geom_sf(aes(linetype = lab, color = lab),fill=NA) + facet_wrap(facets = dplyr::vars(lab)) + theme_void()
    # param_combos_area %>% 
    #   dplyr::group_by(max_ht_m,pred_id) %>%
    #   dplyr::filter(dplyr::row_number()==1) %>%
    #   dplyr::select(max_ht_m,pred_id) %>% 
    #   dplyr::ungroup() %>% 
    #   # dplyr::glimpse()
    #   ggplot() + geom_sf(aes(color = factor(max_ht_m)),fill=NA) + facet_wrap(facets = dplyr::vars(max_ht_m))
    
  ########################################################################################
  ## 4) circle filter
  ########################################################################################
    # !!!!!!!!!!!!!!!!!!!!!!geometries are unique by max_ht_m,pred_id!!!!!!!!!!!!!!!!!!!!!!
    unique_geometries <- param_combos_area %>% 
      dplyr::group_by(max_ht_m,pred_id) %>%
      dplyr::filter(dplyr::row_number()==1) %>%
      dplyr::ungroup() %>% 
      dplyr::select(max_ht_m,pred_id) %>% 
      dplyr::mutate(record_id = dplyr::row_number())
  
  # param_combos_detect_convexity_ans %>% dplyr::glimpse()
  # and we'll apply the circle fitting to this spatial data
  # apply the sf_data_circle_fit() which takes each segment polygon, transforms it to points, and the fits the best circle
  unique_geometries_cf <- sf_data_circle_fit(unique_geometries)
  # param_combos_detect_cf_ans %>% sf::st_write("../data/param_combos_detect_cf_ans.gpkg")
  # param_combos_detect_cf_ans %>% dplyr::glimpse()
  # nrow(param_combos_detect_cf_ans)
  # nrow(param_combos_detect_convexity_ans)
  
  # we'll use the IoU function we defined 
    # we map over this to only compare the segment to it's own best circle fit...not all
    # we should consider doing this in bulk.....another day
    param_combos_detect_cf_iou <- 
      unique_geometries$record_id %>% 
      purrr::map(\(x)
        ground_truth_single_match(
          gt_inst = unique_geometries %>% 
            dplyr::filter(record_id == x)
          , gt_id = "record_id"
          , predictions = unique_geometries_cf %>% 
            dplyr::filter(record_id == x) %>% 
            dplyr::select(record_id) %>% # keeping other columns causes error?
            dplyr::rename(circ_record_id = record_id)
          , pred_id = "circ_record_id"
          , min_iou_pct = 0 # set to 0 just to return pct
        )    
      ) %>% 
      dplyr::bind_rows()
    
    # param_combos_detect_cf_iou %>% dplyr::glimpse()
  
  ### now we need a function to map over all the different filters for circularity by combination of the other settings
    ### this is going to be highly custom to this particular task :\
  param_combos_circle_fit <- 
    param_combos_area %>% 
    # expands row to unique by max_ht_m,min_area_m2,max_area_m2,convexity_pct,circle_fit_iou_pct,pred_id
    dplyr::inner_join(
      param_combos_df
      , by = dplyr::join_by(max_ht_m,min_area_m2,max_area_m2,convexity_pct)
      , relationship = "many-to-many"
    ) %>% 
    # join unique geoms
    dplyr::inner_join(
      unique_geometries %>% 
        sf::st_drop_geometry() %>% 
        dplyr::select(max_ht_m,pred_id,record_id)
      , by = dplyr::join_by(max_ht_m,pred_id)
    ) %>% 
    # join circle fits
    dplyr::left_join(
      param_combos_detect_cf_iou %>% dplyr::select(record_id,iou)
      , by = "record_id"
    ) %>% 
    dplyr::filter(dplyr::coalesce(iou,0)>=circle_fit_iou_pct)
  ############################################################
  # get unique sets of pred_id by height so we only need to smooth the raster for these sets
  ############################################################
    rn_geom_lookup <- param_combos_circle_fit %>% 
      sf::st_drop_geometry() %>% 
      dplyr::arrange(rn,max_ht_m,pred_id) %>% 
      dplyr::group_by(rn,max_ht_m) %>% 
      dplyr::summarise(preds = paste(sort(pred_id), collapse = "_"),n=dplyr::n()) %>% 
      dplyr::ungroup() %>% 
      dplyr::arrange(max_ht_m,desc(n))

    # now just get the unique sets by max_ht_m
    rn_geom_lookup <- rn_geom_lookup %>% 
      dplyr::inner_join(
        rn_geom_lookup %>% 
          dplyr::distinct(max_ht_m,preds) %>% 
          dplyr::mutate(set_id = dplyr::row_number())
        , by = dplyr::join_by(max_ht_m,preds)
      ) %>% 
      dplyr::select(set_id,rn,max_ht_m)
    # rn_geom_lookup %>% dplyr::glimpse()
    # now just get the unique max_ht_m and actual geoms
    geom_sets <-
      param_combos_circle_fit %>% 
        dplyr::select(rn,max_ht_m,pred_id) %>% 
        dplyr::inner_join(
          rn_geom_lookup %>% 
            dplyr::group_by(set_id) %>% 
            dplyr::filter(dplyr::row_number()==1) %>% 
            dplyr::ungroup() %>% 
            dplyr::select(rn,set_id)
          , by = dplyr::join_by(rn)
        )
        # dplyr::relocate(set_id) %>% 
        # dplyr::arrange(set_id,pred_id) %>% 
        # dplyr::glimpse()
    ######## 
  
    ########################################################################################
    ## 6) raster smooth
    ########################################################################################
      geom_sets_smooth <- 
        geom_sets$set_id %>% 
        unique() %>% 
        purrr::map(\(x)
          raster_smooth_smoother(
              chm_watershed_seg_fn_ans = chm_watershed_seg_ans
              , max_ht_m = (geom_sets %>% sf::st_drop_geometry() %>% dplyr::filter(set_id==x) %>% dplyr::slice(1) %>% dplyr::pull(max_ht_m))
              , watershed_ans_poly = geom_sets %>% dplyr::filter(set_id==x)
              , min_area_m2 = min(param_combos_df$min_area_m2,na.rm=T) ##### this fixes the parameter 
                # .......... so won't work if param_combos_df contains multiple values #ohwell...just don't test min_area_m2
            ) %>% 
            dplyr::mutate(
              set_id = x
              , max_ht_m = (geom_sets %>% sf::st_drop_geometry() %>% dplyr::filter(set_id==x) %>% dplyr::slice(1) %>% dplyr::pull(max_ht_m))
            )
          , .progress = "smooth smoothing it"
        ) %>% 
        dplyr::bind_rows()
    
    # geom_sets_smooth %>% dplyr::glimpse()  
    # ggplot() +
    #   geom_sf(
    #     data = geom_sets_smooth %>% dplyr::filter(set_id == geom_sets_smooth$set_id[1])
    #     , fill = "navy"
    #     , alpha = 0.8
    #   ) +
    #   geom_sf(
    #     data = geom_sets %>% dplyr::filter(set_id == geom_sets_smooth$set_id[1])
    #     , fill = NA
    #     , color = "gold"
    #   ) +
    #   theme_void()
    
    ########################################################################################
    ## 6.3) second irregular filter
    ########################################################################################
    # get set, convexity combinations to map over st_irregular_remove
    set_convexity_combo <- 
      rn_geom_lookup %>%  
        # add in convexity pct to get unique set_id, convexity
        dplyr::inner_join(
          param_combos_df %>% 
            dplyr::select(rn,convexity_pct)
          , by = "rn"
        ) %>% 
        dplyr::distinct(set_id,convexity_pct)
    
    # filter for convexity
      geom_sets_convexity <- 
        1:nrow(set_convexity_combo) %>% 
        purrr::map(\(x)
          st_irregular_remove(
            sf_data = geom_sets_smooth %>% dplyr::filter(set_id==set_convexity_combo$set_id[x])
            , pct_chull_overlap = set_convexity_combo$convexity_pct[x]
          ) %>% 
            dplyr::mutate(
              set_id = set_convexity_combo$set_id[x]
              , convexity_pct = set_convexity_combo$convexity_pct[x]
            )
          , .progress = "second irregularity filter"
        ) %>% 
        dplyr::bind_rows()
    
    # row is unique by set_id,convexity_pct,pred_id
    # geom_sets_convexity %>% dplyr::glimpse()
    # geom_sets_convexity %>% sf::st_drop_geometry() %>% dplyr::count(set_id,convexity_pct)
    
    ########################################################################################
    ## 6.6) second area filter
    ########################################################################################
    # filter for area
      message("expanding to final param combos.....")
      filtered_final <- 
        geom_sets_convexity %>% 
        dplyr::select(-c(max_ht_m)) %>% 
        # expand to full param_combos_df
        dplyr::inner_join(
          param_combos_df %>% 
            dplyr::inner_join(
              rn_geom_lookup %>% dplyr::select(rn,set_id)
              , by = "rn"
            )
          , by = dplyr::join_by(set_id,convexity_pct)
          , relationship = "many-to-many"
        ) %>% 
        # filter for area
        dplyr::filter(
          dplyr::coalesce(area_m2,0) >= min_area_m2
          & dplyr::coalesce(area_m2,0) <= max_area_m2
        ) %>% 
        dplyr::select(-c(set_id)) %>% 
        dplyr::relocate(names(param_combos_df))
    # filtered_final %>% dplyr::glimpse()
    ########################################################################################
    ## 7) shape refinement & overlap removal
    ########################################################################################
      # use the convex hull shapes of our remaining segments. 
      # This helps to smooth out the often 'blocky' edges of raster-based segments
      # , which can look like they were generated in Minecraft. 
      # Additionally, by removing any segments with overlapping convex hull shapes, 
      # we can likely reduce false detections that are actually groups of small trees or shrubs, 
      # ensuring our results represent singular slash piles.
      smooth_segs <- T
      
      if(smooth_segs){
        # smooth unique polygons 
        message("shape refinement and diametering.....")
        ##### just work with unique polygons
        ##### geom_sets_smooth row is unique by: set_id,pred_id
        geom_sets_final <- geom_sets_smooth %>% 
          dplyr::select( -dplyr::any_of(c(
            "hey_xxxxxxxxxx"
            , "max_ht_m"
          ))) %>% 
          # join to filter geoms
          dplyr::inner_join(
            rn_geom_lookup %>% 
              dplyr::inner_join(
                filtered_final %>% 
                  sf::st_drop_geometry() %>% 
                  dplyr::distinct(rn,pred_id)
                , by = dplyr::join_by(rn)
                , relationship = "one-to-many"
              ) %>% 
              # just get the unique geoms now
              dplyr::distinct(set_id,pred_id)
            , by = dplyr::join_by(set_id,pred_id)
            , relationship = "one-to-one"
          ) %>% 
          # smooth
          sf::st_convex_hull() %>% 
          sf::st_simplify() %>% 
          sf::st_make_valid() %>% 
          dplyr::filter(sf::st_is_valid(.)) %>% 
          # diameter
          dplyr::ungroup() %>% 
          st_calculate_diameter() %>% 
          # now we need to re-do the volume and area calculations
          dplyr::mutate(
            area_m2 = sf::st_area(.) %>% as.numeric()
            , volume_m3 = area_m2*volume_per_area
          )
        
        # ##### geom_sets_final row is unique by: set_id,pred_id
        # geom_sets_final %>% dplyr::glimpse()
        # ggplot() +
        #   geom_sf(
        #     # post smooth (should be slightly larger)
        #     data = geom_sets_final %>% dplyr::filter(set_id == geom_sets_final$set_id[1])
        #     , fill = "navy"
        #     , alpha = 0.8
        #     , color = "gray"
        #     , lwd = 0.2
        #   ) +
        #   theme_void()
        # ggplot() +
        #   geom_sf(
        #     # post smooth (should be slightly larger)
        #     data = geom_sets_final %>% dplyr::filter(set_id == geom_sets_final$set_id[1])
        #     , fill = "navy"
        #     , alpha = 0.8
        #     , color = "gray"
        #     , lwd = 0.2
        #   ) +
        #   geom_sf(
        #     # pre-smooth
        #     data = geom_sets %>% dplyr::filter(set_id == geom_sets_final$set_id[1])
        #     , fill = NA
        #     , color = "gold"
        #   ) +
        #   theme_void()
        
        # sf::st_read("c:/data/usfs/manitou_slash_piles/data/param_combos_piles_chm0.45m.gpkg") %>% 
        #   dplyr::mutate(xxxarea_m2 = sf::st_area(.) %>% as.numeric()) %>% 
        #   sf::st_drop_geometry() %>% 
        #   dplyr::slice_sample(n=33333) %>% 
        #   ggplot2::ggplot(mapping = ggplot2::aes(x = xxxarea_m2, y =area_m2)) +
        #   ggplot2::geom_abline(lwd = 1) +
        #   ggplot2::geom_point() +
        #   ggplot2::geom_smooth(method = "lm", se=F, color = "tomato", linetype = "dashed") +
        #   ggplot2::scale_color_viridis_c(option = "mako", direction = -1, alpha = 0.8) +
        #   ggplot2::theme_light()
          
        
        # ##### geom_sets_final row is unique by: set_id,pred_id
        # overlapping convex hull shapes are then removed to prevent false positives from clustered small trees or shrubs
        # have to map over this to do overlaps only on one set at a time
        geom_sets_final_olaps <- 
            geom_sets_final$set_id %>% 
            unique() %>% 
            purrr::map(\(x)
              geom_sets_final %>% 
                dplyr::filter(set_id==x) %>% 
                st_remove_overlaps()
            ) %>% 
            dplyr::bind_rows()
          
        # ##### geom_sets_final_olaps row is unique by: set_id,pred_id
        # geom_sets_final %>% dplyr::glimpse()
        # geom_sets_final_olaps %>% dplyr::glimpse()
        # ggplot() +
        #   geom_sf(
        #     # post smooth (should be slightly larger)
        #     data = geom_sets_final_olaps %>% dplyr::filter(set_id == geom_sets_final_olaps$set_id[1])
        #     , fill = "navy"
        #     , alpha = 0.8
        #     , color = "gray"
        #     , lwd = 0.2
        #   ) +
        #   geom_sf(
        #     # pre-smooth
        #     data = geom_sets_final %>% dplyr::filter(set_id == geom_sets_final_olaps$set_id[1])
        #     , fill = NA
        #     , color = "gold"
        #     , lwd = 1
        #   ) +
        #   theme_void()
        
        
        # join back to param_combos list
        final_ans <- geom_sets_final_olaps %>% 
          # join to lookup
          # expands row to unique by set_id,rn,pred_id
          dplyr::inner_join(
            rn_geom_lookup %>% 
              dplyr::select(rn,set_id)
            , by = dplyr::join_by(set_id)
            , relationship = "many-to-many"
          ) %>% 
          # join with filtered_final which have pred_id
          dplyr::inner_join(
            filtered_final %>% 
              sf::st_drop_geometry() %>% 
              dplyr::distinct(rn,pred_id)
            , by = dplyr::join_by(rn,pred_id)
            , relationship = "one-to-one"
          ) %>% 
          # add param_combos_df data
          dplyr::inner_join(
            param_combos_df
            , by = "rn"
            , relationship = "many-to-one"
          )
        
        # nrow(filtered_final)
        # nrow(final_ans) # should be less
      }else{
        message("diametering.....")
        ##### just work with unique polygons
        ##### geom_sets_smooth row is unique by: set_id,pred_id
        geom_sets_final <- geom_sets_smooth %>% 
          dplyr::select( -dplyr::any_of(c(
            "hey_xxxxxxxxxx"
            , "max_ht_m"
          ))) %>% 
          # join to filter geoms
          dplyr::inner_join(
            rn_geom_lookup %>% 
              dplyr::inner_join(
                filtered_final %>% 
                  sf::st_drop_geometry() %>% 
                  dplyr::distinct(rn,pred_id)
                , by = dplyr::join_by(rn)
                , relationship = "one-to-many"
              ) %>% 
              # just get the unique geoms now
              dplyr::distinct(set_id,pred_id)
            , by = dplyr::join_by(set_id,pred_id)
            , relationship = "one-to-one"
          ) %>% 
          # diameter
          dplyr::ungroup() %>% 
          st_calculate_diameter()
        
        # join back to param_combos list
        final_ans <- geom_sets_final %>% 
          # join to lookup
          # expands row to unique by set_id,rn,pred_id
          dplyr::inner_join(
            rn_geom_lookup %>% 
              dplyr::select(rn,set_id)
            , by = dplyr::join_by(set_id)
            , relationship = "many-to-many"
          ) %>% 
          # join with filtered_final which have pred_id
          dplyr::inner_join(
            filtered_final %>% 
              sf::st_drop_geometry() %>% 
              dplyr::distinct(rn,pred_id)
            , by = dplyr::join_by(rn,pred_id)
            , relationship = "one-to-one"
          ) %>% 
          # add param_combos_df data
          dplyr::inner_join(
            param_combos_df
            , by = "rn"
            , relationship = "many-to-one"
          )
        
      }
      
      if(stringr::str_length(ofile)>1){
        final_ans %>% sf::st_write(ofile, append = F, quiet = T)
      }
      
    # return
      return(final_ans)
}

# param_combos_piles <- 
#   param_combos_df$rn %>% 
#   # sample(3) %>% ## !!!!!!!!!!!!!!!!!!!! remove after testing
#   purrr::map(\(x)
#     slash_pile_detect_watershed(
#         chm_rast = cloud2raster_ans$chm_rast
#         , max_ht_m = param_combos_df$max_ht_m[x]
#         , min_area_m2 = param_combos_df$min_area_m2[x]
#         , max_area_m2 = param_combos_df$max_area_m2[x]
#         , convexity_pct = param_combos_df$convexity_pct[x]
#         , circle_fit_iou_pct = param_combos_df$circle_fit_iou_pct[x]
#       ) %>% 
#       dplyr::mutate(
#         rn = param_combos_df$rn[x]
#         , max_ht_m = param_combos_df$max_ht_m[x]
#         , min_area_m2 = param_combos_df$min_area_m2[x]
#         , max_area_m2 = param_combos_df$max_area_m2[x]
#         , convexity_pct = param_combos_df$convexity_pct[x]
#         , circle_fit_iou_pct = param_combos_df$circle_fit_iou_pct[x]
#       )
#   ) %>% 
#   dplyr::bind_rows()

# param_combos_piles %>% 
#   sf::st_write("../data/param_combos_piles.gpkg", append = F)
# # param_combos_piles <- sf::st_read("../data/param_combos_piles.gpkg")

7.1.1 Test Workflow

we’ll test our apply our function using a sample from the data frame of the parameter combinations

f_temp <- "../data/testtest_param_combos_piles.gpkg"
if(!file.exists(f_temp)){
  param_combos_piles <- param_combos_piles_detect_fn(
    chm = cloud2raster_ans$chm_rast
    , param_combos_df = param_combos_df %>% dplyr::slice_sample(n=17)
    , smooth_segs = T
    , ofile = f_temp
  )
  # param_combos_piles %>% dplyr::glimpse()
  
  # save it
  sf::st_write(param_combos_piles, f_temp, append = F, quiet = T)
}else{
  param_combos_piles <- sf::st_read(f_temp, quiet = T)
}

what did we get?

param_combos_piles %>% dplyr::glimpse()

## Rows: 8,441
## Columns: 14
## $ pred_id            <dbl> 418, 991, 1745, 2416, 2624, 2640, 3111, 3160, 3482,…
## $ area_m2            <dbl> 2.56, 3.76, 3.36, 2.52, 4.86, 7.92, 3.90, 3.12, 2.4…
## $ volume_m3          <dbl> 9.913314, 12.710141, 12.714443, 4.286529, 13.488759…
## $ max_height_m       <dbl> 4.990000, 4.970000, 4.933000, 4.889250, 4.873667, 4…
## $ volume_per_area    <dbl> 3.872388, 3.380357, 3.784060, 1.701004, 2.775465, 3…
## $ set_id             <int> 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,…
## $ diameter_m         <dbl> 2.163331, 2.690725, 2.720294, 2.000000, 3.440930, 4…
## $ rn                 <int> 473, 473, 473, 473, 473, 473, 473, 473, 473, 473, 4…
## $ max_ht_m           <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ min_area_m2        <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ max_area_m2        <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ convexity_pct      <dbl> 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0…
## $ circle_fit_iou_pct <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0…
## $ geom               <MULTIPOLYGON [m]> MULTIPOLYGON (((499555 4317..., MULTIP…

# param_combos_piles %>% sf::st_drop_geometry() %>% dplyr::count(rn) %>% dplyr::slice_head(n=6)
# terra::plot(cloud2raster_ans$chm_rast, col = viridis::plasma(100), axes = F)
# terra::plot(param_combos_piles %>% dplyr::filter(rn==param_combos_piles$rn[2222]) %>% terra::vect(),add = T, border = "gray44", col = NA, lwd = 3)

we should have predicted piles for each parameter combination tested so a row is unique by the combination of all of the variables in the param_combos_df data frame and the pile detected. we expect the number of piles detected to vary by these different parameter settings where the number of parameter settings should match the number of samples we selected.

param_combos_piles %>% 
  sf::st_drop_geometry() %>% 
  dplyr::count(rn)

##     rn    n
## 1    1  597
## 2   13  331
## 3   94  141
## 4  173  149
## 5  199  133
## 6  246  190
## 7  276  796
## 8  313  575
## 9  338  601
## 10 361  984
## 11 408  619
## 12 423  203
## 13 473  192
## 14 508  733
## 15 526 1169
## 16 559  287
## 17 588  741

which is equivalent to

param_combos_piles %>% 
  sf::st_drop_geometry() %>% 
  dplyr::count(max_ht_m,min_area_m2,max_area_m2,convexity_pct,circle_fit_iou_pct)

##    max_ht_m min_area_m2 max_area_m2 convexity_pct circle_fit_iou_pct    n
## 1         2           2          10           0.1                0.1  597
## 2         2           2          10           0.5                0.5  331
## 3         2           2          40           0.7                0.7  141
## 4         3           2          10           0.9                0.5  149
## 5         3           2          20           0.9                0.7  133
## 6         3           2          40           0.9                0.1  190
## 7         3           2          60           0.1                0.1  796
## 8         4           2          10           0.5                0.5  575
## 9         4           2          20           0.5                0.5  601
## 10        4           2          30           0.5                0.1  984
## 11        4           2          50           0.3                0.5  619
## 12        4           2          50           0.9                0.5  203
## 13        5           2          10           0.9                0.5  192
## 14        5           2          30           0.3                0.5  733
## 15        5           2          40           0.1                0.1 1169
## 16        5           2          50           0.3                0.7  287
## 17        5           2          60           0.5                0.5  741

7.1.2 Test Instance Matching

let’s see how we did with the full data fusion predictions compared to the ground truth data using the instance matching process we outlined in this earlier section. here we validate the segments from these combinations against the ground truth data and we can map over the ground_truth_prediction_match() function to get true positive, false positive (commission), and false negative (omission) classifications for the predicted and ground truth piles

f_temp <- "../data/testtest_param_combos_gt.csv"
if(!file.exists(f_temp)){
  param_combos_gt <- 
    unique(param_combos_piles$rn) %>% 
    purrr::map(\(x)
      ground_truth_prediction_match(
        ground_truth = slash_piles_polys %>% 
          dplyr::filter(is_in_stand) %>% 
          dplyr::arrange(desc(field_diameter_m)) %>% 
          sf::st_transform(sf::st_crs(param_combos_piles))
        , gt_id = "pile_id"
        , predictions = param_combos_piles %>% 
          dplyr::filter(rn == x) %>% 
          dplyr::filter(
            pred_id %in% (param_combos_piles %>% 
              dplyr::filter(rn == x) %>% 
              sf::st_intersection(
                stand_boundary %>% 
                  sf::st_transform(sf::st_crs(param_combos_piles))
              ) %>% 
              sf::st_drop_geometry() %>% 
              dplyr::pull(pred_id))
          )
        , pred_id = "pred_id"
        , min_iou_pct = 0.05
      ) %>% 
      dplyr::mutate(rn=x)
    ) %>% 
    dplyr::bind_rows()
  param_combos_gt %>% readr::write_csv(f_temp, append = F, progress = F)
}else{
  param_combos_gt <- readr::read_csv(f_temp, progress = F, show_col_types = F)
}
# huh?
param_combos_gt %>% dplyr::glimpse()
# param_combos_gt %>% dplyr::filter(pile_id==120)

7.1.3 Test Accuracy

earlier, we defined a function (agg_ground_truth_match()) to prepare our results for analysis by developing a function that aggregates the single-pile-level data into a single record for each parameter combination. this function will calculate detection performance metrics such as F-score, precision, and recall (using the confusion_matrix_scores_fn() we defined above), as well as quantification accuracy metrics including Root Mean Squared Error (RMSE), Mean Error (ME), and Mean Absolute Percentage Error (MAPE) to assess the accuracy of our pile form measurements. this could be a valuable function for any future analysis comparing predictions to ground truth data.

the agg_ground_truth_match() function looks for columns with the prefix “diff_” to calculate the mean error (ME) and the RMSE while columns with the prefix “pct_diff_” enable the function to calculate the mean absolute percent error (MAPE)…let’s make those columns

# let's attach a flag to only work with piles that intersect with the stand boundary
# and caclulate the "diameter" of the piles
# add in/out to piles data
param_combos_piles <- 
  param_combos_piles %>% 
  dplyr::left_join(
    param_combos_piles %>% 
      sf::st_intersection(
        stand_boundary %>% 
          sf::st_transform(sf::st_crs(param_combos_piles))
      ) %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(rn,pred_id) %>% 
      dplyr::mutate(is_in_stand = T)
    , by = dplyr::join_by(rn,pred_id)
  ) %>% 
  dplyr::mutate(
    is_in_stand = dplyr::coalesce(is_in_stand,F)
  )
# param_combos_piles %>% dplyr::glimpse()

# add it to validation
param_combos_gt <-
  param_combos_gt %>% 
  dplyr::mutate(pile_id = as.numeric(pile_id)) %>% 
  # add area of gt
  dplyr::left_join(
    slash_piles_polys %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(pile_id,image_gt_area_m2,height_m,field_diameter_m) %>% 
      dplyr::rename(
        gt_height_m = height_m
        , gt_diameter_m = field_diameter_m
        , gt_area_m2 = image_gt_area_m2
      ) %>% 
      dplyr::mutate(pile_id=as.numeric(pile_id))
    , by = "pile_id"
  ) %>% 
  # add info from predictions
  dplyr::left_join(
    param_combos_piles %>%
      sf::st_drop_geometry() %>%
      dplyr::select(
        rn,pred_id
        ,is_in_stand
        , area_m2, volume_m3, max_height_m, diameter_m
      ) %>% 
      dplyr::rename(
        pred_area_m2 = area_m2
        , pred_height_m = max_height_m
        , pred_diameter_m = diameter_m
      )
    , by = dplyr::join_by(rn,pred_id)
  ) %>%
  dplyr::mutate(
    is_in_stand = dplyr::case_when(
      is_in_stand == T ~ T
      , is_in_stand == F ~ F
      , match_grp == "omission" ~ T
      , T ~ F
    )
    ### calculate these based on the formulas below...agg_ground_truth_match() depends on those formulas
    # ht diffs
    , diff_height_m = pred_height_m-gt_height_m
    , pct_diff_height_m = (gt_height_m-pred_height_m)/gt_height_m
    # diameter
    , diff_diameter_m = pred_diameter_m-gt_diameter_m
    , pct_diff_diameter_m = (gt_diameter_m-pred_diameter_m)/gt_diameter_m
    # area diffs
    , diff_area_m2 = pred_area_m2-gt_area_m2
    , pct_diff_area_m2 = (gt_area_m2-pred_area_m2)/gt_area_m2
  )
# what?
# param_combos_gt %>% dplyr::glimpse()

now let’s apply the agg_ground_truth_match() function at the parameter combination level

param_combos_gt_agg <- unique(param_combos_gt$rn) %>% 
  purrr::map(\(x)
    agg_ground_truth_match(
      param_combos_gt %>% 
        dplyr::filter(
          is_in_stand
          & rn == x
        )
    ) %>% 
    dplyr::mutate(rn = x)
  ) %>% 
  dplyr::bind_rows() %>% 
  # add in info on all parameter combinations
  dplyr::inner_join(
    param_combos_piles %>% 
    sf::st_drop_geometry() %>% 
    dplyr::distinct(
      rn,max_ht_m,min_area_m2,max_area_m2,convexity_pct,circle_fit_iou_pct
    )
    , by = "rn"
    , relationship = "one-to-one"
  )
# what is this?
param_combos_gt_agg %>% dplyr::glimpse()

## Rows: 17
## Columns: 23
## $ tp_n                     <dbl> 83, 66, 116, 114, 78, 113, 94, 103, 95, 80, 7…
## $ fp_n                     <dbl> 33, 10, 156, 329, 53, 258, 138, 140, 14, 12, …
## $ fn_n                     <dbl> 38, 55, 5, 7, 43, 8, 27, 18, 26, 41, 45, 10, …
## $ omission_rate            <dbl> 0.31404959, 0.45454545, 0.04132231, 0.0578512…
## $ commission_rate          <dbl> 0.28448276, 0.13157895, 0.57352941, 0.7426636…
## $ precision                <dbl> 0.7155172, 0.8684211, 0.4264706, 0.2573363, 0…
## $ recall                   <dbl> 0.6859504, 0.5454545, 0.9586777, 0.9421488, 0…
## $ f_score                  <dbl> 0.7004219, 0.6700508, 0.5903308, 0.4042553, 0…
## $ diff_area_m2_rmse        <dbl> 0.9177856, 1.0318250, 2.7307986, 2.0816764, 1…
## $ diff_diameter_m_rmse     <dbl> 0.5356816, 0.5909564, 0.8223519, 0.8535298, 0…
## $ diff_height_m_rmse       <dbl> 0.4736971, 0.4941902, 0.7271534, 0.6431814, 0…
## $ diff_area_m2_mean        <dbl> 0.4754762, 0.6257658, 0.6015038, 0.7216610, 0…
## $ diff_diameter_m_mean     <dbl> 0.3976082, 0.4047071, 0.5242729, 0.5515058, 0…
## $ diff_height_m_mean       <dbl> 0.026860245, -0.274133330, -0.116632762, 0.00…
## $ pct_diff_area_m2_mape    <dbl> 0.1116297, 0.1209130, 0.1491131, 0.1503765, 0…
## $ pct_diff_diameter_m_mape <dbl> 0.1512180, 0.1615222, 0.1834563, 0.1948854, 0…
## $ pct_diff_height_m_mape   <dbl> 0.1904217, 0.1385338, 0.2001751, 0.2095256, 0…
## $ rn                       <dbl> 473, 94, 276, 526, 13, 361, 313, 338, 246, 17…
## $ max_ht_m                 <dbl> 5, 2, 3, 5, 2, 4, 4, 4, 3, 3, 3, 5, 4, 5, 5, …
## $ min_area_m2              <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ max_area_m2              <dbl> 10, 40, 60, 40, 10, 30, 10, 20, 40, 10, 20, 6…
## $ convexity_pct            <dbl> 0.9, 0.7, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.9, …
## $ circle_fit_iou_pct       <dbl> 0.5, 0.7, 0.1, 0.1, 0.5, 0.1, 0.5, 0.5, 0.1, …

# param_combos_gt_agg %>% readr::write_csv("c:/Users/georg/Downloads/param_combos_gt_agg.csv")
# param_combos_gt_agg %>% dplyr::select(f_score,precision,recall,tidyselect::ends_with("_n")) %>% dplyr::mutate(gt_n=tp_n+fn_n)

7.1.3.1 Test Accuracy Insights

lets’ make some plotting functions to glean some insight into the distribution of the accuracies obtained from our full set of sensitivity testing point estimates

first, we’ll make a function to plot the distribution of F-score, Recall, and Precision values obtained in our sensitivity testing

# this is a lot of work, so we're going to make it a function
plt_detection_dist <- function(
  df
  , my_subtitle = ""
  , show_rug = T
) {
  pal_eval_metric <- c(
    RColorBrewer::brewer.pal(3,"Oranges")[3]
    , RColorBrewer::brewer.pal(3,"Greys")[3]
    , RColorBrewer::brewer.pal(3,"Purples")[3]
  )
  
  df_temp <- df %>% 
    tidyr::pivot_longer(
      cols = c(precision,recall,f_score)
      , names_to = "metric"
      , values_to = "value"
    ) %>% 
    dplyr::mutate(
      metric = dplyr::case_when(
          metric == "f_score" ~ 1
          , metric == "recall" ~ 2
          , metric == "precision" ~ 3
        ) %>% 
        factor(
          ordered = T
          , levels = 1:3
          , labels = c(
            "F-score"
            , "Recall"
            , "Precision"
          )
        )
    ) 
  
  # plot 
  # if(nrow(df)<=15 && (df_temp %>% dplyr::count(metric,value) %>% dplyr::pull(n) %>% max())>1 ){
  if(nrow(df)<=15){
    # round
    df_temp <- df_temp %>% 
      dplyr::mutate(
        value = round(value,2)
      )
    # agg for median plotting
      xxxdf_temp <- df_temp %>% 
        dplyr::group_by(metric) %>% 
        dplyr::summarise(value = median(value,na.rm=T)) %>% 
        dplyr::ungroup() %>% 
        dplyr::mutate(
          value_lab = paste0(
            "median:\n"
            , scales::percent(value,accuracy=1)
          )
        )
    
    # plot 
    plt <- df_temp %>% 
      dplyr::count(metric,value) %>% 
      ggplot2::ggplot(
        mapping = ggplot2::aes(x = value, fill = metric, color = metric)
      ) +
      ggplot2::geom_vline(
        data = xxxdf_temp
        , mapping = ggplot2::aes(xintercept = value)
        , color = "gray44", linetype = "dashed"
      ) +
      # ggplot2::geom_jitter(mapping = ggplot2::aes(y=-0.2), width = 0, height = 0.1) +
      # ggplot2::geom_boxplot(width = 0.1, color = "black", fill = NA, outliers = F) +
      ggplot2::geom_segment(
        mapping = ggplot2::aes(y=n,yend=0)
        , lwd = 2, alpha = 0.8
      ) +
      ggplot2::geom_point(
        mapping = ggplot2::aes(y=n)
        , alpha = 1
        , shape = 21, color = "gray44", size = 5
      ) +
      ggplot2::geom_text(
        mapping = ggplot2::aes(y=n,label=n)
        , size = 2.5, color = "white"
        # , vjust = -0.01
      ) +
      ggplot2::geom_text(
        data = xxxdf_temp
        , mapping = ggplot2::aes(
          x = -Inf, y = Inf # always in upper left?
          # x = value, y = 0
          , label = value_lab
        )
        , hjust = -0.1, vjust = 1 # always in upper left?
        # , hjust = -0.1, vjust = -5
        , size = 2.5, color = "black"
      ) +
      # ggplot2::geom_rug() +
      ggplot2::scale_fill_manual(values = pal_eval_metric) +
      ggplot2::scale_color_manual(values = pal_eval_metric) +
      ggplot2::scale_x_continuous(
        labels = scales::percent_format(accuracy = 1)
        # , limits = c(0,1)
      ) +
      ggplot2::scale_y_continuous(expand = ggplot2::expansion(mult = c(0, .1))) +
      ggplot2::facet_grid(cols = dplyr::vars(metric), scales = "free") +
      ggplot2::labs(
        x = "", y = ""
        , subtitle = my_subtitle
      ) +
      ggplot2::theme_light() +
      ggplot2::theme(
        legend.position = "none"
        , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
        , axis.text.x = ggplot2::element_text(size = 7)
        , axis.text.y = ggplot2::element_blank()
        , axis.ticks.y = ggplot2::element_blank()
        , panel.grid.major.y = ggplot2::element_blank()
        , panel.grid.minor.y = ggplot2::element_blank()
        , plot.subtitle = ggplot2::element_text(size = 8)
      )
  }else{
    # agg for median plotting
      xxxdf_temp <- df_temp %>% 
        dplyr::group_by(metric) %>% 
        dplyr::summarise(value = median(value,na.rm=T)) %>% 
        dplyr::ungroup() %>% 
        dplyr::mutate(
          value_lab = paste0(
            "median:\n"
            , scales::percent(value,accuracy=1)
          )
        )
  
    plt <- df_temp %>% 
      ggplot2::ggplot(
        mapping = ggplot2::aes(x = value, fill = metric, color = metric)
      ) +
      ggplot2::geom_density(color = NA, alpha = 0.9) +
      # ggplot2::geom_rug(
      #   # # setting these makes the plotting more computationally intensive
      #   # alpha = 0.5
      #   # , length = ggplot2::unit(0.01, "npc")
      # ) +
      ggplot2::geom_vline(
        data = xxxdf_temp
        , mapping = ggplot2::aes(xintercept = value)
        , color = "gray44", linetype = "dashed"
      ) +
      ggplot2::geom_text(
        data = xxxdf_temp
        , mapping = ggplot2::aes(
          x = -Inf, y = Inf # always in upper left?
          # x = value, y = 0
          , label = value_lab
        )
        , hjust = -0.1, vjust = 1 # always in upper left?
        # , hjust = -0.1, vjust = -5
        , size = 2.5, color = "black"
      ) +
      ggplot2::scale_fill_manual(values = pal_eval_metric) +
      ggplot2::scale_color_manual(values = pal_eval_metric) +
      ggplot2::scale_x_continuous(
        labels = scales::percent_format(accuracy = 1)
        # , limits = c(0,1)
      ) +
      ggplot2::facet_grid(cols = dplyr::vars(metric), scales = "free") +
      ggplot2::labs(
        x = "", y = ""
        , subtitle = my_subtitle
      ) +
      ggplot2::theme_light() +
      ggplot2::theme(
        legend.position = "none"
        , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
        , axis.text.x = ggplot2::element_text(size = 7)
        , axis.text.y = ggplot2::element_blank()
        , axis.ticks.y = ggplot2::element_blank()
        , panel.grid.major.y = ggplot2::element_blank()
        , panel.grid.minor.y = ggplot2::element_blank()
        , plot.subtitle = ggplot2::element_text(size = 8)
      )
    if(show_rug){
      plt <- plt + 
        ggplot2::geom_rug(
          # # setting these makes the plotting more computationally intensive
          # alpha = 0.5
          # , length = ggplot2::unit(0.01, "npc")
        )
    }
  }
  return(plt)
}

that is a lot of plotting, try isolating the ggplot2::ggplot() command from the function to see what it does

now plot the distribution of F-score, Recall, and Precision values obtained in our test sensitivity testing

this is only a test

plt_detection_dist(param_combos_gt_agg)

we put in a special plot format if we only have a few observations (i.e. <=15)

this is only a test

plt_detection_dist(param_combos_gt_agg %>% dplyr::slice_sample(n=11))

neat.

now, we’ll make a function to plot the distribution of MAPE, RMSE, and ME values obtained in our sensitivity testing. this one is a little more complex than the detection accuracy plotting because the values are on different scales (i.e. MAPE is a percentage value, RMSE is positive in the units of the measurement, and ME is positive or negative in the units of the measurement)

# this is a lot of work, so we're going to make it a function
plt_form_quantification_dist <- function(
  df
  , my_subtitle = ""
  , show_rug = T
) {
  # reshape data to go long by evaluation metric
  df_temp <-
    df %>% 
    dplyr::ungroup() %>% 
    dplyr::mutate(
      # label combining params
      lab = stringr::str_c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct, sep = ":") %>% 
        forcats::fct_reorder(f_score) # %>% forcats::fct_rev()
    ) %>% 
    dplyr::select(
      max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct
      , lab
      # quantification
      , tidyselect::ends_with("_rmse")
      , tidyselect::ends_with("_rrmse")
      , tidyselect::ends_with("_mean")
      , tidyselect::ends_with("_mape")
    ) %>% 
    tidyr::pivot_longer(
      cols = c(
        tidyselect::ends_with("_rmse")
        , tidyselect::ends_with("_rrmse")
        , tidyselect::ends_with("_mean")
        , tidyselect::ends_with("_mape")
      )
      , names_to = "metric"
      , values_to = "value"
    ) %>% 
    dplyr::mutate(
      eval_metric = stringr::str_extract(metric, "(_rmse|_rrmse|_mean|_mape)$") %>% 
        stringr::str_remove_all("_") %>% 
        stringr::str_replace_all("mean","me") %>% 
        toupper() %>% 
        factor(
          ordered = T
          , levels = c("ME","RMSE","RRMSE","MAPE")
        )
      , pile_metric = metric %>% 
        stringr::str_remove("(_rmse|_rrmse|_mean|_mape)$") %>% 
        stringr::str_extract("(paraboloid_volume|volume|area|height|diameter)") %>% 
        factor(
          ordered = T
          , levels = c(
            "volume"
            , "paraboloid_volume"
            , "area"
            , "height"
            , "diameter"
          )
          , labels = c(
            "Volume"
            , "Volume paraboloid"
            , "Area"
            , "Height"
            , "Diameter"
          )
        )
    )
  
  # plot
  # !!! we map over the ggplot function to allow for differnt formatting of the x axis
  # !!! and to allow for different axis ranges for each individual panel which is not possible when faceting
  plt_list_temp <- 
    # get factors in order
    df_temp %>% 
    dplyr::filter(eval_metric!="RRMSE") %>% 
    dplyr::count(eval_metric) %>% 
    pull(eval_metric) %>% 
    purrr::map(function(x, my_show_rug=show_rug){
      if(nrow(df)<=15){
        df_temp <- df_temp %>% 
          dplyr::filter(eval_metric==x) %>% 
          dplyr::mutate(
            value = dplyr::case_when(
              eval_metric %in% c("RRMSE", "MAPE") ~ round(value,2)
              , T ~ round(value,1)
            )
          )
        # aggregate
        xxxdf_temp <- df_temp %>% 
          dplyr::group_by(pile_metric,eval_metric) %>% 
          dplyr::summarise(value = median(value,na.rm=T)) %>% 
          dplyr::ungroup() %>% 
          dplyr::mutate(
            value_lab = paste0(
                "median:\n"
              , dplyr::case_when(
                eval_metric %in% c("RRMSE", "MAPE") ~ scales::percent(value,accuracy=1)
                , eval_metric == "ME" ~ scales::comma(value,accuracy=0.1)
                , T ~ scales::comma(value,accuracy=0.1)
              )
            )
          )
        # plot 
        p <- df_temp %>% 
          dplyr::count(eval_metric,pile_metric,value) %>% 
          ggplot2::ggplot(
            mapping = ggplot2::aes(x = value, fill = pile_metric, color = pile_metric)
          ) +
          ggplot2::geom_vline(
            data = xxxdf_temp
            , mapping = ggplot2::aes(xintercept = value)
            , color = "gray44", linetype = "dashed"
          ) +
          # ggplot2::geom_jitter(mapping = ggplot2::aes(y=-0.2), width = 0, height = 0.1) +
          # ggplot2::geom_boxplot(width = 0.1, color = "black", fill = NA, outliers = F) +
          ggplot2::geom_segment(
            mapping = ggplot2::aes(y=n,yend=0)
            , lwd = 2, alpha = 0.8
          ) +
          ggplot2::geom_point(
            mapping = ggplot2::aes(y=n)
            , alpha = 1
            , shape = 21, color = "gray44", size = 5
          ) +
          ggplot2::geom_text(
            mapping = ggplot2::aes(y=n,label=n)
            , size = 2.5, color = "white"
            # , vjust = -0.01
          ) +
          ggplot2::geom_text(
            data = xxxdf_temp
            , mapping = ggplot2::aes(
              x = -Inf, y = Inf # always in upper left?
              # x = value, y = 0
              , label = value_lab
            )
            , hjust = -0.1, vjust = 1 # always in upper left?
            # , hjust = -0.1, vjust = -5
            , size = 2.5, color = "black"
          ) +
          # ggplot2::geom_rug() +
          harrypotter::scale_fill_hp_d(option = "hermionegranger") +
          harrypotter::scale_color_hp_d(option = "hermionegranger") +
          ggplot2::scale_y_continuous(expand = ggplot2::expansion(mult = c(0, .33))) +
          ggplot2::facet_grid(
            cols = dplyr::vars(pile_metric), rows = dplyr::vars(eval_metric)
            , scales = "free"
            , switch = "y" # moves y facet label to left
          ) +
          ggplot2::labs(x = "", y = "") +
          ggplot2::theme_light() +
          ggplot2::theme(
            legend.position = "none"
            , strip.placement = "outside"
            , strip.text.y = ggplot2::element_text(size = 11, color = "black", face = "bold")
            , strip.text.y.left = ggplot2::element_text(angle = 0)
            , strip.background.y = ggplot2::element_rect(color = NA, fill = NA)
            , strip.text.x = ggplot2::element_text(size = 8, color = "black", face = "bold")
            , axis.text.x = ggplot2::element_text(size = 6.5)
            , axis.text.y = ggplot2::element_blank()
            , axis.ticks.y = ggplot2::element_blank()
            , panel.grid.major.y = ggplot2::element_blank()
            , panel.grid.minor.y = ggplot2::element_blank()
          )
      }else{
        # aggregate
        xxxdf_temp <- df_temp %>% 
          dplyr::group_by(pile_metric,eval_metric) %>% 
          dplyr::summarise(value = median(value,na.rm=T)) %>% 
          dplyr::ungroup() %>% 
          dplyr::mutate(
            value_lab = paste0(
                "median:\n"
              , dplyr::case_when(
                eval_metric %in% c("RRMSE", "MAPE") ~ scales::percent(value,accuracy=1)
                , eval_metric == "ME" ~ scales::comma(value,accuracy=0.1)
                , T ~ scales::comma(value,accuracy=0.1)
              )
            )
          )
        p <- df_temp %>% 
          dplyr::filter(eval_metric==x) %>% 
          ggplot2::ggplot() +
          # ggplot2::geom_vline(xintercept = 0, color = "gray") +
          ggplot2::geom_density(mapping = ggplot2::aes(x = value, fill = pile_metric), color = NA, alpha = 0.9) +
          # ggplot2::geom_rug(mapping = ggplot2::aes(x = value, color = pile_metric)) +
          harrypotter::scale_fill_hp_d(option = "hermionegranger") +
          harrypotter::scale_color_hp_d(option = "hermionegranger") +
          ggplot2::geom_vline(
            data = xxxdf_temp %>% dplyr::filter(eval_metric==x)
            , mapping = ggplot2::aes(xintercept = value)
            , color = "gray44", linetype = "dashed"
          ) +
          ggplot2::geom_text(
            data = xxxdf_temp %>% dplyr::filter(eval_metric==x)
            , mapping = ggplot2::aes(
              x = -Inf, y = Inf # always in upper left?
              # x = value, y = 0
              , label = value_lab
            )
            , hjust = -0.1, vjust = 1 # always in upper left?
            # , hjust = -0.1, vjust = -5
            , size = 2.5
          ) +
          ggplot2::facet_grid(
            cols = dplyr::vars(pile_metric), rows = dplyr::vars(eval_metric)
            , scales = "free"
            , switch = "y" # moves y facet label to left
          ) +
          ggplot2::labs(x = "", y = "") +
          ggplot2::theme_light() +
          ggplot2::theme(
            legend.position = "none"
            , strip.placement = "outside"
            , strip.text.y = ggplot2::element_text(size = 11, color = "black", face = "bold")
            , strip.text.y.left = ggplot2::element_text(angle = 0)
            , strip.background.y = ggplot2::element_rect(color = NA, fill = NA)
            , strip.text.x = ggplot2::element_text(size = 8, color = "black", face = "bold")
            , axis.text.x = ggplot2::element_text(size = 6.5)
            , axis.text.y = ggplot2::element_blank()
            , axis.ticks.y = ggplot2::element_blank()
            , panel.grid.major.y = ggplot2::element_blank()
            , panel.grid.minor.y = ggplot2::element_blank()
          )
        
        if(my_show_rug){
          p <- p + ggplot2::geom_rug(mapping = ggplot2::aes(x = value, color = pile_metric))
        }
      }
      
      if(x %in% c("RRMSE", "MAPE")){
        return(p+ggplot2::scale_x_continuous(labels = scales::percent_format(accuracy = 1)))
      }else{
        return(p)
      }
    })
  # combine
  # plt_list_temp
  patchwork::wrap_plots(
      plt_list_temp
      , ncol = 1
  ) + patchwork::plot_annotation(
      # title = paste0(aoi_sf$study_site_lab, " - ALS")
      subtitle = my_subtitle
        # paste0(
        #      "distribution of pile form quantification accuracy metrics for all"
        #      # , scales::percent(1-pct_rank_th_top,accuracy=1)
        #      , " (n="
        #      , scales::comma(nrow(param_combos_gt_agg), accuracy = 1)
        #      , ") "
        #      , "parameter combinations tested"
        #   )
      # , caption = "hey"
      , theme = ggplot2::theme(
        plot.subtitle = ggplot2::element_text(size = 8)
      )
    )
}

now plot the distribution of F-score, Recall, and Precision values obtained in our test sensitivity testing

this is only a test

plt_form_quantification_dist(param_combos_gt_agg)

we put in a special plot format if we only have a few observations (i.e. <=15)

this is only a test

plt_form_quantification_dist(param_combos_gt_agg %>% dplyr::slice_sample(n=11))

ok.