Section 5 Raster-based: Watershed Segmentation

We’ll first attempt to detect slash piles using raster-based methods with the DTM and DSM. These raster-based approaches are simple and efficient but can be limited in complex forest structures where piles might be occluded by overstory trees and rasterization simplifies/removes some of the rich 3D information in the point cloud. For slash piles, which are often irregular and may not have a distinct “treetop” equivalent, common CHM-based tree detection methods might be less directly applicable. We’ll see…

When used for individual tree detection (ITD), this technique treats the CHM as a topographic surface, where local maxima represent tree tops and valleys represent crown boundaries. A “water source” is conceptually placed at each local lowest point, and the surface is “flooded.” Barriers are generated where different “water sources” meet, forming watershed lines that delineate individual tree crowns.

two possible approaches for segmenting piles are to: 1) segment individual trees using a top-down approach and then use the canopy cover as a mask to then identify slash piles; 2) use a bottoms-up approach to perform slash pile segmentation on a lower “slice” of the CHM based on an expected maximum height of a pile

we’ll first try the bottoms-up approach using a CHM slice with a maximum height of 4 m based on the highest point in the point cloud (3.65 m) of the slash pile which had the tallest measurement in the field (6.4 m). The 6.4 m (21 ft) field measured height seems unlikely even for a machine pile, so we’ll use the point cloud measured height which will align with the CHM data.

check out the lower slice of the CHM

# set the max expected pile height
max_ht_m <- 4
# lower CHM slice
cloud2raster_ans$chm_rast %>% 
  terra::clamp(upper = max_ht_m, lower = 0, values = F) %>% 
  terra::plot(col = viridis::plasma(100), axes = F)

already, it looks like the piles should be distinguishable objects from this data

5.1 Method Demonstration

let’s run watershed segmentation using lidR::watershed() which is based on the bioconductor package EBIimage

watershed_ans <- lidR::watershed(
    chm = cloud2raster_ans$chm_rast %>% 
      terra::clamp(upper = max_ht_m, lower = 0, values = F)
    , th_tree = 0.5
  )()
# this is a raster
watershed_ans

## class       : SpatRaster 
## size        : 2740, 3473, 1  (nrow, ncol, nlyr)
## resolution  : 0.2, 0.2  (x, y)
## extent      : 499264.2, 499958.8, 4317599, 4318147  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613) 
## source(s)   : memory
## varname     : chm_0.2m 
## name        : focal_mean 
## min value   :          1 
## max value   :      10007

each value should be a unique “segment” which we can refine based on rules of expected size and shape of piles

terra::freq(watershed_ans) %>% 
  dplyr::slice_sample(n = 10)

##    layer value count
## 1      1  6846    11
## 2      1  8003    16
## 3      1  2435     4
## 4      1  6953     4
## 5      1  1685     6
## 6      1  7978     9
## 7      1  6768    12
## 8      1  1403    54
## 9      1   511     5
## 10     1  3321     6

where the “value” is the segment identifier and the count is the number of raster cells assigned to that segment

let’s plot the raster return from the watershed segmentation

watershed_ans %>% 
  terra::plot(
    col = c(
      viridis::turbo(n = floor(terra::minmax(watershed_ans)[2]/3))
      , viridis::viridis(n = floor(terra::minmax(watershed_ans)[2]/3))
      , viridis::cividis(n = floor(terra::minmax(watershed_ans)[2]/3))
    ) %>% sample()
    , legend = F
    , axes = F
  )

we can quickly calculate the area of the segments which we could use as a filter

min_area_m2 <- 2
# Two standard US parking spaces, typically measuring 9 feet by 18 feet, 
# are roughly equivalent to 30.25 square meters. Each space is approximately 15.125 square meters.
# 15.125*3
max_area_m2 <- 50
# list of segments that meet size threshold
keep_segs_temp <- watershed_ans %>% 
  terra::freq() %>% 
  dplyr::mutate(area_m2 = count * (terra::res(watershed_ans) %>% .[1:2] %>% prod()) ) %>% 
  dplyr::filter(
    area_m2 >= min_area_m2 & 
    area_m2 <= max_area_m2
  ) %>%
  dplyr::pull(value)
# filter the raster and plot it
watershed_ans <- watershed_ans %>% 
  terra::subst(from = keep_segs_temp, to = keep_segs_temp, others = NA)

plot it with the watershed segmented piles (orange) and the actual piles (white)

# plot it 
terra::plotRGB(ortho_rast, stretch = "lin", colNA = "transparent")
terra::plot(
  watershed_ans %>% 
    terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T)
  , add = T, border = "orangered", col = NA, lwd = 1.2
)
terra::plot(
  slash_piles_polys %>% 
    sf::st_transform(terra::crs(ortho_rast)) %>% terra::vect()
  , add = T, border = "white", col = NA, lwd = 1
)

nice…we are getting so close.

5.1.1 Geometric filtering of watershed segments

now let’s try to filter based on the geometric properties of the watershed-detected segments.

ideally, we want objects that: i) meet the height threshold over the entire surface of the segment (no doughnuts); ii) are not irregularly shaped (relatively few inward angles); and iii) are circular in shape

we’ll make a convex hull of the polygons generated from a raster to smooth out the square edges and any inward curves or indentations, resulting in a boundary that’s always convex (no inward angles). using a convex hull we will be able to filter out:

watershed detected segments that were actually lower branches of a tree. these will be shaped like a doughnut with circular shape but a hole in the center
watershed detected segments that are irregularly shaped like coarse woody debris that was not organized into piles by humans

let’s convert the watershed-detected segments from raster to vector data and create a convex hull of the shapes for comparison

# vectors of segments
watershed_ans_poly <-
  watershed_ans %>% 
  terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>% 
  setNames("pred_id") %>%
  sf::st_as_sf() %>% 
  sf::st_simplify() %>% 
  sf::st_make_valid() %>% 
  dplyr::filter(sf::st_is_valid(.)) %>% 
  dplyr::mutate(treeID=pred_id) %>% 
  cloud2trees::simplify_multipolygon_crowns() %>% 
  dplyr::select(-treeID)
# convex hulls of segments
watershed_ans_poly_chull <-
  watershed_ans_poly %>% 
  sf::st_convex_hull() %>% 
  sf::st_simplify() %>% 
  sf::st_make_valid() %>% 
  dplyr::filter(sf::st_is_valid(.))

lets make an example area that we’ll use to demonstrate the filtering process of the watershed detected segments

aoi_temp <-
  watershed_ans_poly %>% 
  # dplyr::filter(pred_id==241) %>%
  # dplyr::filter(pred_id==11916) %>%
  dplyr::slice_sample(n = 1) %>%
  sf::st_bbox() %>% 
  sf::st_as_sfc() %>% 
  sf::st_buffer(55)
# list of examples
pred_id_temp <- watershed_ans_poly %>% 
  sf::st_intersection(aoi_temp) %>% 
  dplyr::pull(pred_id)
# plot it
ortho_plt_temp <- 
  ortho_plt_fn(
    stand = aoi_temp
    , buffer = 5
  )

plot our example watershed detected segments as vectors (pink) and convex hull of the segments (orange) compared with the ground truth piles (blue)

ortho_plt_temp +
  ggplot2::geom_sf(
    data = watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::mutate(pred_id = as.factor(pred_id))
    , fill = "hotpink"
    , alpha = 0.6
    , lwd = 0
    , color = NA
  ) +
  ggplot2::geom_sf(
    data = watershed_ans_poly_chull %>% 
      dplyr::filter(pred_id %in% pred_id_temp)
    , color = "orangered"
    , fill = NA
  ) +
  ggplot2::geom_sf(
    data = slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% 
      sf::st_intersection(aoi_temp)
    , fill = NA, color = "blue", lwd = 0.3
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "none")

let’s first filter out segments that have holes in them or are very irregularly shaped by comparing the area of the polygon and convex hull

# min required overlap between the predicted pile and the convex hull of the predicted pile
pct_chull_overlap <- 0.7
# compare areas
watershed_keep_overlaps_chull_pred_id <- watershed_ans_poly %>% 
  dplyr::mutate(poly_area_m2 = sf::st_area(.) %>% as.numeric()) %>% 
  dplyr::inner_join(
    watershed_ans_poly_chull %>% 
      dplyr::mutate(chull_area_m2 = sf::st_area(.) %>% as.numeric()) %>%
      sf::st_drop_geometry()
    , by = "pred_id"
  ) %>% 
  dplyr::mutate(
    pct_chull = poly_area_m2/chull_area_m2
  ) %>% 
  dplyr::filter(
    pct_chull >= pct_chull_overlap
  ) %>% 
  dplyr::pull(pred_id)

which piles meet the minimum overlap threshold (black outline) between the segmented polygon and the convex hull?

ortho_plt_temp +
  # ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::mutate(pred_id = as.factor(pred_id))
    , fill = "hotpink"
    , alpha = 0.6
    , lwd = 0
    , color = NA
  ) +
  ggplot2::geom_sf(
    data = watershed_ans_poly_chull %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::mutate(
        meets_overlap = pred_id %in% watershed_keep_overlaps_chull_pred_id
      )
    , ggplot2::aes(color = meets_overlap)
    , fill = NA
  ) +
  ggplot2::geom_sf(
    data = slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% 
      sf::st_intersection(aoi_temp)
    , fill = NA, color = "blue", lwd = 0.3
  ) +
  ggplot2::scale_color_manual(values = c("red","black")) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "top")

that looks like it did what we wanted it to do (filter out segments with holes and irregular shapes) let’s look at the entire area again after applying this filter plotting the remaining watershed segmented piles (orange) and the actual piles (white)

# plot it 
terra::plotRGB(ortho_rast, stretch = "lin", colNA = "transparent")
terra::plot(
  watershed_ans_poly %>% 
    dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
    terra::vect()
  , add = T, border = "orangered", col = NA, lwd = 1.2
)
terra::plot(
  slash_piles_polys %>% 
    sf::st_transform(terra::crs(ortho_rast)) %>% terra::vect()
  , add = T, border = "white", col = NA, lwd = 1
)

finally, let’s apply a circle-fitting algorithm to remove non-circular segments from the remaining segments.

Least squares circle fitting is a method to find the circle that best approximates a set of data points by minimizing the sum of the squared distances between the points and the circle. The lidR::fit_circle() function finds the best-fitting flat, horizontal circle for a group of 3D points, even if some of those points are messy or don’t quite fit. It determines the circle’s center and size, and also provides an “angular range” to show how much of a complete circle the points actually form, which is a more reliable measure than a simple error value (e.g. RMSE).

The “angular range” tells you how much of a complete circle the points in the watershed-detected segment actually cover. Imagine drawing a circle, and then only having points along a part of its edge, here’s how to interpret it:

360 degrees suggests the points form a full, unbroken circle, like the base of a perfectly round slash pile.
180 degrees would mean the points only form a half-circle or a semi-circle.

A smaller range (e.g., 90 degrees) indicates just a partial arc or a small curve. This can help us determine if a group of points truly represents a circular shape, which is useful for identifying objects like slash piles that are expected to have a round or conical base.

we’ll define a function to pass our sf polygon data of watershed detected segments and return a sf data of the fitted circles

##########################
# 1)
# function to return sf circle from xy center and radius in given crs
# to handle return from common circle fitting algorithms
##########################
point_xy_radius_to_circle_sf <- function(
  center_x
  , center_y
  , radius
  , crs = NULL
) {
  if(is.null(crs)){stop("need a crs, guy")}
  # create a point geometry object
  center_point <- sf::st_point(c(center_x, center_y))
  # create an sf object from the point
  center_sf <- sf::st_sf(
    data.frame(
      center_x = center_x
      , center_y = center_y
      , radius = radius
    )
    , geometry = sf::st_sfc(center_point)
    , crs = crs
  )
  # create the circle geometry by buffering the point
  circle_sf <- sf::st_buffer(center_sf, dist = radius)
  return(circle_sf)
}
##########################
# 2)
# function to generate 2d xy points from polygon feature
# to pass to common circle fitting algorithms
# !!! only works with a singular polygon at a time
##########################
poly_to_points <- function(
  sf_data
  , as_spatial = F # if set to F, returns xy dataframe; if T returns sf data
  , simplify_multipolygons = F # if set to T, multipolygons are simplified by keeping the largest segment
) {
  if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
  # just work with the first
  if(nrow(sf_data)>1){stop("this function only works with a single record at a time")}
  # simplify_multipolygons
  if(simplify_multipolygons){
    sf_data <- sf_data %>% 
      dplyr::mutate(treeID=1) %>% 
      cloud2trees::simplify_multipolygon_crowns() %>% 
      dplyr::select(-treeID)
  }
  # get point coordinates
  xy_temp <- 
    sf_data %>% 
    sf::st_coordinates() %>% 
    dplyr::as_tibble() %>% 
    dplyr::rename_with(tolower) %>% 
    dplyr::select(x,y) %>% 
    dplyr::mutate(z=0)
  # as_spatial
  if(as_spatial){
    xy_temp <- xy_temp %>% 
      sf::st_as_sf(coords = c("x", "y"), crs = sf::st_crs(sf_data), remove = F)
  }
  return(xy_temp)
}
# watershed_ans_poly %>% 
#   dplyr::filter(pred_id %in% c(7717)) %>%
#   # poly_to_points(as_spatial = T) %>% 
#   poly_to_points(as_spatial = F) %>% 
#   ggplot() + 
#     # geom_sf()
#     geom_point(aes(x=x,y=y))

##########################
# 3)
# function to combine poly_to_points, lidR::fit_circle, and point_xy_radius_to_circle_sf
# !!! only works with a singular polygon at a time
##########################
poly_circle_fit <- function(
  poly
  # if set to T, multipolygons are simplified by keeping the largest segment
  , simplify_multipolygons = F
  # number of iterations for the RANSAC fitting algorithm
  , num_iterations = 100
  # threshold value; points are considered inliers if their residuals are below this value
  , inlier_threshold = 0.01
) {
  # poly_to_points
  poly_to_points_ans <- poly_to_points(poly, as_spatial = F, simplify_multipolygons = simplify_multipolygons)
  # fit_circle
  fit_circle_ans <- lidR::fit_circle(
    points = poly_to_points_ans %>% as.matrix()
    # number of iterations for the RANSAC fitting algorithm
    , num_iterations = num_iterations
    # threshold value; points are considered inliers if their residuals are below this value
    , inlier_threshold = inlier_threshold
  )
  # point_xy_radius_to_circle_sf
  ans <- point_xy_radius_to_circle_sf(
    center_x = fit_circle_ans$center_x
    , center_y = fit_circle_ans$center_y
    , radius = fit_circle_ans$radius
    , crs = sf::st_crs(poly)
  )
  # add other vars
  ans <- ans %>% 
    dplyr::mutate(
      covered_arc_degree = fit_circle_ans$covered_arc_degree
      , percentage_inlier = fit_circle_ans$percentage_inlier
      , percentage_inside = fit_circle_ans$percentage_inside
      # , inliers = fit_circle_ans$inliers
    )
  # return
  return(ans)
}
# watershed_ans_poly %>%
#   dplyr::filter(pred_id == 7717) %>%
#   poly_circle_fit() %>%
#   ggplot() + geom_sf() + 
#   geom_sf(
#     data = filtered_watershed_ans_poly %>% dplyr::filter(pred_id == 7717) %>% poly_to_points(as_spatial = T)
#   )

# watershed_ans_poly %>%
#   dplyr::filter(pred_id == 7717) %>%
#   poly_circle_fit() %>% 
#   dplyr::glimpse()

##########################
# 4)
# function to combine poly_to_points, lidR::fit_circle, and point_xy_radius_to_circle_sf
# !!! only works with a singular polygon at a time
##########################
sf_data_circle_fit <- function(sf_data) {
  if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
  # apply poly_circle_fit() to each row to get fitted circle sf data
  cf <- sf_data %>% 
    dplyr::ungroup() %>% 
    dplyr::mutate(id_xxx = dplyr::row_number()) %>% 
    dplyr::nest_by(id_xxx) %>% 
    dplyr::mutate(
      circle_fit = poly_circle_fit(poly = data)
    ) %>% 
    dplyr::pull(circle_fit)
  # combine with original data but drop original geom
  df <- sf_data %>% 
    sf::st_drop_geometry() %>% 
    dplyr::bind_cols(cf) %>% 
    sf::st_as_sf(crs = sf::st_crs(sf_data))
  # return
  return(df)
}

let’s apply the sf_data_circle_fit() function we just defined fits the best circle using lidR::fit_circle() to each watershed detected segment to get a spatial data frame with the best fitting circle for each segment

# apply the sf_data_circle_fit() which takes each segment polygon, transforms it to points, and the fits the best circle
watershed_ans_poly_circle_fit <- watershed_ans_poly %>% 
  dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
  sf_data_circle_fit()
# what is this?
watershed_ans_poly_circle_fit %>% dplyr::glimpse()

## Rows: 997
## Columns: 8
## $ pred_id            <dbl> 8, 12, 20, 24, 32, 38, 50, 52, 53, 54, 55, 57, 76, …
## $ center_x           <dbl> 499945.1, 499740.5, 499719.8, 499778.4, 499466.5, 4…
## $ center_y           <dbl> 4317699, 4317897, 4317852, 4318049, 4317615, 431806…
## $ radius             <dbl> 8.600242e-01, 1.117770e+00, 1.250685e+00, 1.001747e…
## $ covered_arc_degree <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ percentage_inlier  <dbl> 0.17142857, 0.17142857, 0.11111111, 0.16666667, 0.1…
## $ percentage_inside  <dbl> 0.40000000, 0.42857143, 0.49206349, 0.35416667, 0.1…
## $ geometry           <POLYGON [m]> POLYGON ((499946 4317699, 4..., POLYGON ((4…

let’s check out the distribution of the metrics that quantify the fit of the circle

watershed_ans_poly_circle_fit %>% 
  sf::st_drop_geometry() %>% 
  dplyr::select(-c(pred_id,center_x,center_y,radius)) %>% 
  tidyr::pivot_longer(dplyr::everything()) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(x = value, fill = name)) +
    ggplot2::geom_density(color = NA) +
    ggplot2::facet_wrap(facets = ggplot2::vars(name), scales = "free") +
    ggplot2::scale_fill_viridis_d(option = "rocket", begin = 0.2, end = 0.8, alpha = 0.8) +
    ggplot2::theme_light() +
    ggplot2::theme(
      axis.text.y = ggplot2::element_blank()
      , axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)
      , axis.title.x = ggplot2::element_blank()
      , legend.position = "none"
      , strip.text = ggplot2::element_text(color = "black", size = 10)
    )

let’s look at the best fitting circles using the remaining piles from our example above

example of watershed detected segments as vectors (pink) and best fitting circle of the segments (orange) compared with the ground truth piles (blue)

ortho_plt_temp +
  # ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
      dplyr::mutate(pred_id = as.factor(pred_id))
    , fill = "hotpink"
    , alpha = 0.6
    , lwd = 0
    , color = NA
  ) +
  ggplot2::geom_sf(
    data = watershed_ans_poly_circle_fit %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
      sf::st_intersection(aoi_temp)
    , color = "orangered"
    , fill = NA
  ) +
  ggplot2::geom_sf(
    data = slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% 
      sf::st_intersection(aoi_temp)
    , fill = NA, color = "blue", lwd = 0.3
  ) +
  ggplot2::theme_void()

the best fitting circles on the linear watershed detected segements are not very well fitting, we can filter using the intersection over union (IoU) between the circle and the predicted segment. we’ll use the IoU function we defined in this earlier section.

watershed_circle_fit_iou <- 
  watershed_keep_overlaps_chull_pred_id %>% 
  purrr::map(\(x)
    ground_truth_single_match(
      gt_inst = watershed_ans_poly %>% 
        dplyr::filter(pred_id == x)
      , gt_id = "pred_id"
      , predictions = watershed_ans_poly_circle_fit %>% 
        dplyr::filter(pred_id == x) %>% 
        dplyr::select(pred_id) %>% # keeping other columns causes error?
        dplyr::rename(circ_pred_id = pred_id)
      , pred_id = "circ_pred_id"
      , min_iou_pct = 0
    )    
  ) %>% 
  dplyr::bind_rows()
# what did we get?
watershed_circle_fit_iou %>% dplyr::glimpse()

## Rows: 992
## Columns: 5
## $ pred_id      <dbl> 8, 12, 20, 24, 32, 38, 50, 52, 53, 54, 55, 57, 76, 89, 10…
## $ i_area       <dbl> 1.3555093, 2.0641867, 3.3892187, 2.2842953, 0.1490024, 3.…
## $ u_area       <dbl> 3.407082e+00, 4.499157e+00, 6.762657e+00, 5.346846e+00, 5…
## $ iou          <dbl> 3.978505e-01, 4.587941e-01, 5.011667e-01, 4.272229e-01, 2…
## $ circ_pred_id <dbl> 8, 12, 20, 24, 32, 38, 50, 52, 53, 54, 55, 57, 76, 89, 10…

what is the distribution of IoU of the watershed segments and the best fit circle of those segments?

watershed_circle_fit_iou %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(x = iou)) +
  ggplot2::geom_density(color = NA, fill = "navy", alpha = 0.8) +
  ggplot2::labs(
    x = "IoU of the watershed segments and the best fit circle"
  ) +
  ggplot2::scale_x_continuous(labels = scales::percent) +
  ggplot2::theme_light() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank()
    , legend.position = "none"
    , strip.text = ggplot2::element_text(color = "black", size = 10)
  )

let’s color our predicted polygons by the IoU

ortho_plt_temp +
  # ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
      dplyr::left_join(watershed_circle_fit_iou, by = "pred_id")
    , mapping = ggplot2::aes(fill = iou)
    , alpha = 0.8
    , lwd = 0
    , color = NA
  ) +
  ggplot2::geom_sf(
    data = watershed_ans_poly_circle_fit %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      sf::st_intersection(aoi_temp)
    , color = "orangered"
    , fill = NA
  ) +
  ggplot2::geom_sf(
    data = slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% 
      sf::st_intersection(aoi_temp)
    , fill = NA, color = "blue", lwd = 0.3
  ) +
  ggplot2::scale_fill_fermenter(
    n.breaks = 10 # 10 use 10 if can go full range 0-1
    , palette = "PuOr" # "BrBG"
    , direction = 1
    , limits = c(0,1) # use c(0,1) if can go full range 0-1
    , labels = scales::percent
    , na.value = "sienna4"
  ) +
  ggplot2::labs(fill="IoU") +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "top"
    , legend.text = ggplot2::element_text(size = 6, angle = 90, vjust = 0.5)
  )

we’ll set a threshold for the minimum IoU to further filter for segments that are approximately round, this filter should remove linear objects from the watershed detections

# min required IoU between the predicted pile and the best fit circle of the predicted pile
pct_iou_circle_fit <- 0.5
# compare iou
watershed_keep_circle_fit_pred_id <- watershed_circle_fit_iou %>% 
  dplyr::filter(iou>=pct_iou_circle_fit) %>% 
  dplyr::pull(pred_id)

let’s check out the remaining watershed detected piles after filtering out the irregularly shaped segments (filtered using the convex hull) and filtering out the non-circular segments (filtered using circle fitting)

example of filtered watershed detected segments as vectors (pink) compared with the ground truth piles (blue)

ortho_plt_temp +
  ggplot2::geom_sf(
    data = watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% pred_id_temp) %>% 
      dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
      dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id) %>% 
      dplyr::mutate(pred_id = as.factor(pred_id))
    , fill = "hotpink"
    , alpha = 0.6
    , lwd = 0
    , color = NA
  ) +
  ggplot2::geom_sf(
    data = slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% 
      sf::st_intersection(aoi_temp)
    , fill = NA, color = "blue", lwd = 0.3
  ) +
  ggplot2::theme_void()

As a final step, we’ll use the convex hull shapes of our remaining segments. This helps to smooth out the often “blocky” edges of raster-based segments, which can look like they were generated in Minecraft. Additionally, by removing any segments with overlapping convex hull shapes, we can likely reduce false detections that are actually groups of small trees or shrubs, ensuring our results represent singular slash piles.

# make a function to remove overlapping polygons from a sf data frame
st_remove_overlaps <- function(sf_data) {
  if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
  # if not polygons
  if( !(sf::st_is(sf_data, type = c("POLYGON", "MULTIPOLYGON")) %>% all()) ){
    stop(paste0(
      "`sf_data` data must be an `sf` class object with POLYGON geometry (see [sf::st_geometry_type()])"
    ))
  }
  if(nrow(sf_data)<=1){return(sf_data)}
  # combine all touching polygons and keep the ones that overlap multiple from the original polygons
  comb_temp <- sf_data %>% 
    dplyr::ungroup() %>% 
    sf::st_union(by_feature = F) %>% 
    sf::st_cast("POLYGON") %>% 
    sf::st_as_sf() %>% 
    sf::st_set_geometry("geometry") %>% 
    sf::st_set_crs(sf::st_crs(sf_data)) %>% 
    dplyr::mutate(new_id = dplyr::row_number()) %>% 
    dplyr::select(new_id) 
  # identify overlaps
  overlap_temp <- comb_temp %>% 
    sf::st_intersection(sf_data) %>% 
    sf::st_drop_geometry() %>% 
    dplyr::group_by(new_id) %>% 
    dplyr::summarise(n_orig = dplyr::n()) %>% 
    dplyr::ungroup() %>% 
    dplyr::filter(n_orig>=2) %>% 
    dplyr::pull(new_id)
  if(length(overlap_temp)==0){return(sf_data)}
  # just get the overlaps
  comb_temp <- comb_temp %>% 
    dplyr::filter(new_id %in% overlap_temp) %>% 
    sf::st_union()
  # remove from the original data  
  return(sf::st_difference(sf_data,comb_temp))
}

save this filtered data as our predictions

# save this filtered data as our predictions
predicted_watershed_piles_sf <- watershed_ans_poly_chull %>% 
  dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
  dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id) %>% 
  st_remove_overlaps()

that looks like it did what we wanted it to do (filter out segments that are non-circular)

let’s look at the entire area again after applying this filter plotting the remaining watershed segmented piles (orange) and the actual piles (white)

# plot it 
terra::plotRGB(ortho_rast, stretch = "lin", colNA = "transparent")
terra::plot(
  predicted_watershed_piles_sf %>% 
    terra::vect()
  , add = T, border = "orangered", col = NA, lwd = 1.2
)
terra::plot(
  slash_piles_polys %>% 
    sf::st_transform(terra::crs(ortho_rast)) %>% terra::vect()
  , add = T, border = "white", col = NA, lwd = 1
)

nice! let’s save these data

predicted_watershed_piles_sf %>% 
  sf::st_write("../data/predicted_watershed_piles_sf.gpkg", append = F)
watershed_ans_poly %>% 
  sf::st_write("../data/watershed_ans_poly.gpkg", append = F)

5.1.2 Confusion matrix evaluation

We didn’t “train” any model here, just developed a rules-based method for detecting piles from aerial point cloud data. As such, we can evaluate the methods performance on the “full” set of ground truth pile data.

let’s see how we did given the list of predictions compared to the ground truth data using the confusion matrix matching process we outlined in this earlier section.

ground_truth_prediction_match_ans <- ground_truth_prediction_match(
  ground_truth = slash_piles_polys %>% 
    dplyr::arrange(desc(diameter)) %>% 
    sf::st_transform(sf::st_crs(predicted_watershed_piles_sf))
  , gt_id = "pile_id"
  , predictions = predicted_watershed_piles_sf
  , pred_id = "pred_id"
  , min_iou_pct = 0.05
)

how did our predictions do using this methodology?

# what did we get?
ground_truth_prediction_match_ans %>% 
  dplyr::count(match_grp) %>% 
  dplyr::mutate(pct = (n/sum(n)) %>% scales::percent(accuracy=0.1))

## # A tibble: 3 × 3
##   match_grp         n pct  
##   <ord>         <int> <chr>
## 1 omission         28 5.5% 
## 2 commission      328 64.2%
## 3 true positive   155 30.3%

let’s look at that spatially

pal_match_grp <- c(
  "omission"=viridis::cividis(3)[1]
  , "commission"= "gray88" #viridis::cividis(3)[2]
  , "true positive"=viridis::cividis(3)[3]
)
# plot it
ortho_plt_fn(stand = sf::st_bbox(predicted_watershed_piles_sf) %>% sf::st_as_sfc(), buffer = 10) +
# ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = 
      slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(watershed_ans_poly_chull)) %>% 
        dplyr::left_join(
          ground_truth_prediction_match_ans %>% 
            dplyr::select(pile_id,match_grp)
          , by = "pile_id"
        )
    , mapping = ggplot2::aes(fill = match_grp)
    , color = NA ,alpha=0.6
  ) + 
  ggplot2::geom_sf(
    data =
      predicted_watershed_piles_sf %>% 
        dplyr::left_join(
          ground_truth_prediction_match_ans %>%
            dplyr::select(pred_id,match_grp)
          , by = "pred_id"
        )
    , mapping = ggplot2::aes(fill = match_grp, color = match_grp)
    , alpha = 0
    , lwd = 0.3
  ) +
  ggplot2::scale_fill_manual(values = pal_match_grp, name = "") +
  ggplot2::scale_color_manual(values = pal_match_grp, name = "") +
  ggplot2::theme(legend.position = "top") +
  ggplot2::guides(
    fill = ggplot2::guide_legend(override.aes = list(color = c(NA,NA,pal_match_grp["commission"])))
    , color = "none"
  )

ggplot2::ggsave("../data/watershed_pred_match.jpg", height = 8, width = 10.5)

it looks like we did a really good job correctly predicting the location of actual piles (yellow) but that we incorrectly predicted pile locations at a relatively high rate. Our false positive predictions (i.e. commmissions) were frequently located in areas with quaking aspen (Populus tremuloides) which has many more short trees than the treated conifer areas. It also looks like some edge effects are influencing our results and these will likely improve when we filter for instances located within the treated stand.

let’s quickly look at the IoU values on the true positives

ground_truth_prediction_match_ans %>% dplyr::select(iou) %>% summary()

##       iou        
##  Min.   :0.1541  
##  1st Qu.:0.4485  
##  Median :0.5371  
##  Mean   :0.5381  
##  3rd Qu.:0.6410  
##  Max.   :0.8647  
##  NA's   :356

finally, let’s check our confusion matrix

confusion_matrix_temp <- agg_ground_truth_match(ground_truth_prediction_match_ans)
confusion_matrix_scores_temp <- confusion_matrix_scores_fn(confusion_matrix_temp)
# plot
confusion_matrix_temp %>% 
  dplyr::select(tidyselect::ends_with("_n")) %>% 
  tidyr::pivot_longer(dplyr::everything()) %>% 
  dplyr::mutate(
    presence = ifelse(name %in% c("tp_n", "fn_n"),1,0)
    , estimate = ifelse(name %in% c("tp_n", "fp_n"),1,0)
  ) %>% 
  dplyr::mutate(
    is_false = as.factor(ifelse(presence!=estimate,1,0))
    , presence_fact = factor(presence,levels = 0:1,labels = c("Observed Absent", "Observed Present"))
    , estimate_fact = factor(estimate,levels = 0:1,labels = c("Predicted Absent", "Predicted Present"))
    , pct = value/sum(value)
  ) %>% 
  ggplot(mapping = aes(y = estimate_fact, x = presence_fact)) +
  geom_tile(aes(fill = is_false), color = "white",alpha=0.8) +
  geom_text(aes(label = scales::comma(value,accuracy=1)), vjust = 1,size = 8) +
  geom_text(aes(label = scales::percent(pct,accuracy=0.1)), vjust = 3.5, size=5) +
  scale_fill_manual(values= c("turquoise","tomato2")) +
  scale_x_discrete(position = "top") +
  labs(
    y = "Predicted"
    , x = "Observed"
    , subtitle = paste0(
      "True positive rate (recall) = "
        , confusion_matrix_scores_temp$recall %>% 
          scales::percent(accuracy = 0.1)
      , "\nPrecision (PPV) = "
        , confusion_matrix_scores_temp$precision %>% 
          scales::percent(accuracy = 0.1)
      , "\nF1-score = "
        , confusion_matrix_scores_temp$f_score %>% 
          scales::percent(accuracy = 0.1)
    )
  ) +
  theme_light() + 
  theme(
    legend.position = "none"
    , panel.grid = element_blank()
    , plot.title = element_text(size = 9)
    , plot.subtitle = element_text(size = 9)
  )

compared to our OBIA approach using the spectral data alone, we improved the recall rate to 84.7% using only the point cloud data without any spectral information. This means that we correctly identified actual piles more often. Our F1-Score of 46.5% also indicates better balance between correctly identifying positive cases (high recall) and minimizing incorrect positive predictions (high precision).

5.1.3 Pile area RMSE

let’s calculate the root mean square error (RMSE) for the pile area in meters squared for the true positive predictions. RMSE captures the ability of the approach to properly represent the 2D spatial coverage of the slash pile.

pile area RMSE is calculated as

\[ \textrm{RMSE} = \sqrt{ \frac{ \sum_{i=1}^{N} (y_{i} - \hat{y_{i}})^{2}}{N}} \]

Where \(N\) is equal to the total number of correctly matched piles, \(y_i\) is the image annotated pile area and \(\hat{y_i}\) is the predicted pile area of \(i\) in square meters.

area_rmse_temp <- ground_truth_prediction_match_ans %>% 
  # add area of gt
  dplyr::inner_join(
    slash_piles_polys %>% 
      sf::st_transform(sf::st_crs(predicted_watershed_piles_sf)) %>%
      dplyr::mutate(gt_area = sf::st_area(.) %>% as.numeric()) %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(pile_id,gt_area)
  ) %>% 
  # add area of pred
  dplyr::inner_join(
    predicted_watershed_piles_sf %>% 
      dplyr::mutate(pred_area = sf::st_area(.) %>% as.numeric()) %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(pred_id,pred_area)
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    area_diff = pred_area-gt_area
    , pct_area_diff = area_diff/gt_area
  )

first, we’ll look at the percent difference in area for each pile spatially

# look at this spatially
ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = predicted_watershed_piles_sf %>% 
      dplyr::inner_join(
        area_rmse_temp %>% dplyr::select(pred_id,area_diff,pct_area_diff)
      )
    , mapping = ggplot2::aes(fill = pct_area_diff)
    , color = NA
  ) +
  ggplot2::geom_sf(
    data = slash_piles_polys %>% 
      dplyr::inner_join(
        area_rmse_temp %>% dplyr::select(pile_id)
      )
    , fill = NA, color = "blue"
  ) +
  ggplot2::scale_fill_stepsn(
    n.breaks = 7
    , colors = scales::pal_div_gradient()(seq(1, 0, length.out = 7))
    , limits = c(-max(abs(fivenum(area_rmse_temp$pct_area_diff))),max(abs(fivenum(area_rmse_temp$pct_area_diff))))
    , labels = scales::percent_format(accuracy = 1)
    , show.limits = T
  ) +
  ggplot2::labs(fill = "% difference area") +
  ggplot2::theme_void()

# and get a summary of the percent error
summary(area_rmse_temp$pct_area_diff)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.8459 -0.5512 -0.4430 -0.4287 -0.3221  0.3020

and let’s look at the distribution of the difference in area (m2) calculated as the predicted area minus the image-annotated area so that negative difference values mean our predictions were smaller and positive values mean our predictions were larger

# plot the difference of the area
area_rmse_temp %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = area_diff)
  ) +
  ggplot2::geom_density(fill = "gray", color = NA) +
  ggplot2::geom_vline(xintercept = median(area_rmse_temp$area_diff)) +
  ggplot2::annotate(
    "text", x = median(area_rmse_temp$area_diff), y = 0
    , label = paste("median:",scales::comma(median(area_rmse_temp$area_diff),accuracy=0.1),"m2")
    , hjust = 1.01, vjust = 1
  ) +
  ggplot2::labs(y="density",x="area difference (m2)", subtitle = "Difference in area between predicted and image-annotated slash piles (m2)") +
  ggplot2::theme_light() +
  ggplot2::theme(axis.text.y = ggplot2::element_blank())

  # ggplot2::geom_boxplot(outliers = F)

finally, let’s calculate area RMSE (m2)

area_rmse_temp %>% 
  dplyr::summarise(
    area_diff_sq = sum(area_diff^2, na.rm = T)
    , n = sum(ifelse(!is.na(area_diff),1,0))
  ) %>% 
  dplyr::mutate(
    rmse = dplyr::case_when(
      dplyr::coalesce(n,0)==0 ~ as.numeric(NA)
      , T ~ sqrt(area_diff_sq/n)
    )
  ) %>% 
  dplyr::select(rmse)

## # A tibble: 1 × 1
##    rmse
##   <dbl>
## 1  15.4

that’s not very good, we’ll have to check out if those image-annotated areas are accurate

5.2 Segmentation Sensitivity Testing

Now we’ll perform sensitivity testing of our rules-based slash pile detection methodology which uses a CHM input raster generated directly from the aerial point cloud data. This method does not require any training data for model building and can potentially be used broadly across different domains.

The rule-based method for slash pile detection using CHM raster data generally follows this outline:

CHM Generation: A Canopy Height Model (CHM) is generated, effectively representing a Digital Surface Model (DSM) with the ground removed, ensuring all values are heights above bare earth.
Height Filtering: A maximum height filter is applied to the CHM to retaining only raster cells below a maximum expected slash pile height (e.g. 4 m).
Candidate Segmentation: Watershed segmentation is then used on this filtered CHM to delineate candidate slash pile locations. Irregularity Filtering: Candidate pile locations are initially filtered to remove highly irregular shapes by assessing their overlap with their convex hull (e.g. >70% overlap). This step helps exclude lower tree branches (which can appear donut-shaped in the lower CHM slice) and unorganized coarse woody debris.
Circularity Filtering: Least squares circle fitting is applied to the remaining candidate segments, and segments are filtered based on a high Intersection over Union (IoU) with the best-fit circle (e.g., >50%). This removes non-circular features such as boulders and downed tree stems. Shape Refinement & Overlap Removal: Lastly, segments are smoothed using their convex hull to remove the “blocky” raster edges (like they were made in Minecraft). Overlapping convex hull shapes are then removed to prevent false positives from clustered small trees or shrubs, ensuring singular pile detections.

5.2.1 Watershed Pile Detection Function

Let’s package all of the steps we used into a single function which can possibly be integrated into the cloud2trees package.

The parameters are defined as follows:

max_ht_m : numeric. The maximum height (in meters) a slash pile is expected to be. This value helps us focus on a specific “slice” of the data, ignoring anything taller than a typical pile.
min_area_m2 : numeric. The smallest 2D area (in square meters) a detected pile must cover to be considered valid.
max_area_m2 : numeric. The largest 2D area (in square meters) a detected pile can cover to be considered valid.
convexity_pct : numeric. A value between 0 and 1 that controls how strict the filtering is for regularly shaped piles. A value of 1 means only piles that are perfectly smooth and rounded, with no dents or inward curves are kept. A value of 0 allows for both perfectly regular and very irregular shapes. This filter works alongside circle_fit_iou_pct to refine the pile’s overall shape.
circle_fit_iou_pct : numeric. A value between 0 and 1 that controls how the filtering is for circular pile shapes. Setting it to 1 means only piles that are perfectly circular are kept. A value of 0 allows for a wide range of shapes, including very circular and non-circular ones (like long, straight lines).
smooth_segs : logical. Setting this option to TRUE will: 1) smooth out the “blocky” edges of detected piles (which can look like they were made in Minecraft) by using their overall shape; and 2) remove any detected piles that overlap significantly with other smoothed piles to help ensure each detection is a single slash pile, not a cluster of small trees or shrubs.

slash_pile_detect_watershed <- function(
  chm_rast
  #### height and area thresholds for the detected piles
  # these should be based on data from the literature or expectations based on the prescription
  , max_ht_m = 4 # set the max expected pile height (could also do a minimum??)
  , min_area_m2 = 2 # set the min expected pile area
  , max_area_m2 = 50 # set the max expected pile area
  #### irregularity filtering
  # 1 = perfectly convex (no inward angles); 0 = so many inward angles
  # values closer to 1 remove more irregular segments; 
    # values closer to 0 keep more irregular segments (and also regular segments)
  # these will all be further filtered for their circularity and later smoothed to remove blocky edges
  # and most inward angles by applying a convex hull to the original detected segment
  , convexity_pct = 0.7 # min required overlap between the predicted pile and the convex hull of the predicted pile
  #### circularity filtering
  # 1 = perfectly circular; 0 = not circular (e.g. linear) but also circular
  # min required IoU between the predicted pile and the best fit circle of the predicted pile
  , circle_fit_iou_pct = 0.5
  #### shape refinement & overlap removal
  ## smooth_segs = T ... convex hulls of raster detected segments are returned, any that overlap are removed
  ## smooth_segs = F ... raster detected segments are returned (blocky) if they meet all prior rules
  , smooth_segs = T
) {
  # checks
  if(!inherits(chm_rast,"SpatRaster")){stop("`chm_rast` must be raster data with the class `SpatRaster` ")}
  # just get the first layer and "slice" the raster based on the height threshold
  chm_rast <- chm_rast %>% 
    terra::subset(subset = 1) %>% 
    terra::clamp(upper = max_ht_m, lower = 0, values = F)
  
  ########################################################################################
  ## 1) watershed segmentation
  ########################################################################################
    # let's run watershed segmentation using `lidR::watershed()` which is based on the bioconductor package `EBIimage`
    # return is a raster with the first layer representing the identified watershed segments
    watershed_ans <- lidR::watershed(
        chm = chm_rast
        , th_tree = 0.5 # this would be the minimum expected pile height
      )()
    names(watershed_ans) <- "pred_id"
    
  ########################################################################################
  ## 1.5) segmentation filtering using raster-based area and volume 
  ########################################################################################
    # first, calculate the area of each cell
    area_rast <- terra::cellSize(chm_rast)
    names(area_rast) <- "area_m2"
    # then, multiply area by the CHM (elevation) for each cell to get a raster with cell volumes
    vol_rast <- area_rast*chm_rast
    names(vol_rast) <- "volume_m3"
    # sum area within each segment to get the total area
    area_df <- terra::zonal(x = area_rast, z = watershed_ans, fun = "sum", na.rm = T)
    # sum volume within each segment to get the total volume
    vol_df <- terra::zonal(x = vol_rast, z = watershed_ans, fun = "sum", na.rm = T)
    # max ht within each segment to get the max ht
    ht_df <- terra::zonal(x = chm_rast, z = watershed_ans, fun = "max", na.rm = T) %>% 
      dplyr::rename(max_height_m=2)
    
    # let's convert the watershed-detected segments from raster to vector data 
    # vectors of segments
    watershed_ans_poly <-
      watershed_ans %>% 
      terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>% 
      sf::st_as_sf() %>% 
      sf::st_simplify() %>% 
      sf::st_make_valid() %>% 
      dplyr::filter(sf::st_is_valid(.)) %>% 
      dplyr::mutate(treeID=pred_id) %>% 
      cloud2trees::simplify_multipolygon_crowns() %>% 
      dplyr::select(-treeID)
    
    # add area and volume to our vector data  
    # we'll do this with a slick trick to perform multiple joins succinctly using purrr::reduce
    watershed_ans_poly <- 
      purrr::reduce(
        list(watershed_ans_poly, area_df, vol_df, ht_df)
        , dplyr::left_join
        , by = 'pred_id'
      ) %>% 
      dplyr::mutate(
        volume_per_area = volume_m3/area_m2
      ) %>% 
      # filter out the segments that don't meet the size thresholds
      dplyr::filter(
        dplyr::coalesce(area_m2,0) >= min_area_m2
        & dplyr::coalesce(area_m2,0) <= max_area_m2
      )
    
    if(dplyr::coalesce(nrow(watershed_ans_poly),0)<1){
      stop(paste0(
        "no segments detected using the given CHM and expected size thresholds"
        , "\n     try adjusting `max_ht_m`, `min_area_m2`, `max_area_m2` "
      ))
    }
    
    # and create a convex hull of the shapes for irregularity filtering and smoothing
    # convex hulls of segments
    watershed_ans_poly_chull <-
      watershed_ans_poly %>% 
      sf::st_convex_hull() %>% 
      sf::st_simplify() %>% 
      sf::st_make_valid() %>% 
      dplyr::filter(sf::st_is_valid(.))
    
  ########################################################################################
  ## 2) irregularity filtering
  ########################################################################################
    # let's first filter out segments that have holes in them 
    # or are very irregularly shaped by comparing the area of the polygon and convex hull

    # convexity_pct = min required overlap between the predicted pile and the convex hull of the predicted pile
    if(convexity_pct==0){
      watershed_keep_overlaps_chull_pred_id <- watershed_ans_poly$pred_id
    }else{
      # compare areas
      watershed_keep_overlaps_chull_pred_id <- watershed_ans_poly %>% 
        dplyr::mutate(poly_area_m2 = sf::st_area(.) %>% as.numeric()) %>% 
        dplyr::inner_join(
          watershed_ans_poly_chull %>% 
            dplyr::mutate(chull_area_m2 = sf::st_area(.) %>% as.numeric()) %>%
            sf::st_drop_geometry()
          , by = "pred_id"
        ) %>% 
        dplyr::mutate(
          pct_chull = poly_area_m2/chull_area_m2
        ) %>% 
        dplyr::filter(
          pct_chull >= convexity_pct
        ) %>% 
        dplyr::pull(pred_id)
    }
    
    if(
      identical(watershed_keep_overlaps_chull_pred_id, numeric(0))
      || any(is.null(watershed_keep_overlaps_chull_pred_id))
      || any(is.na(watershed_keep_overlaps_chull_pred_id))
      || length(watershed_keep_overlaps_chull_pred_id)<1
    ){
      stop(paste0(
        "no segments detected using the given CHM and irregularity expectations"
        , "\n     try adjusting `convexity_pct` "
      ))
    }
  
  ########################################################################################
  ## 3) circularity filtering
  ########################################################################################
    # let's apply a circle-fitting algorithm to remove non-circular segments from the remaining segments
    # let's apply the `sf_data_circle_fit()` function that
    # fits the best circle using `lidR::fit_circle()` to each watershed detected segment 
    # to get a spatial data frame with the best fitting circle for each segment

    # apply the sf_data_circle_fit() which takes each segment polygon, transforms it to points, and the fits the best circle
    watershed_ans_poly_circle_fit <- watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
      sf_data_circle_fit()
    
    # filter using the intersection over union (IoU) between the circle and the predicted segment. 
    # we'll use the IoU function we defined 
    # we map over this to only compare the segment to it's own best circle fit...not all
    # we should consider doing this in bulk.....another day
    watershed_circle_fit_iou <- 
      watershed_keep_overlaps_chull_pred_id %>% 
      purrr::map(\(x)
        ground_truth_single_match(
          gt_inst = watershed_ans_poly %>% 
            dplyr::filter(pred_id == x)
          , gt_id = "pred_id"
          , predictions = watershed_ans_poly_circle_fit %>% 
            dplyr::filter(pred_id == x) %>% 
            dplyr::select(pred_id) %>% # keeping other columns causes error?
            dplyr::rename(circ_pred_id = pred_id)
          , pred_id = "circ_pred_id"
          , min_iou_pct = 0 # set to 0 just to return pct
        )    
      ) %>% 
      dplyr::bind_rows()
    
    # threshold for the minimum IoU to further filter for segments that are approximately round, 
    # this filter should remove linear objects from the watershed detections
      # compare iou
      if(circle_fit_iou_pct==0){
        watershed_keep_circle_fit_pred_id <- watershed_keep_overlaps_chull_pred_id
      }else{
        watershed_keep_circle_fit_pred_id <- watershed_circle_fit_iou %>% 
          dplyr::filter(iou>=circle_fit_iou_pct) %>% 
          dplyr::pull(pred_id) 
      }
      
    if(
      identical(watershed_keep_circle_fit_pred_id, numeric(0))
      || any(is.null(watershed_keep_circle_fit_pred_id))
      || any(is.na(watershed_keep_circle_fit_pred_id))
      || length(watershed_keep_circle_fit_pred_id)<1
    ){
      stop(paste0(
        "no segments detected using the given CHM and circularity expectations"
        , "\n     try adjusting `circle_fit_iou_pct` "
      ))
    }
    
  ########################################################################################
  ## 4) shape refinement & overlap removal
  ########################################################################################
    # use the convex hull shapes of our remaining segments. 
    # This helps to smooth out the often 'blocky' edges of raster-based segments
    # , which can look like they were generated in Minecraft. 
    # Additionally, by removing any segments with overlapping convex hull shapes, 
    # we can likely reduce false detections that are actually groups of small trees or shrubs, 
    # ensuring our results represent singular slash piles.
    
    if(smooth_segs){
      return_dta <- watershed_ans_poly_chull %>% 
        dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
        dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id) %>% 
        st_remove_overlaps() %>% 
        # now we need to re-do the volume and area calculations
        dplyr::mutate(
          area_m2 = sf::st_area(.) %>% as.numeric()
          , volume_m3 = area_m2*volume_per_area
        )
    }else{
      return_dta <- watershed_ans_poly %>% 
        dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>% 
        dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id)
    }
      
  # return
    return(return_dta)
}

let’s test this real quick

chm_temp <- 
  cloud2raster_ans$chm_rast %>% 
  terra::crop(
    slash_piles_points %>% 
      sf::st_zm() %>% 
      dplyr::slice_sample(n=1) %>% 
      sf::st_buffer(100) %>% 
      sf::st_transform(terra::crs(cloud2raster_ans$chm_rast)) %>% 
      terra::vect()
  )
# terra::plot(chm_temp)
slash_pile_detect_watershed_ans_temp <- slash_pile_detect_watershed(chm_temp)
# what did we get?
slash_pile_detect_watershed_ans_temp %>% dplyr::glimpse()

## Rows: 69
## Columns: 6
## $ pred_id         <dbl> 11, 23, 43, 47, 100, 147, 202, 206, 216, 224, 231, 241…
## $ area_m2         <dbl> 3.12, 7.06, 3.06, 4.34, 5.20, 2.74, 3.48, 39.40, 2.86,…
## $ volume_m3       <dbl> 9.757329, 18.830532, 8.023260, 10.729498, 12.671159, 7…
## $ max_height_m    <dbl> 3.987000, 3.978000, 3.961000, 3.950000, 3.868167, 3.72…
## $ geometry        <POLYGON [m]> POLYGON ((499614.4 4317833,..., POLYGON ((4994…
## $ volume_per_area <dbl> 3.1273490, 2.6672142, 2.6219804, 2.4722346, 2.4367613,…

how does it look overlaid on the CHM?

terra::plot(chm_temp, col = viridis::plasma(100), axes = F)
terra::plot(slash_pile_detect_watershed_ans_temp %>% terra::vect(),add = T, border = "gray44", col = NA, lwd = 3)

how do the area and volume metrics look?

p1_temp <- slash_pile_detect_watershed_ans_temp %>% 
  ggplot2::ggplot() +
  ggplot2::geom_sf(mapping = ggplot2::aes(fill = area_m2)) +
  ggplot2::scale_fill_distiller(palette = "Blues", direction = 1) +
  ggplot2::labs(x="",y="") +
  ggplot2::theme_light() +
  ggplot2::theme(legend.position = "top", axis.text = ggplot2::element_blank())
p2_temp <- slash_pile_detect_watershed_ans_temp %>% 
  ggplot2::ggplot() +
  ggplot2::geom_sf(mapping = ggplot2::aes(fill = volume_m3)) +
  ggplot2::scale_fill_distiller(palette = "BuGn", direction = 1) +
  ggplot2::labs(x="",y="") +
  ggplot2::theme_light() +
  ggplot2::theme(legend.position = "top", axis.text = ggplot2::element_blank())
p3_temp <- slash_pile_detect_watershed_ans_temp %>% 
  ggplot2::ggplot() +
  ggplot2::geom_sf(mapping = ggplot2::aes(fill = max_height_m)) +
  ggplot2::scale_fill_distiller(palette = "YlOrBr", direction = 1) +
  ggplot2::labs(x="",y="") +
  ggplot2::theme_light() +
  ggplot2::theme(legend.position = "top", axis.text = ggplot2::element_blank())
p4_temp <- slash_pile_detect_watershed_ans_temp %>% 
  ggplot2::ggplot() +
  ggplot2::geom_sf(mapping = ggplot2::aes(fill = volume_per_area)) +
  ggplot2::scale_fill_distiller(palette = "PuRd", direction = 1) +
  ggplot2::labs(x="",y="") +
  ggplot2::theme_light() +
  ggplot2::theme(legend.position = "top", axis.text = ggplot2::element_blank())
(p1_temp + p2_temp) / (p3_temp + p4_temp)

the volume per area ratio (volume_per_area) quantifies the “effective” height or depth of that volume relative to the area it occupies; this ratio may not be very useful for anything other than scaling estimates to relate a three-dimensional quantity (volume) to a two-dimensional quantity (area)

5.2.2 Parameter sensitivity testing

Parameter sensitivity testing is a systematic process of evaluating how changes to the specific thresholds and settings within the detection methodology impact the final results. Since the method doesn’t use training data, its performance is highly dependent on these manually defined parameters. The objective of this testing is to understand the robustness of the method and identify the optimal combination of settings that yield the best detection performance, balancing factors like detection rate (recall) and accuracy of positive predictions (precision).

Here are the general steps to accomplish this testing:

Define Parameter Ranges and Increments: For each identified parameter, determine a reasonable range of values to test and the step size for incrementing through that range. For example, if a threshold is currently 0.5, you might test from 0.3 to 0.7 in increments of 0.05.
Automate the Detection Workflow: Create a script or automated process that can run your entire rules-based slash pile detection method using different combinations of these parameter values.
Execute Tests: Run the automated workflow for each defined parameter combination.
Collect Performance Metrics: For every run, calculate the key performance metrics against your image-annotated validation data (e.g. recall, precision, F-score)
Analyze and Visualize Results: Plot the performance metrics against the varying parameter values or combinations. This will help you visualize trends, identify sweet spots where performance is maximized, and understand trade-offs (e.g., increasing recall might decrease precision).
Select Optimal Parameters: Based on the analysis and the specific goals of the project (e.g., minimizing false negatives for safety, or maximizing overall accuracy), select the parameter set that provides the most desirable performance. This might involve choosing a balance between precision and recall, or prioritizing one over the other.

param_combos_df <-
  tidyr::crossing(
    max_ht_m = seq(from = 4, to =  6, by = 1) # set the max expected pile height (could also do a minimum??)
    , min_area_m2 = c(2) # seq(from = 1, to =  2, by = 1) # set the min expected pile area
    , max_area_m2 = seq(from = 40, to =  130, by = 10) # set the max expected pile area
    , convexity_pct = seq(from = 0.2, to =  0.8, by = 0.1) # min required overlap between the predicted pile and the convex hull of the predicted pile
    , circle_fit_iou_pct = seq(from = 0.3, to = 0.8, by = 0.1) 
  ) %>% 
  dplyr::mutate(rn = dplyr::row_number()) %>% 
  dplyr::relocate(rn)
# huh?
param_combos_df %>% dplyr::glimpse()

## Rows: 1,260
## Columns: 6
## $ rn                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ max_ht_m           <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
## $ min_area_m2        <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ max_area_m2        <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,…
## $ convexity_pct      <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 0…
## $ circle_fit_iou_pct <dbl> 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.3, 0.4, 0.5, 0.6, 0…

that might be too many combinations but we’ll give it a shot

5.2.2.1 Workflow over parameter combinations

to automate the detection workflow we can simply map these combinations over our slash_pile_detect_watershed() function. however, that would result in us running the same watershed segmentation process repeatedly with the same settings. as such, let’s make a function to efficiently perform the detection workflow using the same watershed segmentation and circle fitting for use with the different filtering combinations.

#########################################################################
# 1)
# function to apply watershed seg over a list of max_ht_m
# function to apply watershed segmentation over a list of different maximum height threshold (`max_ht_m`) which determines the "slice" of the CHM to use
#########################################################################
chm_watershed_seg_fn <- function(chm_rast,max_ht_m) {
  # get unique hts
  max_ht_m <- unique(as.numeric(max_ht_m))
  max_ht_m <- max_ht_m[!is.na(max_ht_m)]
  if(
    dplyr::coalesce(length(max_ht_m),0)<1
  ){stop("could not detect `max_ht_m` parameter setting which should be numeric list")}
  # checks
  if(!inherits(chm_rast,"SpatRaster")){stop("`chm_rast` must be raster data with the class `SpatRaster` ")}
  # just get the first layer
  chm_rast <- chm_rast %>% terra::subset(subset = 1)
  # map over the watershed
  # let's run watershed segmentation using `lidR::watershed()` which is based on the bioconductor package `EBIimage`
  # return is a raster with the first layer representing the identified watershed segments
  ret_rast <- max_ht_m %>% 
    purrr::map(\(x)
      lidR::watershed(
        chm = chm_rast %>% 
          terra::clamp(upper = x, lower = 0, values = F)
        , th_tree = 0.5 # this would be the minimum expected pile height
      )()
    ) %>% 
    c()
  # name
  names(ret_rast) <- as.character(max_ht_m)
  return(terra::rast(ret_rast)) # returns a raster stack
}

#########################################################################
# 2)
# now we need to define a function to apply the filters to the resulting watershed segmented rasters 
# based on a data frame of difference combinations of filters for irregularity using the comparison to the convex hull
#########################################################################
param_combos_detect_convexity_fn <- function(
  watershed_rast_stack
  #### height and area thresholds for the detected piles
  # these should be based on data from the literature or expectations based on the prescription
  , max_ht_m = 4 # set the max expected pile height (could also do a minimum??)
  , min_area_m2 = 2 # set the min expected pile area
  , max_area_m2 = 50 # set the max expected pile area
  #### irregularity filtering
  # 1 = perfectly convex (no inward angles); 0 = so many inward angles
  # values closer to 1 remove more irregular segments; 
    # values closer to 0 keep more irregular segments (and also regular segments)
  # these will all be further filtered for their circularity and later smoothed to remove blocky edges
  # and most inward angles by applying a convex hull to the original detected segment
  , convexity_pct = 0.7 # min required overlap between the predicted pile and the convex hull of the predicted pile
) {
  # subset the raster
  watershed_ans <- watershed_rast_stack %>% 
    terra::subset( as.character(unique(max_ht_m)[1]) )
  
  # checks
  if(!inherits(watershed_ans,"SpatRaster")){stop("`watershed_ans` must be raster data with the class `SpatRaster` ")}
  # just get the first layer
  watershed_ans <- watershed_ans %>% terra::subset(subset = 1)
  
  ########################################################################################
  ## 1) watershed segmentation
  ########################################################################################
    # we can quickly calculate the area of the segments which we could use as a filter
    # list of segments that meet size threshold
    keep_segs_temp <- watershed_ans %>% 
      terra::freq() %>% 
      dplyr::mutate(area_m2 = count * (terra::res(watershed_ans) %>% .[1:2] %>% prod()) ) %>% 
      dplyr::filter(
        area_m2 >= min_area_m2 & 
        area_m2 <= max_area_m2
      ) %>%
      dplyr::pull(value)
    # filter the raster (segments that don't meet the size thresholds are set to NA)
    watershed_ans <- watershed_ans %>% 
      terra::subst(from = keep_segs_temp, to = keep_segs_temp, others = NA)
    
    # let's convert the watershed-detected segments from raster to vector data 
    # and create a convex hull of the shapes for comparison
    # vectors of segments
    watershed_ans_poly <-
      watershed_ans %>% 
      terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>% 
      setNames("pred_id") %>%
      sf::st_as_sf() %>% 
      sf::st_simplify() %>% 
      sf::st_make_valid() %>% 
      dplyr::filter(sf::st_is_valid(.)) %>% 
      dplyr::mutate(treeID=pred_id) %>% 
      cloud2trees::simplify_multipolygon_crowns() %>% 
      dplyr::select(-treeID)
    
    if(dplyr::coalesce(nrow(watershed_ans_poly),0)<1){
      return(NULL)
    }
    
    # convex hulls of segments
    watershed_ans_poly_chull <-
      watershed_ans_poly %>% 
      sf::st_convex_hull() %>% 
      sf::st_simplify() %>% 
      sf::st_make_valid() %>% 
      dplyr::filter(sf::st_is_valid(.))
    
  ########################################################################################
  ## 2) irregularity filtering
  ########################################################################################
    # let's first filter out segments that have holes in them 
    # or are very irregularly shaped by comparing the area of the polygon and convex hull

    # convexity_pct = min required overlap between the predicted pile and the convex hull of the predicted pile
    if(convexity_pct==0){
      watershed_keep_overlaps_chull_pred_id <- watershed_ans_poly$pred_id
    }else{
      # compare areas
      watershed_keep_overlaps_chull_pred_id <- watershed_ans_poly %>% 
        dplyr::mutate(poly_area_m2 = sf::st_area(.) %>% as.numeric()) %>% 
        dplyr::inner_join(
          watershed_ans_poly_chull %>% 
            dplyr::mutate(chull_area_m2 = sf::st_area(.) %>% as.numeric()) %>%
            sf::st_drop_geometry()
          , by = "pred_id"
        ) %>% 
        dplyr::mutate(
          pct_chull = poly_area_m2/chull_area_m2
        ) %>% 
        dplyr::filter(
          pct_chull >= convexity_pct
        ) %>% 
        dplyr::pull(pred_id)
    }
    
    if(
      identical(watershed_keep_overlaps_chull_pred_id, numeric(0))
      || any(is.null(watershed_keep_overlaps_chull_pred_id))
      || any(is.na(watershed_keep_overlaps_chull_pred_id))
      || length(watershed_keep_overlaps_chull_pred_id)<1
    ){
      return(NULL)
    }
  #########################################
  # return the filtered segment polygons
  return(
    watershed_ans_poly %>% 
      dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id)
  )
}

# function to do the whole thing
param_combos_piles_detect_fn <- function(
  chm
  , param_combos_df
  , smooth_segs = T
  , ofile = "../data/param_combos_piles.gpkg"
) {
  # apply chm_watershed_seg_fn
  ### takes ~37 mins
  chm_watershed_seg_ans <- chm_watershed_seg_fn(chm, max_ht_m = unique(param_combos_df$max_ht_m))
  # # what did we get?
  # chm_watershed_seg_ans %>% 
  #   terra::subset( as.character(unique(param_combos_df$max_ht_m)[3]) ) %>% 
  #   terra::plot()
  
  # just get the unique combinations needed for this param_combos_detect_convexity_fn function
  param_combos_convexity_df <-
    param_combos_df %>% 
    dplyr::distinct(
      max_ht_m
      , min_area_m2
      , max_area_m2
      , convexity_pct
    )
  # apply this using our data frame to map over the combinations
  ### takes ~40 mins
  param_combos_detect_convexity_ans <- 
    1:nrow(param_combos_convexity_df) %>% 
    purrr::map(\(x)
      param_combos_detect_convexity_fn(
          watershed_rast_stack = chm_watershed_seg_ans
          , max_ht_m = param_combos_convexity_df$max_ht_m[x]
          , min_area_m2 = param_combos_convexity_df$min_area_m2[x]
          , max_area_m2 = param_combos_convexity_df$max_area_m2[x]
          , convexity_pct = param_combos_convexity_df$convexity_pct[x]
        ) %>% 
        dplyr::mutate(
          max_ht_m = param_combos_convexity_df$max_ht_m[x]
          , min_area_m2 = param_combos_convexity_df$min_area_m2[x]
          , max_area_m2 = param_combos_convexity_df$max_area_m2[x]
          , convexity_pct = param_combos_convexity_df$convexity_pct[x]
        )
      , .progress = T
    ) %>% 
    dplyr::bind_rows()
  # generate a record ID
  param_combos_detect_convexity_ans <- param_combos_detect_convexity_ans %>% dplyr::mutate(record_id = dplyr::row_number())
  # param_combos_detect_convexity_ans <- sf::st_read("../data/param_combos_detect_convexity_ans.gpkg") #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!FULL DATA
  
  # #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Filter for testing
  # param_combos_detect_convexity_ans <-
  #   param_combos_detect_convexity_ans %>% 
  #   dplyr::inner_join(
  #     param_combos_convexity_df %>% 
  #       dplyr::slice_sample(n = 5)
  #     , by = dplyr::join_by(max_ht_m,min_area_m2,max_area_m2,convexity_pct)
  #   )
  
  # param_combos_detect_convexity_ans %>% dplyr::glimpse()
  # and we'll apply the circle fitting to this spatial data
  # apply the sf_data_circle_fit() which takes each segment polygon, transforms it to points, and the fits the best circle
  param_combos_detect_cf_ans <- sf_data_circle_fit(param_combos_detect_convexity_ans)
  # param_combos_detect_cf_ans %>% sf::st_write("../data/param_combos_detect_cf_ans.gpkg")
  # param_combos_detect_cf_ans %>% dplyr::glimpse()
  # nrow(param_combos_detect_cf_ans)
  # nrow(param_combos_detect_convexity_ans)
  
  # we'll use the IoU function we defined 
    # we map over this to only compare the segment to it's own best circle fit...not all
    # we should consider doing this in bulk.....another day
    param_combos_detect_cf_iou <- 
      param_combos_detect_convexity_ans$record_id %>% 
      purrr::map(\(x)
        ground_truth_single_match(
          gt_inst = param_combos_detect_convexity_ans %>% 
            dplyr::filter(record_id == x)
          , gt_id = "record_id"
          , predictions = param_combos_detect_cf_ans %>% 
            dplyr::filter(record_id == x) %>% 
            dplyr::select(record_id) %>% # keeping other columns causes error?
            dplyr::rename(circ_record_id = record_id)
          , pred_id = "circ_record_id"
          , min_iou_pct = 0 # set to 0 just to return pct
        )    
      ) %>% 
      dplyr::bind_rows()
    
    # param_combos_detect_cf_iou %>% dplyr::glimpse()
  
  ### now we need a function to map over all the different filters for circularity by combination of the other settings
    ### this is going to be highly custom to this particular task :\
  final_ans <- 
    param_combos_detect_convexity_ans %>% 
    dplyr::inner_join(
      param_combos_df
      , by = dplyr::join_by(max_ht_m,min_area_m2,max_area_m2,convexity_pct)
      , relationship = "many-to-many"
    ) %>% 
    dplyr::left_join(
      param_combos_detect_cf_iou %>% dplyr::select(record_id,iou)
      , by = "record_id"
    ) %>% 
    dplyr::filter(dplyr::coalesce(iou,0)>=circle_fit_iou_pct)
  
    ########################################################################################
    ## 4) shape refinement & overlap removal
    ########################################################################################
      # use the convex hull shapes of our remaining segments. 
      # This helps to smooth out the often 'blocky' edges of raster-based segments
      # , which can look like they were generated in Minecraft. 
      # Additionally, by removing any segments with overlapping convex hull shapes, 
      # we can likely reduce false detections that are actually groups of small trees or shrubs, 
      # ensuring our results represent singular slash piles.
      if(smooth_segs){
        final_ans <- unique(final_ans$rn) %>% 
          # have to map over this to do overlaps only on one set at a time
          purrr::map(\(x)
            final_ans %>% 
              dplyr::filter(rn==x) %>% 
              sf::st_convex_hull() %>% 
              sf::st_simplify() %>% 
              sf::st_make_valid() %>% 
              dplyr::filter(sf::st_is_valid(.)) %>% 
              st_remove_overlaps()
          ) %>% 
          dplyr::bind_rows()
      }
      if(stringr::str_length(ofile)>1){
        final_ans %>% sf::st_write(ofile, append = F, quiet = T)
      }
      
    # return
      return(final_ans)
}


# param_combos_piles <- 
#   param_combos_df$rn %>% 
#   # sample(3) %>% ## !!!!!!!!!!!!!!!!!!!! remove after testing
#   purrr::map(\(x)
#     slash_pile_detect_watershed(
#         chm_rast = cloud2raster_ans$chm_rast
#         , max_ht_m = param_combos_df$max_ht_m[x]
#         , min_area_m2 = param_combos_df$min_area_m2[x]
#         , max_area_m2 = param_combos_df$max_area_m2[x]
#         , convexity_pct = param_combos_df$convexity_pct[x]
#         , circle_fit_iou_pct = param_combos_df$circle_fit_iou_pct[x]
#       ) %>% 
#       dplyr::mutate(
#         rn = param_combos_df$rn[x]
#         , max_ht_m = param_combos_df$max_ht_m[x]
#         , min_area_m2 = param_combos_df$min_area_m2[x]
#         , max_area_m2 = param_combos_df$max_area_m2[x]
#         , convexity_pct = param_combos_df$convexity_pct[x]
#         , circle_fit_iou_pct = param_combos_df$circle_fit_iou_pct[x]
#       )
#   ) %>% 
#   dplyr::bind_rows()

# param_combos_piles %>% 
#   sf::st_write("../data/param_combos_piles.gpkg", append = F)
# # param_combos_piles <- sf::st_read("../data/param_combos_piles.gpkg")

apply our function using the data frame of the parameter combinations

f_temp <- "../data/param_combos_piles.gpkg"
  # "f:/PFDP_Data/sens_test_param_combos/param_combos_piles.gpkg"
if(!file.exists(f_temp)){
  param_combos_piles <- param_combos_piles_detect_fn(
    chm = cloud2raster_ans$chm_rast
    , param_combos_df = param_combos_df
    , smooth_segs = T
    , ofile = f_temp
  )
  # when this was originally written, the pile area, height, and volume metrics were left out...let's add those
    #    ;\
    fix_area_vol_fn_temp <- function(
      chm_rast
      , max_ht_m
      , pile_polys
    ) {
      # just get the first layer and "slice" the raster based on the height threshold
      chm_rast <- chm_rast %>% 
        terra::subset(subset = 1) %>% 
        terra::clamp(upper = max_ht_m, lower = 0, values = F)
      # filterrrrrr
      pile_polys <- pile_polys %>% 
        dplyr::filter(max_ht_m==max_ht_m)
      # stuff
      area_rast_temp <- terra::cellSize(chm_rast) %>% 
        terra::mask(chm_rast)
      names(area_rast_temp) <- "area_m2"
      # then, multiply area by the CHM (elevation) for each cell to get a raster with cell volumes
      vol_rast_temp <- area_rast_temp*chm_rast
      names(vol_rast_temp) <- "volume_m3"
      # sum area within each segment to get the total area
      area_df_temp <- terra::zonal(
        x = area_rast_temp
        , z = pile_polys %>% 
          dplyr::select(pred_id) %>% 
          terra::vect()
        , fun = "sum", na.rm = T
      )
      # area_df_temp %>% dplyr::glimpse()
      # sum volume within each segment to get the total volume
      vol_df_temp <- terra::zonal(
        x = vol_rast_temp
        , z = pile_polys %>% 
          dplyr::select(pred_id) %>% 
          terra::vect()
        , fun = "sum", na.rm = T
      )
      # max ht within each segment to get the max ht
      ht_df_temp <- terra::zonal(
          x = chm_rast
          , z = pile_polys %>% 
            dplyr::select(pred_id) %>% 
            terra::vect()
          , fun = "max", na.rm = T
        ) %>% 
        dplyr::rename(max_height_m=1)
      # vectors of segments
      pile_polys <-
        pile_polys %>% 
        dplyr::bind_cols(
          area_df_temp, vol_df_temp, ht_df_temp
        ) %>% 
        dplyr::mutate(
          volume_per_area = volume_m3/area_m2
        ) %>% 
        # now we need to re-do the volume and area calculations
        dplyr::mutate(
          area_m2 = sf::st_area(.) %>% as.numeric()
          , volume_m3 = area_m2*volume_per_area
        )
      return(pile_polys)
    }
    # map it
    param_combos_piles <-
      unique(param_combos_piles$max_ht_m) %>% 
      purrr::map(\(x)
        fix_area_vol_fn_temp(
          chm_rast = cloud2raster_ans$chm_rast
          , max_ht_m = x
          , pile_polys = param_combos_piles %>% dplyr::filter(max_ht_m==x)
        )
      ) %>% 
      dplyr::bind_rows()
    # param_combos_piles %>% dplyr::glimpse()
    # save it
    sf::st_write(param_combos_piles, f_temp, append = F, quiet = T)
}else{
  param_combos_piles <- sf::st_read(f_temp, quiet = T)
}

what did we get?

param_combos_piles %>% dplyr::glimpse()

## Rows: 628,683
## Columns: 14
## $ pred_id            <dbl> 12, 16, 20, 24, 38, 43, 52, 53, 54, 55, 57, 76, 89,…
## $ max_ht_m           <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
## $ min_area_m2        <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ max_area_m2        <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,…
## $ convexity_pct      <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0…
## $ record_id          <int> 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 17, 20, 2…
## $ rn                 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ circle_fit_iou_pct <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0…
## $ iou                <dbl> 0.3627501, 0.4436463, 0.4199539, 0.3680601, 0.71855…
## $ area_m2            <dbl> 5.820766e-11, 3.680000e+00, 0.000000e+00, 5.600000e…
## $ volume_m3          <dbl> 1.325388e-10, 1.265935e+01, NA, 1.420662e+01, 1.867…
## $ max_height_m       <dbl> 2.277000, 4.000000, NA, 4.000000, 3.999429, NA, 3.9…
## $ volume_per_area    <dbl> 2.2770000, 3.4400403, NA, 2.5368959, 2.0166748, NA,…
## $ geom               <MULTIPOLYGON [m]> MULTIPOLYGON (((499740.7 43..., MULTIP…

# param_combos_piles %>% sf::st_drop_geometry() %>% dplyr::count(rn) %>% dplyr::slice_head(n=6)
# terra::plot(cloud2raster_ans$chm_rast, col = viridis::plasma(100), axes = F)
# terra::plot(param_combos_piles %>% dplyr::filter(rn==param_combos_piles$rn[2222]) %>% terra::vect(),add = T, border = "gray44", col = NA, lwd = 3)

5.2.2.2 Validation over parameter combinations

now we need to validate these combinations against the ground truth data and we can map over the ground_truth_prediction_match() function

f_temp <- "../data/param_combos_gt.csv"
if(!file.exists(f_temp)){
  param_combos_gt <- 
    unique(param_combos_df$rn) %>% 
    purrr::map(\(x)
      ground_truth_prediction_match(
        ground_truth = slash_piles_polys %>% 
          dplyr::filter(is_in_stand) %>% 
          dplyr::arrange(desc(diameter)) %>% 
          sf::st_transform(sf::st_crs(param_combos_piles))
        , gt_id = "pile_id"
        , predictions = param_combos_piles %>% 
          dplyr::filter(rn == x) %>% 
          dplyr::filter(
            pred_id %in% (param_combos_piles %>% 
              dplyr::filter(rn == x) %>% 
              sf::st_intersection(
                stand_boundary %>% 
                  sf::st_transform(sf::st_crs(param_combos_piles))
              ) %>% 
              sf::st_drop_geometry() %>% 
              dplyr::pull(pred_id))
          )
        , pred_id = "pred_id"
        , min_iou_pct = 0.05
      ) %>% 
      dplyr::mutate(rn=x)
    ) %>% 
    dplyr::bind_rows()
  param_combos_gt %>% readr::write_csv(f_temp, append = F, progress = F)
}else{
  param_combos_gt <- readr::read_csv(f_temp, progress = F, show_col_types = F)
}
# huh?
param_combos_gt %>% dplyr::glimpse()
# param_combos_gt %>% dplyr::filter(pile_id==120)

let’s attach a flag to only work with piles that intersect with the stand boundary and add the area of the ground truth and predicted piles

# let's attach a flag to only work with piles that intersect with the stand boundary
# add in/out to piles data
param_combos_piles <- param_combos_piles %>% 
  dplyr::left_join(
    param_combos_piles %>% 
      sf::st_intersection(
        stand_boundary %>% 
          sf::st_transform(sf::st_crs(param_combos_piles))
      ) %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(rn,pred_id) %>% 
      dplyr::mutate(is_in_stand = T)
    , by = dplyr::join_by(rn,pred_id)
  ) %>% 
  dplyr::mutate(
    is_in_stand = dplyr::coalesce(is_in_stand,F)
  ) %>% 
  dplyr::filter(area_m2>=min_area_m2 & area_m2<=max_area_m2)

# param_combos_piles %>% dplyr::glimpse()
# param_combos_gt %>% dplyr::glimpse()
# param_combos_gt %>% dplyr::slice_sample(prop = 0.05) %>% dplyr::count(match_grp)
# add it to validation
param_combos_gt <-
  param_combos_gt %>% 
  # add area of gt
  dplyr::left_join(
    slash_piles_polys %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(pile_id,image_gt_area_m2,field_gt_area_m2,image_gt_volume_m3,field_gt_volume_m3,height_m) %>% 
      dplyr::rename(gt_height_m = height_m) %>% 
      dplyr::mutate(pile_id=as.numeric(pile_id))
    , by = "pile_id"
  ) %>% 
  # add info from predictions
  dplyr::left_join(
    param_combos_piles %>%
      sf::st_drop_geometry() %>%
      dplyr::select(
        rn,pred_id
        ,is_in_stand
        , area_m2, volume_m3, max_height_m
      ) %>% 
      dplyr::rename(
        pred_area_m2 = area_m2, pred_volume_m3 = volume_m3, pred_height_m = max_height_m
      ) %>% 
      # add volume assuming cone or conical shape
      dplyr::mutate(
        pred_cone_volume_m3 = (1/3)*pi*(sqrt(pred_area_m2/pi)^2)*pred_height_m # sqrt(pred_area_m2/pi) = radius assuming of circle covering same area
      )
    , by = dplyr::join_by(rn,pred_id)
  ) %>%
  dplyr::mutate(
    is_in_stand = dplyr::case_when(
      is_in_stand == T ~ T
      , is_in_stand == F ~ F
      , match_grp == "omission" ~ T
      , T ~ F
    )
    ### calculate these based on the formulas below...agg_ground_truth_match() depends on those formulas
    # area diffs
    , height_diff = pred_height_m-gt_height_m
    , pct_diff_height = (gt_height_m-pred_height_m)/gt_height_m
    # area diffs
    , area_diff_image = pred_area_m2-image_gt_area_m2
    , pct_diff_area_image = (image_gt_area_m2-pred_area_m2)/image_gt_area_m2
    , area_diff_field = pred_area_m2-field_gt_area_m2
    , pct_diff_area_field = (field_gt_area_m2-pred_area_m2)/field_gt_area_m2
    # volume diffs
    , volume_diff_image = pred_volume_m3-image_gt_volume_m3
    , pct_diff_volume_image = (image_gt_volume_m3-pred_volume_m3)/image_gt_volume_m3
    , volume_diff_field = pred_volume_m3-field_gt_volume_m3
    , pct_diff_volume_field = (field_gt_volume_m3-pred_volume_m3)/field_gt_volume_m3
    # volume diffs cone
    , cone_volume_diff_image = pred_cone_volume_m3-image_gt_volume_m3
    , cone_volume_diff_field = pred_cone_volume_m3-field_gt_volume_m3
  )
# what?
param_combos_gt %>% dplyr::glimpse()

let’s look at one combination spatially

# plot it
ggplot2::ggplot() +
  ggplot2::geom_sf(
    data = stand_boundary %>% 
      sf::st_transform(sf::st_crs(param_combos_piles))
    , color = "black", fill = NA
  ) +
  ggplot2::geom_sf(
    data = 
      slash_piles_polys %>% 
      dplyr::filter(is_in_stand) %>% 
      sf::st_transform(sf::st_crs(param_combos_piles)) %>% 
      dplyr::mutate(pile_id=as.numeric(pile_id)) %>% 
        dplyr::left_join(
          param_combos_gt %>% 
            dplyr::filter(rn == param_combos_gt$rn[1]) %>% 
            dplyr::select(pile_id,match_grp)
          , by = "pile_id"
        )
    , mapping = ggplot2::aes(fill = match_grp)
    , color = NA ,alpha=0.6
  ) + 
  ggplot2::geom_sf(
    data =
      param_combos_piles %>% 
        dplyr::filter(rn == param_combos_gt$rn[1] & is_in_stand) %>% 
        dplyr::left_join(
          param_combos_gt %>%
            dplyr::filter(rn == param_combos_gt$rn[1] & is_in_stand) %>% 
            dplyr::select(pred_id,match_grp)
          , by = "pred_id"
        )
    , mapping = ggplot2::aes(fill = match_grp, color = match_grp)
    , alpha = 0
    , lwd = 0.7
  ) +
  ggplot2::scale_fill_manual(values = pal_match_grp, name = "", na.value = "red") +
  ggplot2::scale_color_manual(values = pal_match_grp, name = "", na.value = "red") +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "top"
    , panel.background = ggplot2::element_rect(fill = "gray66")
  ) +
  ggplot2::guides(
    fill = ggplot2::guide_legend(override.aes = list(color = c(pal_match_grp["commission"],NA,NA)))
    , color = "none"
  )

5.2.2.3 Aggregate assessment metrics

now we need to aggregate these single tree level results to a single line (for each parameter combination in this instance). let’s make a function to do this that can also calculate RMSE for any column name that contains “diff_” but doesn’t start with “pct_”…this could be a handy function for any analysis comparing predictions to ground truth data.

in addition to RMSE, we’ll calculate the mean error (ME) and the mean absolute percentage error (MAPE) for the respective values:

\[ \textrm{ME} = \frac{ \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})}{N} \] \[ \textrm{MAPE} = \frac{1}{N} \sum_{i=1}^{N} \left| \frac{y_{i} - \hat{y_{i}}}{y_{i}} \right| \]

Where \(N\) is equal to the total number of correctly matched piles, \(y_i\) is the ground truth measured value and \(\hat{y_i}\) is the predicted value of \(i\)

we could also calculate Relative RMSE (RRMSE)

\[ \textrm{RRMSE} = \frac{\text{RMSE}}{\bar{y}} \times 100\% \]

where, \(\bar{y}\) represents the mean of the ground truth values. the interpretations of RMSE and RRMSE are:

RMSE: Measures the average magnitude of the differences between predicted and the actual observed values, expressed in the same units as the metric.
RRMSE: Expresses RMSE as a percentage of the mean of the observed values, providing a scale-independent measure to compare model accuracy across different datasets or models.

for this analysis, we’ll show how to calculate RRMSE but we’ll only investigate MAPE:

Use MAPE when: You need an easily understandable metric for comparing prediction accuracy across different series or models with varying scales, particularly when zeros or near-zero actual values are not present in your data.
Use RRMSE when: You need a metric that is more robust to small or zero actual values and you want to penalize larger errors more heavily due to the squaring of errors in its calculation.

# first function takes df with cols tp_n, fp_n, and fn_n to calculate rate
confusion_matrix_scores_fn <- function(df) {
  df %>% 
  dplyr::mutate(
    omission_rate = dplyr::case_when(
      (tp_n+fn_n) == 0  ~ as.numeric(NA)
      , T ~ fn_n/(tp_n+fn_n)
    ) # False Negative Rate or Miss Rate
    , commission_rate = dplyr::case_when(
      (tp_n+fp_n) == 0  ~ as.numeric(NA)
      , T ~ fp_n/(tp_n+fp_n)
    ) # False Positive Rate
    , precision = dplyr::case_when(
      (tp_n+fp_n) == 0  ~ as.numeric(NA)
      , T ~ tp_n/(tp_n+fp_n)
    )
    , recall = dplyr::case_when(
      (tp_n+fn_n) == 0  ~ as.numeric(NA)
      , T ~ tp_n/(tp_n+fn_n)
    )
    , f_score = dplyr::case_when(
      is.na(precision) | is.na(recall) ~ as.numeric(NA)
      , (precision+recall) == 0 ~ 0
      , T ~ 2 * ( (precision*recall)/(precision+recall) )
    )    
  ) 
}
# aggregate results from ground_truth_prediction_match()
agg_ground_truth_match <- function(ground_truth_prediction_match_ans) {
  if(nrow(ground_truth_prediction_match_ans)==0){return(NULL)}
  if( !(names(ground_truth_prediction_match_ans) %>% stringr::str_equal("match_grp") %>% any()) ){stop("ground_truth_prediction_match_ans must contain `match_grp` column")}
  
  # check for difference columns (contains "_diff") and calc rmse for only those to return a single line df with colums for each diff_rmse
    if(
      (ground_truth_prediction_match_ans %>% 
        dplyr::select(tidyselect::contains("_diff") & !tidyselect::starts_with("pct_")) %>% 
        ncol() )>0
    ){
      # get rmse and mean difference/error for all columns with "_diff" but not "pct_diff"
      # get mape for all columns with "pct_diff" but not "diff_"
      rmse_df <- ground_truth_prediction_match_ans %>% 
        dplyr::select(
          (tidyselect::contains("_diff") & !tidyselect::starts_with("pct_"))
          | tidyselect::starts_with("pct_diff")
        ) %>%
        tidyr::pivot_longer(dplyr::everything(), values_drop_na = T) %>% 
        dplyr::group_by(name) %>% 
        dplyr::summarise(
          sq = sum(value^2, na.rm = T)
          , mean = mean(value, na.rm = T)
          , sumabs = sum(abs(value), na.rm = T)
          , nomiss = sum(!is.na(value))
        ) %>% 
        dplyr::ungroup() %>% 
        dplyr::mutate(
          rmse = dplyr::case_when(
            dplyr::coalesce(nomiss,0)==0 ~ as.numeric(NA)
            , T ~ sqrt(sq/nomiss)
          )
          , mape = dplyr::case_when(
            dplyr::coalesce(nomiss,0)==0 ~ as.numeric(NA)
            , T ~ sumabs/nomiss
          )
        ) %>% 
        # NA nonsense values
        dplyr::mutate(
          mape = dplyr::case_when(
            stringr::str_starts(name, "pct_diff") ~ mape
            , T ~ as.numeric(NA)
          )
          , rmse = dplyr::case_when(
            stringr::str_starts(name, "pct_diff") ~ as.numeric(NA)
            , T ~ rmse
          )
          , mean = dplyr::case_when(
            stringr::str_starts(name, "pct_diff") ~ as.numeric(NA)
            , T ~ mean
          )
        ) %>% 
        dplyr::select(name,rmse,mean,mape) %>% 
        tidyr::pivot_wider(
          names_from = name
          , values_from = c(rmse,mean,mape)
          , names_glue = "{name}_{.value}"
        ) %>% 
        # remove columns with NA in all rows
        dplyr::select( dplyr::where( ~!all(is.na(.x)) ) )
      
      if(
        dplyr::coalesce(nrow(rmse_df),0)==0
        || dplyr::coalesce(ncol(rmse_df),0)==0
      ){
        # empty df
        rmse_df <- dplyr::tibble()
      }
    }else{
      # empty df
      rmse_df <- dplyr::tibble()
    }
  
  # count by match group
    agg <-
      ground_truth_prediction_match_ans %>% 
      dplyr::count(match_grp) %>% 
      dplyr::mutate(
        match_grp = dplyr::case_match(
          match_grp
          , "true positive"~"tp"
          , "commission"~"fp"
          , "omission"~"fn"
        )
      )
    # true positive, false positive, false negative rates
    return_df <- dplyr::tibble(match_grp = c("tp","fp","fn")) %>% 
      dplyr::left_join(agg, by = "match_grp") %>% 
      dplyr::mutate(dplyr::across(.cols = c(n), .fn = ~dplyr::coalesce(.x,0))) %>% 
      tidyr::pivot_wider(
        names_from = match_grp
        , values_from = c(n)
        , names_glue = "{match_grp}_{.value}"
      )
    # rates, precision, recall, f-score
    return_df <- return_df %>% 
      confusion_matrix_scores_fn() %>% 
      # add rmse
      dplyr::bind_cols(rmse_df)
  # return
  return(return_df)
}

now let’s apply it at the parameter combination level

param_combos_gt_agg <- unique(param_combos_gt$rn) %>% 
  purrr::map(\(x)
    agg_ground_truth_match(
      param_combos_gt %>% 
        dplyr::filter(
          is_in_stand
          & rn == x
        )
    ) %>% 
    dplyr::mutate(rn = x)
  ) %>% 
  dplyr::bind_rows() %>% 
  # add in info on all parameter combinations
  dplyr::inner_join(
    param_combos_piles %>% 
    sf::st_drop_geometry() %>% 
    dplyr::distinct(
      rn,max_ht_m,min_area_m2,max_area_m2,convexity_pct,circle_fit_iou_pct
    )
    , by = "rn"
    , relationship = "one-to-one"
  )

# we can also add the relative rmse (rrmse) by comparing the rmse to the mean value of the ground truth data
param_combos_gt_agg <- param_combos_gt_agg %>%
  dplyr::bind_cols(
    # add means of gt
    slash_piles_polys %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(image_gt_area_m2,field_gt_area_m2,image_gt_volume_m3,field_gt_volume_m3,height_m) %>% 
      dplyr::ungroup() %>% 
      dplyr::summarise(dplyr::across(dplyr::everything(), ~mean(.x,na.rm=T))) %>% 
      dplyr::rename_with(~ paste0("gt_", .x, recycle0 = TRUE))
  ) %>% 
  dplyr::mutate(
    area_diff_field_rrmse = area_diff_field_rmse/gt_field_gt_area_m2
    , area_diff_image_rrmse = area_diff_image_rmse/gt_image_gt_area_m2
    , volume_diff_field_rrmse = volume_diff_field_rmse/gt_field_gt_volume_m3
    , volume_diff_image_rrmse = volume_diff_image_rmse/gt_image_gt_volume_m3
    , cone_volume_diff_field_rrmse = cone_volume_diff_field_rmse/gt_field_gt_volume_m3
    , cone_volume_diff_image_rrmse = cone_volume_diff_image_rmse/gt_image_gt_volume_m3
    , height_diff_rrmse = height_diff_rmse/gt_height_m
  ) %>% 
  dplyr::select(!tidyselect::starts_with("gt_"))
# what is this?
param_combos_gt_agg %>% dplyr::glimpse()

## Rows: 1,260
## Columns: 40
## $ tp_n                         <dbl> 107, 106, 96, 88, 65, 11, 108, 107, 100, …
## $ fp_n                         <dbl> 160, 134, 107, 61, 27, 7, 171, 139, 103, …
## $ fn_n                         <dbl> 12, 13, 23, 31, 54, 110, 11, 12, 19, 27, …
## $ omission_rate                <dbl> 0.10084034, 0.10924370, 0.19327731, 0.260…
## $ commission_rate              <dbl> 0.5992509, 0.5583333, 0.5270936, 0.409396…
## $ precision                    <dbl> 0.4007491, 0.4416667, 0.4729064, 0.590604…
## $ recall                       <dbl> 0.89915966, 0.89075630, 0.80672269, 0.739…
## $ f_score                      <dbl> 0.5544041, 0.5905292, 0.5962733, 0.656716…
## $ area_diff_field_rmse         <dbl> 3.868051, 3.885180, 3.807512, 3.890205, 4…
## $ area_diff_image_rmse         <dbl> 10.46601, 10.50315, 10.91475, 11.25964, 1…
## $ cone_volume_diff_field_rmse  <dbl> 7.490940, 7.526448, 7.790422, 8.120722, 9…
## $ cone_volume_diff_image_rmse  <dbl> 15.78185, 15.85451, 16.58836, 17.31307, 1…
## $ height_diff_rmse             <dbl> 0.5886118, 0.5907639, 0.5865752, 0.596960…
## $ volume_diff_field_rmse       <dbl> 6.266046, 6.296208, 6.356979, 6.635252, 7…
## $ volume_diff_image_rmse       <dbl> 13.49247, 13.55815, 14.16271, 14.79545, 1…
## $ area_diff_field_mean         <dbl> -1.5767572, -1.5800470, -1.8332476, -1.78…
## $ area_diff_image_mean         <dbl> -6.887313, -6.903269, -7.212212, -7.34892…
## $ cone_volume_diff_field_mean  <dbl> -1.855219, -1.858713, -2.225155, -2.25765…
## $ cone_volume_diff_image_mean  <dbl> -6.415861, -6.438672, -6.977413, -7.26086…
## $ height_diff_mean             <dbl> -0.024606325, -0.021877675, -0.067167884,…
## $ volume_diff_field_mean       <dbl> 0.597212113, 0.593251731, 0.283383782, 0.…
## $ volume_diff_image_mean       <dbl> -3.963430, -3.986707, -4.468874, -4.64514…
## $ pct_diff_area_field_mape     <dbl> 0.2355877, 0.2363886, 0.2284273, 0.219623…
## $ pct_diff_area_image_mape     <dbl> 0.4368490, 0.4368840, 0.4379644, 0.429786…
## $ pct_diff_height_mape         <dbl> 0.1891672, 0.1895399, 0.1818905, 0.179652…
## $ pct_diff_volume_field_mape   <dbl> 0.4384068, 0.4408890, 0.4029316, 0.420493…
## $ pct_diff_volume_image_mape   <dbl> 0.3388673, 0.3402073, 0.3430604, 0.338788…
## $ rn                           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
## $ max_ht_m                     <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ min_area_m2                  <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ max_area_m2                  <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
## $ convexity_pct                <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0…
## $ circle_fit_iou_pct           <dbl> 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.3, 0.4, 0…
## $ area_diff_field_rrmse        <dbl> 0.3646426, 0.3662574, 0.3589356, 0.366731…
## $ area_diff_image_rrmse        <dbl> 0.5484007, 0.5503465, 0.5719135, 0.589985…
## $ volume_diff_field_rrmse      <dbl> 0.6149686, 0.6179288, 0.6238930, 0.651203…
## $ volume_diff_image_rrmse      <dbl> 0.8009531, 0.8048521, 0.8407405, 0.878301…
## $ cone_volume_diff_field_rrmse <dbl> 0.7351834, 0.7386683, 0.7645755, 0.796992…
## $ cone_volume_diff_image_rrmse <dbl> 0.9368571, 0.9411703, 0.9847339, 1.027754…
## $ height_diff_rrmse            <dbl> 0.2697319, 0.2707181, 0.2687987, 0.273557…

5.2.3 Parameter Sensitivity Test Results

let’s get a quick summary of the evaluation metrics across all parameter combinations

# pal
pal_eval_metric <- c(
  RColorBrewer::brewer.pal(3,"Oranges")[3]
  , RColorBrewer::brewer.pal(3,"Greys")[3]
  , RColorBrewer::brewer.pal(3,"Purples")[3]
)
# summary
param_combos_gt_agg %>% 
  dplyr::select(f_score,recall,precision,tidyselect::ends_with("_mape")) %>% 
  summary()

##     f_score           recall          precision      pct_diff_area_field_mape
##  Min.   :0.1259   Min.   :0.07438   Min.   :0.2437   Min.   :0.1112          
##  1st Qu.:0.4448   1st Qu.:0.56198   1st Qu.:0.3691   1st Qu.:0.2143          
##  Median :0.5368   Median :0.80165   Median :0.4688   Median :0.2267          
##  Mean   :0.4955   Mean   :0.68651   Mean   :0.4798   Mean   :0.2208          
##  3rd Qu.:0.5963   3rd Qu.:0.89256   3rd Qu.:0.5876   3rd Qu.:0.2341          
##  Max.   :0.7174   Max.   :0.95868   Max.   :0.8750   Max.   :0.2656          
##  pct_diff_area_image_mape pct_diff_height_mape pct_diff_volume_field_mape
##  Min.   :0.3195           Min.   :0.1352       Min.   :0.2820            
##  1st Qu.:0.4250           1st Qu.:0.1901       1st Qu.:0.4370            
##  Median :0.4327           Median :0.1975       Median :0.4533            
##  Mean   :0.4263           Mean   :0.1956       Mean   :0.4560            
##  3rd Qu.:0.4370           3rd Qu.:0.2031       3rd Qu.:0.4697            
##  Max.   :0.4683           Max.   :0.2567       Max.   :0.6678            
##  pct_diff_volume_image_mape
##  Min.   :0.2063            
##  1st Qu.:0.3303            
##  Median :0.3428            
##  Mean   :0.3361            
##  3rd Qu.:0.3499            
##  Max.   :0.3848

5.2.3.1 Overall: instance segmentation

let’s check out the distribution of the instance segmentation (i.e. pile detection) results

param_combos_gt_agg %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
        )
      )
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, fill = metric)
  ) +
  ggplot2::geom_density(color = NA, alpha = 0.8) +
  ggplot2::scale_fill_manual(values = pal_eval_metric) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1)
    , limits = c(0,1)
  ) +
  ggplot2::facet_grid(cols = dplyr::vars(metric)) +
  ggplot2::labs(
    x = "", y = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.text.y = ggplot2::element_blank()
  )

ggplot2::ggsave("../data/best_pile_detect_dist.jpeg", height = 8, width = 8)

5.2.3.2 Overall: form quantification

let’s check out the distribution of the ability of the method to properly extract the form of the piles by looking at the RMSE values

# this function gets pairs of colors that are spectrally different and cb friendly
pal_get_pairs_fn <- function(n) {
  if(n%%2==1){n=n+1}
  viridis::turbo(n = n*4) %>%
  .[2:((n*4)-1)] %>% # removes extremes
  .[c(T,T,F,F)] %>% # keeps 2 of every 4
  .[c(F,F,T,T)] # keeps 2 of every 4
  # .[1:n]
}
# pal_get_pairs_fn(6) %>% scales::show_col()

param_combos_gt_agg %>% 
  dplyr::select(!tidyselect::starts_with("cone_volume_diff")) %>% 
  tidyr::pivot_longer(
    cols = c(tidyselect::ends_with("_rmse"))
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
        # rmse
        , metric == "volume_diff_field_rmse" ~ 4
        , metric == "volume_diff_image_rmse" ~ 5
        , metric == "area_diff_field_rmse" ~ 6
        , metric == "area_diff_image_rmse" ~ 7
        , metric == "height_diff_rmse" ~ 8
      ) %>% 
      factor(
        ordered = T
        , levels = 1:8
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
          , "RMSE: Volume field (m3)"
          , "RMSE: Volume image (m3)"
          , "RMSE: Area field (m2)"
          , "RMSE: Area image (m2)"
          , "RMSE: Height (m)"
        )
      )
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, fill = metric)
  ) +
  ggplot2::geom_density(color = NA, alpha = 0.8) +
  ggplot2::scale_fill_manual(
    values = pal_get_pairs_fn(6)[1:5]
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(metric)
    , ncol = 2
    , scales = "free"
  ) +
  ggplot2::labs(
    x = "", y = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.text.y = ggplot2::element_blank()
  )

ggplot2::ggsave("../data/best_pile_detect_dist_rmse.jpeg", height = 8, width = 8)

let’s check out the distribution of the ability of the method to properly extract the form of the piles by looking at the MAPE values

# pal_get_pairs_fn(6) %>% scales::show_col()
param_combos_gt_agg %>% 
  dplyr::select(!tidyselect::starts_with("cone_volume_diff")) %>% 
  tidyr::pivot_longer(
    cols = c(tidyselect::ends_with("_mape"))
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
        # rmse
        , metric == "pct_diff_volume_field_mape" ~ 4
        , metric == "pct_diff_volume_image_mape" ~ 5
        , metric == "pct_diff_area_field_mape" ~ 6
        , metric == "pct_diff_area_image_mape" ~ 7
        , metric == "pct_diff_height_mape" ~ 8
      ) %>% 
      factor(
        ordered = T
        , levels = 1:8
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
          , "MAPE: Volume field (m3)"
          , "MAPE: Volume image (m3)"
          , "MAPE: Area field (m2)"
          , "MAPE: Area image (m2)"
          , "MAPE: Height (m)"
        )
      )
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, fill = metric)
  ) +
  ggplot2::geom_density(color = NA, alpha = 0.8) +
  ggplot2::scale_fill_manual(
    values = pal_get_pairs_fn(6)[1:5]
  ) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1)
    , limits = c(0,1)
    , breaks = scales::breaks_extended(n=8)
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(metric)
    , ncol = 2
    , scales = "free"
  ) +
  ggplot2::labs(
    x = "", y = ""
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.text.y = ggplot2::element_blank()
  )

ggplot2::ggsave("../data/best_pile_detect_dist_mape.jpeg", height = 8, width = 8)

let’s check out the distribution of the ability of the method to properly extract the form of the piles by looking at the ME values

# pal_get_pairs_fn(6) %>% scales::show_col()
param_combos_gt_agg %>% 
  dplyr::select(!tidyselect::starts_with("cone_volume_diff")) %>% 
  tidyr::pivot_longer(
    cols = c(tidyselect::ends_with("_mean"))
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
        # rmse
        , metric == "volume_diff_field_mean" ~ 4
        , metric == "volume_diff_image_mean" ~ 5
        , metric == "area_diff_field_mean" ~ 6
        , metric == "area_diff_image_mean" ~ 7
        , metric == "height_diff_mean" ~ 8
      ) %>% 
      factor(
        ordered = T
        , levels = 1:8
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
          , "ME: Volume field (m3)"
          , "ME: Volume image (m3)"
          , "ME: Area field (m2)"
          , "ME: Area image (m2)"
          , "ME: Height (m)"
        )
      )
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, fill = metric)
  ) +
  ggplot2::geom_vline(xintercept = 0) +
  ggplot2::geom_density(color = NA, alpha = 0.8) +
  ggplot2::scale_fill_manual(
    values = pal_get_pairs_fn(6)[1:5]
  ) +
  ggplot2::scale_x_continuous(
    breaks = scales::breaks_extended(n=8)
  ) +
  ggplot2::facet_wrap(
    facets = dplyr::vars(metric)
    , ncol = 2
    , scales = "free"
  ) +
  ggplot2::labs(
    x = "", y = ""
    , subtitle = "Mean Error < 0 => Actual>Predicted\nMean Error > 0 => Actual<Predicted"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text.x = ggplot2::element_text(size = 7)
    , axis.text.y = ggplot2::element_blank()
  )

ggplot2::ggsave("../data/best_pile_detect_dist_me.jpeg", height = 8, width = 8)

5.2.3.3 Main Effects: instance segmentation

let’s average across all other factors to look at the main effect by parameter for instance segmentation performance (i.e. pile detection)

param_combos_gt_agg %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  tidyr::pivot_longer(
    cols = c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct)
    , names_to = "param"
    , values_to = "param_value"
  ) %>% 
  dplyr::group_by(param, param_value, metric) %>%
  dplyr::summarise(
    median = median(value,na.rm=T)
    , q25 = stats::quantile(value,na.rm=T,probs = 0.25)
    , q75 = stats::quantile(value,na.rm=T,probs = 0.75)
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    param = dplyr::case_when(
        param == "max_ht_m" ~ 1
        , param == "min_area_m2" ~ 2
        , param == "max_area_m2" ~ 3
        , param == "convexity_pct" ~ 4
        , param == "circle_fit_iou_pct" ~ 5
      ) %>% 
      factor(
        ordered = T
        , levels = 1:5
        , labels = c(
          "max_ht_m"
          , "min_area_m2"
          , "max_area_m2"
          , "convexity_pct"
          , "circle_fit_iou_pct"
        )
      )
    , metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
        )
      )
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = median, x = param_value, color = metric, fill = metric, group = metric, shape = metric)
  ) +
  ggplot2::geom_ribbon(
    mapping = ggplot2::aes(ymin = q25, ymax = q75)
    , alpha = 0.2, color = NA
  ) +
  ggplot2::geom_line(lwd = 1.5, alpha = 0.8) +
  ggplot2::geom_point(size = 2) +
  ggplot2::facet_wrap(facets = dplyr::vars(param), scales = "free_x") +
  # ggplot2::scale_color_viridis_d(begin = 0.2, end = 0.8) +
  ggplot2::scale_fill_manual(values = pal_eval_metric) +
  ggplot2::scale_color_manual(values = pal_eval_metric) +
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(x = "", y = "median value", color = "", fill = "") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 5, alpha = 1))
    , shape = "none"
  )

ggplot2::ggsave("../data/best_pile_detect_test.jpeg", height = 8, width = 8)

based on these main effect aggregated results:

increasing the max_ht_m (determines the “slice” of the CHM to use) reduces precision and F-score with minimal impact on recall
increasing the max_area_m2 (determines the maximum pile area) has minimal impact on all three evaluation metrics
increasing convexity_pct toward 1 to favor regular shapes had minimal impact on metrics until parameter values exceeded 0.7, at which point recall decreased, but precision and F-score slightly improved
increasing circle_fit_iou_pct toward 1 to favor circular shapes minimally affected recall while improving precision and F-score up to parameter values of 0.5. Beyond this, recall significantly dropped, with only modest F-score improvements, and overall accuracy measured by F-score crashed past 0.7

5.2.3.4 Main Effects: form quantification

let’s average across all other factors to look at the main effect by parameter for the MAPE metrics quantifying the pile form accuracy

param_combos_gt_agg %>% 
  dplyr::select(!tidyselect::starts_with("volume_diff")) %>% 
  tidyr::pivot_longer(
    cols = c(tidyselect::ends_with("_mape"))
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  tidyr::pivot_longer(
    cols = c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct)
    , names_to = "param"
    , values_to = "param_value"
  ) %>% 
  dplyr::group_by(param, param_value, metric) %>%
  dplyr::summarise(
    median = median(value,na.rm=T)
    , q25 = stats::quantile(value,na.rm=T,probs = 0.25)
    , q75 = stats::quantile(value,na.rm=T,probs = 0.75)
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    param = dplyr::case_when(
        param == "max_ht_m" ~ 1
        , param == "min_area_m2" ~ 2
        , param == "max_area_m2" ~ 3
        , param == "convexity_pct" ~ 4
        , param == "circle_fit_iou_pct" ~ 5
      ) %>% 
      factor(
        ordered = T
        , levels = 1:5
        , labels = c(
          "max_ht_m"
          , "min_area_m2"
          , "max_area_m2"
          , "convexity_pct"
          , "circle_fit_iou_pct"
        )
      )
    , metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
        # rmse
        , metric == "pct_diff_volume_field_mape" ~ 4
        , metric == "pct_diff_volume_image_mape" ~ 5
        , metric == "pct_diff_area_field_mape" ~ 6
        , metric == "pct_diff_area_image_mape" ~ 7
        , metric == "pct_diff_height_mape" ~ 8
      ) %>% 
      factor(
        ordered = T
        , levels = 1:8
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
          , "MAPE: Volume field (%)"
          , "MAPE: Volume image (%)"
          , "MAPE: Area field (%)"
          , "MAPE: Area image (%)"
          , "MAPE: Height (%)"
        )
      )
  ) %>% 
  ggplot2::ggplot(
    mapping = ggplot2::aes(y = median, x = param_value, color = metric, fill = metric, group = metric, shape = metric)
  ) +
  # ggplot2::geom_ribbon(
  #   mapping = ggplot2::aes(ymin = q25, ymax = q75)
  #   , alpha = 0.2, color = NA
  # ) +
  ggplot2::geom_line(lwd = 1.5, alpha = 0.8) +
  ggplot2::geom_point(size = 2) +
  ggplot2::facet_wrap(facets = dplyr::vars(param), scales = "free_x") +
  ggplot2::scale_fill_manual(values = pal_get_pairs_fn(6)[1:5]) +
  ggplot2::scale_color_manual(values = pal_get_pairs_fn(6)[1:5]) +
  ggplot2::scale_shape_manual(values = c(15,16,17,18,0)) +
  ggplot2::scale_y_continuous(limits = c(0,1), labels = scales::percent, breaks = scales::breaks_extended(10)) +
  ggplot2::labs(x = "", y = "median value", color = "", fill = "") +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "top"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
  ) +
  ggplot2::guides(
    color = ggplot2::guide_legend(override.aes = list(shape = 15, linetype = 0, size = 5, alpha = 1))
    , shape = "none"
  )

ggplot2::ggsave("../data/best_pile_detect_test_mape.jpeg", height = 8, width = 9)

these results indicate that the method’s accuracy in outlining slash pile form is robust to parameter changes, as the relevant shape accuracy metrics remain consistent across various settings

5.2.3.5 Best results: instance segmentation

we can cut to the chase and just look at the parameter combinations that achieved the best results based on the evaluation metrics

pct_rank_th_temp <- 0.99
df_temp <- param_combos_gt_agg %>% 
  dplyr::arrange(desc(f_score), desc(recall), desc(precision)) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(f_score, recall, precision)
    , .fn = dplyr::percent_rank
    , .names = "pct_rank_{.col}"
  )) %>% 
  # now get the max of these pct ranks by row
  dplyr::rowwise() %>% 
  dplyr::mutate(
    pct_rank_max = max(
      dplyr::c_across(
        tidyselect::starts_with("pct_rank")
      )
      , na.rm = T
    )
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    rank_lab = dplyr::case_when(
      pct_rank_f_score>=pct_rank_th_temp ~ 1
      , pct_rank_recall>=pct_rank_th_temp ~ 2
      , pct_rank_precision>=pct_rank_th_temp ~ 3
      , (pct_rank_f_score<=(1-pct_rank_th_temp) & dplyr::coalesce(f_score,0)>0) ~ 4
    ) %>% 
      factor(
        ordered = T
        , levels = 1:4
        , labels = c(
          paste0("top ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " F-score")
          , paste0("top ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " Recall")
          , paste0("top ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " Precision")
          , paste0("bottom ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " F-score")
        )
      )
  ) %>% 
  dplyr::filter(
    !is.na(rank_lab) # pct_rank_max>=0.99
  ) %>%
  dplyr::mutate(
    lab = stringr::str_c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct, sep = ":")
  ) 
df_temp %>% 
  tidyr::pivot_longer(
    cols = c(precision,recall,f_score)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "f_score" ~ 1
        , metric == "recall" ~ 2
        , metric == "precision" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "F-score"
          , "Recall"
          , "Precision"
        )
      )
    , lab = forcats::fct_reorder(lab, pct_rank_f_score)
    , val_lab = scales::percent(value, accuracy = 0.1)
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, y = lab, fill = metric, label = val_lab)
  ) +
  ggplot2::geom_col(width = 0.6) +
  ggplot2::geom_text(color = "black", size = 2, hjust = -0.2) +
  ggplot2::scale_fill_manual(values = pal_eval_metric) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1)
    , limits = c(0,1.05)
    # , expand = expansion(mult = c(0, .08))
  ) +
  ggplot2::facet_grid(cols = dplyr::vars(metric), rows = dplyr::vars(rank_lab), scales = "free_y") +
  ggplot2::labs(
    x = "", y = "max_ht_m : max_area_m2 : convexity_pct : circle_fit_iou_pct"
    , fill = ""
    , subtitle = "top (and bottom) parameter combinations by evaluation metrics"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text = ggplot2::element_text(size = 6)
  )

ggplot2::ggsave("../data/best_pile_detect_ever_bars.jpeg", height = 10, width = 8)

let’s make a table of these results

df_temp %>% 
  dplyr::filter(pct_rank_max>=pct_rank_th_temp) %>% 
  dplyr::select(
    rank_lab
    , max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct
    , f_score, recall, precision
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::arrange(rank_lab,desc(f_score), desc(recall), desc(precision)) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(f_score, recall, precision)
    , .fn = ~ scales::percent(.x, accuracy = 1)
  )) %>% 
  dplyr::mutate(blank= "   " ) %>% 
  dplyr::relocate(blank, .before = f_score) %>% 
  kableExtra::kbl(
    caption = "parameter combinations that achieved the best slash pile detection results"
    , col.names = c(
      "."
      ,"max_ht_m","max_area_m2","convexity_pct","circle_fit_iou_pct"
      , "   "
      , "F-score", "Recall", "Precision"
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>% 
  kableExtra::add_header_above(c(" "=6, "Evaluation Metric" = 3))

Table 5.1: parameter combinations that achieved the best slash pile detection results
					Evaluation Metric
.	max_ht_m	max_area_m2	convexity_pct	circle_fit_iou_pct	F-score	Recall	Precision
top 1% F-score	4	60	0.7	0.6	72%	82%	64%
	4	80	0.2	0.6	72%	83%	63%
	4	60	0.3	0.6	71%	81%	63%
	4	110	0.4	0.6	71%	80%	63%
	4	70	0.3	0.6	71%	79%	64%
	4	90	0.3	0.6	71%	80%	63%
	4	50	0.3	0.6	70%	83%	61%
	4	110	0.7	0.6	70%	80%	62%
	4	120	0.7	0.6	70%	81%	61%
	4	120	0.8	0.6	70%	78%	63%
	4	80	0.5	0.6	70%	79%	62%
	4	100	0.6	0.7	69%	64%	76%
	4	40	0.6	0.6	69%	80%	61%
top 1% Recall	4	110	0.5	0.3	58%	95%	42%
	5	90	0.7	0.4	54%	95%	38%
	5	110	0.5	0.4	53%	95%	37%
	5	50	0.3	0.4	51%	95%	35%
	5	90	0.7	0.3	50%	95%	34%
	5	110	0.5	0.3	49%	95%	33%
	5	130	0.5	0.3	49%	95%	33%
	5	50	0.3	0.3	49%	96%	32%
	5	50	0.2	0.3	48%	95%	32%
	6	100	0.7	0.3	43%	95%	28%
	6	110	0.6	0.3	42%	95%	27%
	6	60	0.4	0.3	42%	96%	27%
	6	50	0.4	0.3	41%	95%	26%
top 1% Precision	4	70	0.7	0.8	30%	18%	81%
	4	80	0.5	0.8	28%	17%	87%
	4	130	0.2	0.8	27%	17%	80%
	4	40	0.3	0.8	25%	15%	82%
	4	80	0.6	0.8	24%	14%	81%
	4	110	0.4	0.8	24%	14%	81%
	4	110	0.5	0.8	24%	14%	81%
	4	120	0.3	0.8	24%	14%	81%
	4	90	0.5	0.8	21%	12%	79%
	4	130	0.7	0.8	21%	12%	79%
	4	50	0.8	0.8	20%	12%	88%
	4	90	0.2	0.8	18%	10%	86%

  # kableExtra::scroll_box(height = "8in")

let’s check out maps of the pile detection results for the parameter combinations with the highest F-score

df_temp <- df_temp %>% 
  dplyr::filter(pct_rank_f_score>=pct_rank_th_temp) %>% 
  dplyr::arrange(desc(f_score), desc(recall), desc(precision)) %>% 
  dplyr::select(
    rn, rank_lab
    , max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct
    , f_score, recall, precision
  ) %>% 
  dplyr::slice_head(n=4)
# ortho
ortho_plt_temp <- ortho_plt_fn(
    stand = sf::st_bbox(stand_boundary) %>%
      sf::st_as_sfc() %>%
      sf::st_transform(sf::st_crs(param_combos_piles))
    , buffer = 10
  )
# plot it
plt_list_temp <- 1:nrow(df_temp) %>%
  purrr::map(\(x)
    ortho_plt_temp +
    # ggplot2::ggplot() +
      ggplot2::geom_sf(
        data = stand_boundary %>% 
          sf::st_transform(sf::st_crs(param_combos_piles))
        , color = "black", fill = NA
      ) +
      ggplot2::geom_sf(
        data = 
          slash_piles_polys %>% 
          dplyr::filter(is_in_stand) %>% 
          sf::st_transform(sf::st_crs(param_combos_piles)) %>% 
          dplyr::mutate(pile_id=as.numeric(pile_id)) %>% 
            dplyr::left_join(
              param_combos_gt %>% 
                dplyr::filter(rn == df_temp$rn[x]) %>% 
                dplyr::select(pile_id,match_grp)
              , by = "pile_id"
            )
        , mapping = ggplot2::aes(fill = match_grp)
        , color = NA ,alpha=0.6
      ) + 
      ggplot2::geom_sf(
        data =
          param_combos_piles %>% 
            dplyr::filter(rn == df_temp$rn[x] & is_in_stand) %>% 
            dplyr::left_join(
              param_combos_gt %>%
                dplyr::filter(rn == df_temp$rn[x] & is_in_stand) %>% 
                dplyr::select(pred_id,match_grp)
              , by = "pred_id"
            )
        , mapping = ggplot2::aes(fill = match_grp, color = match_grp)
        , alpha = 0
        , lwd = 0.5
      ) +
      ggplot2::scale_fill_manual(values = pal_match_grp, name = "", na.value = "red") +
      ggplot2::scale_color_manual(values = pal_match_grp, name = "", na.value = "red") +
      ggplot2::labs(
        title = paste0(
          " F-score = ", scales::percent(df_temp$f_score[x], accuracy = 1)
          ," | Recall = ", scales::percent(df_temp$recall[x], accuracy = 1)
          ," | Precision = ", scales::percent(df_temp$precision[x], accuracy = 1)
        )
        , subtitle = paste0(
          " max_ht_m = ", (df_temp$max_ht_m[x])
          ," : max_area_m2 = ", (df_temp$max_area_m2[x])
          ," : convexity_pct = ", (df_temp$convexity_pct[x])
          ," : circle_fit_iou_pct = ", (df_temp$circle_fit_iou_pct[x])
          , "\n"
        )
      ) +
      ggplot2::theme_void() +
      ggplot2::theme(
        legend.position = "top"
        # , panel.background = ggplot2::element_rect(fill = "gray66")
        , plot.title = ggplot2::element_text(size = 8)
        , plot.subtitle = ggplot2::element_text(size = 6)
      ) +
      ggplot2::guides(
        fill = ggplot2::guide_legend(override.aes = list(color = c(pal_match_grp["commission"],NA,NA)))
        , color = "none"
      )
  )

patchwork::wrap_plots(
    plt_list_temp
    , ncol = 2
    , guides = "collect"
  ) &
  ggplot2::theme(legend.position = "bottom")

ggplot2::ggsave("../data/best_pile_detect_ever.jpeg", height = 9.5, width = 8.5)

5.2.3.6 Best results: form quantification

we can cut to the chase and just look at the parameter combinations that achieved the best results based on the form quantification

pct_rank_th_temp <- 0.99
df_temp <-
  param_combos_gt_agg %>% 
  dplyr::arrange(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
    , .fn = dplyr::percent_rank
    , .names = "pct_rank_{.col}"
  )) %>% 
  # now get the max of these pct ranks by row
  dplyr::rowwise() %>% 
  dplyr::mutate(
    pct_rank_max = max(
      dplyr::c_across(
        tidyselect::starts_with("pct_rank")
      )
      , na.rm = T
    )
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(
    rank_lab = dplyr::case_when(
      pct_rank_pct_diff_volume_field_mape<=(1-pct_rank_th_temp) ~ 1
      , pct_rank_pct_diff_area_field_mape<=(1-pct_rank_th_temp) ~ 2
      , pct_rank_pct_diff_height_mape<=(1-pct_rank_th_temp) ~ 3
    ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          paste0("top ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " MAPE Volume field")
          , paste0("top ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " MAPE Area field")
          , paste0("top ", scales::percent(1-pct_rank_th_temp, accuracy = 1), " MAPE Height")
        )
      )
  ) %>% 
  dplyr::filter(
    !is.na(rank_lab) # pct_rank_max>=0.99
  ) %>%
  dplyr::mutate(
    lab = stringr::str_c(max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct, sep = ":")
  ) 
df_temp %>% 
  tidyr::pivot_longer(
    cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
    , names_to = "metric"
    , values_to = "value"
  ) %>% 
  dplyr::mutate(
    metric = dplyr::case_when(
        metric == "pct_diff_volume_field_mape" ~ 1
        , metric == "pct_diff_area_field_mape" ~ 2
        , metric == "pct_diff_height_mape" ~ 3
      ) %>% 
      factor(
        ordered = T
        , levels = 1:3
        , labels = c(
          "MAPE Volume field"
          , "MAPE Area field"
          , "MAPE Height"
        )
      )
    , lab = forcats::fct_reorder(lab, pct_rank_pct_diff_volume_field_mape) %>% forcats::fct_rev()
    , val_lab = scales::percent(value, accuracy = 0.1)
  ) %>%
  ggplot2::ggplot(
    mapping = ggplot2::aes(x = value, y = lab, fill = metric, label = val_lab)
  ) +
  ggplot2::geom_col(width = 0.6) +
  ggplot2::geom_text(color = "black", size = 2, hjust = -0.2) +
  ggplot2::scale_fill_manual(values = pal_get_pairs_fn(6)[1:5][c(1,3,5)]) +
  ggplot2::scale_x_continuous(
    labels = scales::percent_format(accuracy = 1)
    , limits = c(0,1.05)
    # , expand = expansion(mult = c(0, .08))
  ) +
  ggplot2::facet_grid(cols = dplyr::vars(metric), rows = dplyr::vars(rank_lab), scales = "free_y") +
  ggplot2::labs(
    x = "", y = "max_ht_m : max_area_m2 : convexity_pct : circle_fit_iou_pct"
    , fill = ""
    , subtitle = "top (and bottom) parameter combinations by evaluation metrics"
  ) +
  ggplot2::theme_light() +
  ggplot2::theme(
    legend.position = "none"
    , strip.text = ggplot2::element_text(size = 11, color = "black", face = "bold")
    , axis.text = ggplot2::element_text(size = 6)
  )

let’s make a table of these results

df_temp %>% 
  dplyr::select(
    rank_lab
    , max_ht_m,max_area_m2,convexity_pct,circle_fit_iou_pct
    , pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::arrange(rank_lab,pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape) %>% 
  dplyr::mutate(dplyr::across(
    .cols = c(pct_diff_volume_field_mape, pct_diff_area_field_mape, pct_diff_height_mape)
    , .fn = ~ scales::percent(.x, accuracy = 1)
  )) %>% 
  dplyr::mutate(blank= "   " ) %>% 
  dplyr::relocate(blank, .before = pct_diff_volume_field_mape) %>% 
  kableExtra::kbl(
    caption = "parameter combinations that achieved the best slash pile form quantification results"
    , col.names = c(
      "."
      ,"max_ht_m","max_area_m2","convexity_pct","circle_fit_iou_pct"
      , "   "
      , "MAPE Volume field"
      , "MAPE Area field"
      , "MAPE Height"
    )
    , escape = F
  ) %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>% 
  kableExtra::add_header_above(c(" "=6, "Evaluation Metric" = 3))

Table 5.2: parameter combinations that achieved the best slash pile form quantification results
					Evaluation Metric
.	max_ht_m	max_area_m2	convexity_pct	circle_fit_iou_pct	MAPE Volume field	MAPE Area field	MAPE Height
top 1% MAPE Volume field	4	100	0.5	0.8	28%	16%	16%
	6	60	0.4	0.8	30%	21%	21%
	6	130	0.3	0.8	31%	19%	17%
	5	110	0.5	0.8	32%	20%	16%
	6	80	0.5	0.8	33%	23%	17%
	6	80	0.6	0.8	35%	16%	14%
	4	50	0.6	0.8	36%	21%	21%
	5	130	0.4	0.8	36%	15%	16%
	5	80	0.2	0.8	36%	16%	19%
	6	110	0.3	0.8	37%	24%	18%
	5	50	0.2	0.7	37%	21%	17%
	5	120	0.6	0.7	38%	21%	18%
	5	100	0.4	0.8	39%	18%	21%
top 1% MAPE Area field	6	70	0.3	0.8	39%	13%	16%
	6	120	0.4	0.8	46%	15%	19%
	4	100	0.8	0.8	47%	14%	25%
	4	120	0.7	0.8	47%	15%	20%
	4	60	0.2	0.8	47%	14%	21%
	6	70	0.7	0.8	47%	13%	21%
	4	100	0.2	0.8	48%	13%	17%
	6	100	0.2	0.8	48%	12%	20%
	4	90	0.4	0.8	53%	11%	23%
	6	110	0.2	0.8	53%	15%	16%
	5	120	0.3	0.8	54%	15%	25%
	5	110	0.3	0.8	57%	15%	17%
	4	130	0.8	0.8	59%	14%	17%
top 1% MAPE Height	4	90	0.6	0.8	39%	22%	14%
	6	40	0.5	0.8	44%	20%	15%
	6	120	0.7	0.8	44%	19%	15%
	5	80	0.4	0.8	46%	15%	15%
	5	120	0.4	0.8	49%	18%	14%
	4	60	0.5	0.8	49%	17%	15%
	4	70	0.2	0.8	52%	16%	15%
	4	100	0.4	0.8	54%	16%	16%
	6	40	0.3	0.8	57%	19%	15%
	4	110	0.8	0.8	58%	22%	15%
	6	70	0.6	0.8	67%	19%	14%

  # kableExtra::scroll_box(height = "8in")

5.2.3.7 Parameter combination drill-downs

The objective of these combination drill-down plots is to convey the pile detection evaluation results for a given set of the convexity_pct, circle_fit_iou_pct, and max_area_m2 parameter settings while holding the max_ht_m and min_area_m2 parameters constant since these should be defined based on the expected values of the treatment area. Based on our validation data, it was clear that the optimal max_ht_m value was 4.0 and that setting the value above this level resulted in understory trees being incorrectly detected as slash piles.

5.2.3.7.1 F-score

param_combos_gt_agg %>% 
  dplyr::filter(
    max_ht_m == best_max_ht_m_temp
    & min_area_m2 == best_min_area_m2_temp
  ) %>% 
  dplyr::mutate(
    flab = stringr::str_c("max_area_m2",max_area_m2,sep = " = ") %>% 
      forcats::fct_reorder(max_area_m2)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(
      y = as.factor(convexity_pct)
      , x = as.factor(circle_fit_iou_pct)
      , fill = f_score
      , label = scales::percent(f_score,accuracy = 1)
    )) +
    ggplot2::geom_tile(color = "white") +
    ggplot2::geom_text(color = "black", size = 3) +
    ggplot2::facet_wrap(facets = dplyr::vars(flab)) + 
    ggplot2::scale_fill_distiller(
      palette = "Oranges"
      , direction = 1
      , labels = scales::percent_format(accuracy = 1)
      , limits = c(0,1)
    ) +
    labs(
      x = "circle_fit_iou_pct"
      , y = "convexity_pct"
      , subtitle = "F-score by parameter setting combination levels"
    ) +
    theme_light() + 
    theme(
      legend.position = "none"
      , axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
      , panel.background = element_blank()
      , panel.grid = element_blank()
      # , plot.title = element_text(hjust = 0.5)
      # , plot.subtitle = element_text(hjust = 0.5)
      , strip.text = element_text(color = "black", face = "bold")
    )

5.2.3.7.2 Recall

param_combos_gt_agg %>% 
  dplyr::filter(
    max_ht_m == best_max_ht_m_temp
    & min_area_m2 == best_min_area_m2_temp
  ) %>% 
  dplyr::mutate(
    flab = stringr::str_c("max_area_m2",max_area_m2,sep = " = ") %>% 
      forcats::fct_reorder(max_area_m2)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(
      y = as.factor(convexity_pct)
      , x = as.factor(circle_fit_iou_pct)
      , fill = recall
      , label = scales::percent(recall,accuracy = 1)
    )) +
    ggplot2::geom_tile(color = "white") +
    ggplot2::geom_text(color = "black", size = 3) +
    ggplot2::facet_wrap(facets = dplyr::vars(flab)) + 
    ggplot2::scale_fill_gradient(
      low = "gray97", high = "gray27"
      , labels = scales::percent_format(accuracy = 1)
      , limits = c(0,1)
    ) +
    labs(
      x = "circle_fit_iou_pct"
      , y = "convexity_pct"
      , subtitle = "Recall by parameter setting combination levels"
    ) +
    theme_light() + 
    theme(
      legend.position = "none"
      , axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
      , panel.background = element_blank()
      , panel.grid = element_blank()
      # , plot.title = element_text(hjust = 0.5)
      # , plot.subtitle = element_text(hjust = 0.5)
      , strip.text = element_text(color = "black", face = "bold")
    )

5.2.3.7.3 Precision

param_combos_gt_agg %>% 
  dplyr::filter(
    max_ht_m == best_max_ht_m_temp
    & min_area_m2 == best_min_area_m2_temp
  ) %>% 
  dplyr::mutate(
    flab = stringr::str_c("max_area_m2",max_area_m2,sep = " = ") %>% 
      forcats::fct_reorder(max_area_m2)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(
      y = as.factor(convexity_pct)
      , x = as.factor(circle_fit_iou_pct)
      , fill = precision
      , label = scales::percent(precision,accuracy = 1)
    )) +
    ggplot2::geom_tile(color = "white") +
    ggplot2::geom_text(color = "black", size = 3) +
    ggplot2::facet_wrap(facets = dplyr::vars(flab)) + 
    ggplot2::scale_fill_distiller(
      palette = "Purples"
      , direction = 1
      , labels = scales::percent_format(accuracy = 1)
      , limits = c(0,1)
    ) +
    labs(
      x = "circle_fit_iou_pct"
      , y = "convexity_pct"
      , subtitle = "Precision by parameter setting combination levels"
    ) +
    theme_light() + 
    theme(
      legend.position = "none"
      , axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
      , panel.background = element_blank()
      , panel.grid = element_blank()
      # , plot.title = element_text(hjust = 0.5)
      # , plot.subtitle = element_text(hjust = 0.5)
      , strip.text = element_text(color = "black", face = "bold")
    )

5.2.3.7.4 MAPE Volume field

param_combos_gt_agg %>% 
  dplyr::filter(
    max_ht_m == best_max_ht_m_temp
    & min_area_m2 == best_min_area_m2_temp
  ) %>% 
  dplyr::mutate(
    flab = stringr::str_c("max_area_m2",max_area_m2,sep = " = ") %>% 
      forcats::fct_reorder(max_area_m2)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(
      y = as.factor(convexity_pct)
      , x = as.factor(circle_fit_iou_pct)
      , fill = pct_diff_volume_field_mape
      , label = scales::percent(pct_diff_volume_field_mape,accuracy = 1)
    )) +
    ggplot2::geom_tile(color = "white") +
    ggplot2::geom_text(color = "black", size = 3) +
    ggplot2::facet_wrap(facets = dplyr::vars(flab)) + 
    ggplot2::scale_fill_distiller(
      palette = "Blues"
      , direction = 1
      , labels = scales::percent_format(accuracy = 1)
      , limits = c(0,1)
    ) +
    labs(
      x = "circle_fit_iou_pct"
      , y = "convexity_pct"
      , subtitle = "MAPE field-collected volume by parameter setting combination levels"
    ) +
    theme_light() + 
    theme(
      legend.position = "none"
      , axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
      , panel.background = element_blank()
      , panel.grid = element_blank()
      # , plot.title = element_text(hjust = 0.5)
      # , plot.subtitle = element_text(hjust = 0.5)
      , strip.text = element_text(color = "black", face = "bold")
    )

5.2.3.7.5 MAPE Area field

param_combos_gt_agg %>% 
  dplyr::filter(
    max_ht_m == best_max_ht_m_temp
    & min_area_m2 == best_min_area_m2_temp
  ) %>% 
  dplyr::mutate(
    flab = stringr::str_c("max_area_m2",max_area_m2,sep = " = ") %>% 
      forcats::fct_reorder(max_area_m2)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(
      y = as.factor(convexity_pct)
      , x = as.factor(circle_fit_iou_pct)
      , fill = pct_diff_area_field_mape
      , label = scales::percent(pct_diff_area_field_mape,accuracy = 1)
    )) +
    ggplot2::geom_tile(color = "white") +
    ggplot2::geom_text(color = "black", size = 3) +
    ggplot2::facet_wrap(facets = dplyr::vars(flab)) + 
    ggplot2::scale_fill_distiller(
      palette = "YlGn"
      , direction = 1
      , labels = scales::percent_format(accuracy = 1)
      , limits = c(0,1)
    ) +
    labs(
      x = "circle_fit_iou_pct"
      , y = "convexity_pct"
      , subtitle = "MAPE field-collected area by parameter setting combination levels"
    ) +
    theme_light() + 
    theme(
      legend.position = "none"
      , axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
      , panel.background = element_blank()
      , panel.grid = element_blank()
      # , plot.title = element_text(hjust = 0.5)
      # , plot.subtitle = element_text(hjust = 0.5)
      , strip.text = element_text(color = "black", face = "bold")
    )

5.2.3.7.6 MAPE Height

param_combos_gt_agg %>% 
  dplyr::filter(
    max_ht_m == best_max_ht_m_temp
    & min_area_m2 == best_min_area_m2_temp
  ) %>% 
  dplyr::mutate(
    flab = stringr::str_c("max_area_m2",max_area_m2,sep = " = ") %>% 
      forcats::fct_reorder(max_area_m2)
  ) %>% 
  ggplot2::ggplot(mapping = ggplot2::aes(
      y = as.factor(convexity_pct)
      , x = as.factor(circle_fit_iou_pct)
      , fill = pct_diff_height_mape
      , label = scales::percent(pct_diff_height_mape,accuracy = 1)
    )) +
    ggplot2::geom_tile(color = "white") +
    ggplot2::geom_text(color = "black", size = 3) +
    ggplot2::facet_wrap(facets = dplyr::vars(flab)) + 
    ggplot2::scale_fill_distiller(
      palette = "Reds"
      , direction = 1
      , labels = scales::percent_format(accuracy = 1)
      , limits = c(0,1)
    ) +
    labs(
      x = "circle_fit_iou_pct"
      , y = "convexity_pct"
      , subtitle = "MAPE field-collected height by parameter setting combination levels"
    ) +
    theme_light() + 
    theme(
      legend.position = "none"
      , axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
      , panel.background = element_blank()
      , panel.grid = element_blank()
      # , plot.title = element_text(hjust = 0.5)
      # , plot.subtitle = element_text(hjust = 0.5)
      , strip.text = element_text(color = "black", face = "bold")
    )