Section 5 Raster-based: Watershed Segmentation
We’ll first attempt to detect slash piles using raster-based methods with the DTM and CHM. These raster-based approaches are simple and efficient but can be limited in complex forest structures where piles might be occluded by overstory trees and rasterization simplifies/removes some of the rich 3D information in the point cloud. For slash piles, which are often irregular and may not have a distinct “treetop” equivalent, common CHM-based tree detection methods might be less directly applicable. We’ll see…
When used for individual tree detection (ITD), this technique treats the CHM as a topographic surface, where local maxima represent tree tops and valleys represent crown boundaries. A “water source” is conceptually placed at each local lowest point, and the surface is “flooded.” Barriers are generated where different “water sources” meet, forming watershed lines that delineate individual tree crowns.
two possible approaches for segmenting piles are to: 1) segment individual trees using a top-down approach and then use the canopy cover as a mask to then identify slash piles; 2) use a bottoms-up approach to perform slash pile segmentation on a lower “slice” of the CHM based on an expected maximum height of a pile
we’ll first try the bottoms-up approach using a CHM slice with a maximum height of 4 m based on the highest point in the point cloud (3.65 m) of the slash pile which had the tallest measurement in the field (6.4 m). The 6.4 m (21 ft) field measured height seems unlikely even for a machine pile, so we’ll use the point cloud measured height which will align with the CHM data.
check out the lower slice of the CHM
# set the max expected pile height
max_ht_m <- 4
# lower CHM slice
cloud2raster_ans$chm_rast %>%
terra::clamp(upper = max_ht_m, lower = 0, values = F) %>%
terra::plot(col = viridis::plasma(100), axes = F)
already, it looks like the piles should be distinguishable objects from this data
5.1 Method Demonstration
let’s run watershed segmentation using lidR::watershed()
which is based on the bioconductor package EBIimage
watershed_ans <- lidR::watershed(
chm = cloud2raster_ans$chm_rast %>%
terra::clamp(upper = max_ht_m, lower = 0, values = F)
, th_tree = 0.5
)()
# this is a raster
watershed_ans
## class : SpatRaster
## size : 2740, 3473, 1 (nrow, ncol, nlyr)
## resolution : 0.2, 0.2 (x, y)
## extent : 499264.2, 499958.8, 4317599, 4318147 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 13N (EPSG:32613)
## source(s) : memory
## varname : chm_0.2m
## name : focal_mean
## min value : 1
## max value : 10007
each value should be a unique “segment” which we can refine based on rules of expected size and shape of piles
## layer value count
## 1 1 8355 13
## 2 1 2492 23
## 3 1 2938 43
## 4 1 7252 1
## 5 1 3954 4
## 6 1 8463 15
## 7 1 3218 9
## 8 1 6997 9
## 9 1 4270 1
## 10 1 8673 2
where the “value” is the segment identifier and the count is the number of raster cells assigned to that segment
let’s plot the raster return from the watershed segmentation
watershed_ans %>%
terra::plot(
col = c(
viridis::turbo(n = floor(terra::minmax(watershed_ans)[2]/3))
, viridis::viridis(n = floor(terra::minmax(watershed_ans)[2]/3))
, viridis::cividis(n = floor(terra::minmax(watershed_ans)[2]/3))
) %>% sample()
, legend = F
, axes = F
)
plot it with the watershed segmented piles (pink) and the actual piles (blue)
# plot it
terra::plotRGB(ortho_rast, stretch = "lin", colNA = "transparent")
terra::plot(
watershed_ans %>%
terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T)
, add = T, border = "hotpink", col = NA, lwd = 1.2
)
terra::plot(
slash_piles_polys %>%
sf::st_transform(terra::crs(ortho_rast)) %>% terra::vect()
, add = T, border = "blue", col = NA, lwd = 1
)
nice…we are getting close.
ideally, we want objects that: i) meet the height threshold over the entire surface of the segment (no doughnuts); ii) are not irregularly shaped (relatively few inward angles); iii) are circular in shape; and iv) meet an expected pile area threshold (minimum/maximum expected area)
5.1.1 Geometric filtering: Irregularity Filtering
now let’s try to filter based on the geometric properties of the watershed-detected segments.
we’ll make a convex hull of the polygons generated from a raster to smooth out the square edges and any inward curves or indentations, resulting in a boundary that’s always convex (no inward angles). using a convex hull we will be able to filter out:
- watershed detected segments that were actually lower branches of a tree. these will be shaped like a doughnut with circular shape but a hole in the center
- watershed detected segments that are irregularly shaped like coarse woody debris that was not organized into piles by humans
let’s convert the watershed-detected segments from raster to vector data and create a convex hull of the shapes for comparison
# vectors of segments
watershed_ans_poly <-
watershed_ans %>%
terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>%
setNames("pred_id") %>%
sf::st_as_sf() %>%
sf::st_simplify() %>%
sf::st_make_valid() %>%
dplyr::filter(sf::st_is_valid(.)) %>%
dplyr::mutate(treeID=pred_id) %>%
cloud2trees::simplify_multipolygon_crowns() %>%
dplyr::select(-treeID)
# convex hulls of segments
watershed_ans_poly_chull <-
watershed_ans_poly %>%
sf::st_convex_hull() %>%
sf::st_simplify() %>%
sf::st_make_valid() %>%
dplyr::filter(sf::st_is_valid(.))
lets make an example area that we’ll use to demonstrate the filtering process of the watershed detected segments
aoi_temp <-
slash_piles_polys %>%
dplyr::filter(pile_id == 91) %>%
sf::st_as_sfc() %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_buffer(55)
# watershed_ans_poly %>%
# # dplyr::filter(pred_id==241) %>%
# dplyr::filter(pred_id==11916) %>%
# # dplyr::slice_sample(n = 1) %>%
# sf::st_bbox() %>%
# sf::st_as_sfc() %>%
# sf::st_buffer(55)
# list of examples
pred_id_temp <- watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
dplyr::pull(pred_id)
# plot it
ortho_plt_temp <-
ortho_plt_fn(
stand = aoi_temp
, buffer = 5
)
plot our example watershed detected segments as vectors (pink) and convex hull of the segments (orange) compared with the ground truth piles (blue)
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::filter(pred_id %in% pred_id_temp) %>%
dplyr::mutate(pred_id = as.factor(pred_id))
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = watershed_ans_poly_chull %>%
dplyr::filter(pred_id %in% pred_id_temp)
, color = "orangered"
, fill = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::theme_void() +
ggplot2::theme(legend.position = "none")
let’s first filter out segments that have holes in them or are very irregularly shaped by comparing the area of the polygon and convex hull
# min required overlap between the predicted pile and the convex hull of the predicted pile
pct_chull_overlap <- 0.7
# compare areas
watershed_keep_overlaps_chull_pred_id <- watershed_ans_poly %>%
dplyr::mutate(poly_area_m2 = sf::st_area(.) %>% as.numeric()) %>%
dplyr::inner_join(
watershed_ans_poly_chull %>%
dplyr::mutate(chull_area_m2 = sf::st_area(.) %>% as.numeric()) %>%
sf::st_drop_geometry()
, by = "pred_id"
) %>%
dplyr::mutate(
pct_chull = poly_area_m2/chull_area_m2
) %>%
dplyr::filter(
pct_chull >= pct_chull_overlap
) %>%
dplyr::pull(pred_id)
let’s make a function to ingest a spatial data frame and return polygons filtered for irregularity using this convex hull process
st_irregular_remove <- function(
sf_data
# min required overlap between the predicted pile and the convex hull of the predicted pile
, pct_chull_overlap = 0.7
) {
if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
# if not polygons
if( !(sf::st_is(sf_data, type = c("POLYGON", "MULTIPOLYGON")) %>% all()) ){
stop(paste0(
"`sf_data` data must be an `sf` class object with POLYGON geometry (see [sf::st_geometry_type()])"
))
}
# convex hulls of segments
poly_chull <-
sf_data %>%
sf::st_convex_hull() %>%
sf::st_simplify() %>%
sf::st_make_valid()
# dplyr::filter(sf::st_is_valid(.))
# compare areas
if(nrow(poly_chull)!=nrow(sf_data)){
stop("could not make valid convex hulls from provided polygon data")
}else{
area_comp <- sf_data %>%
dplyr::mutate(area_xxxx = sf::st_area(.) %>% as.numeric()) %>%
dplyr::bind_cols(
poly_chull %>%
dplyr::mutate(chull_area_xxxx = sf::st_area(.) %>% as.numeric()) %>%
dplyr::select(chull_area_xxxx) %>%
sf::st_drop_geometry()
) %>%
dplyr::mutate(
pct_chull = area_xxxx/chull_area_xxxx
) %>%
dplyr::filter(
pct_chull >= pct_chull_overlap
) %>%
dplyr::select(-c(area_xxxx,chull_area_xxxx))
return(area_comp)
}
}
# length(watershed_keep_overlaps_chull_pred_id)
watershed_keep_overlaps_chull_pred_id <-
watershed_ans_poly %>%
st_irregular_remove(pct_chull_overlap = pct_chull_overlap) %>%
dplyr::pull(pred_id)
now, we’ll look at which piles meet the minimum overlap threshold (black outline) between the segmented polygon and the convex hull and will be kept compared to those that did not meet the irregularity threshold (red) and will be removed
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::filter(pred_id %in% pred_id_temp) %>%
dplyr::mutate(pred_id = as.factor(pred_id))
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = watershed_ans_poly_chull %>%
dplyr::filter(pred_id %in% pred_id_temp) %>%
dplyr::mutate(
meets_overlap = pred_id %in% watershed_keep_overlaps_chull_pred_id
)
, ggplot2::aes(color = meets_overlap)
, fill = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::scale_color_manual(values = c("red","black")) +
ggplot2::theme_void() +
ggplot2::theme(legend.position = "top")
it looks like the filtering using convex hulls successfully removed most segments with irregular shapes and holes. following this, an area filter is applied based on the expected minimum and maximum pile areas, along with a circularity filter that uses least squares circle fitting to remove non-circular shapes. this expected area and geometric shape filtering is performed to ensure that only the most likely slash pile candidates proceed to the next stage. a key limitation of raster-based methods, however, is that they treat continuous surfaces as discrete cell heights, which can fragment larger objects into smaller segments. to mitigate this, the chm raster will be smoothed using only the area of the remaining candidate segments to generalize boundaries, aggregate disconnected segments, and fill in gaps created by the rasterization of the continuous data. finally, a second pass of irregularity filtering, using the same convex hull process, is applied to remove any new irregular shapes generated during the smoothing and aggregation.
5.1.2 Area Filtering
apply an area filter based on the minimum and maximum expected pile areas
##### area thresholds
min_area_m2 <- 2
# Two standard US parking spaces, typically measuring 9 feet by 18 feet,
# are roughly equivalent to 30.25 square meters. Each space is approximately 15.125 square meters.
# 15.125*3
max_area_m2 <- 50
# filter the remaining segments by area
watershed_ans_poly <-
watershed_ans_poly %>%
# dplyr::filter(pred_id %in% watershed_keep_overlaps_chull_pred_id) %>%
st_irregular_remove(pct_chull_overlap = pct_chull_overlap) %>%
dplyr::mutate(area_xxxx = sf::st_area(.) %>% as.numeric()) %>%
# filter out the segments that don't meet the size thresholds
dplyr::filter(
dplyr::coalesce(area_xxxx,0) >= min_area_m2
& dplyr::coalesce(area_xxxx,0) <= max_area_m2
) %>%
dplyr::select(-c(area_xxxx))
example of watershed detected segments as vectors (pink) filtered to remove irregular shapes and segments outside of the expected area thresholds; compared with the ground truth piles (blue)
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
)
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::theme_void()
it looks like we are on the right track. now we need to remove the remaining candidate segments that do not meet our expectations for having a circular base
5.1.3 Geometric filtering: Circularity Filtering
now, let’s apply a circle-fitting algorithm to remove non-circular segments from the remaining segments
Least squares circle fitting is a method to find the circle that best approximates a set of data points by minimizing the sum of the squared distances between the points and the circle. The lidR::fit_circle()
function finds the best-fitting flat, horizontal circle for a group of 3D points, even if some of those points are messy or don’t quite fit. It determines the circle’s center and size, and also provides an “angular range” to show how much of a complete circle the points actually form, which is a more reliable measure than a simple error value (e.g. RMSE).
The “angular range” tells you how much of a complete circle the points in the watershed-detected segment actually cover. Imagine drawing a circle, and then only having points along a part of its edge, here’s how to interpret it:
- 360 degrees suggests the points form a full, unbroken circle, like the base of a perfectly round slash pile.
- 180 degrees would mean the points only form a half-circle or a semi-circle.
A smaller range (e.g., 90 degrees) indicates just a partial arc or a small curve. This can help us determine if a group of points truly represents a circular shape, which is useful for identifying objects like slash piles that are expected to have a round base (e.g. conical or half-sphere in shape).
we’ll define a function to pass our sf
polygon data of watershed detected segments and return a sf
data of the fitted circles
##########################
# 1)
# function to return sf circle from xy center and radius in given crs
# to handle return from common circle fitting algorithms
##########################
point_xy_radius_to_circle_sf <- function(
center_x
, center_y
, radius
, crs = NULL
) {
if(is.null(crs)){stop("need a crs, guy")}
# create a point geometry object
center_point <- sf::st_point(c(center_x, center_y))
# create an sf object from the point
center_sf <- sf::st_sf(
data.frame(
center_x = center_x
, center_y = center_y
, radius = radius
)
, geometry = sf::st_sfc(center_point)
, crs = crs
)
# create the circle geometry by buffering the point
circle_sf <- sf::st_buffer(center_sf, dist = radius)
return(circle_sf)
}
##########################
# 2)
# function to generate 2d xy points from polygon feature
# to pass to common circle fitting algorithms
# !!! only works with a singular polygon at a time
##########################
poly_to_points <- function(
sf_data
, as_spatial = F # if set to F, returns xy dataframe; if T returns sf data
, simplify_multipolygons = F # if set to T, multipolygons are simplified by keeping the largest segment
) {
if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
# just work with the first
if(nrow(sf_data)>1){stop("this function only works with a single record at a time")}
# simplify_multipolygons
if(simplify_multipolygons){
sf_data <- sf_data %>%
dplyr::mutate(treeID=1) %>%
cloud2trees::simplify_multipolygon_crowns() %>%
dplyr::select(-treeID)
}
# get point coordinates
xy_temp <-
sf_data %>%
sf::st_coordinates() %>%
dplyr::as_tibble() %>%
dplyr::rename_with(tolower) %>%
dplyr::select(x,y) %>%
dplyr::mutate(z=0)
# as_spatial
if(as_spatial){
xy_temp <- xy_temp %>%
sf::st_as_sf(coords = c("x", "y"), crs = sf::st_crs(sf_data), remove = F)
}
return(xy_temp)
}
# watershed_ans_poly %>%
# dplyr::filter(pred_id %in% c(7717)) %>%
# # poly_to_points(as_spatial = T) %>%
# poly_to_points(as_spatial = F) %>%
# ggplot() +
# # geom_sf()
# geom_point(aes(x=x,y=y))
##########################
# 3)
# function to combine poly_to_points, lidR::fit_circle, and point_xy_radius_to_circle_sf
# !!! only works with a singular polygon at a time
##########################
poly_circle_fit <- function(
poly
# if set to T, multipolygons are simplified by keeping the largest segment
, simplify_multipolygons = F
# number of iterations for the RANSAC fitting algorithm
, num_iterations = 100
# threshold value; points are considered inliers if their residuals are below this value
, inlier_threshold = 0.01
) {
# poly_to_points
poly_to_points_ans <- poly_to_points(poly, as_spatial = F, simplify_multipolygons = simplify_multipolygons)
# fit_circle
fit_circle_ans <- lidR::fit_circle(
points = poly_to_points_ans %>% as.matrix()
# number of iterations for the RANSAC fitting algorithm
, num_iterations = num_iterations
# threshold value; points are considered inliers if their residuals are below this value
, inlier_threshold = inlier_threshold
)
# point_xy_radius_to_circle_sf
ans <- point_xy_radius_to_circle_sf(
center_x = fit_circle_ans$center_x
, center_y = fit_circle_ans$center_y
, radius = fit_circle_ans$radius
, crs = sf::st_crs(poly)
)
# add other vars
ans <- ans %>%
dplyr::mutate(
covered_arc_degree = fit_circle_ans$covered_arc_degree
, percentage_inlier = fit_circle_ans$percentage_inlier
, percentage_inside = fit_circle_ans$percentage_inside
# , inliers = fit_circle_ans$inliers
)
# return
return(ans)
}
# watershed_ans_poly %>%
# dplyr::filter(pred_id == 7717) %>%
# poly_circle_fit() %>%
# ggplot() + geom_sf() +
# geom_sf(
# data = filtered_watershed_ans_poly %>% dplyr::filter(pred_id == 7717) %>% poly_to_points(as_spatial = T)
# )
# watershed_ans_poly %>%
# dplyr::filter(pred_id == 7717) %>%
# poly_circle_fit() %>%
# dplyr::glimpse()
##########################
# 4)
# function to combine poly_to_points, lidR::fit_circle, and point_xy_radius_to_circle_sf
# !!! only works with a singular polygon at a time
##########################
sf_data_circle_fit <- function(sf_data, num_iterations = 100) {
if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
# apply poly_circle_fit() to each row to get fitted circle sf data
cf <- sf_data %>%
dplyr::ungroup() %>%
dplyr::mutate(id_xxx = dplyr::row_number()) %>%
dplyr::nest_by(id_xxx) %>%
dplyr::mutate(
circle_fit = poly_circle_fit(poly = data, num_iterations=num_iterations)
) %>%
dplyr::pull(circle_fit)
# combine with original data but drop original geom
df <- sf_data %>%
sf::st_drop_geometry() %>%
dplyr::bind_cols(cf) %>%
sf::st_as_sf(crs = sf::st_crs(sf_data))
# return
return(df)
}
let’s apply the sf_data_circle_fit()
function we just defined fits the best circle using lidR::fit_circle()
to each watershed detected segment to get a spatial data frame with the best fitting circle for each segment
# apply the sf_data_circle_fit() which takes each segment polygon, transforms it to points, and the fits the best circle
watershed_ans_poly_circle_fit <- watershed_ans_poly %>%
sf_data_circle_fit()
# what is this?
watershed_ans_poly_circle_fit %>% dplyr::glimpse()
## Rows: 955
## Columns: 9
## $ pred_id <dbl> 8, 12, 20, 24, 32, 38, 50, 52, 53, 54, 55, 57, 76, …
## $ pct_chull <dbl> 0.7577640, 0.7764706, 0.7197802, 0.8000000, 0.70440…
## $ center_x <dbl> 499944.3, 499740.7, 499720.7, 499778.2, 499470.4, 4…
## $ center_y <dbl> 4317698, 4317898, 4317852, 4318049, 4317613, 431806…
## $ radius <dbl> 1.140667e+00, 7.078160e-01, 2.120892e+00, 1.035751e…
## $ covered_arc_degree <dbl> 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ percentage_inlier <dbl> 0.22857143, 0.17142857, 0.09523810, 0.14583333, 0.1…
## $ percentage_inside <dbl> 0.57142857, 0.14285714, 0.79365079, 0.45833333, 0.5…
## $ geometry <POLYGON [m]> POLYGON ((499945.4 4317698,..., POLYGON ((4…
let’s check out the distribution of the metrics that quantify the fit of the circle
watershed_ans_poly_circle_fit %>%
sf::st_drop_geometry() %>%
dplyr::select(covered_arc_degree,percentage_inlier,percentage_inside) %>%
tidyr::pivot_longer(dplyr::everything()) %>%
ggplot2::ggplot(mapping = ggplot2::aes(x = value, fill = name)) +
ggplot2::geom_density(color = NA) +
ggplot2::facet_wrap(facets = ggplot2::vars(name), scales = "free") +
ggplot2::scale_fill_viridis_d(option = "rocket", begin = 0.2, end = 0.8, alpha = 0.8) +
ggplot2::theme_light() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank()
, axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)
, axis.title.x = ggplot2::element_blank()
, legend.position = "none"
, strip.text = ggplot2::element_text(color = "black", size = 10)
)
let’s look at the best fitting circles using the remaining piles from our example above
example of watershed detected segments as vectors (pink) filtered to remove irregular shapes and segments outside of the expected area thresholds; best fitting circle of the segments (orange); compared with the ground truth piles (blue)
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
)
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = watershed_ans_poly_circle_fit %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
)
, color = "orangered"
, fill = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::theme_void()
the best fitting circles on the linear watershed detected segements are not very well fitting, we can filter using the intersection over union (IoU) between the circle and the predicted segment. we’ll use the IoU function we defined in this earlier section.
watershed_circle_fit_iou <-
watershed_ans_poly$pred_id %>%
unique() %>%
purrr::map(\(x)
ground_truth_single_match(
gt_inst = watershed_ans_poly %>%
dplyr::filter(pred_id == x)
, gt_id = "pred_id"
, predictions = watershed_ans_poly_circle_fit %>%
dplyr::filter(pred_id == x) %>%
dplyr::select(pred_id) %>% # keeping other columns causes error?
dplyr::rename(circ_pred_id = pred_id)
, pred_id = "circ_pred_id"
, min_iou_pct = 0
)
) %>%
dplyr::bind_rows()
# what did we get?
watershed_circle_fit_iou %>% dplyr::glimpse()
## Rows: 951
## Columns: 5
## $ pred_id <dbl> 8, 12, 20, 24, 32, 38, 50, 52, 53, 54, 55, 57, 76, 89, 10…
## $ i_area <dbl> 2.2841854, 1.3944221, 4.8777724, 2.1111733, 3.8128534, 6.…
## $ u_area <dbl> 4.241543e+00, 2.818808e+00, 1.448724e+01, 5.737528e+00, 6…
## $ iou <dbl> 5.385270e-01, 4.946851e-01, 3.366945e-01, 3.679587e-01, 5…
## $ circ_pred_id <dbl> 8, 12, 20, 24, 32, 38, 50, 52, 53, 54, 55, 57, 76, 89, 10…
what is the distribution of IoU of the watershed segments and the best fit circle of those segments?
watershed_circle_fit_iou %>%
ggplot2::ggplot(mapping = ggplot2::aes(x = iou)) +
ggplot2::geom_density(color = NA, fill = "navy", alpha = 0.8) +
ggplot2::labs(
x = "IoU of the watershed segments and the best fit circle"
) +
ggplot2::scale_x_continuous(labels = scales::percent) +
ggplot2::theme_light() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank()
, legend.position = "none"
, strip.text = ggplot2::element_text(color = "black", size = 10)
)
let’s color our predicted segments by the IoU with the best fitting circle
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
) %>%
dplyr::left_join(watershed_circle_fit_iou, by = "pred_id")
, mapping = ggplot2::aes(fill = iou)
, alpha = 0.9
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = watershed_ans_poly_circle_fit %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
)
, color = "orangered"
, fill = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::scale_fill_fermenter(
n.breaks = 10 # 10 use 10 if can go full range 0-1
, palette = "PuOr" # "BrBG"
, direction = 1
, limits = c(0,1) # use c(0,1) if can go full range 0-1
, labels = scales::percent
, na.value = "sienna4"
) +
ggplot2::labs(fill="IoU") +
ggplot2::theme_void() +
ggplot2::theme(
legend.position = "top"
, legend.text = ggplot2::element_text(size = 6, angle = 90, vjust = 0.5)
)
we’ll set a threshold for the minimum IoU to further filter for segments that are approximately round, this filter should remove linear objects from the watershed detections
# min required IoU between the predicted pile and the best fit circle of the predicted pile
pct_iou_circle_fit <- 0.5
# compare iou
watershed_keep_circle_fit_pred_id <- watershed_circle_fit_iou %>%
dplyr::filter(iou>=pct_iou_circle_fit) %>%
dplyr::pull(pred_id)
let’s check out the remaining watershed detected piles after: 1) filtering out the irregularly shaped segments (filtered using the convex hull), 2) smoothing and filtering out the non-circular segments (filtered using circle fitting)
example of watershed detected segments as vectors (pink) filtered to remove irregular shapes, segments outside of the expected area thresholds, and non-circular segments; compared with the ground truth piles (blue)
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
) %>%
dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id)
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::theme_void()
5.1.4 Smoothing
smooth the remaining piles to aggregate these disconnected segments and better represent the continuous, real-world form of objects
########################################
# use the remaining segments that meet the geometric and area filtering
# to smooth the watershed raster
########################################
ws_for_smooth <- 3 # needs to be the same as the CHM smooth below
smooth_watershed_ans <- watershed_ans %>%
terra::mask(
watershed_ans_poly %>% #these are irregularity and area filtered already
dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id) %>%
terra::vect()
, updatevalue=NA
) %>%
# smooths the raster using the majority value
terra::focal(w = ws_for_smooth, fun = "modal", na.rm = T, na.policy = "only") # only fill NA cells
names(smooth_watershed_ans) <- "pred_id"
# smooth_watershed_ans %>% terra::crop(aoi_temp %>% terra::vect()) %>% terra::plot(legend = F, axes = F)
ok, we now have our smoothed watershed candidate piles that have been smoothed to help generalize the boundaries of candidate piles and aggregate disconnected portions of the CHM that were fragmented by the raster’s discrete cell value generation. the objecitive of this smoothing is to better represent the continuous shape of real-world objects.
let’s compare the original watershed segmented raster with this smoothed raster, compared with the ground truth piles (blue)
# original
watershed_ans %>%
terra::mask(
watershed_ans_poly %>% #these are irregularity and area filtered already
dplyr::filter(pred_id %in% watershed_keep_circle_fit_pred_id) %>%
terra::vect()
, updatevalue=NA
) %>%
terra::crop(aoi_temp %>% terra::vect()) %>%
terra::plot(col = viridis::turbo(222) %>% sample(), axes = F, legend = F, main = "pre-smooth segments")
terra::plot(
slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% terra::vect()
, add = T, border = "blue", col = NA, lwd = 1
)
# smoothed
smooth_watershed_ans %>%
terra::crop(aoi_temp %>% terra::vect()) %>%
terra::plot(col = viridis::turbo(222) %>% sample(), axes = F, legend = F, main = "post-smooth segments")
terra::plot(
slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>% terra::vect()
, add = T, border = "blue", col = NA, lwd = 1
)
5.1.4.1 Area and Volume from CHM
we’ll smooth the CHM raster after masking with our remaining watershed segments and use the smoothed raster to calculate area, height, and volume for each candidate pile
# mask the chm rast to these remaining segments and smooth to match the smoothing for the segments
smooth_chm_rast <-
cloud2raster_ans$chm_rast %>%
terra::clamp(upper = max_ht_m, lower = 0, values = F) %>%
terra::mask(smooth_watershed_ans) %>%
# smooths the raster to match the smoothing in the watershed segments
terra::focal(w = ws_for_smooth, fun = "mean", na.rm = T, na.policy = "only") #only for cells that are NA
# smooth_chm_rast %>% terra::crop(aoi_temp %>% terra::vect()) %>% terra::plot()
# now mask the watershed_ans raster to only keep cells that are in the originating CHM
smooth_watershed_ans <- smooth_watershed_ans %>%
terra::mask(smooth_chm_rast)
# smooth_watershed_ans %>% terra::crop(aoi_temp %>% terra::vect()) %>% terra::plot(col = viridis::turbo(222) %>% sample(), axes = F, legend = F)
########################################################################################
## calculate raster-based area and volume
########################################################################################
# first, calculate the area of each cell
area_rast <- terra::cellSize(smooth_chm_rast)
names(area_rast) <- "area_m2"
# area_rast %>% terra::plot()
# then, multiply area by the CHM (elevation) for each cell to get a raster with cell volumes
vol_rast <- area_rast*smooth_chm_rast
names(vol_rast) <- "volume_m3"
# vol_rast %>% terra::plot()
# sum area within each segment to get the total area
area_df <- terra::zonal(x = area_rast, z = smooth_watershed_ans, fun = "sum", na.rm = T)
# sum volume within each segment to get the total volume
vol_df <- terra::zonal(x = vol_rast, z = smooth_watershed_ans, fun = "sum", na.rm = T)
# max ht within each segment to get the max ht
ht_df <- terra::zonal(x = smooth_chm_rast, z = smooth_watershed_ans, fun = "max", na.rm = T) %>%
dplyr::rename(max_height_m=2)
# let's convert the smoothed and filtered watershed-detected segments from raster to vector data
# vectors of segments
watershed_ans_poly <-
smooth_watershed_ans %>%
terra::as.polygons(round = F, aggregate = T, values = T, extent = F, na.rm = T) %>%
sf::st_as_sf() %>%
sf::st_simplify() %>%
sf::st_make_valid() %>%
dplyr::filter(sf::st_is_valid(.)) %>%
dplyr::mutate(treeID=pred_id) %>%
cloud2trees::simplify_multipolygon_crowns() %>%
dplyr::select(-treeID)
add area and volume to our vector data and do a second pass of the filtering for the expected area thresholds and irregularity
# add area and volume to our vector data
# we'll do this with a slick trick to perform multiple joins succinctly using purrr::reduce
watershed_ans_poly <-
purrr::reduce(
list(watershed_ans_poly, area_df, vol_df, ht_df)
, dplyr::left_join
, by = 'pred_id'
) %>%
dplyr::mutate(
volume_per_area = volume_m3/area_m2
) %>%
# filter out the segments that don't meet the size thresholds
dplyr::filter(
dplyr::coalesce(area_m2,0) >= min_area_m2
& dplyr::coalesce(area_m2,0) <= max_area_m2
) %>%
# do one more pass of the irregularity filtering
st_irregular_remove(pct_chull_overlap = pct_chull_overlap)
# huh?
watershed_ans_poly %>% dplyr::glimpse()
## Rows: 501
## Columns: 7
## $ pred_id <dbl> 8, 38, 55, 80, 104, 151, 175, 191, 214, 233, 300, 334,…
## $ area_m2 <dbl> 4.483586, 10.968773, 7.526019, 28.742989, 4.363490, 7.…
## $ volume_m3 <dbl> 10.053752, 21.828819, 22.057003, 83.911887, 7.180342, …
## $ max_height_m <dbl> 4.000000, 3.999429, 3.999000, 3.998000, 3.998000, 3.99…
## $ volume_per_area <dbl> 2.242346, 1.990088, 2.930766, 2.919386, 1.645550, 2.46…
## $ pct_chull <dbl> 0.8853755, 0.8968903, 0.9238329, 0.7977778, 0.9356223,…
## $ geometry <POLYGON [m]> POLYGON ((499944.4 4317700,..., POLYGON ((4993…
let’s look at the remaining piles which have now been: 1) filtered to remove irregular shapes, 2) filtered based on the expected area threshold, 3) filtered to remove non-circular shapes, 4) smoothed to aggregate remaining candidate segments, 5) filtered to remove new irregular shapes which may have been generated during the smoothing process
example of watershed detected segments as vectors (pink) filtered to remove irregular shapes, segments outside of the expected area thresholds, non-circular segments, and smoothed to better represent the continuous shape of real-world objects; compared with the ground truth piles (blue)
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = watershed_ans_poly %>%
dplyr::inner_join(
watershed_ans_poly %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
)
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::theme_void()
let’s look at the entire area again after applying this filter plotting the remaining watershed segmented piles (pink) and the actual piles (blue)
# plot it
terra::plotRGB(ortho_rast, stretch = "lin", colNA = "transparent")
terra::plot(
watershed_ans_poly %>%
terra::vect()
, add = T, border = "hotpink", col = NA, lwd = 1.2
)
terra::plot(
slash_piles_polys %>%
sf::st_transform(terra::crs(ortho_rast)) %>% terra::vect()
, add = T, border = "blue", col = NA, lwd = 1
)
5.1.5 Shape Refinement
As a final step, we’ll use the convex hull shapes of our remaining segments. This helps to smooth out the often “blocky” edges of raster-based segments, which can look like they were generated in Minecraft. Additionally, by removing any segments with overlapping convex hull shapes, we can likely reduce false detections that are actually groups of small trees or shrubs, ensuring our results represent singular slash piles.
# make a function to remove overlapping polygons from a sf data frame
st_remove_overlaps <- function(sf_data) {
if(!inherits(sf_data, "sf")){stop("must pass `sf` data object")}
# if not polygons
if( !(sf::st_is(sf_data, type = c("POLYGON", "MULTIPOLYGON")) %>% all()) ){
stop(paste0(
"`sf_data` data must be an `sf` class object with POLYGON geometry (see [sf::st_geometry_type()])"
))
}
if(nrow(sf_data)<=1){return(sf_data)}
# combine all touching polygons and keep the ones that overlap multiple from the original polygons
comb_temp <- sf_data %>%
dplyr::ungroup() %>%
sf::st_union(by_feature = F) %>%
sf::st_cast("POLYGON") %>%
sf::st_as_sf() %>%
sf::st_set_geometry("geometry") %>%
sf::st_set_crs(sf::st_crs(sf_data)) %>%
dplyr::mutate(new_id = dplyr::row_number()) %>%
dplyr::select(new_id)
# identify overlaps
overlap_temp <- comb_temp %>%
sf::st_intersection(sf_data) %>%
sf::st_drop_geometry() %>%
dplyr::group_by(new_id) %>%
dplyr::summarise(n_orig = dplyr::n()) %>%
dplyr::ungroup() %>%
dplyr::filter(n_orig>=2) %>%
dplyr::pull(new_id)
if(length(overlap_temp)==0){return(sf_data)}
# just get the overlaps
comb_temp <- comb_temp %>%
dplyr::filter(new_id %in% overlap_temp) %>%
sf::st_union()
# remove from the original data
return(sf::st_difference(sf_data,comb_temp))
}
save this filtered data as our predictions
# save this filtered data as our predictions
predicted_watershed_piles_sf <-
watershed_ans_poly %>%
sf::st_convex_hull() %>%
sf::st_simplify() %>%
sf::st_make_valid() %>%
dplyr::filter(sf::st_is_valid(.)) %>%
st_remove_overlaps()
now let’s look at our final detected segments (pink) compared with the ground truth piles (blue) in the example area we have been looking at
ortho_plt_temp +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data = predicted_watershed_piles_sf %>%
dplyr::inner_join(
predicted_watershed_piles_sf %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id)
)
, fill = "hotpink"
, alpha = 0.6
, lwd = 0
, color = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly)) %>%
sf::st_intersection(aoi_temp) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id)
) %>%
sf::st_transform(sf::st_crs(watershed_ans_poly))
, fill = NA, color = "blue", lwd = 0.4
) +
ggplot2::theme_void()
that looks like it did what we wanted it to do, though note there is a false negative prediction (omission) though no false positive predictions (commissions) in this example area
let’s look at the entire area again after applying this filter, plotting the remaining watershed segmented piles (pink) and the actual piles (blue)
# plot it
terra::plotRGB(ortho_rast, stretch = "lin", colNA = "transparent")
terra::plot(
predicted_watershed_piles_sf %>%
terra::vect()
, add = T, border = "hotpink", col = NA, lwd = 1.2
)
terra::plot(
slash_piles_polys %>%
sf::st_transform(terra::crs(ortho_rast)) %>% terra::vect()
, add = T, border = "blue", col = NA, lwd = 1
)
nice! let’s save these data
5.2 Confusion matrix evaluation
We didn’t “train” any model here, just developed a rules-based method for detecting piles from aerial point cloud data. As such, we can evaluate the methods performance on the “full” set of ground truth pile data.
let’s see how we did given the list of predictions compared to the ground truth data using the confusion matrix matching process we outlined in this earlier section.
ground_truth_prediction_match_ans <- ground_truth_prediction_match(
ground_truth = slash_piles_polys %>%
dplyr::arrange(desc(diameter)) %>%
sf::st_transform(sf::st_crs(predicted_watershed_piles_sf))
, gt_id = "pile_id"
, predictions = predicted_watershed_piles_sf
, pred_id = "pred_id"
, min_iou_pct = 0.05
)
how did our predictions do using this methodology?
# what did we get?
ground_truth_prediction_match_ans %>%
dplyr::count(match_grp) %>%
dplyr::mutate(pct = (n/sum(n)) %>% scales::percent(accuracy=0.1))
## # A tibble: 3 × 3
## match_grp n pct
## <ord> <int> <chr>
## 1 omission 27 5.7%
## 2 commission 292 61.5%
## 3 true positive 156 32.8%
let’s look at that spatially
pal_match_grp <- c(
"omission"=viridis::cividis(3)[1]
, "commission"= "gray88" #viridis::cividis(3)[2]
, "true positive"=viridis::cividis(3)[3]
)
# plot it
ortho_plt_fn(stand = sf::st_bbox(predicted_watershed_piles_sf) %>% sf::st_as_sfc(), buffer = 10) +
# ggplot2::ggplot() +
ggplot2::geom_sf(
data =
slash_piles_polys %>%
sf::st_transform(sf::st_crs(watershed_ans_poly_chull)) %>%
dplyr::left_join(
ground_truth_prediction_match_ans %>%
dplyr::select(pile_id,match_grp)
, by = "pile_id"
)
, mapping = ggplot2::aes(fill = match_grp)
, color = NA ,alpha=0.6
) +
ggplot2::geom_sf(
data =
predicted_watershed_piles_sf %>%
dplyr::left_join(
ground_truth_prediction_match_ans %>%
dplyr::select(pred_id,match_grp)
, by = "pred_id"
)
, mapping = ggplot2::aes(fill = match_grp, color = match_grp)
, alpha = 0
, lwd = 0.3
) +
ggplot2::scale_fill_manual(values = pal_match_grp, name = "") +
ggplot2::scale_color_manual(values = pal_match_grp, name = "") +
ggplot2::theme(legend.position = "top") +
ggplot2::guides(
fill = ggplot2::guide_legend(override.aes = list(color = c(NA,NA,pal_match_grp["commission"])))
, color = "none"
)
it looks like we did a really good job correctly predicting the location of actual piles (yellow) but that we incorrectly predicted pile locations at a relatively high rate. Our false positive predictions (i.e. commmissions) were frequently located in areas with quaking aspen (Populus tremuloides) which has many more short trees than the treated conifer areas. It also looks like some edge effects are influencing our results and these will likely improve when we filter for instances located within the treated stand.
let’s quickly look at the IoU values on the true positives
## iou
## Min. :0.2344
## 1st Qu.:0.5924
## Median :0.6642
## Mean :0.6555
## 3rd Qu.:0.7340
## Max. :0.8820
## NA's :319
finally, let’s check our confusion matrix
confusion_matrix_temp <- agg_ground_truth_match(ground_truth_prediction_match_ans)
confusion_matrix_scores_temp <- confusion_matrix_scores_fn(confusion_matrix_temp)
# plot
confusion_matrix_temp %>%
dplyr::select(tidyselect::ends_with("_n")) %>%
tidyr::pivot_longer(dplyr::everything()) %>%
dplyr::mutate(
presence = ifelse(name %in% c("tp_n", "fn_n"),1,0)
, estimate = ifelse(name %in% c("tp_n", "fp_n"),1,0)
) %>%
dplyr::mutate(
is_false = as.factor(ifelse(presence!=estimate,1,0))
, presence_fact = factor(presence,levels = 0:1,labels = c("Observed Absent", "Observed Present"))
, estimate_fact = factor(estimate,levels = 0:1,labels = c("Predicted Absent", "Predicted Present"))
, pct = value/sum(value)
) %>%
ggplot(mapping = aes(y = estimate_fact, x = presence_fact)) +
geom_tile(aes(fill = is_false), color = "white",alpha=0.8) +
geom_text(aes(label = scales::comma(value,accuracy=1)), vjust = 1,size = 8) +
geom_text(aes(label = scales::percent(pct,accuracy=0.1)), vjust = 3.5, size=5) +
scale_fill_manual(values= c("turquoise","tomato2")) +
scale_x_discrete(position = "top") +
labs(
y = "Predicted"
, x = "Observed"
, subtitle = paste0(
"True positive rate (recall) = "
, confusion_matrix_scores_temp$recall %>%
scales::percent(accuracy = 0.1)
, "\nPrecision (PPV) = "
, confusion_matrix_scores_temp$precision %>%
scales::percent(accuracy = 0.1)
, "\nF1-score = "
, confusion_matrix_scores_temp$f_score %>%
scales::percent(accuracy = 0.1)
)
) +
theme_light() +
theme(
legend.position = "none"
, panel.grid = element_blank()
, plot.title = element_text(size = 9)
, plot.subtitle = element_text(size = 9)
)
compared to our OBIA approach using the spectral data alone, we improved the recall rate to 85.2% using only the point cloud data without any spectral information. This means that we correctly identified actual piles more often. Our F1-Score of 49.4% also indicates better balance between correctly identifying positive cases (high recall) and minimizing incorrect positive predictions (high precision).
5.3 Pile area RMSE
let’s calculate the root mean square error (RMSE) for the pile area in meters squared for the true positive predictions. RMSE captures the ability of the approach to properly represent the 2D spatial coverage of the slash pile.
pile area RMSE is calculated as
\[ \textrm{RMSE} = \sqrt{ \frac{ \sum_{i=1}^{N} (y_{i} - \hat{y_{i}})^{2}}{N}} \]
Where \(N\) is equal to the total number of correctly matched piles, \(y_i\) is the image annotated pile area and \(\hat{y_i}\) is the predicted pile area of \(i\) in square meters.
area_rmse_temp <- ground_truth_prediction_match_ans %>%
# add area of gt
dplyr::inner_join(
slash_piles_polys %>%
sf::st_transform(sf::st_crs(predicted_watershed_piles_sf)) %>%
dplyr::mutate(gt_area = sf::st_area(.) %>% as.numeric()) %>%
sf::st_drop_geometry() %>%
dplyr::select(pile_id,gt_area)
) %>%
# add area of pred
dplyr::inner_join(
predicted_watershed_piles_sf %>%
dplyr::mutate(pred_area = sf::st_area(.) %>% as.numeric()) %>%
sf::st_drop_geometry() %>%
dplyr::select(pred_id,pred_area)
) %>%
dplyr::ungroup() %>%
dplyr::mutate(
area_diff = pred_area-gt_area
, pct_area_diff = area_diff/gt_area
)
first, we’ll look at the percent difference in area for each pile spatially
# look at this spatially
ggplot2::ggplot() +
ggplot2::geom_sf(
data = predicted_watershed_piles_sf %>%
dplyr::inner_join(
area_rmse_temp %>% dplyr::select(pred_id,area_diff,pct_area_diff)
)
, mapping = ggplot2::aes(fill = pct_area_diff)
, color = NA
) +
ggplot2::geom_sf(
data = slash_piles_polys %>%
dplyr::inner_join(
area_rmse_temp %>% dplyr::select(pile_id)
)
, fill = NA, color = "blue"
) +
ggplot2::scale_fill_stepsn(
n.breaks = 7
, colors = scales::pal_div_gradient()(seq(1, 0, length.out = 7))
, limits = c(-max(abs(fivenum(area_rmse_temp$pct_area_diff))),max(abs(fivenum(area_rmse_temp$pct_area_diff))))
, labels = scales::percent_format(accuracy = 1)
, show.limits = T
) +
ggplot2::labs(fill = "% difference area") +
ggplot2::theme_void()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.76559 -0.39823 -0.23627 -0.22975 -0.08443 0.43320
and let’s look at the distribution of the difference in area (m2) calculated as the predicted area minus the image-annotated area so that negative difference values mean our predictions were smaller and positive values mean our predictions were larger
# plot the difference of the area
area_rmse_temp %>%
ggplot2::ggplot(
mapping = ggplot2::aes(x = area_diff)
) +
ggplot2::geom_density(fill = "gray", color = NA) +
ggplot2::geom_vline(xintercept = median(area_rmse_temp$area_diff)) +
ggplot2::annotate(
"text", x = median(area_rmse_temp$area_diff), y = 0
, label = paste("median:",scales::comma(median(area_rmse_temp$area_diff),accuracy=0.1),"m2")
, hjust = 1.01, vjust = 1
) +
ggplot2::labs(y="density",x="area difference (m2)", subtitle = "Difference in area between predicted and image-annotated slash piles (m2)") +
ggplot2::theme_light() +
ggplot2::theme(axis.text.y = ggplot2::element_blank())
finally, let’s calculate area RMSE (m2)
area_rmse_temp %>%
dplyr::summarise(
area_diff_sq = sum(area_diff^2, na.rm = T)
, n = sum(ifelse(!is.na(area_diff),1,0))
) %>%
dplyr::mutate(
rmse = dplyr::case_when(
dplyr::coalesce(n,0)==0 ~ as.numeric(NA)
, T ~ sqrt(area_diff_sq/n)
)
) %>%
dplyr::select(rmse)
## # A tibble: 1 × 1
## rmse
## <dbl>
## 1 12.8
that’s not very good, we’ll have to check out if those image-annotated areas are accurate