We chose three architectures: YOLOv5s, YOLOv5m, and YOLOv5l. Backbone adoptsthe Cross Stage Partial Network (CSPNet) [39]. Before entering the backbone network, theYOLOv5 algorithm adds the Focus module and performs downsampling by slicing thepicture. The neck is in the form of a Feature Pyramid Network (FPN) plus a Path AggregationNetwork (PAN) and combines three different scales of feature information [40,41].Then, it uses the Non-Maximum Suppression (NMS) method to remove redundant predictionbounding boxes (Figure