The network structure consists of three main parts: backbone, FPN, and outputnetwork. The backbone network used in this experiment was ResNet50 and ResNet101 [43],which could be divided into 5 parts. It adds FPN for multi-scale feature extraction. Theoutput network consists of Heads, each of which contains a shared part and 3 branches.Classification predicts the confidence of the existence of the target at each sampling pointon the feature map, center-ness predicts the distance between the sampling point and thecenter of the target, and regression predicts the distance between the sampling point andthe real box of the original image (Figure 3).