We used the MMDetection framework for model training based on PyTorch [42,46].The optimizer was Stochastic Gradient Descent (SGD), the momentum was set to 0.9, andthe weight decay was set to 0.0001. The total number of epochs was 30. The learning ratewas 1 × 10−2 and the batch size was 2. For joint training, the learning rate was 1 × 10−2 andthe batch size was 4. In total, 500 steps were used for the warm-up. The learning ratewould decrease linearly according to the epoch, and the decrease ratio was 10, in epoch16 and epoch 19, respectively. Experiments were run on RTX 3090 GPU