Abstract
Driven by the growing need for Oriented Object Detection (OOD), learning from point annotations under a weakly-supervised framework has emerged as a promis ing alternative to costly and laborious manual labeling. In this paper, we discuss two deficiencies in existing point-supervised methods: inefficient utilization and poor quality of pseudo labels. Therefore, we present Point2RBox-v3. At the core are two principles: 1) Progressive Label Assignment (PLA). It dynamically es timates instance sizes in a coarse yet intelligent manner at different stages of the training process, enabling the use of label assignment methods. 2) Prior-Guided Dynamic Mask Loss (PGDM-Loss). It is an enhancement of the Voronoi Wa tershed Loss from Point2RBox-v2, which overcomes the shortcomings of Water shed in its poor performance in sparse scenes and SAM’s poor performance in dense scenes. To our knowledge, Point2RBox-v3 is the first model to employ dynamic pseudo labels for label assignment, and it creatively complements the advantages of SAM model with the watershed algorithm, which achieves excel lent performance in both sparse and dense scenes. Our solution gives competitive performance, especially in scenarios with large variations in object size or sparse object occurrences: 66.09% 56.86% 41.28% 46.40% 19.60% 45.96% on DOTA v1.0/DOTA-v1.5/DOTA-v2.0/DIOR/STAR/RSAR.
Visual comparisons with the state-of-the-art method Point2RBox-v2. Radar plot comparing the performance of our method with 10 other state-of-the-art methods across 6 benchmark datasets.
The training pipeline of Point2RBox-v3. Progressive Label Assign utilizes scale info from pseudo label to dynamically assign gt point. Prior-Guided Dynamic Mask provides enhanced mask supervision information. Lothers are the loss functions inherited from Point2RBox-v2.
The process of Progressive Label Assignment (PLA). Points of different colors represent those assigned to different feature pyramid levels P2, P3, P4, P5, P6 for label assignment. As training progresses, the label assignment strategy evolves. It begins with using fixed Watershed regions in the early stages and transitions to leveraging dynamic, network-generated dimensions in the middle to-late phases. This evolution guides ground truth points to be assigned to more suitable FPN levels over time.
Comparison between watershed and SAM masks on DOTA-v1.0. The red patches with yellow edges represent the masks generated by the model. The processing result in the top-right corner shows significant over-segmentation by SAM, which causes the masks to visually merge into a large, incorrect region.
Detection performance of all categories and the mean AP50 on the DOTA-v1.0.
AP50 comparisons on the DOTA-v1.0/1.5/2.0,DIOR,STAR,and RSAR datasets.
AP50 comparison on DOTA-v1.0/v1.5 under the partial weakly-supervised setting.“10%” means using only 10% point-labeled training data, the rest unlabeled.
@article{zhang2025point2rbox,
title={Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization},
author={Zhang, Teng and Fan, Ziqian and Liu, Mingxin and Zhang, Xin and Lu, Xudong and Li, Wentong and Zhou, Yue and Yu, Yi and Li, Xiang and Yan, Junchi and others},
journal={arXiv preprint arXiv:2509.26281},
year={2025}
}