🧭 GPS: Geometric Primary Structure for Articulated Parts Perception in Robot Manipulation

Shanghai Jiao Tong University
We introduce a VR-based data collection system to annotate Geometric Primary Structure(GPS) of articulated objects in real-world, which balances scalability and data quality.

Motivation

We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed two main directions: one line of work uses pose-based representation, which requires high manual cost; in parallel, affordance-based methods extract future object motion from point tracking without additional manual efforts, but suffer from low-quality data. In this paper, we propose a new representation of articulated parts, Geometric Primary Structure (PGS), an abstraction of the part geometry structure to balance scalability and quality. For efficient and scalable data collection, PGS is integrated with a portable Virtual Reality (VR) device and requires only one minute to annotate one object sequence. This direct human annotation provides higher quality than the estimated affordance.

Dataset & Efficiency

Dataset and efficiency figure
Efficient annotation: recording one annotated manipulation video for an object takes less than one minute. Using our portable and efficient VR-GPS system, we collect 41K frames for 234 objects across six part classes.

Performance

Performance figure
We train a generalizable GPS model with a single RGB-D object image as input. For object manipulation, we deploy a heuristic policy based on GPS prediction. Without any in-domain fine-tuning, our method achieves a 73% success rate, covering 270 initial states for 9 objects.

Robot Demo

For object manipulation, we deploy a heuristic policy based on PGS prediction.
Without any in-domain fine-tuning, our method achieves a 73% success rate, covering 270 initial states for 9 objects.

Citation

@InProceedings{Wu_2026_CVPR,
    author    = {Wu, Xiaoqian and Guo, Yejie and Chen, Xiaoyang and Yang, Lixin and Lu, Cewu and Li, Yong-Lu},
    title     = {Revisiting Articulated Parts Perception in Robot Manipulation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
    month     = {June},
    year      = {2026},
    pages     = {1368-1377}
}