Efficient 3D Semantic Segmentation with Superpoint Transformer

ICCV 2023

Damien Robert^1,2, Hugo Raguet³, Loïc Landrieu^2,4

¹CSAI, ENGIE Lab CRIGEN, France
²LASTIG, IGN, ENSG, Univ. Gustave Eiffel, France
³INSA Centre Val-de-Loire Univ de Tours, LIFAT, France
⁴LIGM, Ecole des Ponts, Univ. Gustave Eiffel, France

Code

Paper

Abstract¶

We introduce a novel superpoint-based transformer 🤖 architecture for efficient ⚡ semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure 🧩, which makes our preprocessing 7 times faster than existing superpoint-based approaches. Additionally, we leverage a self-attention mechanism to capture the relationships between superpoints at multiple scales, leading to state-of-the-art performance on three challenging benchmark datasets: S3DIS (76.0% mIoU 6-fold), KITTI360 (63.5% on Val), and DALES (79.6%). With only 212k parameters 🦋, our approach is up to 200 times more compact than other state-of-the-art models while maintaining similar performance. Furthermore, our model can be trained on a single GPU in 3 hours ⚡ for a fold of the S3DIS dataset, which is 7× to 70× fewer GPU-hours than the best-performing methods. Our code and models are accessible at github.com/drprojects/superpoint_transformer

Motivation¶

This project aims at fusing the best of two worlds:

Transformer-based models 🤖
( Point Transformer, Stratified Transformer, ...)

Superpoint-based models 🧩
( SPG, SSP+SPG, ...)

✅ Expressivity
✅ Capture long-range interactions
❌ Compute effort guided by arbitrary point or voxel samplings
❌ Loads of parameters
❌ Long training

✅ Much smaller problem complexity
✅ Geometry-guided compute effort allocation
✅ Fast training
✅ Lightweight model
❌ Long preprocessing time
❌ GNN's expressivity and long-range interactions
❌ No hierarchical reasoning

To this end, we introduce Superpoint Transformer 🧩🤖 :

✅ Much smaller problem complexity
✅ Geometry-guided compute effort allocation
✅ Fast training
✅ Lightweight model
❌ ➡ ✅ Fast parallelized preprocessing
❌ ➡ ✅ Transformer’s expressivity and long-range interactions
❌ ➡ ✅ Multi-scale reasoning on a hierarchical partition 🧩

These changes allow SPT to match -or surpass- the performance of SOTA models with much fewer parameters and in a fraction of their training and inference time. Here are some SPT-facts:

📊 SOTA on S3DIS 6-Fold (76.0 mIoU)
📊 SOTA on KITTI-360 Val (63.5 mIoU)
📊 Near SOTA on DALES (79.6 mIoU)
🦋 212k parameters (PointNeXt ÷ 200, Stratified Transformer ÷ 40)
⚡ S3DIS training in 3 GPU-hours (PointNeXt ÷ 7, Stratified Transformer ÷ 70)
⚡ Preprocessing x7 faster than SPG

The above interactive visualization will help you get a sense of what our hierarchical partition structure looks like.

Our model architecture replaces SPG's Graph Neural Networks with Transformer self-attention blocks, reasoning on a graph connecting ajacent superpoints.

Visualizing the model size vs performance of 3D semantic segmentation methods on S3DIS 6-Fold, we observe that small, tailored models can offer a more flexible and sustainable alternative to large, generic models for 3D learning.

With training times of a few hours on a single GPU, SPT allows practitioners to easily customize the models to their specific needs, enhancing the overall usability and accessibility of 3D learning.

BibTex¶

In case you use all or part of this project, please cite the following paper:

	                    
@inproceedings{robert2023spt,
  title={Efficient 3D Semantic Segmentation with Superpoint Transformer},
  author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023},
url = {\url{https://github.com/drprojects/superpoint_transformer}}

}

Acknowledgments 🙏¶

This work was funded by ENGIE Lab CRIGEN and carried out in the LASTIG research unit of Univ. Gustave Eiffel. It was supported by ANR project READY3D ANR-19-CE23-0007, and was granted access to the HPC resources of IDRIS under the allocation AD011013388R1 made by GENCI.

We thank Bruno Vallet, Romain Loiseau, and Ewelina Rupnik for inspiring discussions and valuable feedback.