Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering

Damien Robert ^1,2, Bruno Vallet ¹, Loic Landrieu ¹,

⁴

Paper Code

3DV'24 Oral

Abstract

We introduce a highly efficient method for panoptic segmentation of large 3D point clouds by redefining this task as a scalable graph clustering problem. This approach can be trained using only local auxiliary tasks, thereby eliminating the resource-intensive instance-matching step during training. Moreover, our formulation can easily be adapted to the superpoint paradigm 🧩, further increasing its efficiency. This allows our model to process scenes with millions of points and thousands of objects in a single inferenceon one GPU ⚡. Our method, called SuperCluster, achieves a new state-of-the-art panoptic segmentation performance for two indoor scanning datasets: 50.1 PQ (+7.8) for S3DIS Area 5, and 58.7 PQ (+25.2) for ScanNetV2. We also set the first state-of-the-art for two large-scale mobile mapping benchmarks: KITTI-360 and DALES. With only 209k parameters 🦋, our model is over 30× smaller than the best-competing method and trains up to 15× faster ⚡. Our code and pretrained models are available on GitHub.

DALES | 7.8 km² | 18M points | 1727 objects | 10.1 seconds on an A40 GPU

We present SuperCluster, an efficient approach for large-scale 3D panoptic segmentation. Our approach is capable of processing 3D scenes of unprecedented scale at once on a single GPU.

Motivation 🤔

Existing panoptic segmentation methods do not scale to large 3D scenes due to several limitations:

⚙️ Costly matching operation at each training step

🔒 Fixed number of predictions

🎭 Each prediction mask has the size of the scene

🐘 Large backbone

This project proposes a scalable approach for addressing 3D panoptic segmentation. To this end, we formulate panoptic segmentation as the solution of a superpoint graph clustering problem.

Intuition 💡

Panoptic segmentation

Superpoint partition

Take the above superpoint partition and the desired panoptic segmentation. Instead of learning to classify and assign an instance to each individual point, we propose to learn to group superpoints together.

Intuitively, we want to group adjacent superpoints together if they are spatially close, have the same class, and are not separated by a border. We translate these goals into the following (superpoint) graph optimization problem.

Our idea is to train a model to predict the input parameters for this optimization problem, without explicitly asking the model to solve the panoptic segmentation task. If the model does its job, we should only need to solve the graph clustering problem at inference time, circumventing several limitations of existing panoptic segmentation methods.

Building on our previous Superpoint Transformer work, we already have the building blocks for building a graph of adjacent superpoints and train a model to classify them. In this work, we introduce a new head to Superpoint Transformer that learns to predict an affinity for each edge between two adjacent superpoints, indicating whether they belong to the same instance.

Interestingly, this SuperCluster model is only trained with local per-node and per-edge objectives. As previously mentioned, we do not need to explicitly compute the panoptic segmentation at training time. This bypasses the need for a matching step between predicted and target instance for computing losses and metrics. At inference time, we use a fast algorithm that finds an approximate solution to the (small) graph optimization problem, yielding the final panoptic segmentation prediction.

Results 📊

SuperCluster achieves state-of-the-art results for 3D panoptic segmentation on large-scale indoor datasets such as S3DIS and ScanNetV2, and sets a first state-of-the-art on large-scale outdoor datasets such as DALES and KITTI-360.

📊 SOTA on S3DIS 6-Fold (55.9 PQ)

📊 SOTA on S3DIS Area 5 (50.1 PQ)

📊 SOTA on ScanNet Val (58.7 PQ)

📊 First on KITTI-360 Val (48.3 PQ)

📊 First on DALES (61.2 PQ)

🦋 212k parameters (PointGroup ÷ 37)

⚡ S3DIS training in 4 GPU-hours

⚡ 7.8km² tile of 18M points in 10.1s on 1 GPU

Below are some interactive examples of SuperCluster predictions on diverse datasets.

Sample scene from the DALES dataset.
Position RGB colors the points based on their 3D position. Semantic and Semantic Pred. show semantic segmentation labels and predictions. Panoptic and Panoptic Pred. show panoptic segmentation labels and predictions. Level 1 and Level 2 show the superpoint partitions.

Sample scene from the S3DIS dataset.

Sample scene from the KITTI-360 dataset.

Sample scene from the ScanNetV2 dataset.

Citation 💬


@inproceedings{robert2024scalable,
    title={{Scalable 3D Panoptic Segmentation as Superpoint Graph Clustering}}},
    author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
    booktitle={Proceedings of the IEEE International Conference on 3D Vision},
    year={2024},
}

Acknowledgments 🙏

This work was funded by ENGIE Lab CRIGEN and carried out in the LASTIG research unit of Univ. Gustave Eiffel. It was supported by ANR project READY3D ANR-19-CE23-0007, and was granted access to the HPC resources of IDRIS under the allocation AD011013388R1 made by GENCI.