We introduce a highly efficient method for panoptic segmentation of large 3D point clouds by redefining this task as a scalable graph clustering problem. This approach can be trained using only local auxiliary tasks, thereby eliminating the resource-intensive instance-matching step during training. Moreover, our formulation can easily be adapted to the superpoint paradigm ๐งฉ, further increasing its efficiency. This allows our model to process scenes with millions of points and thousands of objects in a single inferenceon one GPU โก. Our method, called SuperCluster, achieves a new state-of-the-art panoptic segmentation performance for two indoor scanning datasets: 50.1 PQ (+7.8) for S3DIS Area 5, and 58.7 PQ (+25.2) for ScanNetV2. We also set the first state-of-the-art for two large-scale mobile mapping benchmarks: KITTI-360 and DALES. With only 209k parameters ๐ฆ, our model is over 30ร smaller than the best-competing method and trains up to 15ร faster โก. Our code and pretrained models are available on GitHub.
We present SuperCluster, an efficient approach for large-scale 3D panoptic segmentation. Our approach is capable of processing 3D scenes of unprecedented scale at once on a single GPU.
Existing panoptic segmentation methods do not scale to large 3D scenes due to several limitations:
| โ๏ธ Costly matching operation at each training step |
| ๐ Fixed number of predictions |
| ๐ญ Each prediction mask has the size of the scene |
| ๐ Large backbone |
This project proposes a scalable approach for addressing 3D panoptic segmentation. To this end, we formulate panoptic segmentation as the solution of a superpoint graph clustering problem.
Panoptic segmentation
Take the above superpoint partition and the desired panoptic segmentation.
Instead of learning to classify and assign an instance to each individual
point, we propose to learn to group superpoints together.
Intuitively, we want to group adjacent superpoints together if they are
spatially close, have the
same class, and are
not separated by a border.
We translate these goals into the following (superpoint) graph optimization
problem.
Our idea is to train a model to predict the input parameters for
this optimization problem, without explicitly asking the model to solve
the panoptic segmentation task. If the model does its job, we should only
need to solve the graph clustering problem at inference time, circumventing
several limitations of existing panoptic segmentation methods.
Building on our previous
Superpoint Transformer
work, we already have the building blocks for building a
graph of adjacent superpoints
and train a model to
classify them.
In this work, we introduce a new head to Superpoint Transformer
that learns to predict an affinity
for each edge between two adjacent superpoints, indicating whether they belong
to the same instance.
Interestingly, this SuperCluster model is only trained with local per-node
and per-edge objectives. As previously mentioned, we do not need to explicitly
compute the panoptic segmentation at training time.
This bypasses the need for a matching step between predicted and target instance for
computing losses and metrics.
At inference time, we use a fast algorithm that finds an approximate solution to
the (small) graph optimization problem, yielding the final panoptic segmentation prediction.
SuperCluster achieves state-of-the-art results for 3D panoptic segmentation on large-scale indoor datasets such as S3DIS and ScanNetV2, and sets a first state-of-the-art on large-scale outdoor datasets such as DALES and KITTI-360.
| ๐ SOTA on S3DIS 6-Fold (55.9 PQ) |
| ๐ SOTA on S3DIS Area 5 (50.1 PQ) |
| ๐ SOTA on ScanNet Val (58.7 PQ) |
| ๐ First on KITTI-360 Val (48.3 PQ) |
| ๐ First on DALES (61.2 PQ) |
| ๐ฆ 212k parameters (PointGroup รท 37) |
| โก S3DIS training in 4 GPU-hours |
| โก 7.8kmยฒ tile of 18M points in 10.1s on 1 GPU |
Below are some interactive examples of SuperCluster predictions on diverse datasets.
Sample scene from the DALES dataset.
Position RGB colors the points based on their 3D position.
Semantic and Semantic Pred. show
semantic segmentation labels and predictions.
Panoptic and Panoptic Pred. show
panoptic segmentation labels and predictions.
Level 1 and Level 2 show the
superpoint partitions.
Sample scene from the S3DIS dataset.
Sample scene from the KITTI-360 dataset.
Sample scene from the ScanNetV2 dataset.
@inproceedings{robert2024scalable,
title={{Scalable 3D Panoptic Segmentation as Superpoint Graph Clustering}}},
author={Robert, Damien and Raguet, Hugo and Landrieu, Loic},
booktitle={Proceedings of the IEEE International Conference on 3D Vision},
year={2024},
}
This work was funded by ENGIE Lab CRIGEN and carried out in the LASTIG research unit of Univ. Gustave Eiffel. It was supported by ANR project READY3D ANR-19-CE23-0007, and was granted access to the HPC resources of IDRIS under the allocation AD011013388R1 made by GENCI.