2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation
Metadata only
Date
2023-11-27Type
- Working Paper
ETH Bibliography
yes
Altmetrics
Abstract
As 3D perception problems grow in popularity and the need for large-scale labeled datasets for LiDAR semantic segmentation increase, new methods arise that aim to reduce the necessity for dense annotations by employing weakly-supervised training. However these methods continue to show weak boundary estimation and high false negative rates for small objects and distant sparse regions. We argue that such weaknesses can be compensated by using RGB images which provide a denser representation of the scene. We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network. We further utilize a one-way contrastive learning scheme alongside a novel mixing strategy called FOVMix, to combat the horizontal field-of-view mismatch between the two sensors and enhance the effects of image guidance. IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points, while introducing no additional annotation burden or computational/memory cost during inference. Furthermore, we show that our contributions also prove effective for semi-supervised training, where IGNet claims state-of-the-art results on both ScribbleKITTI and SemanticKITTI. Show more
Publication status
publishedJournal / series
arXivPages / Article No.
Publisher
Cornell UniversityEdition / version
v1Organisational unit
03514 - Van Gool, Luc / Van Gool, Luc
Related publications and datasets
Is referenced by: https://doi.org/10.3929/ethz-b-000668173
More
Show all metadata
ETH Bibliography
yes
Altmetrics