Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference
Liang, Cailei ORCID: https://orcid.org/0000-0002-8691-836X; Cappelletto, Jose
ORCID: https://orcid.org/0000-0002-8891-6915; Massot-Campos, Miquel; Bodenmann, Adrian
ORCID: https://orcid.org/0000-0002-3195-0602; Huvenne, Veerle AI
ORCID: https://orcid.org/0000-0001-7135-6360; Wardell, Catherine; Bett, Brian J.; Newborough, Darryl; Thornton, Blair.
2025
Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference.
The International Journal of Robotics Research.
10.1177/02783649251343640
Preview |
Text
© The Author(s) 2025. liang-et-al-2025-self-supervised-learning-with-multimodal-remote-sensed-maps-for-seafloor-visual-class-inference.pdf - Published Version Available under License Creative Commons Attribution Non-commercial 4.0. Download (3MB) | Preview |
Abstract/Summary
Seafloor surveys often gather multiple modes of remote sensed mapping and sampling data to infer kilo- to mega-hectare scale seafloor habitat distributions. However, efforts to extract information from multimodal data are complicated by inconsistencies between measurement modes (e.g., resolution, positional offsets, geometric distortions) and different acquisition periods for dynamically changing environments. In this study, we investigate the use of location information during multimodal feature learning and its impact on habitat classification. Experiments on multimodal datasets gathered from three Marine Protected Areas (MPAs) showed improved robustness and performance when using location-based regularisation terms compared to equivalent autoencoder-based and contrastive self-supervised feature learners. Location-guiding improved F1 scores by 7.7% for autoencoder-based and 28.8% for contrastive feature learners averaged across 78 experiments on datasets spanning three distinct sites and 18 data modes. Location-guiding enhances performance when combining multimodal data, increasing F1 scores by an average of 8.8% and 37.8% compared to the best-performing individual mode being combined for autoencoder-based and contrastive self-supervised models, respectively. Performance gains are maintained over a large range of location-guiding distance hyperparameters, where improvements of 5.3% and 29.4% are achieved on average over an order-of-magnitude range of hyperparameters for the autoencoder and contrastive learners, respectively, both comparing favourably with optimally tuned conditions. Location-guiding also exhibits robustness to position inconsistencies between combined data modes, still achieving an average of 3.0% and 30.4% increase in performance compared to equivalent feature learners without location regularisation when position offsets of up to 10 m are artificially introduced to the remote sensed data. Our results show that the classifier used to delineate the learned feature spaces has less impact on performance than the feature learner, with probabilistic classifiers averaging 3.4% higher F1 scores than non-probabilistic classifiers.
Item Type: | Publication - Article |
---|---|
Digital Object Identifier (DOI): | 10.1177/02783649251343640 |
ISSN: | 0278-3649 |
Additional Keywords: | Multimodal feature learning, location-based regularisation, self-supervision, seafloor mapping, habitat classification |
Date made live: | 06 Jul 2025 18:29 +0 (UTC) |
URI: | https://nora.nerc.ac.uk/id/eprint/539791 |
Actions (login required)
![]() |
View Item |
Document Downloads
Downloads for past 30 days
Downloads per month over past year