Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference

Liang, Cailei; Cappelletto, Jose; Massot-Campos, Miquel; Bodenmann, Adrian; Huvenne, Veerle AI; Wardell, Catherine; Bett, Brian J.; Newborough, Darryl; Thornton, Blair

Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference

Liang, Cailei ORCID: https://orcid.org/0000-0002-8691-836X; Cappelletto, Jose ORCID: https://orcid.org/0000-0002-8891-6915; Massot-Campos, Miquel; Bodenmann, Adrian ORCID: https://orcid.org/0000-0002-3195-0602; Huvenne, Veerle AI ORCID: https://orcid.org/0000-0001-7135-6360; Wardell, Catherine; Bett, Brian J.; Newborough, Darryl; Thornton, Blair. 2025 Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference. The International Journal of Robotics Research. 10.1177/02783649251343640

Before downloading, please read NORA policies.

[thumbnail of liang-et-al-2025-self-supervised-learning-with-multimodal-remote-sensed-maps-for-seafloor-visual-class-inference.pdf]

Preview

Text
© The Author(s) 2025.
liang-et-al-2025-self-supervised-learning-with-multimodal-remote-sensed-maps-for-seafloor-visual-class-inference.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial 4.0.
Download (3MB) | Preview

Official URL: https://doi.org/10.1177/02783649251343640

Abstract/Summary

Seafloor surveys often gather multiple modes of remote sensed mapping and sampling data to infer kilo- to mega-hectare scale seafloor habitat distributions. However, efforts to extract information from multimodal data are complicated by inconsistencies between measurement modes (e.g., resolution, positional offsets, geometric distortions) and different acquisition periods for dynamically changing environments. In this study, we investigate the use of location information during multimodal feature learning and its impact on habitat classification. Experiments on multimodal datasets gathered from three Marine Protected Areas (MPAs) showed improved robustness and performance when using location-based regularisation terms compared to equivalent autoencoder-based and contrastive self-supervised feature learners. Location-guiding improved F1 scores by 7.7% for autoencoder-based and 28.8% for contrastive feature learners averaged across 78 experiments on datasets spanning three distinct sites and 18 data modes. Location-guiding enhances performance when combining multimodal data, increasing F1 scores by an average of 8.8% and 37.8% compared to the best-performing individual mode being combined for autoencoder-based and contrastive self-supervised models, respectively. Performance gains are maintained over a large range of location-guiding distance hyperparameters, where improvements of 5.3% and 29.4% are achieved on average over an order-of-magnitude range of hyperparameters for the autoencoder and contrastive learners, respectively, both comparing favourably with optimally tuned conditions. Location-guiding also exhibits robustness to position inconsistencies between combined data modes, still achieving an average of 3.0% and 30.4% increase in performance compared to equivalent feature learners without location regularisation when position offsets of up to 10 m are artificially introduced to the remote sensed data. Our results show that the classifier used to delineate the learned feature spaces has less impact on performance than the feature learner, with probabilistic classifiers averaging 3.4% higher F1 scores than non-probabilistic classifiers.

Item Type:

Publication - Article

Digital Object Identifier (DOI):

10.1177/02783649251343640

ISSN:

0278-3649

Additional Keywords:

Multimodal feature learning, location-based regularisation, self-supervision, seafloor mapping, habitat classification

Date made live:

06 Jul 2025 18:29 +0 (UTC)

URI:

https://nora.nerc.ac.uk/id/eprint/539791