MaCon: a generic self-supervised framework for unsupervised multimodal change detection

Wang, Jian; Yan, Li; Yang, Jianbing; Xie, Hong; Yuan, Qiangqiang; Wei, Pengcheng; Gao, Zhao; Zhang, Ce; Atkinson, Peter M.

MaCon: a generic self-supervised framework for unsupervised multimodal change detection

Wang, Jian ORCID: https://orcid.org/0000-0001-6695-1472; Yan, Li ORCID: https://orcid.org/0000-0002-5507-810X; Yang, Jianbing; Xie, Hong ORCID: https://orcid.org/0000-0002-0956-0421; Yuan, Qiangqiang ORCID: https://orcid.org/0000-0001-7140-2224; Wei, Pengcheng ORCID: https://orcid.org/0000-0003-4265-8698; Gao, Zhao; Zhang, Ce ORCID: https://orcid.org/0000-0001-5100-3584; Atkinson, Peter M. ORCID: https://orcid.org/0000-0002-5489-6880. 2025 MaCon: a generic self-supervised framework for unsupervised multimodal change detection. IEEE Transactions on Image Processing, 34. 1485-1500. 10.1109/TIP.2025.3542276

Full text not available from this repository.

Official URL: https://doi.org/10.1109/TIP.2025.3542276

Abstract/Summary

Change detection(CD) is important for Earth observation, emergency response and time-series understanding. Recently, data availability in various modalities has increased rapidly, and multimodal change detection (MCD) is gaining prominence. Given the scarcity of datasets and labels for MCD, unsupervised approaches are more practical for MCD. However, previous methods typically either merely reduce the gap between multimodal data through transformation or feed the original multimodal data directly into the discriminant network for difference extraction. The former faces challenges in extracting precise difference features. The latter contains the pronounced intrinsic distinction between the original multimodal data; direct extraction and comparison of features usually introduce significant noise, thereby compromising the quality of the resultant difference image. In this article, we proposed the MaCon framework to synergistically distill the common and discrepancy representations. The MaCon framework unifies mask reconstruction (MR) and contrastive learning (CL) self-supervised paradigms, where the MR serves the purpose of transformation while CL focuses on discrimination. Moreover, we presented an optimal sampling strategy in the CL architecture, enabling the CL subnetwork to extract more distinguishable discrepancy representations. Furthermore, we developed an effective silent attention mechanism that not only enhances contrast in output representations but stabilizes the training. Experimental results on both multimodal and monomodal datasets demonstrate that the MaCon framework effectively distills the intrinsic common representations between varied modalities and manifests state-of-the-art performance across both multimodal and monomodal CD. Such findings imply that the MaCon possesses the potential to serve as a unified framework in the CD and relevant fields. Source code will be publicly available once the article is accepted.

Item Type:

Publication - Article

Digital Object Identifier (DOI):

10.1109/TIP.2025.3542276

UKCEH and CEH Sections/Science Areas:

UKCEH Fellows

ISSN:

1057-7149

Additional Keywords:

self-supervised learning, mask reconstruction, contrastive learning, multimodal data, change detection, unsupervised learning, remote sensing, Earth observation

NORA Subject Terms:

Earth Sciences
Electronics, Engineering and Technology
Data and Information

Date made live:

17 Mar 2025 11:59 +0 (UTC)

URI:

https://nora.nerc.ac.uk/id/eprint/539094