Benchmarking and inter-comparison of Sentinel-1 InSAR velocities and time series

Different InSAR algorithms and methods produce velocities and times series that are not identical, even using the same data for the same area. This inconsistency can cause confusion and be a barrier to uptake and widespread use of the data in the commercial sector. With the widespread availability of Sentinel-1 SAR data and a suite of new algorithms in the commercial and academic sectors, it is timely to develop a method for comparison of different results. In this study, we focus on developing and testing an independent and robust methodology for assessment of different InSAR processing results. Our proposed method is adapted from the Terrafirma Process Validation project; we compare geocoded line-of-sight velocities and time series, density and coverage, as well as some qualitative metrics. We use Sentinel-1 data from an area in Glasgow (UK) processed using 4 different approaches modified RapidSAR, SqueeSAR, GAMMA-IPTA and conventional StaMPS. The main areas of ground motion are detected using all approaches, with the average standard deviation of velocity differences for all intercomparison pairs in all polygons equal to 1.1 mm/yr. Sentinel-1 InSAR therefore provides comparable results that are independent of processing approaches. However, there are considerable differences in some aspects of the results, in particular in their density and coverage. We discuss the reasons for these differences and suggest a framework for validation that could be used in future national or pan-national ground motion services.


Introduction
Interferometric Synthetic Aperture Radar (InSAR) is an Earth Observation technique based on radar satellite imagery that can measure surface deformation with millimetre level precision (Bamler and Hartl, 1998;Gabriel et al., 1989;Hanssen, 2001). In order to improve the performance in extracting deformation signals from noisy InSAR data, many different InSAR time-series approaches have been developed (Osmanoglu et al., 2016;Pepe and Calo, 2017).
Persistent Scatterer InSAR (PSI) exploits strong, stable scatterers that display coherent scattering behaviour over time to overcome temporal decorrelation, which restricts the use of conventional InSAR (Ferretti et al., 2000(Ferretti et al., , 2001. By a combination of spatial and temporal filtering, the contribution of atmospheric errors can also be reduced significantly. The original PSI algorithms work where there are large number of strong scatterers (often man-made structures) with a deformation behaviour close to the assumed linear velocity model, although more sophisticated versions of the algorithm, capable of dealing with PS affected by nonlinear motion, have also been developed. The Stanford Method for Persistent Scatterers (StaMPS) focusses on improving the number of measurement points in rural areas, and on providing an open source algorithm (Hooper et al., 2007). The major difference between StaMPS and the traditional PS approach is that StaMPS uses the spatial correlation of phase for identifying PS pixels and does not use phase triangulation which forms a spatial network connecting all PS pixels (Hooper et al., 2007). All PSI algorithms use a single-master stack of differential interferograms to process PS pixels (Hooper et al., 2012). For satellites such as ERS-1/2 and Envisat, with a relatively large orbital tube and hence a large range of perpendicular baselines in individual interferograms, only point scatterers remain coherent in a single master stack.
Distributed Scatterers (DSs) can also be used for extracting velocities and times series from InSAR. These contain coherent information when temporal and orbital baselines are relatively short/small but can be incoherent in interferograms with relatively long time intervals and large perpendicular baselines (different viewing geometries). Small baseline (SB) approaches build time series by connecting interferograms with small temporal and perpendicular baselines (Berardino et al., 2002;Schmidt and Bürgmann, 2003). By combining PSI and SB approaches, hybrid approaches can increase the measurement density (Hooper, 2008;Lanari et al., 2004). However, there may still be useful interferometric measurements within the stack of SAR data that are excluded from a hybrid PS/SB analysis, particularly in rural areas where pixels may have intermittent coherence. The multi-interferogram method (Biggs et al., 2007) implemented in PiRate (Wang et al., 2009) and ISBAS (Intermittent Small Baseline Subsets) method (Sowter et al., 2013) are based on a modification of the SBAS method (Berardino et al., 2002) and exploit intermittent coherence in order to obtain average velocities for a greater number of DS. However, time series approaches that only use short-timespan, multi-looked interferograms suffer from potential biases (Ansari et al., 2020).
SqueeSAR forms all possible interferograms, selects neighbouring pixels with similar scattering mechanisms, known as statistically homogeneous pixels (SHP), and provides a synergistic analysis of PS and DS without the need for significant changes to the traditional PSI processing chain (Ferretti et al., 2011;Fornaro et al., 2015;Monti-Guarnieri and Tebaldini, 2008). It improves the density, coverage and quality of measurement points with respect to conventional PSI, over non-urban areas at the cost of a large increase in processing time. In contrast, RapidSAR (Rapid Time Series InSAR) was designed to allow fast ingestion of new images and limited computational load (Spaans and Hooper, 2016). This method identifies SHP pixels (named siblings) with a more computationally efficient algorithm than SqueeSAR and does not use phase triangulation. RapidSAR enables coherence in newly formed interferograms to be calculated quicklythe results can be used in a modified SBAS approach to produce time series and velocities (Spaans and Hooper, 2016).
Using all possible interferograms at full SAR resolution for Sentinel-1 or other wide-swath SAR missions is challenging due to the large data volume. The Sequential Estimator approach has therefore been proposed to form interferograms efficiently for long InSAR time series by processing the data in small batches and forming compressed artificial interferograms from each (Ansari et al., 2017). Alternatively, FRInGE (Full-Resolution InSAR time series using Generalised Eigenvectors) generates a full coherence matrix efficiently and selects both PS and DS pixels at full resolution.
Different InSAR time series methods use different strategies to extract information from SAR images. They are also different in terms of dealing with the contributions of various phenomena impacting the interferometric phase including long wavelength trends, atmospheric phase screens (APS) and nonlinear deformation. Moreover, different strategies can be applied to remove these terms, e.g. spatial and temporal filter size for removing APS, size of spatial scale to de-trend and/or type of nonlinear deformation model (i.e. periodic, exponential). Therefore, products of different InSAR algorithms are not identical and can be dissimilar in terms of quantitative and qualitative metrics. In order to assess the quality of InSAR data for a specific case study, there is a requirement to evaluate the consistency of available InSAR data produced by different time series approaches. Up to now, several different methods to compare InSAR products have been presented, all of which have limitations and none of which have used Sentinel-1 images, (see Section 2). Therefore, there is a need to present an inter-comparison method which addresses the limitations of previous studies including the application to Sentinel-1 InSAR data.
Sentinel-1 is a two-satellite imaging radar constellation, providing global C-band imagery designed to supply the data needs of Europe's Copernicus programme. Sentinel-1A & -1B offer a six-day revisit cycle and unprecedented coverage of Europe, with 12-day imagery acquired globally. Sentinel-1 uses the Terrain Observation by Progressive Scan (TOPS) mode, sweeping the beam in the fight direction, and is designed primarily for InSAR applications (De Zan and Monti Guarnieri, 2006). Raw data acquired by Sentinel-1 are freely available, addressing the limitation of cost and/or lack of data and providing research and commercial opportunities increasingly, Sentinel-1 data are being used to form nationwide/international ground motion maps. All Sentinel-1 imagery is acquired within a narrow orbital tube, maximizing interferometric coherence. To exploit Sentinel-1 data, a European Ground Motion Service (EU-GMS) is under development, by the European Environment Agency (https://land.copernicus.eu/user-corner/tech nical-library/european-ground-motion-service), to provide consistent, regular, standardised, harmonised and reliable information on ground motion over Europe and across national borders, with millimetre accuracy (Crosetto et al., 2020). The ground motion results will be derived from time series analyses of Sentinel-1 data, most likely using different PS and DS InSAR approaches. Several Copernicus Participating States including Germany, Italy, Norway, Spain, Denmark, and France have already or are in the process of implementing national ground motion services. These services will benefit from EU-GMS by standardising national service components and encouraging the use of deformation data by both public and commercial users. To make the outputs useful for operational applications, quality assessment of ground motion maps is a fundamental priority, and an important aspect of quality assessment is data consistency, particularly at borders or boundaries, where different methods may have been used. The nationwide/international ground motion map will be likely processed by multiple suppliers, therefore there is a need to assess and ensure consistency of InSAR results.
Our main goal in this research is to develop and test a fair and robust methodology to assess the similarities and differences between results from different InSAR processing chains, and to recommend a validation strategy for any nationwide/international (e.g. UK/EU) ground motion map. We review the history of InSAR comparison approaches with their characteristics and limitations in Section 2. In Section 3, we describe an approach we have developed. In section 4, we use the method to compare 4 processing algorithms for a test area in Glasgow. We present results in Section 5 and discuss the major differences and similarities between the InSAR results in Section 6, providing recommendations for future nationwide/international products and validation activities. Finally, we summarise the main conclusions in Section 7.

Review of previous InSAR comparison and validation approaches
Several previous projects have compared and validated InSAR velocities and time series. Following the 2003 Fringe meeting, the European Space Agency (ESA) initiated a blind InSAR validation project, PSIC4 (Persistent Scatterer Interferometry Codes Cross Comparison and Certification for long term differential interferometry), to assess the performance of PSI for land deformation monitoring using Envisat and ERS images (Crosetto et al., 2007b;Raucoules et al., 2009). The project analysed results for the same area provided by Altamira Information (Crosetto et al., 2008a), DLR (German Space Agency) (Adam et al., 2005), Gamma Remote Sensing (Werner et al., 2003), IREA-CNR (Institute for Electromagnetic Sensing of the Environment National Research Council of Italy) (Berardino et al., 2002), TRE (Tele-Rilevamento Europa) (Ferretti et al., 2007), TUDelft (Delft University of Technology) (Kampes, 2005), UPC (Catalonia Polytechnics University) (Mora et al., 2003) and Vexcel (Van der Kooij et al., 2005). Preprocessing of the data prior to inter-comparison comprised applying geolocation shifts and spatially referencing each data set to the same reference area. The most relevant indicators used to compare the results were the average deformation rate and the density and distribution of the selected PS points. The PSIC4 test area was a coal mining area in the South of France, which was undergoing rapid subsidence and did not include stable features. The reference area was a stable local area outside the mining work. The results showed that for the case under consideration, the main area of subsidence could not, or could only partly, be assessed by most of the InSAR teams due to the low density of PS in the area of interest. Moreover, the standard deviation of velocity differences between the data sets ranged between 0.6 and 1.9 mm/year which can be considered as an estimate of local uncertainties. One of the most important conclusions of PSIC4 concerned the characteristics of the coal mining test site in which none of the conditions to measure deformation with millimetric accuracy by PSI was fully realised. The severe characteristics of the PSIC4 test site were non-optimal for PSI due to i) abrupt nonlinear motion and ii) rapid velocities which were prone to aliasing with the 35-day revisit time of Envisat/ERS. The project recommended future SAR missions with more frequent acquisitions in order to improve the ability of PSI to detect rapid velocities (Raucoules et al., 2009). PSIC4 used "blind conditions" with no a priori information about the deformation or the goal of the PSI analysis. The teams used a standard PSI approach instead of tailoring the processing to a specific objective, which could partly explain the lack of PS in the mining area. PSIC4 demonstrated that, at that time, PSI performance was highly dependent on the application, and the limitations were real. A wider area inter-comparison, "Provence Inter-Comparison", was later presented using the same data as PSIC4 but covering a larger area, including both deforming and stable areas (Crosetto et al., 2007a). One difference between the Provence Inter-Comparison study and PSIC4, was that it compared the data set in the radar coordinate system, to avoid validation issues associated with geocoding errors. The Provence inter-Comparison showed a greater degree of consistency between the velocity maps and the time series from different providers (Crosetto et al., 2007a). It was largely based on data outside the mining area, where the results of the two projects were similar.
The Terrafirma project (Capes et al., 2009), part of the EU/ESA Global Monitoring for Environment and Security (GMES) programme, the precursor to Copernicus, also established a PSI process validation approach, known as the Terrafirma Validation Project, which built on the earlier studies. The Terrafirma validation project had two aims: result validation via comparison with ground truth levelling and intercomparison of the results of different InSAR providers. The intercomparison methodology initially compared four InSAR data sets from different providers (TRE, Altamira Information, Gamma Remote Sensing, and Fugro NPA) in radar coordinate systems to a reference processing result (GENESIS, DLR PSI processing), which was defined as the "truth" (Adam et al., 2009). Pre-processing steps included checking the global consistency of the data sets and the coregistration in radar space, referencing the data to the same reference in the time and space dimensions and removing potential tilts by de-trending. Velocities, time series, topographic corrections, detection capabilities and data densities were compared. The project produced a set of global statistics, which concerned large sets of PS pixels and provided information on the global inter-comparison behaviour of different metrics. The average standard deviation of the velocity differences and the mean standard deviation of the time series differences were 0.5-0.7 mm/yr and 1.5-5.6 mm, respectively. These values were used to derive error bars to indicate the quality of the estimate derived by PSI, which was key information for Terrafirma end users. Since deformation rates in the case studies in the Terrafirma Validation Project were moderately low, one should be careful in extending these statistics to areas involving higher deformation rates. Moreover, the results showed remarkable differences in PS density between the providers, which resulted from the use of different criteria during PS selection (Crosetto et al., 2008b).
As the Terrafirma PSI certification process was intended for local (20 km × 20 km) PSI analysis of deformation, the Wide Area Processing (WAP) Terrafirma project later expanded this methodology to validate PSI processing over a significantly greater area (one or more scenes of 100 km × 100 km) than that considered in the initial Terrafirma PSI certification (Adam et al., 2013;Brcic et al., 2014). The major differences between the processing chains in the wide area relate to atmospheric compensation and trend removal. Both steps were applied for TRE products, none of them implemented for DLR products and Altamira only removed the long wavelength trend. The results showed that the standard deviations of the deformation velocity differences for coherent pixels were below 1 mm/yr in most of the inter-comparison cases. This was one requirement of Terrafirma PSI certification. It also concluded that the most significant factors affecting compliance with this requirement were: (a) possible long wavelength trends affecting the interferograms (resulting from spurious atmospheric components and orbital fringes); (b) systematic phase components associated with the master scene used for the PS analysis, and (c) possible phase unwrapping errors, which were strongly dependent on the deformation signal and the presence of any data gaps in the interferograms.
Previous validation approaches have several limitations. Firstly, to be useful in real-world applications, InSAR data must be geocoded. By only comparing results from different methods/providers in the radar coordinate system, an important step of InSAR processing is excluded. Secondly, specifying a reference InSAR product as the "truth", as was done in the Terrafirma Validation Project, can also lead to an unfair comparison, as it excludes the possibility that the reference data set also has errors. Thirdly, validation projects to date have used data from Envisat and ERS; the improved spatial and temporal coverage of Sentinel-1 data, and its narrow orbital tube, opens up several new opportunities for InSAR processing, which were not feasible previously. Finally, previous approaches were only applied to validating PSI methods; a comparison method that can consider both PS and DS is now required.
Several recent studies have compared individual data sets or methods. A comparative study based on the results from DePSI (Delft PS-InSAR processing package) and StaMPS (Stanford Method for Persistent Scatterers), was applied using two data sets from ERS and Envisat and concluded that these methods are complementary (Sousa et al., 2011). The time-series InSAR results generated using ERS data with both a PS method and a SBAS algorithm were compared quantitatively and the calculated discrepancy was found to be consistent with those estimated by the PSIC4 study (Shanker et al., 2011).
In another study, the capability of three InSAR time-series techniques, PSI, SBAS and SqueeSAR, for evaluating landslide deformation, was investigated using TerraSAR-X images (Mirzaee et al., 2017). The estimated average velocity maps and coherence maps produced by the methods were compared and it was concluded that SqueeSAR was more efficient for evaluating landslide kinematics in the rural case study.
Finally, the performance of ISBAS and RapidSAR were compared using Sentinel-1 images to monitor shale-gas operations in Lancashire, outlined as part of the Environmental Baseline Monitoring programme conducted by the British Geological Survey (BGS). The results showed agreement between the approaches to estimate average annual velocity in the study area (Jordan et al., 2019).
With the Copernicus European Ground Motion Service now being commissioned, it is timely to formalise requirements for comparison of InSAR results. In this research, we present a methodology for intercomparison of geocoded InSAR products using Sentinel-1 images. We test this methodology with the InSAR products resulted from different InSAR time series algorithms over a case study where multiple InSAR data sets are available.

Methods
In this section, we introduce our new inter-comparison method. The outline of the proposed approach is shown in Fig. 1. We base the approach on the Terrafirma Validation Project, but tackle its limitations as follows: 1) As end-users require geocoded InSAR data, we compare all the data sets in geographic rather than radar coordinates. This allows us to consider any potential geocoding errors that can impact on the final product, especially in areas with very local deformation. 2) Because no InSAR processing chain produces perfect, noise-free results, we avoid assuming that any reference InSAR processing is the "truth". 3) We define several polygons with different land cover types and stability. This allows us to assess how the agreement differs between InSAR data with different signals and/or different ground conditions. 4) We do not limit the time series processing to PSI algorithms and are open to any other methodologies e.g. both PS and DS InSAR processing. 5) We work with Sentinel-1 imagery. Our approach can be split into pre-processing and inter-comparison stages. These are described in more detail below.

Pre-processing
Before comparing data sets, some pre-processing steps are required: (i) We assess the consistency of geocoded data sets from different InSAR methods. As the coordinate system of the points selected by different InSAR methods might be different, we convert all the InSAR data to an identical geographic coordinate system. Any geocoding errors are critical when the deforming area is very small and should be noted. Adjustments can be made if necessary to ensure the data are comparable. This pre-processing step was applied in the PSIC4 project. We assume that any translation of coordinates is constant for the whole data set; and assesse them by overlaying the data on an accurate base map and considering some control points. (ii) We select pairs of InSAR data sets for comparison, with each data set processed using a different method. For the comparison to be valid, both data sets in a pair must use data from the same ascending or descending Sentinel-1 pass. Therefore, the inputs of this step are individual InSAR data sets from different methods and the outputs are different pairs of data sets. (iii) The time range of InSAR data for each comparison pair may be different. To ensure consistency as much as possible, we reestimate the velocity using a common time range for each comparison pair, by fitting linear velocities to the time series for each pixel using only data from the common time range. The reestimated velocities for the InSAR data sets forming each InSAR pair are the outputs of this step.
(iv) We identify the common dates in the time series for each comparison pair and set the first common date as a reference time as follows: where for each selected point in an inter-comparison pair, d(t) Tem− new is the re-referenced deformation time series in temporal space (output), d(t) Tem− old is the original deformation time series (input) and d(t 0 ) Tem− old is the deformation of the first common date for the corresponding pair.
(v) We can optionally apply an identical low pass filter to each time series data set (input), in this case using a triangular filter covering 5 epochs. This helps to remove the effect of random noise in a similar manner from both time series. The output of this step is filtered time series of InSAR data for each pair. Ideally, we work with unfiltered time series before applying this filter, but this may not always be possible. In this study, only one of our data sets was filtered, and we used the same temporal filter for the other unfiltered data sets. (vi) We re-reference the deformation rate and deformation time series of all comparison pairs to an identical local reference area, which is outside the deforming areas and contains coherent pixels: where for each selected point in an inter-comparison pair, d(t) new is the re-referenced deformation time series in spatial and temporal spaces (output), d_ref(t) Tem− new is temporally re-referenced deformation time series for the reference area, V new is the re-referenced deformation velocity in spatial space (output), V old is the original deformation velocity and V_ref old is the deformation velocity of the reference area. Unlike the Terrafirma Validation Project, we do not have access to the coherence of selected points for all data sets. Therefore, we apply a noise analysis algorithm to identify highquality pixels in the reference area (Hooper et al., 2007;Sadeghi et al., 2018). First, selected pixels are connected to form a network using Delaunay triangulation. Then, for each arc connecting two pixels, a weighted average phase is calculated from the entire time series, and removed from the original phase of the arc, which is then low pass filtered in time. The resulting phase, with the weighted average phase added back in, provides an estimate for the smooth underlying signal. Phase noise is estimated by subtracting the smooth phase from the original phase of the arc. Finally, the phase noise of each measurement pixel is obtained from the phase noise of its corresponding arcs. The pixels with a noise level less than a threshold for all data sets are selected in the reference area.
(vii) We define an identical geographic grid with 40 m spacing in both easting and northing and for each of the InSAR data sets calculate the mean value of any measurement points located inside each grid cell. The outputs of this step are the time series and velocities for each defined grid cell. We specify no-data for grid cells which contain no measurement points. (viii) We define polygons covering areas with different scattering and deformation characteristics so that the algorithms can be tested in different conditions. In the case of our test site in Glasgow, we define urban, rural and deforming polygons (see section 4 and Fig. 2). Selected grid cells inside each defined polygon for all InSAR data are the outputs of this step.

Inter-comparison of data sets
After pre-processing, we compare InSAR results in terms of several metrics, including the estimated velocities, time series, density and coverage as follows: (i) We calculate the differences between the deformation velocities for the common grid pixels of each pair and estimate their mean (μ dV ) and standard deviation (σ dV ). We also calculate the correlation coefficient for estimated velocities (ρ V ) of all common grid pixels. (ii) In order to extract statistics from the differences between timeseries, (a) in the first step, we compute the differences between the time series for each common grid pixel of each pair and then extract their mean and standard deviation; (b) we calculate the mean of the parameters computed in the previous step for all common grid pixels of a given pair, mean of mean of time-series differences μμ dD and mean of standard deviation of time-series differences μσ dD . The mean values show any potential bias between the estimated deformation velocities/time series of each pair. Standard deviation values provide information on how the deformation velocity/time series differences are distributed. We also calculate the correlation coefficient for the estimated deformation time series (ρ D ) of each common grid pixel. This is a useful tool for measuring the degree of similarity of the deformation histories of the analysed time series. (iii) In order to compare the density and coverage of measurement pixels, we resample the InSAR data onto an identical 100 m × 100 m grid. The number of selected pixels in each cell gives the selected pixel density (D); we calculate the average density for each of the polygons with different scattering/deformation characteristics. We also calculate the coverage (C) of measurement pixels, which we define as the percentage of 100 m × 100 m grid pixels containing at least 1 InSAR measurement. We note that in order to make a fair comparison in terms of density and coverage, the noise analysis described in section 3.1-vi should be applied before comparison and noisy pixels should be removed from each data set using the same threshold for the phase noise standard deviation, in this case 1 rad in our case.

Data and case study
We use results from the Clyde Gateway of the Glasgow City Region to test our methods (Fig. 2). This is an area of particular interest to the Natural Environment Research Council (NERC) as it is the BGS geothermal energy research field test site of the UKGEOS project (https://www.ukgeos.ac.uk/about/project-details). The Glasgow site will help characterise whether water from abandoned mine workings can be used to generate a sustainable and efficient source of energy. Changes in underground water levels, pressure and temperature caused by mine water for geothermal energy production activities can lead to surface subsidence/uplift (Heimlich et al., 2015). Therefore, monitoring is required to assess surface-level impacts of geothermal abstraction and re-injection research activities . Although the area is largely urban, it also includes more rural areas, such as the Woodland park within the Cuningar loop.
We have access to several Sentinel-1 InSAR data products for this area, processed using four different approaches. Results for two of these approaches were provided by commercial companies: SatSense, using a modified RapidSAR algorithm (https://www.satsense.com) (Spaans and Hooper, 2016) and TRE-ALTAMIRA, using the SqueeSAR algorithm (htt ps://site.tre-altamira.com) (Ferretti et al., 2011). The results for the other two approaches come from our own processing using GAMMA-IPTA, processed using conventional PSI at BGS (https://www.gamma -rs.ch) and the PS-only option of StaMPS (Hooper et al., 2007). Analysis and interpretation of some of the GAMMA-IPTA results can be found in Bateson and Novellino (2019). We used the results of the four different approaches to test our InSAR inter-comparison activity. In all we have 5 data sets, 3 in an ascending geometry and 2 in descending.
Hereafter, we anonymise the algorithms and label them A-D, in no particular order. We have ascending and descending InSAR results for algorithm A, which we used for inter-comparison independently. For algorithm B, we have data for descending geometry only and for algorithm C and algorithm D we have data for ascending geometry only. Therefore, we formed 4 inter-comparison pairs: A-B (descending), A-C (ascending), A-D (ascending) and C-D (ascending). Table 1 compares the key characteristics of the data sets: the longest time span and the largest number of available images are related to A (descending) and B (descending) which used similar Sentinel-1 data sets, while C includes the shortest time range and smallest number of Sentinel-1 scenes. B and C used PSI algorithms which select only PS pixels and form single-master interferograms, but A and D took advantage of identifying both PS and DS pixels and made a multiple-master interferometric network. Apart from C, spatial de-trending was applied during processing to all of the InSAR data to remove any potential long wavelength trends. The different strategies applied by the InSAR algorithms for removing the effects of unwanted elements such as long wavelength trend and APS might have an impact on the level of agreement between the algorithms for example by introducing a bias in the average of deformation velocity differences.
In algorithm A, the level of noise for a point is assessed by calculating the difference between the smoothed time series and the APS filtered time series. After referencing to the average of its neighbours, this helps to give an idea of which points are inherently noisier since atmospheric effects have been reduced. The phase noise is estimated for algorithm B using method described in section 3.1-vi. For algorithm C, the point quality is measured by calculating the standard deviation of the misfit to a regression through the time series. The standard deviation of the phase misfit depends on the quality of the reference point, the target point and the pre-defined model (usually linear). For algorithm D, a linear model is fitted to the time series for each measurement point before compensating for possible atmospheric components and the standard deviation of the residual phases is calculated to estimate the uncertainty of the average velocity.
We define three polygons for the test region that broadly cover "deforming" (0.2 km 2 ), "urban" (0.9 km 2 ) and "rural" areas (1.2 km 2 ) (Fig. 2). The area west of Cuningar Loop (deforming polygon) was a site developed for the Commonwealth Games Athletes Village and suffers from small 6 mm/yr rates of linear subsidence due to loading of the superficial deposits .

results
In this section, we present the results of our inter-comparison methodology using the available data sets. Four InSAR comparison pairs can be made: A-B (descending), A-C (ascending), A-D (ascending) and C-D (ascending). Before showing the inter-comparison results, we show the averaged velocities on the defined grid and the common dates for each inter-comparison pair in Fig. 4 and Fig. 5, respectively. As can be seen in Fig. 4, the reference area is local, and therefore the estimated inter-comparison statistics represent the local uncertainty.

Inter-comparison of velocity
The velocity differences are calculated for the common grid pixels of each pair in the urban, rural and deforming polygons. The mean and standard deviation of deformation velocity differences and correlation coefficients of the estimated velocities are extracted and reported in Fig. 6.
The mean differences are 1.0 mm/yr at most (C-D, urban polygon), and most are less than 0.1 mm/yr, confirming that there are not any significant biases in estimated velocities. The mean velocity differences associated with comparison pairs A-B and A-D are closer to zero than those for A-C and C-D. The standard deviation of the velocity differences can be used to assess the level of agreement between the InSAR algorithms, but does not mathematically represent the uncertainty. The standard deviation related to A-B is well under 1 mm/yr and indicates a good overall agreement. The standard deviations of the other InSAR algorithm pairs are all better than 2 mm/yr and the average of the standard deviations is highest in the rural polygon and lowest in deforming polygon for all pairs. The average of the standard deviations of velocity differences for all polygons and all pairs is 1.1 mm/yr; we use this value to show confidence bounds for measurements in the scatterplots in Fig. 7.
All InSAR comparison pairs show low correlation where there is little deformation (in the urban and rural polygons), but have higher correlation in the deforming polygon. Correlation coefficients (ρ V ) are in the range 0.5 to 0.7 showing a good agreement between the different methods, where there is significant deformation. We illustrate the agreement by creating scatterplots of the estimated velocities in the deforming polygon (Fig. 7). The correlation is clearest in the comparison between A and B, where both data sets have relatively high density, but a good correlation is also seen in the deforming area for the other data sets, confirming that all algorithms are detecting similar deformation signature in this region.
In Fig. 8, we also show variograms of the estimated velocity differences for all common grid cells of each inter-comparison pair inside the purple outlined square in Fig. 4-a. This helps assess the spatial variability in the difference between the estimated velocities by the different methods. The variograms show that the noise level does not increase significantly with the spatial separation of the points on the 1-2 km length scales that we have analysed in this study. However, we would expect the noise level to increase with distance for longer length scales (Emardson et al., 2003).

Inter-comparison of time-series
As described in the Section 3, we calculate the differences between deformation time series in each comparison pair at each common grid pixels, and then extract mean and standard deviation of these differences at each pixel. Finally, we estimate the mean of the means and the mean of the standard deviations using all of the common grid pixels in each of the polygons (Fig. 9).
The mean of mean values (μμ dD ) is under ±2 mm for all pairs, indicating that there are noticeable systematic effects between the time series pairs. The mean of standard deviations (μσ dD ) ranges between 1 mm for the A-B pair and 2 mm for the A-D pair. Time series statistics associated with the A-D pair shows the poorest agreement with respect to the others for all polygons, except the urban polygon where mean of the mean is slightly lower than A-B pair.
We also calculated the correlation coefficient for the estimated deformation time series of each common grid pixel and plotted the percentage of common grid pixels with a correlation coefficient above 0.7 in Fig. 10. This figure confirms that the percentage above 0.7 is over 50% for all comparison pairs in the deforming polygon, which means more than half the common grid pixels in this polygon show a high level of similarity between the patterns of estimated deformation. The most similar pattern of deformation time series for all polygons is related to the A-B inter-comparison pair.
For illustration purposes, we also compare the time series for one typical subsiding grid cell in the deforming polygon ( Fig. 11-a)) for all pairs. We plot all the time series on the same time axis, although the common temporal interval between the algorithms differs for each InSAR pair. The time histories from the different algorithms and viewing geometries compare well when they are observing the same time periods. Fig. 11-b) and 11-c) show the deformation time series plot and deformation time series scatterplot for A and B which used the same Table 1 Key characteristics of the InSAR products: A-descending (Fig. 3a), A-ascending (Fig. 3b), B-descending (Fig. 3c), C-ascending (Fig. 3d), D-ascending (Fig. 3e). descending Sentinel-1 data sets. There is an excellent correlation coefficient between the deformation patterns estimated by the two algorithms and no significant bias between the estimated deformation time series can be seen. The reference of the deformation time series is the first common date. The deformation time series with the original reference in time selected by the InSAR algorithms are plotted in Fig. 11d) for A, C and D using descending Sentinel-1 images. Then the deformation time series for A-C, A-D and C -D (each comparison pair in ascending geometry) are plotted in the common temporal interval using the deformation of the first common date as a reference in time in Fig. 11-e), 11-g) and 11-i), respectively. The corresponding scatterplots of the estimated deformation time series for A-C, A-D and C-D pair are shown in the Fig. 11-f), 11-h) and 11-j), respectively and good agreement between the InSAR algorithms in detecting the deforming signal can be seen.

Inter-comparison of density and coverage
One major difference between the InSAR products is the density of pixels. We compare these for all InSAR algorithms in Fig. 12. All InSAR algorithms provide the highest and lowest density in urban and rural areas, respectively. We plot density maps for the different InSAR   Fig. 3. Estimated LOS deformation velocity in the case study by a) algorithm A using descending Sentinel-1 images, b) algorithm A using ascending Sentinel-1 images, c) algorithm B using descending Sentinel-1 images, d) algorithm C using ascending Sentinel-1 images and e) algorithm D using ascending Sentinel-1 images. The yellow outlined polygons are defined in a) as areas including urban, rural and deforming features. The black outlined ovals show localised subsidence signals. Fig. 4. Average LOS deformation velocity in the inter-comparison grid for the case study by a) algorithm A using descending Sentinel-1 images, b) algorithm A using ascending Sentinel-1 images, c) algorithm B using descending Sentinel-1 images, d) algorithm C using ascending Sentinel-1 images and e) algorithm D using ascending Sentinel-1 images. The yellow outlined polygons are defined as areas including urban, rural and deforming features. The magenta outlined rectangular shows location of reference area. The Purple outlined square in a) shows an area to estimate variogram in fig. 8. algorithms in Fig. 13. The results confirm that A is the most successful InSAR algorithm in terms of density of pixels for both ascending and descending geometries, and D identifies the lowest density.
We also compare the coverage for each InSAR comparison pair in Fig. 14, defined as the percentage of 100 × 100 m grid pixels containing at least one measurement. The coverage of different InSAR algorithms in the deforming polygons is very similar. Although D provided the lowest density for all polygons, it offers the highest coverage in rural and deforming polygons. Indeed, it was able to select pixels in some locations other InSAR algorithms were not, including inside the Cuningar loop (Fig. 3).

Discussion and recommendations
In this section, we discuss the major similarities and differences between InSAR results. We then recommend some requirements for a national/international ground motion map. The results are quite similar despite the very different algorithms used. The major similarity between all the InSAR data sets is that they all detect similar deformation signals in the deforming polygon. Moreover, all of the methods provide a good density of observations in the urban polygon, as bright scatterers are selected appropriately by all the methods. In addition to the motion in the deforming polygon, a number of other features of deformation are seen in all data sets. For example, all data methods show localised subsidence (up to 10 mm/yr) on the M74 motorway gantry highlighted with black outlined ovals in Fig. 3, which is likely related to instability in the embankment supporting the motorway at this location . There is particularly good agreement between the velocity and time series products, and density/coverage of measurement points, for algorithm A and algorithm B, even though the methods are dissimilar. The better agreement of the A-B pair with respect to the other pairs is not due to the geometry of the Sentinel-1 data (descending).
However, the results are not completely identical. One of the most striking differences between different InSAR methods is density and coverage of selected pixels. The ability to recover measurements at a high pixel density and with wide coverage is one of the most important requirements for monitoring many different sources of deformation in      different conditions. This can be critical where the deforming signal is very local and occurs in non-urban areas that lack man-made structures. The main reason for the difference in point density is the different methodologies used for processing and the criteria that are used for selecting the pixels (Table 1 and Fig. 12). In general, those methods that take advantage of both PS and DS, and benefit from making all possible interferograms (e.g. A and D) are more successful at extracting the maximum information (density and/or coverage) from the SAR stack. Note, however, that due to the short baseline of the Sentinel-1 interferograms, some DS pixels can remain coherent in a single-master interferogram network and would be identified as PS pixels in PSI processing methods such as StaMPS, where phase correlation rather than amplitude is used to identify PS (Hooper et al., 2007). Fully connected networks including interferograms with long temporal baselines may suffer from fewer selected pixels compared to the networks with only short baselines. Moreover, the temporal range of processed SAR images has an impact on the density of measurement points. Because scattering behaviour might vary over time, the probability of finding PS pixels with consistent scattering behaviour over longer periods of time reduces. However, in this case, the algorithms with the longest time series have the highest density so the time period cannot explain the differences we see. In addition, other factors that can have a major impact on the density of measurements, include the temporal sampling of signal, the temporal range of processed data, the configuration of the interferometric network, whether oversampling of the original images is applied, the approach for side-lobe cancelation, and the specific thresholds chosen for an acceptable signal-to-noise ratio (SNR).
The sampling rate for Sentinel-1 is such that the single-look pixel Fig. 11. a) Estimated LOS velocity by A(ascending) inside the deforming polygon on our grid used for comparison; data from measurement grid pixel "1" is shown in b-j; b) deformation time series plot of A(descending) and B (descending) using the first common date as a reference in time; c) scatterplot of A(descending) and B (descending; d) deformation time series plot of A(ascending), C(ascending) and D(ascending), the temporal reference of time series is the original reference selected by the InSAR algorithms; e) deformation time series plot of A(ascending) and C(ascending) in the common temporal interval using the first common date as a reference in time; f) scatterplot of A(ascending) and C(ascending); g) deformation time series plot of A (ascending) and D (ascending) in the common temporal interval using the first common date as a reference in time; f) scatterplot of A (ascending) and D (ascending); i) deformation time series plot of C (ascending) and D (ascending) in the common temporal interval using the first common date as a reference in time; j) scatterplot of C (ascending) and D (ascending).
spacing is finer than the resolution, and some scatterers can result in more than one PS pixel. In such cases the InSAR algorithm should ensure that any extraneous PSs from a single scatterer are pruned. It should also be noted that time series methods selecting DS pixels can introduce data redundancy when spatial filtering is used to select SHPs. Therefore, given a homogeneous area, the selected points may show identical scattering behaviours. In this case, InSAR algorithms should provide end-users the resolution of the DS pixels and inform them that those measurements do not correspond to that specific point. Techniques that use pixels with intermittent coherence (Biggs et al., 2007;Cigna and Sowter, 2017;Sowter et al., 2013) can be more successful in terms of spatial coverage and density, particularly in non-urban areas; however, these methods tend to use a high multi-looking factor to improve coherence, and hence the density of observations is often lower. Although a high density of measurement points is a desired outcome for an InSAR product, striking a balance between the quality and density of selected pixels is challenging. A higher density can be obtained by not rejecting pixels with higher noise values. The interferometric processing strategy (e.g. the use of a single master or multiple master images) and the methodology of time series filtering/smoothing also have an impact on the level of noise in the final results. Decisions may need to be made on a case-by-case basis, depending on the application and the expected magnitude of the deformation signals. Using methodologies that provide both high density and high quality observations of deformation is a key priority for any national/international ground motion map.
There might be some systematic effects in difference maps, which are mainly due to different approaches to dealing with long wavelength trends and atmospheric phase screens (APS). In order to evaluate impact of de-trending the products before inter-comparison, we also carried out a comparison with de-trended products from algorithm C. We repeated the calculation of statistics for the velocity and time series intercomparison described in the section 5-1 and 5-2 for pairs A-C and C -D. The results show that, in this case, de-trending has a negligible impact on the consistency of products from C with those A/D 7,9 and 10). This is likely due to the small size of our polygons (maximum 1.2 km 2 ) which are not significantly affected by the long wavelength trend. It would be appropriate to de-trend data in inter-comparison activities for large case studies.
Different geocoded coordinates for the common selected pixels is another discrepancy between the InSAR products. Overlaying InSAR   data on an accurate base map or ortho-rectified aerial photograph and/ or using corner reflectors can be solutions for correcting the geocoding shifts. Although a linearly varying shift would probably provide more accurate geocoding corrections, in general, a constant shift is assumed for inter-comparison purposes (Raucoules et al., 2009). Geocoding error correction improves the agreement between different data sets significantly.
InSAR products are different in terms of some qualitative indicators. Spatial resolution is one of the most relevant metrics and ranges from the original high resolution sampling of Sentinel-1 (14.1 m and 2.3 m in azimuth and slant range direction, respectively) to lower resolutions that depend on the associated multi-look factors selected during processing. Some methods can provide both high and low resolutions to be used for different applications (e.g. rural and urban environments); this may be useful for national/international ground motion services.
We also note that ground motion maps are dynamic products, with velocities changing over time. Consequently, the frequency of update and latency period (time delay between acquisition and the update) should be defined by the InSAR algorithms that deliver the product. An appropriate update and latency period should be defined as part of any commissioning process. Some applications, such as hazard monitoring, benefit from rapid updates.
Any future national or international ground motion service using Sentinel-1 InSAR will need to instigate a validation process to ensure data meet minimum standards and are consistent across borders. We propose that this is done in an open and transparent fashion. A single test region, or network of test sites for various applications, should be identified that includes a range of deformation and land cover types, and bidders should submit their analyses for this region as part of any commissioning process. The results should be open and accessible via an online repository so that InSAR algorithms benefit from understanding how their analyses differ from others and so that all can improve their offerings. One of our major challenges in this research, was that different Sentinel-1 images (ascending vs descending; different dates) were processed by the InSAR algorithms, limiting our ability to conduct a fair comparison between all approaches. We suggest that in the future a comparison exercise should be repeated periodically and that in each case the time period, acquisition dates and acquisition geometry should be explicitly specified.
In our analysis, the InSAR algorithms produced results over Glasgow to establish a baseline prior to the geothermal exploitation. The deforming areas in Glasgow were very local, and the scattering conditions in the deforming areas were not ideal. In addition, the "truth", estimated through an independent measurement method, was unknown and therefore validation of the InSAR products using external measurements was impossible. Test sites should be carefully selected and should cover a range of different deformation types. Independent data should be collected, for example, from dense permanent networks of GNSS and levelling measurements. Corner reflectors may be useful for testing geolocation and for providing measurement points with high SNR, and it may be appropriate to process data from a very-highresolution satellite system such as TerraSAR-X for additional validation of Sentinel-1 results.

Conclusions
In this research, we present an InSAR inter-comparison method, which 1) builds on the Terrafirma Validation Project and 2) addresses the limitations of previously proposed approaches up to now. We tested our method using 5 InSAR time series products including conventional PSI and advanced joint PS and DS InSAR, applied to Sentinel-1 images. We selected an inter-comparison site in Glasgow, for which we had access to multiple InSAR data, and defined three polygons covering urban, rural, and deforming features. It is clear from our results that different InSAR methods detect the same general deformation features, but they are not identical in terms of different metrics. We propose different indicators, which are divided into quantitative metrics e.g. density and coverage of measurement points and qualitative metrics e.g. spatial resolution. Based on our comparison results, we suggest some recommendations, which might be useful for any future nationwide/international InSAR product and validation activities.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.