Efficient sampling for geostatistical surveys

A geostatistical survey for soil requires rational choices regarding the sampling strategy. If the variogram of the property of interest is known then it is possible to optimize the sampling scheme such that an objective function related to the survey error is minimized. However, the variogram is rarely known prior to sampling. Instead it must be approximated by using either a variogram estimated from a reconnaissance survey or a variogram estimated for the same soil property in similar conditions. For this reason, spatial coverage schemes are often preferred, because they rely on the simple dispersion of sampling units as uniformly as possible, and are similar to those produced by minimizing the kriging variance. If extra sampling locations are added close to those in a spatial coverage scheme then the scheme might be broadly similar to one produced by minimizing the total error (i.e. kriging variance plus the prediction error due to uncertainty in the covariance parameters). We consider the relative merits of these different sampling approaches by comparing their mean total error for different specified random functions. Our results showed the considerable benefit of adding close‐pairs to a spatial coverage scheme, and that optimizing with respect to the total error generally gave a small further advantage. When we consider the example of sampling for geostatistical survey of clay content of the soil, an optimized scheme based on the average of previously reported clay variograms was fairly robust compared to the spatial coverage plus close‐pairs scheme. We conclude that the direct optimization of spatial surveys was only rarely worthwhile. For most cases, it is best to apply a spatial coverage scheme with a proportion of additional sampling locations to provide some closely spaced pairs. Furthermore, our results indicated that the number of observations required for an effective geostatistical survey depend on the variogram parameters.

A geostatistical survey for soil requires rational choices regarding the sampling strategy. If the variogram of the property of interest is known then it is possible to optimize the sampling scheme such that an objective function related to the survey error is minimized. However, the variogram is rarely known prior to sampling. Instead it must be approximated by using either a variogram estimated from a reconnaissance survey or a variogram estimated for the same soil property in similar conditions. For this reason, spatial coverage schemes are often preferred, because they rely on the simple dispersion of sampling units as uniformly as possible, and are similar to those produced by minimizing the kriging variance. If extra sampling locations are added close to those in a spatial coverage scheme then the scheme might be broadly similar to one produced by minimizing the total error (i.e. kriging variance plus the prediction error due to uncertainty in the covariance parameters). We consider the relative merits of these different sampling approaches by comparing their mean total error for different specified random functions. Our results showed the considerable benefit of adding close-pairs to a spatial coverage scheme, and that optimizing with respect to the total error generally gave a small further advantage. When we consider the example of sampling for geostatistical survey of clay content of the soil, an optimized scheme based on the average of previously reported clay variograms was fairly robust compared to the spatial coverage plus close-pairs scheme. We conclude that the direct optimization of spatial surveys was only rarely worthwhile. For most cases, it is best to apply a spatial coverage scheme with a proportion of additional sampling locations to provide some closely spaced pairs. Furthermore, our results indicated that the number of observations required for an effective geostatistical survey depend on the variogram parameters.

Highlights
• We compared spatial coverage and spatial coverage plus a subset of 10% from the total sample as close-pairs. • The objective function encompasses variogram uncertainty and prediction error variance. • Spatial coverage schemes always performed poorly because of the lack of information at short distances. • Using a scheme in which 10% of the sampling units are taken at short distances is a robust strategy.

K E Y W O R D S
geostatistics, ordinary kriging, pedometrics, sample size, spatial coverage scheme, uncertainty assessment When mapping a continuous soil variable, geostatistical predictions at unobserved locations are made from a limited set of sampling units, called a sample. The spatial locations of those units, i.e. the sampling scheme, has a key role in determining the cost of the survey and the quality of the predictions. Often, limited resources are available and one must adopt efficient strategies for the soil sample collection. Several solutions have been proposed to select additional sampling sites optimally using ordinary kriging, a basic technique in geostatistics. These often require prior knowledge about the correlation function (i.e. variogram) of the target property. For example, Van Groenigen, Siderius, and Stein (1999) proposed spatial simulated annealing (SSA) to optimize the sampling scheme so as to minimize the spatially averaged kriging variance as the objective function. This method leads to a space-filling distribution of observations, which are placed more or less evenly over the area of interest. A similar scheme can be obtained by the spatial coverage method described in Royle and Nychka (1998). They proposed a general geometric, space-filling criterion and published a point-swapping algorithm in S-plus to minimize this criterion. Brus, Spätjens, and De Gruijter (1999) proposed the mean of the squared shortest distance (MSSD) as a geometric minimization criterion, so that it can be minimized by the fast k-means algorithm. Later this was implemented in the R language by Walvoort, Brus, and De Gruijter (2010).
One advantage of coverage schemes is that they do not depend on the variogram of the soil property to be sampled. Coverage schemes are created by minimizing a criterion that is simply a function of the distance between sampling locations. Brus, De Gruijter, and Van Groenigen (2007) showed that using a spatial coverage scheme led to only marginally larger mean ordinary kriging variances (MKV) than schemes where this quantity was minimized directly. The authors endorsed early geostatistical practice in soil science where sampling units were located on a regular grid (Yfantis, Flatman, & Behar, 1987).
However, regularly-spaced sampling schemes are inadequate to model the short-range variation of the soil property, which is critical for geostatistical analyses (Starks, 1986). A practical solution, as suggested for instance by De Gruijter, Brus, Bierkens, & Knotters (2006, pp. 166-168), is to supplement the spatial coverage sample by a few additional units, located at short distances from the existing units. Recently, Lark and Marchant (2018) demonstrated that including such a short-distance subset markedly decreased the uncertainty of the kriging prediction for little additional effort in field data collection. Over a contrasting set of random variables, the authors proposed a simple rule that about 10% of the total sample size should be devoted to shortdistance units.
Using a more formal expression of the total error in a geostatistical survey, Marchant and Lark (2007) optimized a sampling scheme by minimization of the sum of error contributions from the kriging variance and the effects of uncertainty in the variogram estimate. We refer to this objective function as the total error. The authors showed that the configuration of the optimized scheme varied according to the variogram, which was unknown prior to sampling, and used a Bayesian framework to account for a set of plausible values of variogram parameters. A similar approach was applied by Zhu and Stein (2006) for redesigning an air monitoring network. The authors noted that estimates of the variogram parameters were uncertain. They approximated the error covariance matrix of the parameters by the inverse of the Fisher information matrix, and used a Taylor series approximation of its effect on the prediction variance to account for it in their sampling objective function. For both studies, the resulting optimized schemes closely resembled the spatial coverage scheme with a small number of closepairs of locations included, which are useful for estimating the spatial correlation over short distances. They showed that the number of close-pair locations depended largely on the variogram parameter values, and especially the variogram distance parameter.
However, the optimization procedure using a formal criterion for minimization of the total error is complex and time consuming. The formula for the total prediction error depends on the variogram and therefore it cannot be calculated exactly prior to sampling. Instead it must be approximated by using either a variogram estimated from a reconnaissance survey or a variogram estimated for the same soil property in similar conditions. Schemes based on approximate variograms are likely to be suboptimal. In such cases, spatial coverage sampling schemes (possibly with additional close-pairs) offer a viable and relatively simple alternative to plan a soil survey with little or no prior information.
Surveyors must also consider the number of sampling units that are required to produce effective geostatistical predictions. The sample must be sufficient to estimate an accurate variogram function. Kerry and Oliver (2007) noted that it is generally accepted that 100 units are required to produce a reliable method of moments estimate of the variogram. This advice stems from a study of simulated random functions conducted by Webster and Oliver (1992). Kerry and Oliver (2007) subsampled four field-scale surveys of clay content and determined that a reliable residual maximum likelihood (REML) estimate of the variogram could be attained with fewer than 50 sampling units.
In summary, the sampling scheme affects the uncertainty in the variogram parameters, which can have an impact on the prediction error variance. Supplementing a spatial coverage sample by a simple rule of thumb reduces the prediction error variance, but the overall distribution of sample points in a scheme can be optimized, although this is laborious and requires some prior information. Whether a practical sampling strategy is markedly better when based on optimization rather than the simple rule remains an open question, and past work has not compared the approaches directly. That is what we address here.
In this research we examined empirically the difference between spatial coverage sampling schemes (sc), spatial coverage schemes supplemented with close-pairs of points (sc + ) and schemes optimized to reduce the total error. We compared these schemes with respect to the sample size required to obtain comparable results. Our objective was to show whether formal optimization is generally worthwhile, given the computational demands and the challenges of specifying prior values of variance parameters, and whether spatial coverage sampling with supplementary points is a robust practical strategy.
In our first scenario we minimized this error for a known hypothetical variogram and a given sample size. Then we determined the size of a spatial coverage scheme that would be required to achieve the same total prediction error. Similarly, we considered the size of a spatial coverage scheme plus 10% close-pairs that would also achieve the same total prediction error.
In addition to the spatial arrangement of sampling units we also considered the minimum number of units that were required to produce useful geostatistical predictions. For sample sizes larger than this minimum sample size the ordinary kriging predictor outperformed the simple random sample mean as a predictor of the values at points. For sample sizes smaller than this minimum there was no benefit from a geostatistical approach for mapping. We assumed that a geostatistical survey should, as a basic minimum requirement, ensure that local spatial predictions have an average prediction error variance that is smaller than the prediction error variance of the regional mean, estimated by designbased sampling. Webster and Lark (2013) discussed how the design-based mean can be treated statistically as a point prediction. We assumed that this design-based survey was the same size as the geostatistical survey, that the sampling units were selected according to a simple random scheme and that the corresponding design-based estimate of the mean was used as the prediction at each location.
In our second scenario we considered a geostatistical survey of soil clay content and the effect of using the average variogram of a set presented by Paterson, McBratney, Minasny, and Pringle (2018) as a basis for a sampling scheme. We minimized the total prediction error variance given a sample size based on the average variogram and then repeated the tests conducted in the first scenario to find the size of the sc and sc + schemes that would be required to achieve the same total error as the optimized scheme for each of the clay variograms.

| Formulation of the objective function
Using the ordinary kriging formulation, we consider the situation in which the soil property (which is assumed to be a realization of a random function Z) has been measured at n locations s i i = 1,…, n;s i 2 A ð Þ . The measurements z(s i ) are treated as realizations of Z(s i ) and prediction is done for Z at unobserved locations s 0 , with a known covariance parameter vector θ. Stacking the z(s i ) in a vector z and changing to matrix notation yields the ordinary kriging prediction equation (Webster & Oliver, 2007): where λ > is the vector of kriging weights, obtained from the kriging equation: where ψ is the Lagrange multiplier introduced to allow minimization of the kriging variance subject to the constraint that the n weights λ 1 , λ 2 ,Á Á Á,λ n sum to one. The covariance between the ith and jth locations is denoted by C(s i − s j | θ). The term C(s i − s i ) is the sill variance (a priori variance). Note that while A needs to be derived (and inverted) once if all observations are used for prediction at every target site, d must be computed for every prediction location s 0 . From Equations (1) and (3), the expected squared error of the prediction is given by: , ð3Þ In addition to the squared error of the prediction, Marchant and Lark (2007) and Zhu and Stein (2006) considered the effect of uncertainty in the estimated spatial model (variogram) parameters by a Taylor series approximation: where cov(θ i , θ j ) is the covariance between the ith and jth parameters. This requires the variogram parameters θ i (i, j = 1,…, q) to be known so that Equation (5) can be approximated prior to sampling. The n-vector of partial derivatives of the kriging weights with respect to the ith variance parameter is denoted by ∂λ > ∂θ i and can be obtained by (Marchant & Lark, 2007): The covariance between the variogram parameters can be approximated using the inverse of the Fisher information matrix F (Kitanidis, 1987): where Tr[Á] denotes the trace of the matrix. The total error at locations s 0 , σ 2 P s 0 ð Þ is given by the sum of the squared prediction error σ 2 OK s 0 ð Þ and the spatial model parameter uncertainty E[τ 2 (s 0 )]: where subscript P stands for parameter. This can be aggregated to obtain a spatial average: In practice, the integral σ 2 P is numerically approximated by a discrete summation over a spatial grid.

| Optimization of the sampling schemes
We start with an initial random set of sampling locations of size N, lying within the boundaries of study area A. We assume that Z(s i ) is a stationary isotropic normally distributed random field, characterized by a constant mean and fitted correlation function ρ(h) (h is the spatial lag or separation distance). The aim is to find the optimal sampling scheme, which minimizes the objective function (Equation (9)), given the parameters of ρ(h). Many algorithms have been developed for solving optimization problems. We use simulated annealing (Kirkpatrick, Gelatt, and Vecchi (1983)), extended for spatial optimization by Van Groenigen et al. (1999) for generating sequences of new possible schemes. A new sampling scheme is created by randomly shifting a randomly selected unit within the study area. This generates a new candidate scheme for which the objective function can be evaluated with Equation (9), and compared with that of the previous scheme. The new candidate scheme is accepted if it has a smaller value of the objective function than the previous one. If the new scheme has a larger value of the objective function then it is accepted or rejected at random; the probability of acceptance is given by (Wadoux, Brus, Rico-Ramirez, & Heuvelink, 2017): where the control parameter α is a temperature parameter. The temperature is kept constant during a set of perturbations, called a chain, after which it is decreased to a value of β × α for β < 1. In this way, the risk of the optimizer becoming trapped in a local but not a global minimum is reduced. We used the implementation provided by the R package spsann (Samuel-Rosa, 2017) through the optimU-SER function. The initial temperature α was set to 3 with a cooling parameter β of 0.9. These were chosen so that P(accept) is close to 1 in the first chain and generally zero at the final chain. The maximum number of chains is set to 200, so that the total number of iterations is N × 200. The process stops if the determined number of iterations (N × 200) is reached or if the criterion remains constant for ten chains. The candidate locations are the centre of cells of a square grid.

| Scenario 1
The first scenario considers the case where the variogram is known. We characterize the spatial correlation ρ by the second parametrization of the isotropic Matérn model (Matérn, 1986) given by Stein (2006, p. 31): where h is the separation distance, K ν is the modified Bessel function of the second kind of order ν (see Abramowitz & Stegun, 1972, pp. 374-379) and Γ is the gamma function. The correlation function ρ(h) has parameters a and ν. Parameter a is the distance parameter, which indicates how fast the correlation decays with increasing h and ν is the smoothness parameter. Stein (2006) noted that ν is the critical parameter in the Matérn correlation model. The larger is ν, the smoother is Z. We chose a Matérn model for its flexibility in modelling the spatial covariance with a small number of parameters (Minasny & McBratney, 2005). For the first scenario, we generated a square area of 100 m × 100 m. Spatial coverage schemes of size N = 60, 61, … , 200 are derived by discretization of the area into N geographical strata using the stratify method from the R package spcosa (Walvoort et al., 2010). The spatial coverage units are taken in the centroid of the strata, which is equivalent to minimizing the mean squared shortest distance between a location in the region and the nearest sampling location. In addition, we also generated samples of size N = 60, 61, … , 200 in which the sampling locations were distributed according to a spatial coverage scheme, with a subset of 10% of units positioned at an arbitrary distance that was short relative to the spacing between neighbouring points in the basic spatial coverage survey. This arbitrary short distance was set to 2 m because of a mean spacing between neighbouring locations in the sc scheme of 6.6 m for N = 60 and 12.5 m for N = 200. These close-pair units were selected by simple random sampling without replacement in a randomly chosen direction from 0-360 degrees. We repeated the selection of close-pairs several times to determine the sampling variation in total variance. Since the latter was small, we did not pursue this any further because this confirmed the very tight confidence intervals in the Lark and Marchant (2018) study. We considered different sets of variogram parameter values, all of which had a total sill variance of one. Four values of ν were tested: ν = 0.5 (equivalent to the exponential variogram), ν = 0.2 (rougher than the exponential), ν = 1.1 and ν = 2 (smoother than the exponential). These four ν values were combined with each of three distance parameter: a = 10, 20 and 30, and three ratios of the nugget (c 0 ) to total sill variance (c 0 + c 1 = 1) for strong (c 0 = 0), moderate (c 0 = 1/3) and weak (c 0 = 2/3) spatial dependence. Note that we use the nugget to sill ratio to characterize the spatial dependence of a model with known parameters, but this should not be done when comparing empirical variograms because the magnitude of the nugget variance is likely to depend in part on the sampling scheme. Each of the 4 × 3 × 3 = 36 scenarios were optimized for a fixed sample size N = 90 in the way described in the previous section. To speed up computations the criterion was evaluated at 34 × 34 locations on a regular square grid of spacing 3 m.
In this scenario we compared for each variogram the size of the sc and sc + samples required to attain the same value of the objective function as the optimized scheme of 90 units.
We also compared the average total prediction variance that resulted from the geostatistical survey of each random function with the prediction variance that would result from using an estimate of the simple random sample of the field mean as a predictor of the value at points. If the design-  (Brus, De Gruijter, & Breeuwsma (1992, Eq .7)): where σ 2 is the dispersion variance (the variance of the variable within the study area) and the σ 2 /N term reflects the uncertainty in estimating the field-scale mean of the property of interest with simple random sampling (Brus & De Gruijter, 1993). Instead of the spatial variance (dispersion variance) for a single realization, we used the model expectation of the dispersion variance in Equation (12), so that the model expectation of the spatial mean of the design-based estimation error variance at points was also obtained. For each set of variogram parameters, we determined the smallest sample size of a geostatistical survey which led to the average total prediction variance being less than this design-based prediction variance. We determined the dispersion variance for each random function from the average variance of 1000 lower-upper (LU)simulations of the function at 2000 random locations across the study area.

| Scenario 2
The second scenario considered a survey of soil clay content where no field-specific information about the variogram was available. In such a circumstance, McBratney and Pringle (1999) suggested that the average of previously published soil clay variograms should provide useful information for assessing soil sampling schemes.
Here we used data from a published study on fieldscale variability of soil variograms. We used a compilation of soil clay variograms, provided by Paterson et al. (2018). They were gathered from the existing literature, based on untransformed data and physical measurements. We converted the exponential, spherical and linear clay variograms to a Matérn model (Equation 11) by re-estimating their parameters using a least squares approach. In this way, we compared surveys using variograms with the same number of estimated parameters. From the set of Matérn clay variograms, we derived an average experimental variogram as in McBratney and Pringle (1999). Each variogram for soil clay was evaluated at a set of closely-spaced lag intervals. Each value of semivariance was transformed to its fourth root. The average value of the fourth root of the variogram was computed at each lag interval over all the clay variograms and the resulting values were back-transformed to their fourth power. The fourth root is used to give a normally distributed variable even when the underlying variable includes extreme values (Cressie & Hawkins, 1980). Finally, a Matérn correlation function (Equation 11) was fitted by non-linear least squares to the average experimental variogram. The estimated Matérn correlation function was similar to an exponential variogram (ν = 0.5) with a nugget variance c 0 = 2.6, a partial sill c 1 = 8.0 and a distance parameter a = 44.1 m (effective range is about 85 m). We then optimized the distribution of 90 sample units within a 500 m × 500 m region, using the mean total prediction error variance, Equation (9), as the objective function specifying the parameters of the average variogram. The objective function was evaluated at a centred square grid of 25 × 25 points with a spacing of 20 m. We then found, for the random function with parameters estimated for each clay variogram, the value of the objective function achieved by optimizing sample schemes of size N = 60, 61, … , 200, and the corresponding number of observations in an sc and an sc + scheme required to match the value of the objective function achievable by optimization with the average clay variogram.

| Scenario 1
Figures 1-4 show 90-unit sampling schemes optimized to minimize the expected total error with different values of the nugget to sill ratio, different distance parameters a and smoothness parameters of 0.2, 0.5, 1.1 and 2, respectively. In all schemes, the sampling locations are generally evenly dispersed over the area with some close-pair units. When the nugget to sill ratio increases (larger c 0 ), the number of closepairs tends to increase substantially. The pattern for larger values of the distance parameter a is reversed. The larger is a, the smaller are the transects of close-pairs. When c 0 = 2/3, c 1 = 1/3 and a = 10 the sample size seems insufficient to cover the whole area. This might indicate that for this variogram and study area, 90 units were insufficient to both estimate the variogram and predict the soil property across the region. All values of ν tested had comparable patterns for the optimized schemes. Figures 5-8 show the values of the objective function for each variogram type for the sc or sc + schemes compared to the values of the objective function from the optimized 90-unit sample scheme. The values of objective function for the sc schemes show a rougher pattern than those of the objective function for the sc + schemes. The sc schemes performed poorly in most cases. The poor performance was less pronounced for large values of a when ν was 1.1 or 2. In such cases, sc schemes were only slightly worse than the optimized schemes. The sc + schemes always performed slightly worse than the optimized schemes. With increasing nugget to sill ratio, the sc + schemes needed an increasing number of additional units to reach the same value for the objective function as the optimized sample scheme. This was valid for all values of ν tested.
For each set of variogram parameters, Table 1 reports the number of additional samples necessary when using the sc + scheme to reach the objective function of the optimized 90-unit scheme. Overall, the sc + scheme needs at least 8 and a maximum of 59 additional units to achieve the objective function of the optimized 90-unit scheme. As mentioned previously, there is a clear trend associated with the nugget to sill ratio. The larger is the ratio, the larger is the number of additional units in the sc + scheme. This effect was slightly diminished for increasing values of a. Figure 9 shows the objective function for the sc and sc + schemes for the case where the smoothness parameter ν = 0.5 was either fixed (known) or estimated (with uncertainty) with parameters c 0 = 0, c 1 = 1 and a = 20. When the smoothness was estimated there was a marked difference between the total error variance for the sc and sc + schemes when there were fewer than about 200 sample points in total. With larger sample sizes (above 220) the difference became negligible. When the smoothness is known (equivalent to assuming an exponential variogram), there were still minor differences between the sc and sc + scheme objective functions but they rapidly converged to the same values (from about 120 units). Table 2 shows the minimum sample size required for the expected total variance to be smaller than the estimation variance of the target property that would result from a designbased survey of the same size. The sc + schemes needed on Value of objective function for sc + (black dots), sc (grey dots) and optimized (red triangle) schemes. The spacing between the two vertical lines indicates the number of extra units required for sc + to achieve an optimized objective function value for ν = 0.5 FIGURE 7 Value of objective function for sc + (black dots), sc (grey dots) and optimized (red triangle) schemes. The spacing between the two vertical lines indicates the number of extra units required for sc + to achieve an optimized objective function value for ν = 1.1 FIGURE 8 Value of objective function for sc + (black dots), sc (grey dots) and optimized (red triangle) schemes. The spacing between the two vertical lines indicates the number of extra units required for sc + to achieve an optimized objective function value for ν = 2 average fewer units than the sc schemes. There is a clear association between an increase in the required sample size, increase in the nugget to sill ratio and decrease in the smoothness and distance parameters. When compared to the effective range of the target property (i.e. the distance at which the spatially correlated portion of the variogram attains 95% of the sill), the minimum number of units increased with decreasing values of the effective range. The dispersion variance (denoted σ 2 in Table 2) increased with larger values of the nugget to sill ratio and larger values of the distance parameter.
3.2 | Scenario 2 Figure 10 shows an example of sc and sc + , as well as the optimized 90-unit scheme obtained by minimization of the expected total error using the average soil clay variogram. The optimized scheme had sampling units dispersed evenly over the area with a number of close-pair units. The number of close-pair units seems slightly larger than that of the sc + scheme. While the sc + and optimized scheme share some similarity in the pattern of sampling locations, the sc scheme is very different from the optimized scheme. This is confirmed by Figure 11 which shows values of the objective function for sc + , sc and optimized schemes using the average clay variogram. The sc scheme performed poorly until about 200 units. In contrast, the sc + had objective function values closer to that of the optimized scheme. Twenty-two additional locations were required for the sc + scheme to reach the objective function of the optimized scheme, which was achieved with a total of 11 close-pairs in the sc + scheme (out of 112). Figure 12 shows the standardized soil clay variograms and the average variogram. First, the average variogram was used to compute the optimized scheme. Second, we found the sample size for the sc + scheme for each separate clay variogram to achieve the total variance of the optimized scheme. Overall, the optimized scheme was fairly robust with contrasting standardized soil clay variograms because it gave about the same total variance for most of the individual variograms as for the average variogram. Figure 12 shows that a large number of additional units were needed (>100) when large sill values of the variogram were reached in a short distance. In addition, fewer units were needed (< −5) when the total sill was reached at large distances. For similar values of the distance parameter, more units were needed for larger values of the nugget variance, e.g. weaker spatial dependence at short distances (see for example the two clay variograms with similar distance parameters but different nugget values).

| DISCUSSION
For all optimized schemes, there was a number of close-pair units. This shows that sampling units at short distances had a critical effect on decreasing the total expected error (which encompasses uncertainty in the variogram parameters and kriging variance). The number of close-pair units increased according to the nugget to sill ratio and to a lesser extent relative to the distance parameter. This was an expected result, because a random variable with a small spatial correlation distance and large nugget to sill ratio had to be sampled at a large number of short-distance locations to ensure minimization of uncertainty in both variogram parameters and prediction error variances (Marchant & Lark, 2007). This explains why sc schemes performed poorly in all cases. The sc schemes lacked close-pair units to estimate the spatial correlation over short distances which have a large effect on total expected error. Sampling schemes containing a subset of 10% as close-pairs (suggested by Lark and Marchant (2018)) provide a robust strategy to ensure a reasonably small total expected error. In the case of a small distance parameter or large nugget to sill ratio, 10% of close-pairs does not provide sufficient information and it is better to either increase the ratio of units taken at short distances or to use an optimized scheme.
The test presented in Figure 9 suggests that the importance of close pairs is reduced if the smoothness parameter is assumed to be known. In practice, however, this was not the case. Assuming a particular smoothness value (e.g. 0.5 for the exponential variogram) for a regularly sampled soil property led to a substantial proportion of the uncertainty being disregarded. This choice was somewhat subjective because it was related to the decision of the modeller and the range of possibilities we allowed in our model. We point out that close pairs are not only important when the nugget to sill ratio is large (Table 1) and the range of spatial correlation (3 × a if ν = 0.5) is small relative to the size of the study area (Table 2), but also when one needs to estimate the additional Matérn model parameter ν (Figure 9).
Results from our second scenario showed that the optimized scheme based on the average variogram was fairly robust for contrasting soil clay variograms. For several variograms, an sc + scheme outperformed the optimized scheme. This was an unexpected result at first sight. The reason is that the sampling scheme was optimized for the average variogram, and can therefore be suboptimal for an individual variogram. The results of the second scenario suggested that databases of variogram parameters (e.g. the one of Paterson et al. (2018)) can be used to derive an average variogram, and that the latter can be used to guide sampling (McBratney & Pringle, 1999) or to predict a soil property from fewer units than usually required for estimating variogram parameters (Kerry & Oliver, 2004). An average variogram could also provide prior information for expert or Bayesian elicitation of the variogram (Cui, Stein, & Myers, 1995). In our two scenarios, close-pair units were taken at a fixed distance from one of the spatial coverage units. There might be room for further research on how these close-pair units should be selected. For example, in several optimized schemes, transects of several units can be seen. Further tests on our scenario 1 (not shown) suggested that selecting closepairs in a cluster might reduce substantially the number of additional units needed with an increasing nugget to sill ratio. Such a scheme would, however, rely heavily on the assumption of stationarity (i.e. that the short-scale variation in the cluster indicates the short-scale variation across the study area). Our results here were for ordinary kriging in which the local mean of the variable was assumed to be constant. Sampling to support universal kriging (to model a non-stationary mean) or to support kriging with external drift (to model dependence of the mean on covariates) introduces other considerations, specifically estimation of the fixed effects parameters in the model. This requires further work. We speculate that the supplemented spatial coverage schemes that we have shown to be efficient for ordinary kriging would also be efficient for universal kriging, in that the spatial coverage points would ensure reliable estimation of trend parameters, and the close-pairs would similarly ensure that the variance parameters are estimated precisely.
For the optimized schemes in scenario 1, derived from a variogram with a small distance parameter and large nugget to sill ratio, the sample size seems too small compared to the size of the study area. Table 2 shows that this is indeed the case for several variogram types, and especially if the units are based on a spatial coverage scheme. This can lead to Value of objective function for sc + scheme (black dots) and sc scheme (grey dots) for c 0 = 0, c 1 = 1, a = 20 and ν = 0.5. In (a) the smoothness parameter is to be estimated while in (b) it is assumed to be known situations where the expected total variance is larger than the total sill variance. In such circumstances adding close-paired observations might resolve the problem of parameter estimation, but the overall sampling scheme remains inadequate for the task of spatial mapping because the spacing between neighbouring observations in the spatial coverage scheme is not sufficiently small relative to the range of spatial dependence. If one kriges from a grid with spacing larger than the range, then the prediction error variance is equal to the sill variance plus the Lagrange parameter, which is equivalent to the second term for the prediction variance of the spatial mean as a point predictor in Equation (12). This points us to the fact that, in these circumstances, where we cannot afford a grid with spacing that is small relative to the range, spatial prediction by kriging is not an option. In these circumstances point prediction might, in the worst case, be the regional mean of the variable, estimated by design-based sampling and with a prediction error variance computed from Equation (12). It might be possible to do better by estimating mean values within subregions of the area of interest such as soil map units (Webster & Beckett, 1968), again by designbased sampling, or by undertaking design-based sampling to estimate parameters of a predictive relation between the soil property of interest and covariates such as data from remote sensors. We hope that this clarifies why we refer to design-based estimation in the paper. It is not the case that design-based sampling does not provide a basis for spatial prediction. Design-based simply refers to the sampling scheme (probability sampling) and the basis for estimation from the data. The resulting design-based mean (for a region or subregion) may then be treated as a spatial prediction, as discussed by Webster and Lark (2013). Table 2 also shows that for large value of smoothness (ν = 1.1 or 2) and small nugget to sill ratios, the minimum number of units needed to make geostatistical analysis more accurate than a design-based estimate, on average, is surprisingly small. This can be explained by the relatively large values of the effective range (r = 60 and r = 57) for the Values of objective function for sc + scheme (black dots) and sc scheme (grey dots) and optimized scheme (the red triangle). The spacing between the two vertical lines in (b) indicates the extra units required for sc + to achieve an objective function value from the optimized scheme for the soil clay average variogram case study (square of 100 m × 100 m); most sampling units were within the range of spatial correlation. However, for random functions with larger nugget to sill ratios, the design-based survey was more accurate even when the survey consisted of more than 200 units. Thus, the number of units required to estimate the variogram from a geostatistical survey depended on the degree of spatial correlation of the target property. We acknowledge that these total prediction variances are based upon a Taylor series approximation to the true variances.

| CONCLUSIONS
From the results and discussion we draw the following conclusions.
• The sc schemes performed poorly in almost all cases because of the lack of information at short distance to estimate the variogram parameters. • Uncertainty of the sc scheme was mainly characterized by uncertainty of the smoothness parameter. Performance of the sc scheme can therefore be greatly improved by assuming that the smoothness is known, for example with an exponential variogram. However, in practice we have no justification for making such an assumption. • The benefit of using an optimized scheme over an sc + scheme was clear but still generally modest. In addition, the optimization required the variogram parameters to be known. • The benefit of using an optimized scheme over an sc + scheme became more important with an increasing nugget to sill ratio (weaker spatial dependence). In this case, geostatistical survey was unlikely to be effective. • For a random variable with zero nugget and a large range of spatial correlation fewer than 15 observations were required to obtain average total prediction variances that were smaller than the prediction variance of the design-based estimate of the regional mean, treated as a point prediction at each location. However, 200 observations of a random variable with a substantial nugget effect were insufficient to meet the same criterion. • When the scale of spatial variation of the soil property was not known, using an average variogram for optimizing the sampling scheme is a robust strategy. • Overall, the tests conducted showed that there was little evidence of large benefits from optimizing sampling schemes. Therefore, it is better in most cases to use a spatial coverage scheme supplemented by a subset of close-pair units unless prior knowledge of the variogram is available (e.g. reconnaissance survey).