Value of open data: A geoscience perspective

We are living in a data‐centric society, with governments and businesses increasingly looking at what they can do to gain insight and improve the flow of data. Encouraging the release of data as ‘open data’ is one measure that would remove barriers to access, increase use and facilitate downstream data innovation. Using examples from firstly the non‐geoscience and then geoscience sectors, this paper outlines three factors that can lead to a successful open data programme. These are (1) having a clear strategy with a well‐articulated vision; (2) ensuring that data are not only free but also technically accessible and delivered under an open licence; and (3) continued investment in the programme to ensure its long‐term success. However, not all data can or should be open, and organizations and governments must be careful that their interventions do not have unintended consequences that might reduce incentives to collect, maintain and share data. A primary concern is the financial sustainability of a dataset, but this also extends to other risks that would prevent the data being widely shared such as the inclusion of personal data or third‐party intellectual property. In these cases, use of a data‐sharing risk assessment framework, and the application of the FAIR principles of findable, accessible, interoperable and reusable can be used to increase data sharing and maximize the benefits that can be realized from geoscience data.


| INTRODUCTION
Data-driven innovation is changing the way we operate as a society. With exponential growth of data being collected and published, governments and businesses are increasingly interested in what they can do to improve the flow of data to users and gain new insights from that data. At the centre of this is the principle that data must be widely shared to optimize its use and stimulate a wider audience (Department for Digital, Culture, Media & Sport, 2020; Houghton, 2011).
There are three features of data sharing that if recognized and embraced can contribute to the efficient, competitive and innovative use of the data (HM Treasury, 2018): 1. A single piece of data can be used in many algorithms and applications at the same time. 2. While data can reveal new insights if aggregated, linked and analysed, those insights are not always apparent or directly beneficial to the data creator or controller. This can result in the data being under-exploited or under-shared.
3. Combining two or more datasets may produce greater insight than possible if they were kept separate.
There are several measures and steps that can be taken to accelerate and promote the sharing and wider use of data which will in turn increase the benefits and insights gained from that data. Encouraging the release of data as 'open data' is one measure that removes many of the barriers that would otherwise limit or prevent the data from being shared and restrict the downstream innovative use of the data.
Many Governments and global institutions have become aware of the benefits that could result from 'open data' and conversely the potential missed opportunities from 'closed data'. In 2012, the UK government published its 'Open Data White Paper: Unleashing the Potential' (HM Government, 2012), where it formally adopted its policy of 'Open by Default' for public sector data and encouraged the use of its Open Government Licence as the de facto licence for sharing data (The National Archives, 2014). In 2013, leaders of the group of 8 (G8) countries signed the Open Data Charter (G8, 2013) that included the expectation that all government data will be published openly by default, alongside principles to increase the quality, quantity and reuse of the data that is released. This paper explores the potential open data opportunities for geoscience data. The term geoscience data is broad and incorporates a range of both analogue and digital records, which arguably include some of the oldest collections of samples and data on the planet, with rock and fossil sample collections dating back from the earliest years of geological science. This paper, however, focusses on the digital data: geological survey data, geochemistry analytical data, geophysical survey data streams, borehole/well data, sub-surface properties data and earth observation data. As our geoscience data research evolves to include artificial intelligence, machine learning and other data science capabilities, the data that underpin this research is also evolving incorporating sensor data, semantics and other 'big data' sources.
Considering both the traditional and novel digital geoscience data that are being collected and made available this paper provides case studies to demonstrate that there are continued and significant insights, innovations and economic benefits that can be realized when geoscience data are shared.

| WHAT IS OPEN DATA?
Open data are often considered synonymous to free data, but it is much more than that. It is described by the Open Knowledge Foundation as 'Open data and content [that] can be freely used, modified and shared by anyone for any purpose' (Open Knowledge Foundation, 2015). This means that beyond the data being delivered without fees, it should also be shared under simple, non-restrictive terms of use such as the Creative Commons framework, and be technically accessible, for example, delivered in non-propriety and/or machine-readable formats.
There is some overlap here with the concept of FAIR data, which uses the principles of findable, accessible, interoperable and reusable to support knowledge discovery and innovation (Wilkinson, 2016). Under FAIR, 'findable' primarily relates to good metadata which is searchable, while 'interoperable' and 'reusable' are facilitated through use of open data standards. FAIR data, however, uses the term 'accessible' to mean that 'once the user finds the required data, they need to know how that data can be accessed, possibly including authentication and authorization' (GO FAIR International Support & Coordination Office, 2016). This means that data can be considered FAIR when it is private, when it is accessible by a defined group of people, or when it is accessible by everyone (open data), (Mons et al., 2017). While open data should be available for everyone to access, use and share without restrictions, it does not necessarily always have to be findable and interoperable.
The Open Data Institute (ODI) developed a Data Spectrum to help creators and users of data understand the language used when sharing data (Open Data Institute, n.d.). It categorized two further types of data that exist alongside open data: shared data, where data can be shared with others under additional licencing conditions, and closed data, where data cannot be shared and are frequently for internal use only and not publicly findable ( Figure 1).
With its many potential characteristics, open data should not be considered an end point in itself, but rather on a scale. In acknowledging this, the ODI developed its own certification scheme that graduated from 'bronze' at the minimum end of open data to 'platinum' as detailed in Table 1 (Open Data Institute, 2013). It reasoned that while all open data should remove restrictions, increase accessibility and be legally reusable, further benefits can be gained by investing in additional legal, practical, technical and social measures.

| Example from the transport sector
Well-designed and delivered open data can have significant impacts. Since 2007, Transport for London (TfL) have developed an open data strategy that includes the release of timetables, service updates and disruption alerts delivered via a series of static data files, feeds and APIs (available in JSON and XML). With an average of 26.9 million journeys across London daily (Transport for London, 2019), their ambition was to move people around the capital more smoothly and efficiently. An important shift from the previous thinking was that they no longer anticipated that they would be providing routing or travel information directly to passengers but hoped that professional and amateur developers would step in to produce new products and services, thereby extending the reach of TfL data, reducing the burden on their own technical development teams and driving economic growth. Within 10 years, there were 600 apps delivering this information to 42% of Londoners (Deloitte, 2017). The beneficiaries of this were direct (for example passengers and road users) and also indirect (for example the shift from private to public transport stimulated by improved journey planning had obvious environmental benefits).
Transport for London also recognized that in order to ensure the continued success of the data release, three key factors need to be met. They must: 1. Invest in the developer community, encouraging use and identifying gaps and opportunities; to hold 'hackathons' and 'accelerators' (dedicated workshops and challenges), write blogs and develop formal partnerships. 2. Use the data and analytics generated by the passengers/journeys to gain insight on the transport network and improve overall customer satisfaction. 3. Improve the quality and coverage of open data; looking at further candidates for open release and identifying opportunities to link and merge with other data.
In 2017, it was estimated that this open data release created annual economic benefits and savings of £130 million per year for travellers, London and TfL themselves, saving Londoners between £70m and £95m per year in saved time and contributed to the creation of 700 jobs (Deloitte, 2017).

GEOSCIENCES AND OPEN DATA EXAMPLES
Geoscience data support a range of economic activities and services both directly and indirectly across public and private sectors, including: • Extractive industries • Natural hazards monitoring and management • Groundwater resources • Geoscience education and research • Geotourism and geoheritage An example of the direct use of geoscience data is when natural hazard data are consulted by the infrastructure sector when evaluating sites for development and designing their construction to be resilient to the hazards. In the UK, this infrastructure pipeline represents more than £600 billion of investment over the decade (Infrastructure &Projects Authority, 2017, andHM Treasury, 2017).
Similarly, geoscience data also contributes towards our utilization of natural resources. Groundwater resources, for example, are important to the economy of the UK and have been valued at approximately £8 billion (Environment Agency, 2005). Therefore, using data to understand complex processes such as groundwater recharge can have a direct impact on how modern society manages sustainable use of a critical natural resource.
Many of the geoscience data that are used by society stems from research, whether this be university led or from national research centres. There have been F I G U R E 1 ODI's Data Spectrum (Open Data Institute, n.d.) numerous studies into the value of research data which reach the same conclusion: there are significant benefits to be gained from curating and openly sharing research data (Houghton & Gruen, 2014;Research Data Alliance, 2014). These benefits include creating jobs, spurring growth, boosting research productivity and creativity and helping people and engaging citizens (Research Data Alliance, 2014), and can be measured in economic terms by exploring 'use value' of research data and by estimating return on investments in data activities (Houghton & Gruen, 2014). Creating and maintaining good data have long been an essential pillar for many organizations. However, providing access to that geoscience data are now seen as increasingly important to realizing a myriad of downstream benefits, and many research and geological institutions have taken positive

| Mining in Western Australia
Mining is a key activity in Western Australia. It is the main driver for wealth creation and generates around 22% of commercial state revenue (ACIL Allen Consulting, 2015). To support this, the state government introduced the Exploration Incentive Scheme which focused on mineral exploration in greenfield areas and was administered by the Geological Survey of Western Australia who received close to $AUD150M between 2009 and 2019 (Economics Consulting Services, 2019). One of the central tenets of the scheme is the open release of geoscience information. By opening up access to this data, significant new areas for mineral exploration were identified which resulted in increased exploration activity and ultimately higher financial benefit. This included, for example, airborne survey data flown by the Geological Survey of Western Australia which was used by exploration companies to identify areas likely to contain uranium deposits (Fogarty & Sagerer, 2016). The release of the information was 'non-rivalrous' meaning that it was available to be accessed simultaneously by multiple organizations, without being depleted, which resulted in more activity and a larger private sector response.
The benefits of the programme were predominantly financial and occurred in both private and public sectors. Every $1M invested in the Exploration Investment Scheme was expected to result in $23.7M in increased national product, which could be attributed to additional exploration activity, taxation and royalty revenue, construction activity and net wealth generated by the development of new mines (ACIL Allen Consulting, 2015). The ACIL study also demonstrated that the timing of the intervention was critical. The financial returns associated with mining are related to the resource and mineral price, so the open release of the information had to coincide with a relatively high resource and mineral cost, in order to stimulate the required attention and subsequent exploration.

| Mining in Chile
The impact of increasing access to data to stimulate mining activity has also been recognized in Chile. Despite the local and national economic importance of the Chilean mining industry, only 30% of the country had modern and detailed geological maps in 2012 (Schwartz et al., 2012). This deficit between supply and demand for geoscience information triggered the creation of a National Geological Programme. To justify the continued investment in this programme, a study was initiated to determine the economic impact of the supply of public geoscience information. It concluded that for every dollar invested in public geoscience information in Chile over the 30 years from 1997-2017, 11.5 dollars of government tax revenue could be generated by the mining industry (Gildemeister et al., 2017). The study noted that providing open access to this geoscience information is key to realizing these benefits.

Ireland and Northern Ireland
Increased mineral exploration and natural capital investment were one of the drivers of TELLUS: an airborne geophysical and ground-based geochemical survey designed by the geological surveys of Britain, Northern Ireland and the Republic of Ireland. TELLUS's goals were not only to provide new resource data to stimulate exploration investment and licencing in minerals and energy resources but also to inform research, regulation and management on other issues such as sustainable land-use planning, measuring environmental change and agricultural management (Young, 2016). The maps and data from the geophysical and geochemical surveys were made openly available which led to estimated investment commitments of over £32 million in new minerals exploration activity, with licenced blocks increasing from 15% to 70% of Northern Ireland's land area (Howard et al., 2014).
In the Republic of Ireland, maps and data from the same surveys support the extractive industry which is worth over €1.65 billion to the Irish economy (including lead, zinc, peat and natural gas) (Indecon International Economic Consultants, 2017). Ireland also provides good examples for geoscience and health, where the economic impact of radon-induced lung cancer is estimated to cost €340.8 Million (Indecon International Economic Consultants, 2017). The Geological Survey of Ireland used the TELLUS data to help develop a method to predict the soil-gas radon concentrations and identify high risk areas. These high-risk areas are fed to planning authorities to better manage the radon risk and minimize the economic impact of radon-induced lung cancer going forward (Elío et al., 2017).

| Borehole data in Britain
Large infrastructure projects are complex and can incur cost over-runs as a result of encountering unforeseen ground conditions such as the presence of groundwater or corrosive ground. In order to minimize the risk of this to a project, geological desk studies and site investigations are carried out which often include drilling boreholes, to determine geological and engineering characteristics of the ground (Site Investigation Steering Group, 2011).
In 2009, the British Geological Survey (BGS) scanned its collection of legacy borehole records and released them as open data on its OpenGeoscience website. The ambition was to make it easier for the construction sector to access the borehole information to help inform and plan their site investigation work. The impact of the data release was immediate: the number of borehole records accessed increased from 2,000 to 20,000, per month. Currently, approximately 130,000 borehole logs are now accessed every month by the construction sector, who use the data to optimize (or reduce) their own borehole sampling strategies. Boreholes are expensive and time consuming to drill c. £4000 for a 20-m hole, and so the economic and time saving benefits of this data release can be considered significant (Wildman, 2018).
BGS have continued to invest in its data release, pushing the borehole records out as widely as possible including via its data partner network and through the BGS iGeology smartphone app. In 2014, BGS launched its online Data Deposit Portal where it encouraged holders of data related to the construction sector, including site investigation data, to upload and share their records in a digital format (British Geological Survey, 2021).
Interestingly, an additional benefit of the data release was the 'virtuous cycle' that it triggered. As more people accessed the borehole records, more clients and contractors saw the benefit that data sharing could bear on the wider industry, and started to donate their own borehole and site investigation records to BGS (Wildman, 2018). Over 1 million 'open' boreholes are now shared by BGS with over ten million borehole records accessed in total since its launch in 2009 (British Geological Survey, 2020).

| European Satellite data
Copernicus, led by the European Commission (EC) in partnership with the European Space Agency (ESA), develops, builds, flies and operates the Sentinel family of satellites and missions. The programme's data policy provides full, open and free-of-charge access to the Earth Observation data captured by the satellites. This data and information have a range of applications in sectors including oil and gas, urban monitoring, insurance for natural disasters and agriculture, so has a wider reach than just the geosciences. Specific examples in the scientific community include the detection and measurement of algal blooms in Southern Chile (Rodríguez-Benito et al., 2020), and the development of interferometric synthetic-aperture radar (InSAR) capability to create ground deformation models (Gee et al., 2019).
From 2008 to 2020, the total investments in the Copernicus programme were forecasted to reach EUR 8.2 billion. Over the same period, this investment was expected to generate economic benefits of between EUR 16.2 and 21.3 billion, with additional social, environmental and strategic benefits (PwC, 2019).

| Making a success of open data
This paper has described a number of case studies that have demonstrated how an open data release can create downstream benefits. When looking again at the Transport for London's open data initiative, this paper argues the success can be attributed to three key factors:

Having a clear strategy with a well-articulated vision:
In Transport for London's case, their strategy was to move people around London more smoothly and efficiently. 2. Ensuring that the data were not only free but also technically accessible and delivered under an open licence: Transport for London delivered the data via a series of static data files, feeds and APIs (available in JSON and XML) and made it available under the UK's Open Government Licence. 3. Continued investment in the programme to ensure its long-term success: Transport for London invested in the developer community, used the data and analytics generated to gain further insight, and then improved the quality and coverage of open data.
The factors listed above are not specific to Transport for London. They can be central features to any successful open data initiative. There have been similar open data successes in geoscience, and the extent of the success can arguably be attributed to how closely they aligned to the three key factors of (1) having a clear strategy, (2) ensuring the data are technically and legally accessible and (3) continued investment in the data. When reviewing the geoscience open data use cases described in this paper, many demonstrated these factors: • The Exploration Incentive Scheme in Western Australia demonstrated that where there was a clear strategy with a well-articulated vision, release of open data at the right time can accelerate and amplify the benefits across a much wider base than the original data publishers and controllers. • There was a clear strategy associated with the TELLUS programme at the outset and an understanding as to where the benefits may lie. In addition, careful consideration was applied when planning how to deliver the data to those who need it. • By continuing to invest in its borehole records delivery initiative, BGS have grown the number of available records which has encouraged further donations and a 'virtuous cycle'. • With such high levels of investment required to support the Copernicus infrastructure and associated services, a strong vision and evidence-based business case was essential for both the initial intervention and ongoing support. This was further advanced by the free, full and open access to data, thus meeting all three necessary factors for a successful open data programme.

| Assessing which data to release
Although data may be 'free', it is rarely free of costs; investment is always required to maintain data as an asset and ensure its delivery. The UK Government's National Data Strategy highlights that 'the true value of data can only be fully realized when it is fit for purpose, recorded in standardized formats on modern, future-proof systems and held in a condition that means it is findable, accessible, interoperable and reusable' (Department for Digital, Culture, Media & Sport, 2020). Good geoscience data are critical to society but it requires investment for continued data collection, maintenance, management and distribution. Where costs of data collection and management are greater than funding, some organizations realize the value of their data assets through direct sales or licencing, and then use the income generated to invest back into those datasets. Releasing data as 'free' removes a potential funding lever which may currently provide financial support to maintain those datasets in the long term. Therefore, it is not always possible for every dataset to be open unless there is a sustainable business model in place to capture the data with a plan to support and maintain data quality. There must be careful consideration to which datasets to open-up based on the wider societal return on investment (ROI), and then to ensure an appropriate portion of that ROI goes to the data publisher, so that they can continue to maintain and improve data quality. Governments and institutions need to be careful that their interventions do not have unintended or unwanted consequences that might reduce incentives to collect, maintain and share data. For example, removing the ability of an organization to raise funds via licencing income may have a direct consequence on the levels of investment the organization can divert into maintenance or support for that data in the longer term even where short term funding is provided to fill the income gap.
Alongside concerns as to how to financially support and maintain data in the long term are additional concerns related to the risks associated with a particular data release. Many organizations hold data that could be valuable to others, and may be considering an open release, but are ultimately unable to publish because of concerns about risks that may or may not exist. These risks include release of personal data and potential breaches of data protection legislation, inclusion of third-party intellectual property, data that if released could pose a national security risk, and data that if shared, could impinge on anti-competition law. Frequently the risk, or perceived risk, does not neatly fall into an obvious category and requires legal expertise or advice to unpick. In these instances, rather than carry the risk or seek legal counsel, many organizations will take the safe option of not releasing the data. However, some take a more positive approach by removing the risk element from their data, for example, redacting parts or depersonalizing a dataset. Increasingly, organizations are now being encouraged to assess the perceived risks using a data-sharing risk assessment framework or toolkit that helps to identify risks, as well as potential mitigations and suggestions as to how to minimize those risks (Geospatial Commission, 2020). These frameworks are encouraging a more proactive, yet responsible, approach to data sharing, which can work to support further candidates for open data that would otherwise be ignored.

| CONCLUSION
While measuring and quantifying the benefits of geoscience data is multi-layered and complex, it is clear that its use and reuse plays an important role in any society.
Geoscience data are critical to the advancement of any society; its impacts are both direct and indirect, and its importance continues to grow. Releasing data as 'open' is a highly effective way to increase the access to that data, generating greater data insights which leads to significant downstream benefits in both industry and academia.
When considering what data to open, it is important to consider the three factors which have been demonstrated from the use cases in this paper to lead to success: 1. Having a clear strategy with a well-articulated ambition. There is evidence that where benefits from geoscience data are identified, interventions can accelerate and amplify the impacts. But there needs to be a clear, well-defined strategy at the outset. 2. Ensuring that the data are not only free but also technically accessible and delivered under a simple licence.
There is a spectrum of how data can be shared, from closed to open data, and further benefits can be gained by investing in additional legal, practical, technical and social measures to remove barriers to access. 3. Continued investment in the programme to ensure its long-term success. The data release must come with an ongoing commitment to support it. It is important to be transparent about the full costs involved in release and maintenance of open data to ensure the release is sustainable in the long term.
Like other forms of data, geoscience data continue to grow exponentially, fuelled in part by the prevalence of above and below-ground sensors and the digitization of archive and legacy records. Alongside traditional data analytics, novel data science techniques are now being used extensively in the geosciences to extract yet more knowledge and insight from the data. The need to share and provide access to geoscience data will accelerate the realization of benefits from this explosion of data. But not all data can or should be made open. In these cases, to maximize the opportunities, the FAIR principles of findable, accessible, interoperable and reusable can be followed. This may result in a particular dataset falling short of the completely unrestricted intention of open data, but applied well, can remove a number of barriers to access and unlock benefits that would otherwise be unattainable. ORCID Geraldine Wildman https://orcid. org/0000-0002-7581-9323 Edward Lewis https://orcid.org/0000-0003-2685-383X