Big data in the geoscience: a portal to physical properties
Kingdon, Andrew ORCID: https://orcid.org/0000-0003-4979-588X; Fellgett, Mark; Nayembil, Martin. 2018 Big data in the geoscience: a portal to physical properties. [Lecture] In: Janet Watson Meeting 2018: a Data Explosion : the Impact of Big Data in Geoscience, London, UK, 27 Feb - 1 Mar 2018. (Unpublished)
Before downloading, please read NORA policies.
|
Text
Kingdon_et_al_big_data_in_geosciences.pdf Download (12MB) | Preview |
Abstract/Summary
Geosciences were early adopters of both computing and digital data; the precursors of the SEG-D and SEG-Y geophysical formats date from as far back as 1967. Data standards, for seismic (SEG-Y, SEG-D) or geophysical log (LAS, DLIS) data simultaneously make interpretation and visualisation of data practicable but also their binary nature makes applying analytical techniques unusually complex. Specialist software is often required to process and interpret different datatypes. Such problems are exacerbated by historic poor data management practices. Datasets are rarely collated at the end of projects or stored with sufficient metadata to accurately describe them and many strategically useful datasets reach BGS incomplete, unusable or inaccessible. Whether this situation arose through a lack of foresight about the future value of data, poor practise or simply storage space restrictions these problems pose huge challenges to today’s geoscientists. Consequently, there are major problems with applying big data analytics to geoscience. For example, many techniques don’t sample geology directly but use proxies needing further interpretation. The use of analytical techniques have commonly been limited by the high proportion of noise incorporated into the datasets with very significant interpretation skills required to identify the signal. Thus far successful applications of “big data” analytics have been limited to closed systems or analyses of very common digital data types. Significant problems remain, including the lack of data that can be immediately interacted with and difficulties in bringing together multiple datasets about related phenomena. Also the lack of adequate metadata about the data available to understand its context and scope and how to apply and qualify results. Whilst geosciences datasets have all the attributes of big data – volume, veracity, velocity, value and variety – the last two controls are disproportionately significant. The first of these determines the usefulness of the data and the second is the biggest impediment to delivering on the promises that big data offers especially in Earth Sciences. In order to deliver a standardised platform of data from which individual geological attributes can be identified BGS has invested in the creation of PropBase (Kingdon et al., 2016). This single portal facilitates the collation of datasets supplied in standardised formats. This allows all data from a single point feature (e.g. boreholes) or areas of interest) E.G. to be extracted together in a common format allowing all data to be immediately compared. The existence of PropBase portal allows a researcher to answer the question “What’s available at a location?” It has already been used in site characterisation for the UK GeoEnergy Observatories project. Such initiatives that allow collation of high volumes of data in a single extractable format are a critical step forward to allowing Big Data analytics. Combined with the increasing availability and ever lowering cost of high power computing and analytical routines, the opportunities for big data analytics are ever growing. However, substantial challenges remain and new and more interactions with computer scientists are needed to deliver on this promise.
Item Type: | Publication - Conference Item (Lecture) |
---|---|
NORA Subject Terms: | Earth Sciences Data and Information |
Date made live: | 17 Apr 2018 12:13 +0 (UTC) |
URI: | https://nora.nerc.ac.uk/id/eprint/519840 |
Actions (login required)
View Item |
Document Downloads
Downloads for past 30 days
Downloads per month over past year