nerc.ac.uk

Publishing Environmental Data APIs for use in AI workflows: recommendations and demonstrators of a standard approach within the NERC Environmental Data Service

Card, Chris; Heaven, Rachel ORCID: https://orcid.org/0000-0002-6172-4809; Kingdon, Andrew ORCID: https://orcid.org/0000-0003-4979-588X; Bell, Patrick; Baldwin, Alex; Carter, Jeremy; Coney, Jonathan; Cooper, Jonathan; Gonzalez Alvarez, Itahisa; Hollaway, Michael ORCID: https://orcid.org/0000-0003-0386-2696; McCormack, Matthew; Poulter, David; Stephens, Ag; Trembath, Philip. 2025 Publishing Environmental Data APIs for use in AI workflows: recommendations and demonstrators of a standard approach within the NERC Environmental Data Service. NERC, 38pp. (OR/25/045) (Unpublished)

Before downloading, please read NORA policies.
[thumbnail of OR25045.pdf]
Preview
Text
OR25045.pdf

Download (2MB) | Preview

Abstract/Summary

The advent of artificial intelligence as a scientific tool is driving a new demand for multidisciplinary data analyses that crosscuts scientific domain boundaries. Developing interoperable solutions to data delivery enables new avenues for scientific investigation. This report summarises work on development of interoperability tools, for environmental research. The NERC Environmental Data Service (EDS) consists of five domain specific data centres supplying data to environmental scientists. The data is findable through data catalogues and web search engines, thanks to decades of collaborative effort implementing standardised discovery metadata. However, the access methods, formats and content of the delivered data are varied, and users need to spend time navigating and understanding these. Data access through Application Programming Interfaces (API) are preferred over bulk data downloads, because they allow programmatic querying and repeatable workflows, and are recommended for access to data that is large, complex or being continuously updated. This project’s detailed aim was a greater level of standardisation of data access APIs across the EDS, with a particular focus on their use in AI and machine learning (ML) applications. This will reduce the effort needed by EDS as data publishers and by environmental researchers as data consumers, saving development time and easing data integration processes. This supports systematic AI analysis of multiple environmental datatypes to underpin development of predictive environmental modelling and digital twins. Through co-design and Agile development processes, we identified and recommended mlcommons Croissant specification as a common standard to help ML consumers interface between data APIs - and bulk download - of any design. Croissant extends existing metadata standards, is understood by web search engines and AI agents to support findability and is integrated into ML python libraries and popular ML platforms to support usability. We created a number of Croissant descriptors from each of the data centres, a new data API, and extensions of metadata APIs to serve croissant metadata. We created demonstrator ML workflow notebooks using the Croissant descriptors and data APIs and ran these on different data science platforms to demonstrate portability. Croissant [26] is a relatively new standard and not built primarily for data access by API or for multi-dimensional spatiotemporal data. We identified areas where croissant and the implementing libraries could work better for these use cases, such as use of the emerging geo-croissant extension, integration with OpenAPI [38] specifications, and support for authenticated data access. At the API implementation level our recommendations were more flexible, and in line with existing EDS practices to use API standards appropriate to the data type (e.g. OGC [37], STAC [43]), and to describe APIs using OpenAPI specification.

Item Type: Publication - Report
UKCEH and CEH Sections/Science Areas: National Capability and Digital Research (2025-)
Funders/Sponsors: British Geological Survey, National Oceanography Centre, UK Centre for Ecology & Hydrology
Additional Information: This item has been internally reviewed, but not externally peer-reviewed.
NORA Subject Terms: Computer Science
Data and Information
Date made live: 27 Jun 2025 10:49 +0 (UTC)
URI: https://nora.nerc.ac.uk/id/eprint/539708

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...