Comparing and evaluating LLMs for efficient and responsible data rescue

Ferguson, Shona; Cape, J. Neil; Crossley, Alan; Harvey, Frank; Fowler, David; Leaver, David; Braban, Christine F.

Ferguson, Shona ORCID: https://orcid.org/0009-0004-4311-1672; Cape, J. Neil ORCID: https://orcid.org/0000-0002-5538-588X; Crossley, Alan; Harvey, Frank; Fowler, David ORCID: https://orcid.org/0000-0002-2999-2627; Leaver, David ORCID: https://orcid.org/0009-0002-2150-4614; Braban, Christine F. ORCID: https://orcid.org/0000-0003-4275-0152. 2026 Comparing and evaluating LLMs for efficient and responsible data rescue. [Other] In: AI in production, Newcastle Upon Tyne, UK, 4th – 5th June 2026. UK Centre for Ecology & Hydrology. (Unpublished)

[A][B][+][-]

Abstract

Data are vitally important outputs from research, and, alongside answering research questions, they can provide opportunities for new research topics and hypotheses. There is a wealth of inaccessible historical data resources held by research institutions in proprietary and poorly documented formats. Preserving and making these datasets accessible through data rescue is crucial to prevent loss of the data and to support future re-use. However, manual data rescue can be a laborious and time-consuming task. Recent advancements in large language models (LLMs) present an opportunity to increase efficiency in data rescue workflows, yet their suitability for handling scientific tabular data remains under-explored.
This work compares the performance of three commonly used LLMs and evaluates their performance in accelerating a digital data rescue methodology applied to historical cloud and rain chemistry data. The LLMs are compared across a series of structured tasks including prioritisation of data rescue, variable and unit identification, and metadata extraction. Their performance is assessed against a fully manual approach to identify the most significant efficiency gains and to examine risks related to hallucinations, inconsistencies, and loss of scientific context.
Although LLMs can substantially reduce manual effort in data rescue tasks, it is vital to maintain a level of human quality control to ensure accuracy, provenance and reproducibility of important scientific data. This talk provides practical lessons for applying LLMs to real-world scientific data processing, contributing to broader discussions on evaluation, trust, and reliability of foundation models beyond natural language tasks.

Documents