nerc.ac.uk

Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0)

Porter, Andrew R.; Appleyard, Jeremy; Ashworth, Mike; Ford, Rupert W.; Holt, Jason ORCID: https://orcid.org/0000-0002-3298-8477; Liu, Hedong; Riley, Graham D.. 2018 Portable multi- and many-core performance for finite-difference or finite-element codes – application to the free-surface component of NEMO (NEMOLite2D 1.0). Geoscientific Model Development, 11 (8). 3447-3464. https://doi.org/10.5194/gmd-11-3447-2018

Before downloading, please read NORA policies.
[img]
Preview
Text
Acoustic-developments_CRS.pdf

Download (3MB) | Preview
[img]
Preview
Text
gmd-11-3447-2018.pdf
Available under License Creative Commons Attribution 4.0.

Download (10MB) | Preview

Abstract/Summary

We present an approach which we call PSyKAl that is designed to achieve portable performance for parallel finite-difference, finite-volume, and finite-element earth-system models. In PSyKAl the code related to the underlying science is formally separated from code related to parallelization and single-core optimizations. This separation of concerns allows scientists to code their science independently of the underlying hardware architecture and for optimization specialists to be able to tailor the code for a particular machine, independently of the science code. We have taken the free-surface part of the NEMO ocean model and created a new shallow-water model named NEMOLite2D. In doing this we have a code which is of a manageable size and yet which incorporates elements of full ocean models (input/output, boundary conditions, etc.). We have then manually constructed a PSyKAl version of this code and investigated the transformations that must be applied to the middle, PSy, layer in order to achieve good performance, both serial and parallel. We have produced versions of the PSy layer parallelized with both OpenMP and OpenACC; in both cases we were able to leave the natural-science parts of the code unchanged while achieving good performance on both multi-core CPUs and GPUs. In quantifying whether or not the obtained performance is good we also consider the limitations of the basic roofline model and improve on it by generating kernel-specific CPU ceilings.

Item Type: Publication - Article
Digital Object Identifier (DOI): https://doi.org/10.5194/gmd-11-3447-2018
ISSN: 1991-9603
Date made live: 10 Oct 2018 14:35 +0 (UTC)
URI: https://nora.nerc.ac.uk/id/eprint/521162

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...