GPU-Accelerated Spectral Methods in Python: A Case Study

Jamie Quinn

University College London

7th SeptembRSE 2021

Melvin.py

A pseudo-spectral, GPU-accelerated framework for numerically solving 2D problems in fluids.

github.com/jamiejquinn/melvin.py

Why Python?

  • Easy prototyping
  • Fast iteration cycle
  • Collective good practice
  • Access to accelerated libraries
  • Students want to learn & use Python

It’s a great language for prototyping research codes & teaching

Easy prototyping

  • spectral/FDM spatial discretisation
  • periodic/Dirichlet boundary conditions
  • 2nd/4th order spatial derivatives for nonlinear advection operator
  • 2nd/4th order Adams-Bashforth time integration
  • Implicit handling of diffusion operator

Designed to be extensible

Fast iteration cycle

  1. Jupyter notebook
  2. Collection of scripts
  3. Fully-fledged Python package

Permits refinement of methods and interfaces

Collective good practices

  • Version control with git & Github
  • Parameterisation with JSON
  • Community contributions through Github
  • Automated with Github Actions
    • unit testing with pytest
    • static analysis with Flake and Black

Acceleration: CuPy

#import numpy as np
import cupy as np

spectral_arr = np.fft(physical_arr)
  • GPU-accelerated Numpy
  • Access to multi-GPU
  • Easy integration with other libraries
  • Can profile using NVIDIA’s Nsight

Alternatives to CuPy

  • Jax - Google’s numpy replacement (differentiation!)
  • Legate - NVIDIA’s numpy replacement
  • CUDA Python - NVIDIA’s low-level
  • PyCUDA - ind low-level
  • Numba - JIT compiler (+ CUDA kernels)

Either interchangeable or interoperable

Results

Validation & Performance

Test: 1000 timesteps of Taylor-Green vortex

  • double precision
  • 4th order time (diffusion 1st order)
  • 4th order space

Config Wallclock (s) Error (%)
CPU (numpy) 143.34 0.1005
GPU (cupy, GTX 1060) 11.01 0.1005
time order 4 -> 2 10.32 0.1005
spatial order 4 -> 2 9.10 0.1005
double -> single 5.70 0.1007
All “speedups” 4.43 0.1007

Lessons learned

  • Non-compiled = unsafe -> static analysis!!
  • Non-compiled = slow -> offload as much as possible
  • Speed is “good enough”
  • Flexibility is worth it
  • Numpy -> CuPY -> Numba -> CUDA python

Accelerated C++?

  • OpenMP, OpenACC, std::par, CUDA
  • C++20 nearly as simple as Python?
  • Melvin.py is ~1.5x runtime of C++ CUDA version

Julia?

  • Offers impressive performance with Pythonic ease
  • Very modern ecosystem
  • Still young
  • New paradigm
  • Few opportunities beyond HPC/ML (for now!)

Python is rarely as fast as Fortran & C++

but its flexibility makes it an ideal language for

  • prototyping GPU-accelerated numerical codes
  • teaching numerical methods (particularly in HPC)

Thanks! Questions?