Concurrent Number Cruncher : An Efficient Sparse Linear Solver on the GPU.

Luc Buatois and Guillaume Caumon and Bruno Levy. ( 2007 )

in: Proc. 27th Gocad Meeting, Nancy

Abstract

A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse generalpurpose linear solver based on the Conjugate Gradient algorithm which outperforms by up to a factor of 5.5x leading-edge CPU counterparts (MKL / ACML).

Download / Links

BibTeX Reference

@inproceedings{206_Buatois,
 abstract = { A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where
the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs
with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical
computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give
a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs;
CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS).
However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed
row storage strategies. By combining recent GPU programming techniques with supercomputing
strategies (namely block compressed row storage and register blocking), we implement a sparse generalpurpose
linear solver based on the Conjugate Gradient algorithm which outperforms by up to a factor of
5.5x leading-edge CPU counterparts (MKL / ACML). },
 author = { Buatois, Luc AND Caumon, Guillaume AND Levy, Bruno },
 booktitle = { Proc. 27th Gocad Meeting, Nancy },
 title = { Concurrent Number Cruncher : An Efficient Sparse Linear Solver on the GPU. },
 year = { 2007 }
}

Concurrent Number Cruncher : An Efficient Sparse Linear Solver on the GPU.

Abstract

Download / Links

BibTeX Reference

QuickLinks for Sponsors