MulticoreBSP for C: A high-performance library for shared-memory parallel programming

A. N. Yzelman*, R. H. Bisseling, D. Roose, K. Meerbergen

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the present article, we further investigate this concept and introduce the new high-performance MulticoreBSP for C library. Among other features, this library supports nested BSP runs. We show that existing BSP software performs well regardless whether it runs on distributedmemory or shared-memory architectures, and show that applications in MulticoreBSP can attain high-performance results. The paper details implementing the Fast Fourier Transform and the sparse matrix-vector multiplication in BSP, both of which outperform state-of-the-art implementations written in other shared-memory parallel programming interfaces. We furthermore study the applicability of BSP when working on highly non-uniform memory access architectures.

Original languageEnglish
Pages (from-to)619-642
Number of pages24
JournalInternational Journal of Parallel Programming
Issue number4
Publication statusPublished - 1 Jan 2014


  • Bulk synchronous parallel
  • Fast Fourier transform
  • High-performance computing
  • Shared-memory parallel programming
  • Software library
  • Sparse matrix-vector multiplication


Dive into the research topics of 'MulticoreBSP for C: A high-performance library for shared-memory parallel programming'. Together they form a unique fingerprint.

Cite this