Abstract
We present three novel parallel scan algorithms for multi-core CPUs which do not need to fix the number of available cores at the start, and have zero overhead compared to sequential scans when executed on a single core. These two properties are in contrast with most existing parallel scan algorithms, which are asymptotically optimal, but have a constant factor overhead compared to sequential scans when executed on a single core. We achieve these properties by adapting the classic three-phase scan algorithms. The resulting algorithms also exhibit better performance than the original ones on multiple cores. Furthermore, we adapt the chained scan with decoupled look-back algorithm to also have these two properties. While this algorithm was originally designed for GPUs, we show it is also suitable for multi-core CPUs, outperforming the classic three-phase scans in our benchmarks, by better using the caches of the processor at the cost of more synchronisation. In general our adaptive chained scan is the fastest parallel scan, but in specific situations our assisted reduce-then-scan is better.
Original language | English |
---|---|
Pages | 52-61 |
Number of pages | 12 |
DOIs | |
Publication status | Published - 6 Mar 2024 |
Event | 15th International Workshop on Programming Models and Applications for Multicores and Manycores - Edinburgh, United Kingdom Duration: 3 Mar 2024 → 3 Mar 2024 |
Conference
Conference | 15th International Workshop on Programming Models and Applications for Multicores and Manycores |
---|---|
Abbreviated title | PMAM '24 |
Country/Territory | United Kingdom |
City | Edinburgh |
Period | 3/03/24 → 3/03/24 |