Benedict R. Gaster
Computer Sciences Technical Report 2014-02
A popular approach to programming manycore GPUs is the Single Instruction Multiple Thread (SIMT) abstraction. SIMT has the benefit of presenting a “single thread” view, alleviating the complexity of explicitly vectorizing the source code. However, due to the SIMD nature of the underlying hardware it is often difficult to fully hide all aspects from the developer. An example of “leaks”, is OpenCL’s barrier, which requires all workitems (i.e. threads) to reach and execute the “same” barrier.
But what does it mean to reach and execute the same barrier? OpenCL provides very little information about the underlying semantics. In this paper we describe a simple execution model for OpenCL 2.0 that captures precisely the semantics of operations like barrier and the more advanced features of subgroups, recently introduced to expose SIMD in a portable manner.