A Look at the OpenCL 2.0 Execution Model
Benedict Gaster, University of West of England.
A popular approach to programming manycore GPUs is the Single Instruction Multiple Thread (SIMT) abstraction. SIMT has the benefit of presenting a ‘single thread’ view, alleviating the complexity of explicitly vectorizing the source code. However, due to the SIMD nature of the underlying hardware it is often difficult to fully hide all aspects from the developer. An example of ‘leaks’, is OpenCL’s barrier, which requires all workitems (i.e. threads) to reach and execute the ‘same’ barrier. Using a set of examples, sometimes surprisingly, we show that common transformations often performed by traditional scalar compilers are not, in general, valid when applied to OpenCL code containing workgroup (or subgroup) collective operations. Additionally, we introduce a mathematical notion of workgroup and subgroup uniformity and outline an execution model for OpenCL 2.0, which enables these traditional compiler transformations to be applied, even in the presence of collective operations, for all valid OpenCL programs. The model clearly describes when it is valid and when it is not valid to apply these transformations.