Derek R. Hower, Bradford M. Beckmann, Benedict R. Gaster, Blake A. Hechtman, Mark D. Hill, Steven K. Reinhardt, and David A. Wood
ASPLOS’14 . The International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). March 2014.
Commodity heterogeneous systems (e.g., integrated CPUs and GPUs), now support a unified, shared memory address space for all components. Because the latency of global communication in a heterogeneous system can be prohibitively high, these systems (unlike homogeneous CPU system) provide synchronization mechanisms that only guarantee ordering among a subset of threads, which we call a scope. Unfortunately, the consequences and semantics of these scoped operations are not yet well understood. With- out a formal and approachable model to reason about the behavior of these operations, we risk an array of portability and performance issues.
In this paper, we embrace scoped synchronization with a new class of memory consistency models that add scoped synchronization to data-race-free models like those of C++ and Java. Called Sequential Consistency for Heterogeneous- Race-Free (SC for HRF), the new models guarantee SC for programs with “sufficient” synchronization (no data races) of “sufficient” scope. We discuss two such models. The first, HRF-direct, works well for programs with highly regular parallelism. The second, HRF-indirect, builds on HRF- direct by allowing synchronization using different scopes in some cases involving transitive communication. We quantitatively show that HRF-indirect encourages forward-looking programs with irregular parallelism by showing up to a 10% performance increase in a task runtime for GPUs.