The many-core revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving many-core architectures.High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. A major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address many-core parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns.
The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse many-core architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this tutorial we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. The Kokkos library is under active research and development to incorporate capabilities from new generations of many-core architectures, and to address a growing list of applications and domain libraries.