Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors
Abstract
Graphics processors (GPUs) are emerging as a promising platform for highly parallel, compute-intensive, general-purpose computations, which usually need support for inter-process synchronization. Using the traditional lock-based
synchronization (e.g. mutual exclusion) makes the computation vulnerable to faults caused by both scientists’ inexperience and hardware transient errors. It is notoriously difficult for scientists to deal with deadlocks when their computation needs to lock many objects concurrently. Hardware transient errors may
make a process, which is holding a lock, stop progressing (or crash). While such hardware transient errors are a non-issue for graphics processors used by graphics computation (e.g. an error in a single pixel may not be noticeable), this no longer holds for graphics processors used for scientific computation. Such scientific
computation requires a fault-tolerant synchronization mechanism. However, most of the powerful GPUs aimed at high-performance computing (e.g. NVIDIA Tesla series) do not support any strong synchronization primitives like test-andset
and compare-and-swap, which are usually used to construct fault-tolerant synchronization
mechanisms. This paper presents an experimental study of fault-tolerant synchronization mechanisms
for NVIDIA’s Compute Unified Device Architecture (CUDA) without the need of strong synchronization primitives in hardware. We implement a lockfree
synchronization mechanism that eliminates lock-related problems like the deadlock and, moreover, can tolerate process crash-failure.We address the experimental issues that arise in the implementation of the mechanism and evaluate its
performance on commodity NVIDIA GeForce 8800 graphics cards.
Publisher
University of TromsøUniversitetet i Tromsø
Citation
IFI-UITØ Technical Report (2012), no.71, 10 ppMetadata
Show full item recordCollections
The following license file are associated with this item: