Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors

Hagen, Tor-Magne Stien; Ha, Hoai Phuong; Anshus, Otto

Permanent lenke

https://hdl.handle.net/10037/4953

Åpne

article.pdf (447.6Kb)

(PDF)

Dato

2012

Type

Research report
Forskningsrapport

Forfatter

Hagen, Tor-Magne Stien; Ha, Hoai Phuong; Anshus, Otto

Sammendrag

Graphics processors (GPUs) are emerging as a promising platform for highly parallel, compute-intensive, general-purpose computations, which usually need support for inter-process synchronization. Using the traditional lock-based synchronization (e.g. mutual exclusion) makes the computation vulnerable to faults caused by both scientists’ inexperience and hardware transient errors. It is notoriously difficult for scientists to deal with deadlocks when their computation needs to lock many objects concurrently. Hardware transient errors may make a process, which is holding a lock, stop progressing (or crash). While such hardware transient errors are a non-issue for graphics processors used by graphics computation (e.g. an error in a single pixel may not be noticeable), this no longer holds for graphics processors used for scientific computation. Such scientific computation requires a fault-tolerant synchronization mechanism. However, most of the powerful GPUs aimed at high-performance computing (e.g. NVIDIA Tesla series) do not support any strong synchronization primitives like test-andset and compare-and-swap, which are usually used to construct fault-tolerant synchronization mechanisms. This paper presents an experimental study of fault-tolerant synchronization mechanisms for NVIDIA’s Compute Unified Device Architecture (CUDA) without the need of strong synchronization primitives in hardware. We implement a lockfree synchronization mechanism that eliminates lock-related problems like the deadlock and, moreover, can tolerate process crash-failure.We address the experimental issues that arise in the implementation of the mechanism and evaluate its performance on commodity NVIDIA GeForce 8800 graphics cards.

Forlag

University of Tromsø
Universitetet i Tromsø

Sitering

IFI-UITØ Technical Report (2012), no.71, 10 pp

Metadata

Vis full innførsel

Samlinger

Artikler, rapporter og annet (informatikk) [486]

Følgende lisensfil er knyttet til denne innførselen:

Original lisens