Now showing items 1-10 of 13
NB-FEB : an easy-to-use and scalable universal synchronization primitive for parallel programming
(Research report; Forskningsrapport, 2008-10)
This paper addresses the problem of universal synchronization primitives that can support scalable thread synchronization for large-scale many-core architectures. The universal synchronization primitives that have been deployed widely in conventional architectures, are the compare-and-swap (CAS) and load-linked/store-conditional (LL/SC) primitives. However, such synchronization primitives are ...
Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors
(Research report; Forskningsrapport, 2012)
Graphics processors (GPUs) are emerging as a promising platform for highly parallel, compute-intensive, general-purpose computations, which usually need support for inter-process synchronization. Using the traditional lock-based synchronization (e.g. mutual exclusion) makes the computation vulnerable to faults caused by both scientists’ inexperience and hardware transient errors. It is notoriously ...
DeltaTree: A Practical Locality-aware Concurrent Search Tree
(Research report; Forskningsrapport, 2013)
As other fundamental programming abstractions in energy-e cient computing, search trees are expected to support both high parallelism and data locality. However, existing highly-concurrent search trees such as red-black trees and AVL trees do not consider data locality while existing locality-aware search trees such as those based on the van Emde Boas layout (vEB-based trees), poorly support ...
DeltaTree: A Locality-aware Concurrent Search Tree
(Journal article; Tidsskriftartikkel; Peer reviewed, 2015-06-15)
Like other fundamental abstractions for high-performance computing, search trees need to support both high concurrency and data locality. However, existing locality-aware search trees based on the van Emde Boas layout (vEB-based trees), poorly support concurrent (update) operations. We present DeltaTree, a practical locality-aware concurrent search tree that integrates both locality-optimization ...
Evaluation of the power efficiency of UPC, OpenMP and MPI
(Research report; Forskningsrapport, 2015)
In this study we compare the performance and power efficiency of Unified Parallel C (UPC), MPI and OpenMP by running a set of kernels from the NAS Benchmark. One of the goals of this study is to focus on the Partitioned Global Address Space (PGAS) model, in order to describe it and compare it to MPI and OpenMP. In particular we consider the power effi- ciency expressed in millions operations ...
GreenBST: Energy-efficient concurrent search tree
(Conference object; Konferansebidrag, 2016-08-09)
Like other fundamental abstractions for energy-efficient com- puting, search trees need to support both high concurrency and fine- grained data locality. However, existing locality-aware search trees such as ones based on the van Emde Boas layout (vEB-based trees), poorly support concurrent (update) operations while existing highly-concurrent search trees such as the non-blocking binary search ...
Implementing and optimizing a Sparse Matrix-Vector Multiplication with UPC
(Research report; Forskningsrapport, 2016)
Programmability and performance-per-watt are the major challenges of the race to Exascale. In this study we focus on Partitioned Global Address Space (PGAS) languages, using UPC as a particular example. This category of parallel languages provides ease of programming as a strong advantage over the classic Message Passing Interface(MPI). PGAS has also advantages compared to classic shared memory ...
On the performance and energy efficiency of the PGAS programming model on multicore architectures
(Journal article; Tidsskriftartikkel; Peer reviewed, 2016-09-15)
Access control protocol with node privacy in wireless sensor networks
(Journal article; Tidsskriftartikkel; Peer reviewed, 2016-11-15)
For preventing malicious nodes joining wireless sensor networks (WSNs), an access control mechanism is necessary for the trustworthy cooperation between the nodes. In addition to access control, recently, privacy has been an important topic regarding how to achieve privacy without disclosing the real identity of communicating entities in the WSNs. Based on elliptic curve cryptography, in this paper, ...
Performance optimization and modeling of fine-grained irregular communication in UPC
(Journal article; Tidsskriftartikkel; Peer reviewed, 2019-03-03)
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory subsystems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, ...