• Evaluating the performance of the allreduce collective operation on clusters. Approach and results 

      Bongo, Lars Ailo; Anshus, Otto J.; Bjørndalen, John Markus (Research report; Forskningsrapport, 2004)
      The performance of the collective operations provided by a communication library is important for many applications run on clusters. The communication structure of collective operations can be organized as a tree. Performance can be improved by configuring and mapping the tree to the clusters in use. We describe and demonstrate an approach for evaluating the performance of different configurations ...
    • NB-FEB : an easy-to-use and scalable universal synchronization primitive for parallel programming 

      Ha, Hoai Phuong; Tsigas, Philippas; Anshus, Otto J. (Research report; Forskningsrapport, 2008-10)
      This paper addresses the problem of universal synchronization primitives that can support scalable thread synchronization for large-scale many-core architectures. The universal synchronization primitives that have been deployed widely in conventional architectures, are the compare-and-swap (CAS) and load-linked/store-conditional (LL/SC) primitives. However, such synchronization primitives are ...
    • Using a virtual event space to understand parallel application communication behavior 

      Bongo, Lars Ailo; Anshus, Otto J.; Bjørndalen, John Markus (Research report; Forskningsrapport, 2003)
      We have developed EventSpace, a configurable data collecting, management and observation system for monitoring low-level synchronization and communication events with the purpose of understanding the behavior of parallel applications on clusters and multi-clusters. Applications are instrumented by adding data collecting code in the form of event collectors to an applications communication paths. ...