Vis enkel innførsel

dc.contributor.authorBongo, Lars Ailo
dc.contributor.authorAnshus, Otto J.
dc.contributor.authorBjørndalen, John Markus
dc.date.accessioned2006-11-28T07:48:30Z
dc.date.available2006-11-28T07:48:30Z
dc.date.issued2004
dc.description.abstractThe performance of the collective operations provided by a communication library is important for many applications run on clusters. The communication structure of collective operations can be organized as a tree. Performance can be improved by configuring and mapping the tree to the clusters in use. We describe and demonstrate an approach for evaluating the performance of different configurations and mappings of allreduce run on clusters of different size, consisting of single-CPU hosts, and SMPs with a different number of CPUs. A breakdown of the cost of allreduce using the best configuration on different clusters is provided. For all, the broadcast part is more expensive than the reduce part. Inter-host communication contributes more to the time per allreduce than the synchronization in the allreduce components. For the small messages sizes used (4 and 256 bytes), the time spent computing the partial reductions is insignificant. Reconfiguring hierarchy aware trees improved performance up to a factor of 1.49, by avoiding scalability problems of the components on SMPs, and by finding the right balance between available concurrency, load on 'root' hosts and the number of network links in a tree. Extending a tree by adding more threads, or by combining two trees does not have a negative influence on the performance of a configuration, but increasing message size does.en
dc.format.extent152346 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10037/372
dc.identifier.urnURN:NBN:no-uit_munin_217
dc.language.isoengen
dc.publisherUniversitetet i Tromsøen
dc.publisherUniversity of Tromsøen
dc.relation.ispartofseriesTekniske rapporter / Institutt for informatikk 48(2004)en
dc.rights.accessRightsopenAccess
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420en
dc.subjectVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550en
dc.titleEvaluating the performance of the allreduce collective operation on clusters. Approach and resultsen
dc.typeResearch reporten
dc.typeForskningsrapporten


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel