Vis enkel innførsel

dc.contributor.advisorHa, Phuong Hoai
dc.contributor.advisorBongo, Lars Ailo
dc.contributor.advisorAnshus, Otto
dc.contributor.authorHenriksen, Torje Starbo
dc.date.accessioned2009-01-26T11:42:02Z
dc.date.available2009-01-26T11:42:02Z
dc.date.issued2008-10-15
dc.description.abstractThe microprocessor industry has reached limitations of sequential processing power due to power-efficiency and heat problems. With the integrated-circuit technology moving forward, chip-multithreading has become the trend, increasing parallel processing power. The shift of focus has resulted in the vast majority of supercomputers having chip-multiprocessors. While the high performance computing community has long written parallel applications using libraries such as MPI, the performance-characteristics have changed from the traditional uni-core cluster, to the current generation of multi-core clusters; more communication is between processes on the same node, and processes run on cores sharing hardware resources, such as cache and memory bus. We explore the possibilities of optimizing a widely used MPI implementation, Open MPI, to minimize communication overhead for communication between processes running on a single node. We take three approaches for optimization: First we measure the message-passing latency between the different cores, and reduce latency for large messages by keeping the sender and receiver synchronized. Second, we increase scalability by using two new queue-designs, reducing the number of communication queues that need to be polled to receive messages. Third, we experiment with mapping a parallel application to different cores, using only a single node. The mapping is done dynamically during runtime, with no prior knowledge of the application's communication pattern. Our results show that for large messages sent between cores sharing cache, message-passing latency can be significantly reduced. Results from running the NAS Parallel Benchmarks using the new queue-designs show that Open MPI can increase its scalability when running more than 64 processes on a single node. Our dynamic mapper performs close to our manual mapping, but rarely increases performance. We see from the experimental results, that the three techniques give performance increase in different scenarios. Combining techniques like these with other techniques, can be a key to unlocking the parallel performance for a broader range of parallel applications.en
dc.format.extent1830164 bytes
dc.format.extent2072 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypetext/plain
dc.identifier.urihttps://hdl.handle.net/10037/1735
dc.identifier.urnURN:NBN:no-uit_munin_1504
dc.language.isoengen
dc.publisherUniversitetet i Tromsøen
dc.publisherUniversity of Tromsøen
dc.rights.accessRightsopenAccess
dc.rights.holderCopyright 2008 The Author(s)
dc.subject.courseIDINF-3981nor
dc.subjectVDP::Mathematics and natural science: 400::Information and communication science: 420::Theoretical computer science, programming languages and programming theory: 421en
dc.subjectMPIen
dc.titleEfficient intra-node Communication for Chip Multiprocessorsen
dc.typeMaster thesisen
dc.typeMastergradsoppgaveen


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel