Cogset : A High-Performance MapReduce Engine
Permanent link
https://hdl.handle.net/10037/3817Date
2012-01-30Type
Doctoral thesisDoktorgradsavhandling
Author
Viken Valvåg, SteffenAbstract
MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mechanism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store temporary
copies of intermediate data, but requires a tighter coupling between the components for storage and processing. The initial intuition motivating our work is that reading and writing less temporary data could improve performance, while the tight coupling of storage and processing could be leveraged to improve data locality.
We therefore conjecture that a high-performance MapReduce engine can be based on static routing, while preserving the non-functional properties associated with traditional engines. To investigate this thesis, we design, implement, and experiment with Cogset, a distributed MapReduce engine that deviates considerably from the traditional design.
We evaluate the performance of Cogset by comparing it to a widely used traditional MapReduce engine using a previously established benchmark. The results confirm our thesis that a high-performance MapReduce engine can be based on static routing, although analysis
indicates that the reasons for Cogset's performance improvements are more subtle than expected. Through our work we develop a better understanding of static routing, its benefits and limitations, and its ramifications for a MapReduce engine.
A secondary goal of our work is to explore how higher-level abstractions that are commonly built on top of MapReduce will interact with an execution engine based on static routing. Cogset is therefore designed with a generic, low-level core interface, upon which MapReduce is implemented as a relatively thin layer, as one of several supported programming interfaces.
At its core, Cogset provides a few fundamental mechanisms for reliable and distributed storage of data, and parallel processing of statically partitioned data. While this dissertation mainly
focuses on how these capabilities are leveraged to implement a distributed MapReduce engine, we also demonstrate how two other higher-level abstractions were built on top of Cogset. These may serve as alternative access points for data-intensive applications, and illustrate how some of the lessons learned from Cogset can be applicable in a broader context.
Description
The papers of this thesis are not available in Munin:
1. Steffen Viken Valvåg and Dag Johansen: 'Oivos : simple and efficient distributed data processing' (2008). In Proceedings of the 2008 Tenth IEEE International Conference on High Performance Computing and Communications (HPCC 2008), pages 113– 122. IEEE Computer Society. Available at http://dx.doi.org/10.1109/HPCC.2008.105
2. Steffen Viken Valvåg and Dag Johansen: 'Update Maps : a new abstraction for High-Throughput Batch processing' (2009). In Proceedings of the 2009 IEEE International Conference on Networking, Architecture, and Storage (NAS 2009), pages 431–438. IEEE Computer Society. Available at http://dx.doi.org/10.1109/NAS.2009.73
3. Steffen Viken Valvåg and Dag Johansen: 'Cogset : a unified engine for reliable storage and parallel processing' (2009). In Proceedings of the 2009 Sixth IFIP International Conference on Network and Parallel Computing (NPC 2009), pages 174– 181. IEEE Computer Society. Available at http://dx.doi.org/10.1109/NPC.2009.23
4. Steffen Viken Valvåg, Dag Johansen, and Åge Kvalnes: 'Cogset vs. Hadoop : measurements and analysis', (2010). In Proceedings of the 2010 Second IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2010), pages 768–775. IEEE Computer Society. Available at http://dx.doi.org/10.1109/CloudCom.2010.103
1. Steffen Viken Valvåg and Dag Johansen: 'Oivos : simple and efficient distributed data processing' (2008). In Proceedings of the 2008 Tenth IEEE International Conference on High Performance Computing and Communications (HPCC 2008), pages 113– 122. IEEE Computer Society. Available at http://dx.doi.org/10.1109/HPCC.2008.105
2. Steffen Viken Valvåg and Dag Johansen: 'Update Maps : a new abstraction for High-Throughput Batch processing' (2009). In Proceedings of the 2009 IEEE International Conference on Networking, Architecture, and Storage (NAS 2009), pages 431–438. IEEE Computer Society. Available at http://dx.doi.org/10.1109/NAS.2009.73
3. Steffen Viken Valvåg and Dag Johansen: 'Cogset : a unified engine for reliable storage and parallel processing' (2009). In Proceedings of the 2009 Sixth IFIP International Conference on Network and Parallel Computing (NPC 2009), pages 174– 181. IEEE Computer Society. Available at http://dx.doi.org/10.1109/NPC.2009.23
4. Steffen Viken Valvåg, Dag Johansen, and Åge Kvalnes: 'Cogset vs. Hadoop : measurements and analysis', (2010). In Proceedings of the 2010 Second IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2010), pages 768–775. IEEE Computer Society. Available at http://dx.doi.org/10.1109/CloudCom.2010.103
Publisher
Universitetet i TromsøUniversity of Tromsø
Metadata
Show full item recordCollections
Copyright 2012 The Author(s)
The following license file are associated with this item:
Related items
Showing items related by title, author, creator and subject.
-
Implementing an electronic health record in a Nigerian secondary healthcare facility. Prospects and challenges
Attah, Ambrose Ojadale (Master thesis; Mastergradsoppgave, 2017-11-02)Nigeria is witnessing continuing advocacy and increase in number of individuals yearning for computerization of health information and healthcare processes. However, little is known about the opinions of the diverse healthcare providers who would ensure the successful implementation and meaningful use of health information technology in the country (Adeleke, Erinle et al. 2015). This study explores ... -
Geometric Modeling- and Sensor Technology Applications for Engineering Problems
Pedersen, Aleksander (Doctoral thesis; Doktorgradsavhandling, 2020-10-20)In applications for technical problems, Geometric modeling and sensor technology are key in both scientific and industrial development. Simulations and visualization techniques are the next step after defining geometry models and data types. This thesis attempts to combine different aspects of geometric modeling and sensor technology as well as to facilitate simulation and visualization. It includes ... -
Latency Optimized Microservice Architecture for Privacy Preserving Computing
Magnussen, Nikolai Åsen (Master thesis; Mastergradsoppgave, 2019-06-01)Recent developments in microservices architecture and building have lead to the advent of unikernels, a specialized operating system kernel coupled with, and executing only, a single application. This thesis presents PPCE a distributed system utilising a microservices architecture based on unikernels, created to enable privacy-preserving computing for users, classes of users, and more importantly; ...