dc.contributor.advisor | Anshus, Otto | |
dc.contributor.advisor | Stødle, Daniel | |
dc.contributor.advisor | Stien Hagen, Tor-Magne | |
dc.contributor.author | Nilsen, Arild | |
dc.date.accessioned | 2012-03-19T12:36:14Z | |
dc.date.available | 2012-03-19T12:36:14Z | |
dc.date.issued | 2011-11 | |
dc.description.abstract | To achieve low overhead, traditional cluster monitoring systems sample data at low frequencies and with coarse granularity. However, interactive monitoring requires frequent (up to 60 Hz) sampling of fine-grained data and visualization tools that can explore and display data in near real-time. This makes traditional cluster monitoring systems unsuited for interactive monitoring of distributed cluster applications, as they fail to capture short-duration events, making understanding the performance relationship between processes on the same or different nodes difficult. To address this issue, WallMon was developed, a tool for interactive visual exploration of performance behaviors in distributed systems. For gathering of data, WallMon is centered around an abstraction of collectors and handlers; collectors gathers data of interest, such as CPU and memory usage, and forwards it to handlers in a push-based fashion, while handlers take action upon the data. WallMon captures and visualizes data for every process on every node, as well as overall node statistics. Data is visualized using a technique inspired by the concept of information flocking. WallMon's design is based on the client-server model, and it is extensible through a module system that encapsulates functionality specific to monitoring (collectors) and visualization (handlers). A set of experiments have been carried out on a cluster of 29 nodes with 180 processes per node. Performance results show 7% (of 100) CPU usage at 64 Hz sampling rate when performing process-level monitoring with WallMon. Using WallMon's interactive visualization, we have observed interesting patterns in different parallel and distributed systems, such as unexpected ratio of user- and kernel-level execution among processes in a particular distributed system. | en |
dc.identifier.uri | https://hdl.handle.net/10037/3991 | |
dc.identifier.urn | URN:NBN:no-uit_munin_3713 | |
dc.language.iso | eng | en |
dc.publisher | Universitetet i Tromsø | en |
dc.publisher | University of Tromsø | en |
dc.rights.accessRights | openAccess | |
dc.rights.holder | Copyright 2011 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/3.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) | en_US |
dc.subject.courseID | INF-3990 | en |
dc.subject | VDP::Technology: 500::Information and communication technology: 550 | en |
dc.subject | VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551 | en |
dc.title | WallMon : Interactive distributed monitoring of process-level resource usage on display and compute clusters | en |
dc.type | Master thesis | en |
dc.type | Mastergradsoppgave | en |