Practical Fault-Tolerance for Mobile Agents
Permanent link
https://hdl.handle.net/10037/3661Date
2011-10-18Type
Doctoral thesisDoktorgradsavhandling
Author
Jacobsen, KjetilAbstract
The amount of computational resources available on the Internet is increasing. Effectively using these resources for distributed computations is challenging. An infrastructure called computational grids provides tools for structuring and deploying large-scale distributed computations on the Internet. One of the key problems in computational grids is managing the available computational resources; tools based on mobile agents are being advocated to solve this problem. However, to be widely adopted, such tools must be robust towards failures in the grid environment, and thus require effective mechanisms for mobile agent fault-tolerance.
To gain insight on how grid applications perform on the Internet, this dissertation investigates two master-worker algorithms, one based on group communication and one based on message flooding. Both algorithms are executed in simulations using Internet communication traces. The results from running and evaluating the algorithms are used to infer requirements for our mobile agent fault-tolerance approach. This dissertation then derives a fault-tolerant mobile agent protocol. The protocol is rooted in the primary-backup approach, where a set of backups monitor the progress of the mobile agent during the computation. The protocol allows the set of backups to be changed during the computation to adapt to the current network topology. The dissertation then describes an implementation of our protocol on top of a mobile agent platform, and evaluates the performance of the protocol. The results show that explicit management of backups can be beneficial to performance, and that our protocol is applicable outside the scope of mobile agent computations. Den tilgjengelige mengden beregningsressurser på Internett øker stadig. Hvordan man forvalter disse ressursene på en mest mulig effektiv måte er enda en utfordring. En infrastruktur som kalles rutenett (computational grids) tilbyr verktøy for å strukturere og vedlikeholde storskala applikasjoner på Internett. Et av nøkkelproblemene i rutenett er hvordan man forvalter de tilgjengelige beregningsressurserne, og mobile agenter er en teknologi som har potensiale til å løse dette problemet. Hensikten med mobile agenter er å flytte dataprogrammer nært dataene de aksesserer under kjøring. For å kunne kjøre effektivt i rutenett krever en løsning basert på mobile agenter mekanismer for feiltoleranse.
For å få innsikt i hvordan beregninger på rutenett oppfører seg, undersøker denne avhandlingen to grunnleggende distribuerte algoritmer. Den første algoritmen er basert på gruppekommunikasjon, mens den andre bruker kringkasting av meldinger som kommunikasjonsmekanisme. Resultatene fra disse analysene brukes som basis for å utlede en protokoll for feiltolerante mobile agenter. Denne protokollen overvåker progresjonen til en mobil agent, og muliggjør eksplisitt tilpassing av overvåkningen for å optimalisere ytelse og robusthet mot kjøringsfeil og kommunikasjonsfeil. Avhandlingen utleder og evaluerer en implementasjon av protokollen i et mobil agent system. Resultatene fra avhandlingen viser at eksplisitt tilpassing av feiltoleranse i forhold til kjøreomgivelsen kan gi bedre ytelse for beregningen. Protokollen har også generell anvendelse i omgivelser som ikke er basert på mobile agenter.
Description
The papers of this thesis are not available in Munin due to publishers' restrictions:
1. Kjetil Jacobsen and Dag Johansen: 'Ubiquitous devices united : enabling distributed computing through mobile code', In Proceedings of the 1999 ACM Symposium on Applied Computing (ACM SAC 1999), pages 399–404. Available at http://dx.doi.org/10.1145/298151.298395
2. Dag Johansen, Keith Marzullo, Fred B. Schneider, Kjetil Jacobsen and Dmitrii Zagorodnov: 'NAP : Practical Fault-Tolerance for Itinerant Computations', In Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (ICDCS,1999), pages 180–189. Available at http://dx.doi.org/10.1109/ICDCS.1999.776519
3. Kjetil Jacobsen, Xianan Zhang and Keith Marzullo: 'Group membership and wide-area master-worker computations', In Proceedings of the 23rd IEEE International Conference on Distributed Computing Systems (ICDCS 2003), pages 570–579. Available at http://dx.doi.org/10.1109/ICDCS.2003.1203508
1. Kjetil Jacobsen and Dag Johansen: 'Ubiquitous devices united : enabling distributed computing through mobile code', In Proceedings of the 1999 ACM Symposium on Applied Computing (ACM SAC 1999), pages 399–404. Available at http://dx.doi.org/10.1145/298151.298395
2. Dag Johansen, Keith Marzullo, Fred B. Schneider, Kjetil Jacobsen and Dmitrii Zagorodnov: 'NAP : Practical Fault-Tolerance for Itinerant Computations', In Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (ICDCS,1999), pages 180–189. Available at http://dx.doi.org/10.1109/ICDCS.1999.776519
3. Kjetil Jacobsen, Xianan Zhang and Keith Marzullo: 'Group membership and wide-area master-worker computations', In Proceedings of the 23rd IEEE International Conference on Distributed Computing Systems (ICDCS 2003), pages 570–579. Available at http://dx.doi.org/10.1109/ICDCS.2003.1203508
Publisher
Universitetet i TromsøUniversity of Tromsø
Metadata
Show full item recordCollections
Copyright 2011 The Author(s)
The following license file are associated with this item: