Fault-Tolerant Distributed Declarative Programs
Permanent link
https://hdl.handle.net/10037/34244Date
2024-06-02Type
MastergradsoppgaveMaster thesis
Author
Jörg, MoritzAbstract
In our increasingly interconnected digital landscape, the constant generation and consumption of data on various computing devices present challenges for ensuring constant accessibility, particularly in intermittent network scenarios. The emerging focus on distributed systems is aimed at not only managing substantial data volumes but also guaranteeing storage on devices for low latency and high availability. A paradigm known as local-first software prioritize the storage of data on end-user devices as opposed to relying solely on centralized cloud services.
The intersection of Conflict-free Replicated Data Type (CRDT)s and Datalog, exemplified by Consistency as Logical Monotonicity (CALM), establishes that monotonic logic programs guarantee eventual consistency without the need for coordination. This synergy enables robust reasoning about data consistency and parallelism, paving the way for the Partitioned and Replicated Asynchronous Datalog (PRAD) runtime. Transforming sequential Datalog programs into distributed one, PRAD ensures that the distributed program meets the specified availability, parallel, and fault-tolerance requirements. To achieve this, PRAD augments Datalog programs with semiring data provenance and equips the provenance expressions with a CRDT.
One limitation of the current PRAD runtime is that it lacks a recovery mechanism. In the event that a site going offline or crashing, the data on that site is lost and the system’s fault-tolerance is compromised. To address this issue, a repair mechanism can be implemented to restore or replicate the lost site. This comes with the added benefit of increasing the systems fault-tolerance and availability.
The main contribution of this thesis is the development of a novel approach to repairing a PRAD program at runtime. The repair mechanism is designed to restore an offline site by leveraging our Lightweight Commit (LWC)s with the help of the Causal Length Set (Cl-Set) CRDT. It is draws inspiration from the Git and Pijul distributed version control systems, and applying their principles to the PRAD runtime. The repair mechanism is designed to be lightweight and efficient, ensuring that the system can repair failures without compromising performance. Our approach differs from previous work in that we do not rely on vector clocks or sequence numbers. Instead, we utilize the Cl-Set CRDT to accommodate messages that may be delivered out of order or are duplicated.
The approach is evaluated through a series of experiments, in which the performance of the repair mechanism is compared to that of the existing PRAD runtime and multiple alternative approaches. The results demonstrate that the repair mechanism is both efficient and lightweight, and that it can restore a site in a reasonable amount of time, thereby confirming the viability of our approach for edge devices.
Publisher
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Show full item recordCollections
Copyright 2024 The Author(s)
The following license file are associated with this item: