δ-Fault-tolerant Publish/Subscribe Systems

Reza Sherafat and Hans-Arno Jacobsen.

CSRG-570, Middleware Systems Research Group, University of T, November 2007.

Abstract

In this paper, we study reliable distributed publish/subscribe (P/S) systems that can “tolerate” multiple simultaneous node crash failures. We formally define a routing consistency property, and propose scalable algorithms that establish and maintain consistency in order to guarantee reliable, in-order, and duplicate-free delivery of messages. Furthermore, we introduce a system configuration parameter, , that corresponds to the maximum number of simultaneous node failures that do not compromise P/S reliability guarantees or prevent system operations. This is achieved via replication of routing information in a fully decentralized manner compatible with the multicast nature of distributed P/S systems. Moreover, we assume node failures are transient and devise algorithms that allow failed nodes to “recover” by synchronizing with other nodes. A recovered node acts as if it has never failed before, and can fully participate in message forwarding.

Download



Readers who enjoyed the above work, may also like the following: