
A broker or link failure may stall publication flows and leave parts of the network near the failed region in an inconsistent state, such as an incorrect perception of the topology, or invalid entries in the routing tables. In such cases, the PADRES system recovers from the failure back to a correct operational state. Once a broker or link failure is detected, the triggered recovery procedure performs the following actions:
Also cycles in the overlay are exploited to speed up the recovery of publication flows. Cyclic networks include redudant paths, and can improve the network's resiliency to failures since failures do not necessarily result in network partitions. The recovery algorithm uses local information gathered as part of the normal system operation, and seamlessly routes publications around failures. The cycle detection component is also used by the PADRES load balancer to distribute load among brokers.
Reliable publication routing ensures that once a publisher/subscriber routing path is constructed, no publications are lost. The reliable routing algorithm uses the services provided by the regular content-based routing protocols and failure recovery algorithms to maintain an operational routing path between the publishers and subscribers. It tolerates message loss (due to unreliable links or faulty brokers) to provide publication delivery gaurantees.