The design and correctness of a communication facility for a distributed computer system
are reported on. The facilty provides support for fault-tolerant process groups in the
form of a family of reliable multicast protocols that can be used in both local- and
wide-area networks. These protocols attain high levels of concurrency, while respecting
application-specific delivery ordering constraints, and have varying cost and performance
that depend on the degree of ordering desired. In particular, a protocol that enforces
causal delivery orderings is introduced and shown to be a valuable alternative to
conventional asynchronous communication protocols. The facilty also ensures that the
processes belonging to a fault-tolerant process group will observe consistent orderings
of events affecting the group as a whole, including process failures, recoveries,
migration, and dynamic changes to group properties like member rankings. A review of
several uses for the protocol in the ISIS system, which supports fault-tolerant resilient
objects and bulletin boards, illustrates the significant simplication of higher level
algorithms made possible by our approach.