COSC6117, Fall 2003
COSC6117
Theory of Distributed Computing
Fall 2003
Instructor: Eric Ruppert
Office: Computer Science Building, room 3042
Telephone: (416) 736-2100 ext. 33979
Facsimile: (416) 736-5872
Lectures: MW 14:30-16:00 in room 126 of Winters College
Office hours: send me email to make an appointment, or just drop by
my office whenever I'm in.
Announcements
- Nov 10: The deadline for Exercise 5 was extended to Nov 17.
- Nov 5: The deadline for Exercise 4 was extended to Nov 10.
- Oct 7: For Exercise 1, you can assume
processes (and shared variables) have
unique names.
- Sept 18: I put some information about the presentation below.
- Sept 18: I will be away the week of Sept 29, so there will be no
lecture on Sept 29 or Oct 1. Instead we will
have extra meetings on Wed Dec 3 and Mon Dec 8
for student presentations. Also note that Oct 6 and Oct 13 are
holidays, so there will be no classes then.
- Sept 17: Starting Sept 22, classes will be in room 126 of
Winters College.
- Sept 8: Because of all the last-minute timetable changes, today's
(shortened) class will be 15:00-16:00 in S501 Ross. This is to make
sure that nobody misses the class because they
looked at an older version of the timetable.
Course Description
Can a given problem be solved in a distributed system? If so, how
efficiently can it be solved? We investigate how the answers to these
questions depend on aspects of the underlying distributed system
including synchrony, fault-tolerance and the means of communication
between processes.
A tentative list of topics:
- shared-memory and message-passing models of distributed systems,
- I/O automata,
- mutual exclusion,
- agreement problems (consensus, leader-election, Byzantine agreement, approximate agreement),
- broadcast and multicast algorithms,
- impossibility results and lower bounds,
- the consensus hierarchy,
- implementing shared data structures,
- randomization in distributed computing,
- self-stabilization, and
- applications of topology to distributed computing.
Marking scheme
| Homework exercises | 80% |
| Class presentation | 20% |
The presentation will be a 25-minute talk summarizing the results of a
research paper on the theory of distributed computing that you find in
the literature. Before you start working on this, you should check
with me that the paper you choose is appropriate. You will also have
to hand in a short (~2 pages) written
summary of what you are presenting. Good places to look for
a paper include
PODC
or
DISC
conference proceedings or the journal
Distributed Computing.
This survey paper has a bibliography containing lots of papers that would be suitable to choose.
If you have a topic in mind, I might be able to help you find a
good paper if you come talk to me.
Lectures
The references below are intended for students who want to read more about the topics discussed in class. Sometimes the readings might be helpful for the assignments. Sometimes they will extend the ideas covered in lectures.
- Sep 8: Introduction. Impossibility of Two Generals Problem.
- Sep 10: Dijkstra's mutual exclusion algorithm. Overview of
models of distributed computing. Reading: [Dij65] and perhaps [LL90]
or [Lyn96].
- Sep 15: Complexity measures. I/O automata.
Flooding in a MP system to do broadcasts and to set up spanning tree.
References: [Lyn96] for I/O automata; [AW98], Sections 2.1-2.3 for
broadcasting.
- Sep 17: Broadcasting, continued. Minimum Spanning Tree
algorithm. Reference [Lyn96], Section 4.4.
- Sep 22: Finishing up the MST algorithm. Leader Election in a ring.
Reference [AW98], Sections 3.1-3.3.
- Sep 24: Leader Election lower bound. Reference [AW98], Section 3.3.3.
Anonymous randomized leader election: expected O(n^2) algorithm
when n known; impossibility result when n unknown. Reference [AW98], Section 14.1; [Ang80].
- Oct 8: Consensus with Byzantine failures. References for
impossibility [AW98, Sec 5.2.3; Lyn96, Sec 6.4, 6.5] and for algorithm [BG93 or AW98, Sec
5.2.5].
- Oct 15: Consensus with Byzantine failures, continued. Algorithm for
synchronous consensus tolerating halting failures. [AW98], Sections
5.1.1-5.1.3
- Oct 20: Lower bound on time complexity of synchronous consensus
tolerating halting failures. [AW98], Section 5.1.4, or [Lyn96], Section
6.7. (The original proof of the lower bound was in [DM90].)
- Oct 22: Asynchronous consensus: impossibility results. References
[FLP85] or [AW98], Section 5.3, or [Lyn96], Chapter 12.
- Oct 27: Impossibility of consensus using registers for 1 halting failure.
[FLP85] or [Lyn96], Chapter 12. More general shared-memory systems, linearizability, implementations. [HW90], [AW98] Chapter 9, or [Lyn96] Chapters 9 and 13.1.
- Oct 29: The consensus hierarchy [Her91].
- Nov 3: Implementing counters and snapshots. [AADGMS93] or [Lyn96], Sections 13.3.1, 13.3.2.
- Nov 5: Snapshots, continued. A universal construction. [Her91].
- Nov 10: Universal construction, continued. Swmr register from swsr register [IL93], [AW98] chapter 10.2.2.
- Nov 12: Continuing discussion of building registers.
- Nov 17: Clock synchronization. [AW98] Chapter 6.3.
- Nov 19: Clock synchronization. References [HMM85] for "radius" algorithm, [BW01] for diameter/2 lower bound.
- Nov 24: Clock synchronization with Byzantine failures [LM85].
- Nov 26: Renaming. [AW98], Chapter 17.3 and [MA94].
- Dec 1: Order-preserving renaming [ABDPR90]. Conclusions.
Then 1 presentation: Sergey Kulikov (Obstruction-free Synchronization: Double-ended Queues as an Example).
- Dec 2: 3 presentations. Irena Yasnitsky (On the Generalized Dining Philosophers Problem), Anton Belov (Computing on an Anonymous Ring), Frank Xu (Fast randomized consensus using shared memory).
- Dec 3: 3 presentations. Baoqing Guan (Lock-Free Linked Lists Using Compare-and-Swap), Xing Xiong (Leader Election in Uniform Rings), Frank Tan (How to play ANY mental game).
- Dec 8: 3 presentations. Kenn Baker (Clustering Algorithms for Wireless Ad Hoc Networks), Kien Kim Huynh (Leader election in mobile ad hoc networks), Wei Ning (Process Synchronization: Design and Evaluation of Distributed Algorithms).
References
There is no required textbook for the course. However, I shall sometimes recommend readings from books or papers. These references will be listed here, and the list will grow during the term. Accessing some of the links below may require you to be logged into a machine at York, so that you can access the ACM Digital Library, etc.
Books (on reserve in Science Library)
Papers
- [AADGMS93] Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt and Nir Shavit. Atomic snapshots of shared memory. J. of the ACM, 40(4), pages 873-890, September 1993.
- [Ang80] D. Angluin. Local and global properties in networks of processors. In Proc. 12th ACM Symposium on Theory of Computing, pages 82-93, 1980.
- [ABDPR90] Hagit Attiya, Amotz Bar-Noy, Danny Dolev, David Peleg and R\uuml;diger Reischuk. Renaming in an asynchronous environment. J. of the ACM, 37(3), pages 524-548, July 1990.
- [AGPV90] Baruch Awerbuch, Oded Goldreich, David Peleg and Ronen Vainish. A trade-off between information and communication in broadcast protocols. J. of the ACM, 37(2), pages 238-256, April 1990.
- [BG93] P. Berman and J. Garay. Cloture votes: n/4-resilient distributed consensus in t+1 rounds. Mathematical Systems Theory 26(1), pages 3-19, 1993.
- [BW01] Saâd Biaz and Jennifer L. Welch. Closed form bounds for clock synchronization under simple uncertainty assumptions. Information Processing Letters, 80, pages 151-157, 2001.
- [Dij65] E. W. Dijkstra. Solution of a problem in concurrent programming control . Communications of the ACM, 8(9), page 569, September, 1965. A very early paper on mutual exclusion.
- [DM90] Cynthia Dwork and Yoram Moses. Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2), pages 156-186, October 1990.
- [FLP85] Michael J. Fischer, Nancy A. Lynch and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), pages 374-382, April 1985.
- [HMM85] Joseph Y. Halpern, Nimrod Megiddo and Ashfaq A. Munshi. Optimal precision in the presence of uncertainty. Journal of Complexity, 1, pages 170-196, 1985.
- [Her91] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1), pages 124-149, January 1991.
- [HW90] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3), pages 463-492, July 1990.
- [IL93] Amos Israeli and Ming Li. Bounded time-stamps. Distributed Computing, 6(4), pages 205-209, July, 1993.
- [LL90] Leslie Lamport and Nancy Lynch. Distributed Computing: Models and methods. In Handbook of Theoretical Computer Science, Volume B, Chapter 18, Elsevier, 1990. (On reserve in Steacie Library.) A good, brief introduction to the area. It describes aspects of distributed models and surveys some important early results.
- [LM85] Leslie Lamport and P. M. Melliar-Smith. Synchronizing clocks in the presence of faults. J. of the ACM, 32(1), pages 52-78, 1985.
- [MA94] Mark Moir and James H. Anderson. Fast, Long-Lived Renaming. In Proc. of the 8th International Workshop on Distributed Algorithms, pages 141-155, 1994. (There have been several better versions of adaptive renaming algorithms published afterwards, but I just want to cover the most basic version of the splitter algorithm. See Mark Moir's page for some of the improvements.)
Web Pages
This course had a web page the last time it was taught.
Exercises

Updated November 27, 2003