COSC6117, Fall 2004
COSC6117
Theory of Distributed Computing
Fall 2004
Instructor: Eric Ruppert
Office: Computer Science Building, room 3042
Telephone: (416) 736-2100 ext. 33979
Facsimile: (416) 736-5872
Lectures: MW 16:00-17:30 in room 125 of Winters College
Office hours: send me email to make an appointment, or just drop by
my office whenever I'm in.
Announcements
- Dec 7: The class on Dec 9 will be held in CSEB3033 from 1:30 to 3:30.
- Dec 2: It's a busy time of year, so you can hand in exercise 8/9 on Dec 13 if you like.
- Dec 1: I forgot to bring the copies of the optional exercise to class with me, so I put them in as many of your mailboxes as I could find. It's also posted below. You don't have to do the exercise, but it will be worth bonus points if you do. I'm hoping that you'll choose different parts to do so that I'll be left with a variety of models. (Don't all choose the same one.)
- Nov 22: Course evaluations will be done on Nov 24 in class.
- Nov 19: By popular demand, the deadline for exercise 6 was extended to Mon, November 22.
- Nov 19: I have switched the last day of presentations from Dec 8 to Dec 9 so that you can attend the department's christmas party.
- Nov 10: Time is running out: if you haven't told me which paper you are presenting, you should do so very soon.
- Nov 10: For exercise 6, assume processes know n, the number of processes in the system.
- Nov 10: The deadline for the exercise due this week has been postponed until Friday.
- Nov 1: There was a little bug in Exercise 5. The if statement should test whether k_1 - k_2 is bigger than 2f, not f. The version below has been corrected. Sorry about this.
- Oct 19: Exercise 4 has been posted below.
- Oct 12: Just to clarify: For exercise 3, the diameter refers to the maximum (over all pairs of vertices) of the minimum number of hops along a path between the pair. (The diameter does not refer to distances with respect to the edge weights: think of the weights as the amount of money you have to pay to use the edge, whereas the time to send a message across any edge is 1, since the system is synchronous.)
- Sept 30: Rachid Guerraoui will come to York on Oct 18 to give a seminar on distributed computing. You are strongly encouraged to attend. See this page for details.
- Sept 28: I will be out of town Oct 1 to Oct 10, and may not be checking email.
- Sept 27: Clarification for Ex 2: Each process has its own local memory that only it can access. You can use as much local memory as you like. To simplify the problem, you can assume that a process's steps that access its own local memory are never skipped; only steps that access shared objects might be skipped.
- Sept 26: Hint for Ex 2: Use the n registers as single-writer registers. I.e. only process number i will write to register number i, but every process will read from every register.
- Sept 26: I forgot to post exercise 3 last week. It is posted now.
- There will be no class on October 4 or 6. Instead, I have scheduled two extra classes at the end the term for you to do presentations. Those classes are on December 7 and 8 (from 4:00-5:30).
- Graduate students in their first two years of grad school are eligible to compete in the ACM programming contest. The regional contest is in November, and we are getting teams trained. If you would like to participate, see this link or just come to the practice session on Wednesday, September 15 from 6:30 to 9:00 in CSE 1004.
Course Description
Can a given problem be solved in a distributed system? If so, how
efficiently can it be solved? We investigate how the answers to these
questions depend on aspects of the underlying distributed system
including synchrony, fault-tolerance and the means of communication
between processes.
A tentative list of topics:
- shared-memory and message-passing models of distributed systems,
- I/O automata,
- mutual exclusion,
- agreement problems (consensus, leader-election, Byzantine agreement, approximate agreement),
- broadcast and multicast algorithms,
- impossibility results and lower bounds,
- the consensus hierarchy,
- implementing shared data structures,
- randomization in distributed computing,
- self-stabilization, and
- applications of topology to distributed computing.
Marking scheme
| Homework exercises | 80% |
| Class presentation | 20% |
The presentation will be a 25-minute talk summarizing the results of a
research paper on the theory of distributed computing that you find in
the literature. Before you start working on this, you should check
with me that the paper you choose is appropriate. You will also have
to hand in a short (~2 pages) written
summary of what you are presenting. Good places to look for
a paper include
PODC
or
DISC
conference proceedings or the journal
Distributed Computing.
This survey paper has a bibliography containing lots of papers that would be suitable to choose.
If you have a topic in mind, I might be able to help you find a
good paper if you come talk to me.
Lectures
The references below are intended for students who want to read more about the topics discussed in class. Sometimes the readings might be helpful for the assignments. Sometimes they will extend the ideas covered in lectures.
- Sep 8: Introduction. Why study theory of distributed computing? Two Generals. Building a 2-component snapshot object from read/write registers. Supplementary reading: [Lyn96] Section 5.1.
- Sep 13: Snapshot object continued. Models of distributed computing. Supplementary reading [LL90].
- Sep 15: Simulations between models. Broadcast. [AW04], Sections 2.1-2.3.
- Sep 20: Minimum Spanning Tree algorithm by Gallagher, Humblet and Spira (synchronous version only). Reading: [Lyn96] Section 4.4.
- Sep 22: Leader Election in a ring. Reading: [Ang80]; [AW04] Chapter 3 or [Lyn96] Chapter 3.
- Sep 27: Leader Election in a ring: complexity lower bound, randomized algorithm. Reading [AW04] Chapter 3, 14.1.
- Sep 29: Using I/O automata to model distributed systems formally. Linearizability. Reading [Lyn96] chapter 8, [HW90].
- October 13: Linearizability. Wait-free consensus is impossible using registers. [Her91] or [Lyn96], chapter 12.
- October 18: 1-resilient consensus is impossible using registers or message passing. Consensus using queues, compare&swap. [Her91] or [Lyn96], chapter 12.
- October 20: Consensus numbers [Her91]. Consensus with Byzantine failures [AW04, Sec 5.2] or [Lyn96, Sec 6.4,6.5].
- October 25: Byzantine consensus in non-complete network graphs. [AW04] Section 5.1.
- October 27: Synchronous consensus with halting failures: algorithm and lower bound. [AW04] Section 5.1 or [Lyn96], Section 6.7.
- November 1: Herlihy's universal construction ([Her91] or [AW04], Sec 15.2). Implementing snapshots from registers ([AADGMS93] or [AW04], chapter 10).
- November 3: Snapshots from registers, counters from snapshots (and from registers), mw register from sw snapshot.
- November 8: Register constructions (mwmr from sw snapshot, swmr from swsr)([AW04], chapter 10). Self-stabilization ([Dij74]).
- November 10: Mutual exclusion: self-stabilizing algorithm and the bakery algorithm [Lam74] or [AW04], Sec 4.4 (See also [Tau04]).
- November 15: Clock synchronization. [AW04], chapter 6.3, [HMM85] for radius algorithm.
- November 17: Clock synchronization. [BW01] for diameter/2 lower bound, [LM85] for Byzantine failures.
- November 22: Finishing up clock synchronization. The renaming problem.
- November 24: Renaming, continued. [AW04], Chapter 17.3 and [MA94], [ABDPR90].
- November 29: A brief intro to the topology of distributed computing. [HS99, HR95].
- December 1: Presentations. Ksenia Shubina (The Wakeup Problem; SIAM J. Computing, vol 25, number 6), Andrew Vorozcovs (Infinitely Many Processes), Chris Klochek (Group Key Exchange).
- December 6: Presentations. Canming Huang (Distributed MST for constant diameter graphs), Mikhail Sizintsev (Randomized consensus), Roman Glistvain (Self-stabilizing clock synchronization).
- December 7: Presentations. Dawei Xia (Optimal distributed algorithms for MST, etc.), Shakil Khan (Knowledge and Common Knowledge), Jin Xu (Group mutual exclusion).
- December 9: Presentations. Mark Obsniuk (Chord), Masoomeh Rudafshani (Assigning labels in anonymous networks), Andrew German (How fast can a distributed read be?), Victor Petrovykh (Detecting Global Predicates).
References
There is no required textbook for the course. However, I shall sometimes recommend readings from books or papers. These references will be listed here, and the list will grow during the term. Accessing some of the links below may require you to be logged into a machine at York, so that you can access the ACM Digital Library, etc.
Books (on reserve in Science Library)
Papers
This list is from last year's version of the course, but it gives you an idea of the kinds of topics covered. I'll add and subtract items from the list during the term.
- [AADGMS93] Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt and Nir Shavit. Atomic snapshots of shared memory. J. of the ACM, 40(4), pages 873-890, September 1993.
- [Ang80] D. Angluin. Local and global properties in networks of processors. In Proc. 12th ACM Symposium on Theory of Computing, pages 82-93, 1980.
- [ABDPR90] Hagit Attiya, Amotz Bar-Noy, Danny Dolev, David Peleg and Rüdiger Reischuk. Renaming in an asynchronous environment. J. of the ACM, 37(3), pages 524-548, July 1990.
- [AGPV90] Baruch Awerbuch, Oded Goldreich, David Peleg and Ronen Vainish. A trade-off between information and communication in broadcast protocols. J. of the ACM, 37(2), pages 238-256, April 1990.
- [BG93] P. Berman and J. Garay. Cloture votes: n/4-resilient distributed consensus in t+1 rounds. Mathematical Systems Theory 26(1), pages 3-19, 1993.
- [BW01] Saâd Biaz and Jennifer L. Welch. Closed form bounds for clock synchronization under simple uncertainty assumptions. Information Processing Letters, 80, pages 151-157, 2001.
- [Dij65] E. W. Dijkstra. Solution of a problem in concurrent programming control . Communications of the ACM, 8(9), page 569, September, 1965. A very early paper on mutual exclusion.
- [Dij74] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(4), pages 643-644, November 1974.
- [DM90] Cynthia Dwork and Yoram Moses. Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2), pages 156-186, October 1990.
- [FR03] Faith Fich and Eric Ruppert. Hundreds of impossibility results for distributed computing. In Distributed Computing, 16(2-3), pages 121-163, 2003.
- [FLP85] Michael J. Fischer, Nancy A. Lynch and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), pages 374-382, April 1985.
- [HMM85] Joseph Y. Halpern, Nimrod Megiddo and Ashfaq A. Munshi. Optimal precision in the presence of uncertainty. Journal of Complexity, 1, pages 170-196, 1985.
- [Her91] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1), pages 124-149, January 1991.
- [HR95] Maurice Herlihy and Sergio Rajsbaum. Algebraic topology and distributed computing: A primer. In Computer Science Today: Recent Trends and Developments, pages 203-217. Springer, 1995.
- [HS99] Maurice Herlihy and Nir Shavit. The topological structure of asynchronous computability. Journal of the ACM, 46(6), pages 858-923, November 1999.
- [HW90] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3), pages 463-492, July 1990.
- [Lam04] Leslie Lamport. A new solution of Dijkstra's concurrent programming problem. Communications of the ACM, 18(8), pages 453-455, August 1974.
- [LL90] Leslie Lamport and Nancy Lynch. Distributed Computing: Models and methods. In Handbook of Theoretical Computer Science, Volume B, Chapter 18, Elsevier, 1990. (On reserve in Steacie Library.) A good, brief introduction to the area. It describes aspects of distributed models and surveys some important early results.
- [LM85] Leslie Lamport and P. M. Melliar-Smith. Synchronizing clocks in the presence of faults. J. of the ACM, 32(1), pages 52-78, 1985.
- [MA94] Mark Moir and James H. Anderson. Fast, Long-Lived Renaming. In Proc. of the 8th International Workshop on Distributed Algorithms, pages 141-155, 1994. (There have been several better versions of adaptive renaming algorithms published afterwards, but I just want to cover the most basic version of the splitter algorithm. See Mark Moir's page for some of the improvements.)
- [Tau04] Gadi Taubenfeld. The black-white bakery algorithm and related bounded-space, adaptive, local-spinning and FIFO algorithms. In Distributed Computing, 18th International Conference, pages 56-70, 2004.
Web Pages
Previous versions of this course: Winter 2002, Fall 2003.
Exercises

Updated December 2, 2004