← Back to blog

Lower bound on the number of rounds in crash stop synchronous consensus

concurrency

Any consensus algorithm with f+2nf + 2 \leq n processes that tolerates ff crashes requires more than ff rounds.

We suppose binary consensus in a synchronous model where each round consists of a message sending and message delivery phase. Each non crashed process must agree on a value in {0,1}\lbrace 0, 1 \rbrace .

Here is an informal reasoning and description of the 'chain argument'. The full proof is in Impossibility Results for Distributed Computing.

Structure

First, we reason that there in an initial state with two executions α\alpha and γ\gamma such that the consensus leads to a different result. In α\alpha exactly one process crashes at the beginning of the first round. In γ\gamma no processes crash.

Then, we will argue for any ff round execution with f+2nf + 2 \leq n processes, that any execution is equivalent to one without crashes. This result combined with the first gives us the desired lower bound.

We'll use 'chain argument's. A chain argument constructs a sequence (chain) of executions α0,α1,...,αk\alpha_0, \alpha_1, ..., \alpha_k such that every adjacent pair is indistinguishable by at least one correct process. This means that each execution must return the same result.

a) The same configuration can lead to different results

Statement 1: There in an initial state with two executions α\alpha and γ\gamma such that the consensus leads to a different result. In α\alpha exactly one process crashes at the beginning of the first round. In γ\gamma no processes crash. Why?

Define CiC_i to be the initial configuration (state) where processes p0,...,pip_0, ..., p_i have input 11 and the others have input 00. Now, a valid consensus, for executions α0,...,αn\alpha_0, ... , \alpha_n starting from C0,...,CnC_0, ..., C_n, where no processes fail, will return 00 starting from C0C_0 and 11 starting from CnC_n.

There must be some jj where Cj,Cj+1C_j, C_{j+1} decide different results.

Intuitively, an execution αj+1\alpha_{j+1}' starting from Cj+1C_{j+1} where pjp_j crashes at the start is not distinguishable from αj\alpha_j following CjC_j. αj\alpha_j and αj+1\alpha_{j+1}' must decide the same results which means that αj+1\alpha_{j+1} and αj+1\alpha_{j+1}' must decide different results even though they both start from CjC_j.

b) Any ff round execution is equivalent to one without crashes

Statement 2: any ff round execution with f+2nf + 2 \leq n processes is equivalent to one without crashes.

The trick is as follows: in a round where a process pip_i crashes, the crash may occur before pip_i has had a chance to send all their outbound messages. If we suppose that pip_i fails to send a message to pjp_j, then there must be a live process pkp_k who cannot distinguish this execution from one where pip_i did send a message to pjp_j. We know pkp_k must exist because 2nf2 \leq n-f. We can apply this to say that the execution where pip_i crashes mid-round is indistinguishable from one where it crashes at the start of the next round.

This gives us an induction. We can move crashes in the ffth round to the start of non-existent (f+1)(f+1)th round. Then, we can move crashes in the rrth round to the (r+1)(r+1)th round and by induction eventually to the start of non-existent (f+1)(f+1)th round.

Links