Monday 5 June 2023

Better visualisation of consensus protocols

Almost every BFT paper includes an "arrow diagram", showing who sends how many messages to whom and when. Despite their universality, these diagrams are difficult to interpret, leaving out many important details and including unimportant ones. In this blog post, we describe a new way to visualise BFT protocols that is easier to draw, easier to read, and easier to interpret.

Arrow diagrams in BFT

Earlier I presented a new Byzantine fault tolerance (BFT) protocol, SACZyzzyva, at SRDS 2019. Afterwards, I received a number of positive comments about this talk, and in particular about the fact that it did not include a single "arrow diagram".

If you don't know what I mean by an arrow diagram (or even if you do), then this article by James Mickens in ;login: provides an excellent introduction to the BFT literature, having this to say on arrow diagrams:

In a paper about Byzantine fault tolerance, the related work section will frequently say, "Compare the protocol diagram of our system to that of the best prior work. Our protocol is clearly better." The paper will present two graphs that look like Figure 2. Trying to determine which one of these hateful diagrams is better is like gazing at two unfathomable seaweed bundles that washed up on the beach and trying to determine which one is marginally less alienating.

Here is one that we included in our paper on SACZyzzyva:

Each arrow represents a message from one machine to another, and the difference in colour (grey vs black) shows how SACZyzzyva transactions can complete with two fewer rounds of communication than Zyzzyva transactions. This isn't the clearest diagram—it is difficult for the brain to pick out which lines it should be counting, so a quick skim of this figure doesn't do much except show whether each phase has O(1), O(n), or O(n2) messages with respect to the number of replicas.

But this diagram also hides some important features of the two protocols that can make a big difference to performance. First, Zyzzyva doesn't always need the extra rounds of communication—when all replicas respond, then it is just as fast as SACZyzzyva. If these extra rounds are needed rarely enough, then using SACZyzzyva might not be worthwhile. Secondly, it doesn't say anything about the size of the messages; the size of each of Zyzzyva's prepare messages actually increases O(n) with the number of replicas, giving it a bit-complexity of O(n2) whenever there are actually faults to tolerate. This suggests that SACZyzzyva may have better scalability than Zyzzyva in normal operation, but you can't see this in the arrow diagram.

Ball diagrams

If not arrow diagrams, then what? The key observation is that when comparing these diagrams, the reader doesn't pay much attention to the sender or recipient of the message, but only the number of messages. This lets us simplify the diagram dramatically:

In this diagram, it is much easier to see how many messages are sent in each round. We also show that the second phase is triggered by a timeout—meaning that there is a long wait between the two bracketed parts—but that it shouldn't happen very often—indicating that the first phase might be more important from a performance perspective.

Ball diagrams are also much less busy, making it easier to fit other information into the diagram:


Figure 3: The SACZyzzyva normal-case protocol, annotated with the changes relative to Zyzzyva's.

Showing message size

However, these diagrams don't show the relative sizes of the messages: the diagram shows the message complexity, but not the bit complexity. We can visualise this by making the area of each ball proportional to the size of its message. In the case of Zyzzyva, the request and order-request messages may be quite large relative to the spec-response messages, since they need to contain request data, and the size of the commit messages will depend on the number of replicas, since they contains 2f+1 signatures from other replicas:

As the number of replicas increases, both the number of messages and their relative sizes change, showing that as the number of replicas increases, scalability will ultimately be limited by the number and size of the order-request messages may be quite large relative to the spec-response messages, since they need to contain relevant data, and the size of the commit messages:

This fact is invisible on a regular arrow diagram.

Conclusion

This post introduced ball diagrams, new type of diagram for visualisation of BFT protocols. These diagrams are easier to read than the usual arrow diagrams that appear in most BFT papers and presentations, so if you are less interested in the source/destination information than the raw number of bits or messages, then consider giving these a try in your next presentation.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

ML Security at SSG: Past, Present and Future

  “Security is a process not a product” – Bruce Schneier This year, we wrap up an eight-year-long, machine learning (ML) focused research e...