As described thus far, the repeated application of a consensus algorithm to create a replicated log will lead
to an ever growing log. This has two problems: it requires unbounded amounts of disk space; and perhaps
worse, it may result in unbounded recovery time since a recovering replica has to replay a potentially long
log before it has fully caught up with other replicas. Since the log is typically a sequence of operations to
be applied to some data structure, and thus implicitly (through replay) represents a persistent form of that
data structure, the problem is to find an alternative persistent representation for the data structure at hand.
An obvious mechanism is to persist – or snapshot – the data structure directly, at which point the log of
operations leading to the current state of the data structure is no longer needed. For example, if the data
structure is held in memory, we take a snapshot by serializing it on disk. If the data structure is kept on
disk, a snapshot may just be an on-disk copy of it.