The Gossip architecture is a weak consistency replication technique where clients send requests to the nearest replica and then the replicas propagate the information to each other periodically just like a normal gossip. It allows for high availability and tolerance to net partitions (A+P).
The algorithm for this architecture can be adapted to the needs of each implementation. It does not need to be rigorously as described bellow. From now on, we will assume there are $n$ replicas.
There is the concept of timestamp, which is a vector with $n$ entries, where each entry symbolizes the current state of the $i$th replica. Let’s define the merge operation:
merge(tsA, tsB): for each entry i in tsA if tsA[i] > tsB[i] tsB[i] = tsA[i]
On each request, the client sends the
prev timestamp (from the last request). When a replica replies to the client, it returns the data, as well as their own timestamp
new. In this case, the client executes
Each replica contains the following state:
- Value: the value that is stored on each replica we want to keep in sync.
- Value Timestamp: a timestamp representing the operations executed to achieve the current value.
- Log: all the update operations the replica has accepted so far. It may contain already processed operations that are yet required to be propagated to other replicas.
- Replica Timestamp: represents all the operations the current replica has accepted, i.e., that were placed in the log.
- Executed Operations: a list with the unique IDs of the executed operations on the current replica.
- Timestamp Table: a list with the the known replica timestamps from other replicas.
When a client makes a read request, it sends the
prev timestamp as well as the request information. Then, the server proceeds as follows:
- Compares the
prevtimestamp with the
valueTimestampto see if we can safely read the value to ensure consistency, i.e., if
- If the previous condition is met, the server replies the required information.
When a client wants to make an update, it generates a UUID id to uniquely identify the request. Then, it sends the
prev, When the request
req arrives in the server, the replica checks decides whether of not to discard the request. A request is discarded if:
idis present on the executed operations list; or
idis present on any log record.
If the request is accepted, then we follow the algorithm:
- Update the replica timestamp by incrementing the $i$th entry, where
iis the current instance number starting on 0.
- Create a unique
timestampto represent the operation from now on. This timestamp is created by duplicating
prevand replacing the ith entry with the previously calculated value.
- Creates a new log record with the new
timestamp, the current instance number, the
idand the data to add.
- Returns the new timestamp to the client.
- Checks if the operation can be executed immediately by checking if
valueTimestamp. If possible, execute
Note: if the system’s data has no casual dependencies, there is no need to wait to have the updated data to apply the changes. In that specific case, most of the logic above can be simplified in order to just avoid duplications, but the updates can be immediately applied.
This architecture requires each replica to gossip every $t$ time to each other in order to keep consistency. On a gossip request, a replica i sends to the replica j:
- The instance number i;
sourceTimestamp, which is the replica timestamp of replica i;
- The logs records we estimate the replica j does not have.
When the replica j receives the gossip message from the replica i, it then proceeds as follows:
- Updates the entry i in the timestamp table with the
- For each log record
- Checks if
r.idis on the executed operations list. If so, discards.
- Checks if
replicaTimestamp. If not, discards.
- Adds the record to its own record log.
- Checks if
replicaTimestamp, by executing
- Goes through the log and executes all stable operations.
- Finally, cleans up the log.
Notes and Advices
- This is an architecture, not a protocol.
- Each use-case can be implemented with its specifics.
- There are ways to ensure the client does not get inconsistent results if they switch replicas.