CAP Theorem
In this section, let us take a look CAP Theorem, which is a very important concept in distributed architectures.
From the WIKI, CAP Theorem (proposed by Eric Brewer), states, that, “It is impossible for a distributed computer system to simultaneously provide more than two out of three of the following guarantees: Consistency, Availability, Partition tolerance”.
Following definitions are from WIKI:
1. Consistency:
Every read receives the most recent write or an error. The data is always consistent, regardless of situation.
2. Availability:
Every request receives a (non-error) response – without guarantee that it contains the most recent write.
3. Partition tolerance:
The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. A system is said to be Partition Tolerant, if the system behaves normally, when there is a network failure between nodes so that they don’t communicate with each and there is a partition between the nodes.
Revised CAP Theorem:
Nearly a decade after proposing the CAP Theorem, Eric Brewer proposed a revised CAP Theorem, according to which, network partitions cannot be avoided, and the distributed system should tolerate the partition. In the absence of partition, the system could support Strict Consistency and Availability. In the case of a network partition, the system should handle the partition and should recover from that. During a network partition, we only have two choices, either be Available or be Consistent.
What is the relevance of CAP Theorem?
Before we start looking at CAP Theorem in more detail, let us take a look at why did CAP theorem arise in first place.
Traditionally, databases were ACID in nature. They were quite happy being strictly consistent and serving loads which were moderate (relative to today’s context). However, the past few years saw the emergence of social media, increased internet usage and cloud based applications. Thus, the systems started experiencing huge workloads and systems are expected to be available 365X24X7. This requirement of being ‘Highly Available’, gave rise to new architectures. So, being consistent alone, ceased to be a criteria for modern databases. If we need systems to be highly available, we cannot achieve this with one machine (or even a couple of machines). We will need a distributed architecture, with data being replicated as well as distributed (Partitioned), across multiple machines.
Use of multiple machines (or nodes) to store data, and use of commodity hardware, means increased chances of failures. It also gave rise to increased complexity of failover mechanisms. Questions like ‘What if one node goes down?, Will the system function normally? How are the data operations like read and write affected’ began surfacing quite often.
Naturally, in such scenarios, we need to trade over one factor for the other. CAP Theorem proposes one such tradeoff for distributed data architecture.
CAP Theorem Scenarios
Since CAP Theorem states that any two of the three (Consistency, Availability, Partition tolerance, referred to as CAP) factors could be guaranteed, we can have three possible scenarios.
- AP (Availability, Partition tolerance)
- CA (Consistency, Availability)
- CP (Consistency, Partition tolerance)
Let us consider a distributed architecture, which has multiple nodes which has data replicated and distributed across multiple nodes. Let us consider two nodes which have data distributed across them. Consider a network failure between these two, which prevents communication between these two nodes.
1. AP (Availability, Partition tolerance)
In this combination, the system is not strictly consistent, when there is a network partition. Now, if we allow the system to update the data on either of the node, the other node will not get updated. Thus, the system is still allowing data operations, thus being Available. However, the data is not consistent. i.e., one node has latest data, whereas the other node has stale data, leading to consistency issues.
2. CP (Consistency, Partition tolerance)
In this combination, the system is not available, when there is a network partition. In the above scenario, if the system doesn’t allow data operations and the nodes return an error, when there is a network partition, the system will be still be Consistent. But this comes at the cost of Availability.
Also, when there is a network partitioning, the ACID properties could be guaranteed within each partition. For instance, the operations could be Atomic, Consistent within a partition, but not necessarily across the entire system which is distributed.