Cassandra for Multiple Data Centers and Cloud Support
MANAGING AVAILABILITY AND MULTIPLE DATA CENTERS
Another key aspect of your DBA job is to ensure the databases you manage are always available for the applications that use them. One thing you will like about Cassandra is that, compared to an RDBMS, ensuring constant uptime is very easy. There is no need for specialized, add-on log shipping software such as Oracle Dataguard.
Further, distributing data to multiple geographies and across various cloud providers is much more simple and straightforward with Cassandra than with any RDBMS.
How to Ensure Constant Availability
As previously discussed, Cassandra sports a masterless architecture where all nodes are the same; and it has been built from the ground up with the understanding that outages and hardware failures will occur. To overcome those and similar issues, Cassandra delivers redundancy in both data and function to a database cluster with all nodes being the same.
Where data operations are concerned, any node in a cluster may be the target for both reads and writes. Should a particular node go down, there is no hiccup in the cluster at all, as any other node may be written to, with reads served from other nodes holding copies of the downed node’s data.
To ensure constant access to data, you should configure Cassandra’s replication to keep multiple copies of data on the nodes that comprise a database cluster. The number of data copies is completely up to you, with three being the most commonly used in production Cassandra environments.
Should a node go down, new or updated information is simply written to another node that keeps a copy of that data. When the downed node is brought back online, it automatically re-syncs with other nodes holding its data so that it is brought back up to date in a transparent fashion.
Multi-Data Center and Cloud Options
Cassandra is the leading distributed database for multi-data center and cloud support. Many production Cassandra systems consist of a database cluster that spans multiple physical data centers, cloud availability zones, or a combination of both. Should a large outage occur in a particular geographical region, the database cluster continues to operate as normal with the other data centers, assuming the operations previously directed at the now downed data center or cloud zone. Once the downed data center comes back online, it syncs with the other data centers and makes itself current.
Multi Data Center – The ring of Cassandra cluster can spread across multiple data centers (DC). Cassandra supports, both, virtual DC as well as physical DC. For example, we have two data centers located at two different geographical locations- Douglous County (US) and Rotterdam (EU). The data is stored across both the DCs, but when we look at the Cassandra’s ring of cluster, all nodes give us a similar output to any query, while keeping the local DC and distant DC behavior intact. Queries can be performed in local DC, only, or across all DCs in the ring.
A single Cassandra database cluster can span multiple data centers and the cloud.
An additional benefit of having a single cluster that spans multiple data centers and geographies is that data can be read and written to incredibly quickly in each location, thus keeping performance very high for the customers it serves in those locations.