A Brief Introduction to Apache Cassandra NoSQL
Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database.
A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
The primary objective of a NoSQL database is to have
- simplicity of design,
- horizontal scaling, and
- finer control over availability.
NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.
Types of NoSQL Databases
There are four general types of NoSQL databases, each with their own specific attributes:
- Graph database – Based on graph theory, these databases are designed for data whose relations are well represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. Examples include: Neo4j and Titan.
- Key-Value store – we start with this type of database because these are some of the least complex NoSQL options. These databases are designed for storing data in a schema-less way. In a key-value store, all of the data within consists of an indexed key and a value, hence the name. Examples of this type of database include: Cassandra, DyanmoDB, Azure Table Storage (ATS), Riak, BerkeleyDB.
- Column store – (also known as wide-column stores) instead of storing data in rows, these databases are designed for storing data tables as sections of columns of data, rather than as rows of data. While this simple description sounds like the inverse of a standard database, wide-column stores offer very high performance and a highly scalable architecture. Examples include: HBase, BigTable and HyperTable.
- Document database – expands on the basic idea of key-value stores where “documents” contain more complex, in that, they contain data and each document is assigned a unique key, which is used to retrieve the document. These are designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Examples include: MongoDB and CouchDB.
The following table lays out some of the key attributes that should be considered when evaluating NoSQL databases.
|Key-value store||High||High||High||None||Variable (None)|
|Document Store||High||Variable (High)||High||Low||Variable (Low)|
|Graph Database||Variable||Variable||High||High||Graph Theory|
A NoSQL Example – Apache Cassandra
Apache Cassandra(tm) is a massively scalable open source NoSQL database delivering continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centers and cloud availability zones. Cassandra was originally developed at Facebook and sports a design combining capabilities from Amazon’s Dynamo and Google’s Bigtable architectures; it was open sourced in 2008.
What Makes Cassandra Ideal for Modern Online Applications
Modern applications that succeed in today’s digital, Internet economy age are those that interact intelligently with the end customer in specifically tailored and personalized ways, benefiting both the customer and the underlying business. Cassandra provides a number of key features and benefits to facilitate the development and management of these types of modern online applications:
• Massively scalable architecture – Cassandra has a masterless design where all nodes are the same, providing operational simplicity and easy scale out capabilities.
• Active everywhere design – all Cassandra nodes may be written to and read from no matter where they are located.
• Linear scale performance – online node additions produce predictable increases in performance. For example, if two nodes produce 200K transactions/sec, four nodes will deliver 400K transactions/sec, and eight nodes 800K transactions/sec.
• Continuous availability – Cassandra offers redundancy of both data and function, which supply no single point of failure and constant uptime.
• Transparent fault detection and recovery – nodes that fail can easily be restored or replaced.
• Flexible and dynamic data model – supports modern data types with fast writes and reads.
• Strong data protection – a commit log design ensures no data loss for incoming transactions. Also, built-in security with easy backup/restore keeps data protected.
• Transaction support with tunable data consistency – Cassandra supports transactions (including batch) with strong or eventual data consistency supplied across a widely distributed cluster.
• Multi-data center replication – Cassandra provides an outstanding cross data center (in multiple geographies) and multi-cloud availability zone support for writes/reads.
• Data compression – data compressed up to 80% without performance overhead helps save on storage costs.
• CQL (Cassandra Query Language) – a SQL-like language that makes moving from an RDBMS very easy.
Top Use Cases
While Cassandra is a general purpose NoSQL database used for a variety of different applications in all industries, there are a number of use cases where the database excels over most of any other option. These include:
• Internet of Things (IOT) applications – Cassandra is perfect for consuming and analyzing lots of fast-incoming data from devices, sensors and similar mechanisms that exist in many different locations.
• Product catalogs and retail apps – For retailers that need durable shopping cart protection, fast product catalog input and lookups, and similar retail application support, Cassandra is the database of choice.
• User activity tracking and monitoring – Media, gaming and entertainment companies use Cassandra to track and monitor the activity of users’ interactions with their movies, music, games, website and online applications.
• Messaging – Cassandra serves as the database backbone for numerous mobile phone, telecommunication, cable/wireless, and messaging providers’ applications.
• Social media analytics and recommendation engines – Online companies, websites, and social media providers use Cassandra to ingest, analyze, and provide analysis and recommendations to their customers.
• Other time series based applications – because of Cassandra’s fast write capabilities, wide-row design, and ability to read only those columns needed to satisfy certain queries, it is well suited for most any time series based applications.