Cassandra – What is NOSQL and Why NOSQL ?
WHAT is NOSQL?
NoSQL (Not Only SQL) Database
A NoSQL database environment is, simply put, a non-relational and largely distributed database system that enables rapid, ad-hoc organization and analysis of extremely high-volume, disparate data types. NoSQL databases are sometimes referred to as cloud databases, non-relational databases, Big Data databases and a myriad of other terms and were developed in response to the sheer volume of data being generated, stored and analyzed by modern users (user-generated data) and their applications (machine-generated data).
In general, NoSQL databases have become the first alternative to relational databases, with scalability, availability, and fault tolerance being key deciding factors. They go well beyond the more widely understood legacy, relational databases (such as Oracle, SQL Server and DB2 databases) in satisfying the needs of today’s modern business applications. A very flexible and schema-less data model, horizontal scalability, distributed architectures, and the use of languages and interfaces that are “not only” SQL typically characterizes this technology.
From a business standpoint, considering a NoSQL or ‘Big Data’ environment has been shown to provide a clear competitive advantage in numerous industries. In the ‘age of data’, this is compelling information as a great saying about the importance of data summed up with the following “if your data isn’t growing then neither is your business”.
Advantages of NoSQL Databases
The reasons for businesses to adopt a NoSQL database environment have almost everything to do with the following market drivers and technical requirements.
The Growth of Big Data
Big Data is one of the key forces driving the growth and popularity of NoSQL for business. The almost limitless array of data collection technologies ranging from simple online actions to point of sale systems to GPS tools to smartphones and tablets to sophisticated sensors – and many more – act as force multipliers for data growth.
In fact, one of the first reasons to use NoSQL is because you have a Big Data project to tackle. A Big Data project is normally typified by:
- High data velocity – lots of data coming in very quickly, possibly from different locations.
- Data variety – storage of data that is structured, semi-structured and unstructured.
- Data volume – data that involve many terabytes or petabytes in size.
- Data complexity – data that is stored and managed in different locations or data centers.
Continuous Data Availability
In today’s marketplace, hardware failures can and will occur, fortunately NoSQL database environments are built with a distributed architecture so there are no single points of failure and there is built-in redundancy of both function and data. If one or more database servers, or ‘nodes’ goes down, the other nodes in the system are able to continue with operations without data loss, thereby showing true fault tolerance. In this way, NoSQL database environments are able to provide continuous availability whether in single locations, across data centers and in the cloud. When deployed appropriately, NoSQL databases can supply high performance at massive scale, which never go down. This is immensely beneficial as any system updates or modifications can be made without having to take the database offline. This fact alone draws the attention of businesses that are serving customers who expect availability of applications and where downtime equates to real dollars lost.
Real Location Independence
The term “location independence” means the ability to read and write to a database regardless of where that I/O operation physically occurs and to have any write functionality propagated out from that location, so that it’s available to users and machines at other sites. Such functionality is very difficult to architect for relational databases. Some techniques can be employed such as master/slave architectures and database sharding can sometimes meet the need for location independent read operations, but writing data everywhere is a different matter, especially when those data volumes are high. Other scenarios where location independence is an advantage are many and include servicing customers in many different geographies and needing to keep the data local at those sites for fast access.
Modern Transactional Capabilities
The concept of transactions appears to be changing in the Internet age, and it’s been demonstrated that ACID transactions are no longer a requirement in database driven systems. At first blush, this assertion sounds extreme, as transactional integrity is a characteristic of most every data systems – especially those with information requirements that demand accuracy and safety. However, what this refers to is not the jeopardizing of data, but rather the new way modern applications ensure transactional consistency across widely distributed systems. The “C” in ACID refers to data Consistency in relational database management systems which is enforced via foreign keys/referential integrity constraints. This type of consistency is not utilized in progressive data management systems such as NoSQL databases because there are no JOIN operations, as this would require more rigid enforcement of consistency. Instead, the “Consistency” that concerns NoSQL databases are found in the CAP theorem, which signifies the immediate or eventual consistency of data across all nodes that participate in a distributed database. The data is still safe and meets the AID portion of the RDBMS ACID definition, but its consistency is maintained differently given the nature and architecture of the system.
Flexible Data Models
One of the major reasons businesses move to a NoSQL database system from a relational database management system (RDBMS) is the more flexible data model that’s found in most NoSQL databases. The relational data model is based on defined relationships between tables, which themselves are defined by a determined column structure, all of which are explicitly organized in a database schema – all very strict and uniform. Problems begin to arise with the relational model around scalability and performance when trying to manage the large data volumes that are becoming a fact of life in a modern IT and business environment. A NoSQL data model – often referred to as schema-less – can support many of these use cases and others that don’t fit well into an RDBMS. A NoSQL database is able to accept all types of data – structured, semi-structured, and unstructured – much more easily than a relational database which rely on a predefined schema. This characteristic of a relational database can be a hindrance on flexibility because, a predefined schema rigidly determines how the database and database data are organized. Many of today’s business applications actually have the ability to enforce rules on data usage themselves making a schema-less database platform a viable option.
Finally, performance factors come into play with an RDBMS’ data model, especially where “wide rows” are involved and update actions are many, which can have real implications on performance. However, a NoSQL data model easily handles such situations and delivers very fast performance for both read and write operations.
Another reason to use a NoSQL database is because you need a more suitable architecture for a particular application. It’s critical that organizations adopt a NoSQL platform that allows them to keep their very high volume data in the context of their applications. Some, but not all, NoSQL solutions provide modern architectures that can tackle the type of applications that require high degrees of scale, data distribution, and continuous availability. Data center support, and as is more common, multiple data center support, should be a use case with which a NoSQL environment complies. It’s not just what your big data needs to look like today, but also out to greater time horizons that decisions should be made.
Analytics and Business Intelligence
A key strategic driver of implementing a NoSQL database environment is the ability to mine the data that are being collected so as to derive insights that puts your business at a competitive advantage. Extracting meaningful business intelligence from very high volumes of data is a very difficult task to achieve with traditional relational database systems. Modern NoSQL database systems not only provide storage and management of business application data, but also deliver integrated data analytics that deliver instant understanding of complex data sets and facilitate flexible decision-making.