Apache Cassandra is a leading NOSQL DATABASE platform for online applications. By offering the benefits of continuous availability, high scalability & performance, strong security, and operational simplicity — while lowering the overall cost of ownership — Cassandra has become a proven choice for both technical and business stakeholders. When compared to other database platforms such as HBase, MongoDB, Redis, MySQL, and many others, Cassandra delivers higher performance under heavy workloads.
Cassandra NoSQL Performance Management - Table of Content |
Monitoring, troubleshooting, and tuning databases are a top priority for you as a DBA. This section details how you can carry out your performance management tasks on a NoSQL database like Cassandra.
Enthusiastic about exploring the skill set of Cassandra? Then, have a look at the Cassandra Training Course together with additional knowledge. |
There are a number of command line utilities that enable you to get the status of your database clusters, as well as general metrics for the network, objects, and I/O operations both at a high level and low level (e.g. table) fashion. For example, the Cassandra nodetool utility lets you quickly determine the up/down status and current data distribution of a cluster:
Checking a cluster’s status with the node tool utility.
[ Check out: Apache Cassandra - Data Model Best Practices]
From a performance metrics standpoint, Cassandra delivers many different statistics that can be accessed in various ways. If you are coming from an RDBMS like Oracle or Microsoft SQL Server and are used for performance data dictionaries like Oracle’s V$ views or SQL Server’ dynamic management tables, the most familiar interface for you is the one supplied by DataStax Enterprise’s Performance Service.
The Performance Service collects, organizes, and maintains an in-depth diagnostic data dictionary for each cluster. It consists of various tables that can be accessed via any CQL utility (e.g. the CQL shell utility, DataStax DevCenter, etc.) and gives you both high-level and detailed performance views of how well a cluster is running.
The Performance Service maintains the following levels of performance information:
You can configure the service to collect nothing, all, or selected performance metrics for the above categories. Once the service has been configured and is running, the statistics are populated in their associated tables and stored in a special keyspace (dse_perf). You can then query the various performance tables to get statistics such as the I/O metrics for certain objects:
(2rows)
In addition to monitoring your database clusters from the command line, you can also easily check on the health of all clusters you’re managing visually (just as you probably do with your chosen RDBMS performance monitors) by using DataStax OpsCenter. OpsCenter gives you both global, at-a-glance dashboards that help you understand how all clusters under your control are doing, as well as drill down capabilities into each cluster and its individual nodes
[ Related Article: Apache Cassandra Data Security Management ]
A global dashboard helps you understand how well all clusters are running and if there are any alerts or issues for one or more clusters that need your attention:
Checking OpsCenter’s global cluster dashboard.
From the global dashboard, you can drill down into each individual cluster and create customized monitoring dashboards for the performance metrics you care about the most:
Examining performance metrics for a single database cluster.
You can also create proactive alerts that notify you far in advance of a problem actually occurring in one of your clusters:
Creating an alert in OpsCenter.
In addition, you can utilize built-in expert services like the Best Practice service that will scan your clusters and provide expert advice on how to configure and tune things for better uptime and performance:
Creating an alert in OpsCenter.
In addition, you can utilize built-in expert services like the Best Practice service that will scan your clusters and provide expert advice on how to configure and tune things for better uptime and performance:
As a DBA, you’re sometimes called upon to locate a database’s worst-running queries that slow the performance of the system as a whole. You’ll find this isn’t hard to do with Cassandra.
First, you can use the DataStax Enterprise Performance Service to automatically capture long-running queries (based on response time thresholds you specify) and then query a performance table that holds those statements:
In addition, there is a background query tracing utility available that you can use on an ad-hoc basis. You can choose to trace all statements coming into a database cluster or only a percentage of them, and then look at the results. The trace information is stored in the systems_traces keyspace that holds two tables: sessions and events, which can be easily queried to answer questions such as what the most time-consuming query has been since a trace was started, and much more.
Also Read: Frequently asked Cassandra Interview Questions & Answers |
You can also use the tracing utility much in the same way you do an EXPLAIN PLAN on an RDBMS query. For example, to understand how a Cassandra cluster will satisfy a single CQL INSERT statement, you would enable the trace utility from the CQL command shell, issue your query, and review the diagnostic information provided:
With Cassandra’s tracing capabilities, OpsCenter’s visual monitoring, DataStax Enterprise’s Performance service, and general command line monitoring tools, you will have most, if not all, of the typical performance tools at your disposal with Cassandra as you do today with your favorite RDBMS.
Explore Cassandra Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now! |
Name | Dates | |
---|---|---|
Cassandra Training | Oct 05 to Oct 20 | View Details |
Cassandra Training | Oct 08 to Oct 23 | View Details |
Cassandra Training | Oct 12 to Oct 27 | View Details |
Cassandra Training | Oct 15 to Oct 30 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.