Apache Cassandra NoSQL Performance Management / Benchmarks

Rating: 4.9
  
 
3947

Apache Cassandra is a leading NOSQL DATABASE platform for online applications. By offering the benefits of continuous availability, high scalability & performance, strong security, and operational simplicity — while lowering the overall cost of ownership — Cassandra has become a proven choice for both technical and business stakeholders. When compared to other database platforms such as HBase, MongoDB, Redis, MySQL, and many others, Cassandra delivers higher performance under heavy workloads.

Cassandra NoSQL Performance Management - Table of Content

Performance Management

Monitoring Basics

Advanced Command Line Performance Monitoring Tools

Visual Database Monitoring

Finding and Troubleshooting Problem Queries

Performance Management

Monitoring, troubleshooting, and tuning databases are a top priority for you as a DBA. This section details how you can carry out your performance management tasks on a NoSQL database like Cassandra.

Enthusiastic about exploring the skill set of Cassandra? Then, have a look at the Cassandra Training Course together with additional knowledge.

Monitoring Basics

There are a number of command line utilities that enable you to get the status of your database clusters, as well as general metrics for the network, objects, and I/O operations both at a high level and low level (e.g. table) fashion. For example, the Cassandra nodetool utility lets you quickly determine the up/down status and current data distribution of a cluster:

Checking a cluster’s status with the nodetool utility

Checking a cluster’s status with the node tool utility.

[ Check out: Apache Cassandra - Data Model Best Practices]

Advanced Command Line Performance Monitoring Tools

From a performance metrics standpoint, Cassandra delivers many different statistics that can be accessed in various ways. If you are coming from an RDBMS like Oracle or Microsoft SQL Server and are used for performance data dictionaries like Oracle’s V$ views or SQL Server’ dynamic management tables, the most familiar interface for you is the one supplied by DataStax Enterprise’s Performance Service.

The Performance Service collects, organizes, and maintains an in-depth diagnostic data dictionary for each cluster. It consists of various tables that can be accessed via any CQL utility (e.g. the CQL shell utility, DataStax DevCenter, etc.) and gives you both high-level and detailed performance views of how well a cluster is running.

MindMajix Youtube Channel

The Performance Service maintains the following levels of performance information:

  • System level – supplies general memory, network, and thread pool statistics.
  • Cluster level – provides metrics at the cluster, data center, and node level.
  • Database level – provides drill-down metrics at the keyspace, table, and table-pernode level.
  • Table histogram level – delivers histogram metrics for the tables being accessed.
  • Object I/O level – supplies metrics concerning ‘hot objects’; data on what objects are being accessed the most.
  • User level – provides metrics concerning user activity, ‘top users’ (those consuming the most resources on the cluster), and more.
  • Statement level – captures queries that exceed a certain response time threshold along with all their relevant metrics.

You can configure the service to collect nothing, all, or selected performance metrics for the above categories. Once the service has been configured and is running, the statistics are populated in their associated tables and stored in a special keyspace (dse_perf). You can then query the various performance tables to get statistics such as the I/O metrics for certain objects:

I/O metrics

I/O metrics

(2rows)

Visual Database Monitoring

In addition to monitoring your database clusters from the command line, you can also easily check on the health of all clusters you’re managing visually (just as you probably do with your chosen RDBMS performance monitors) by using DataStax OpsCenter. OpsCenter gives you both global, at-a-glance dashboards that help you understand how all clusters under your control are doing, as well as drill down capabilities into each cluster and its individual nodes

[ Related Article: Apache Cassandra Data Security Management ]

A global dashboard helps you understand how well all clusters are running and if there are any alerts or issues for one or more clusters that need your attention:

Checking OpsCenter’s global cluster dashboard

Checking OpsCenter’s global cluster dashboard.

From the global dashboard, you can drill down into each individual cluster and create customized monitoring dashboards for the performance metrics you care about the most:

Examining performance metrics for a single database cluster

Examining performance metrics for a single database cluster.

You can also create proactive alerts that notify you far in advance of a problem actually occurring in one of your clusters:

Creating an alert in OpsCenter

Creating an alert in OpsCenter.

In addition, you can utilize built-in expert services like the Best Practice service that will scan your clusters and provide expert advice on how to configure and tune things for better uptime and performance:

Creating an alert in OpsCenter

Creating an alert in OpsCenter.

In addition, you can utilize built-in expert services like the Best Practice service that will scan your clusters and provide expert advice on how to configure and tune things for better uptime and performance:

Finding and Troubleshooting Problem Queries

As a DBA, you’re sometimes called upon to locate a database’s worst-running queries that slow the performance of the system as a whole. You’ll find this isn’t hard to do with Cassandra.

First, you can use the DataStax Enterprise Performance Service to automatically capture long-running queries (based on response time thresholds you specify) and then query a performance table that holds those statements:

Capture long-running queries

In addition, there is a background query tracing utility available that you can use on an ad-hoc basis. You can choose to trace all statements coming into a database cluster or only a percentage of them, and then look at the results. The trace information is stored in the systems_traces keyspace that holds two tables: sessions and events, which can be easily queried to answer questions such as what the most time-consuming query has been since a trace was started, and much more.

Also Read: Frequently asked Cassandra Interview Questions & Answers

You can also use the tracing utility much in the same way you do an EXPLAIN PLAN on an RDBMS query. For example, to understand how a Cassandra cluster will satisfy a single CQL INSERT statement, you would enable the trace utility from the CQL command shell, issue your query, and review the diagnostic information provided:

 CQL command shell

With Cassandra’s tracing capabilities, OpsCenter’s visual monitoring, DataStax Enterprise’s Performance service, and general command line monitoring tools, you will have most, if not all, of the typical performance tools at your disposal with Cassandra as you do today with your favorite RDBMS.

 

Explore Cassandra Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
 
Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
Cassandra TrainingApr 27 to May 12View Details
Cassandra TrainingApr 30 to May 15View Details
Cassandra TrainingMay 04 to May 19View Details
Cassandra TrainingMay 07 to May 22View Details
Last updated: 04 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read more