With the uninterrupted growth of data volumes ever since the primitive ages of computing, storage of information, support and maintenance has been the biggest challenge. Also, the cloud computing technology has positioned a new dimension (“pay-as-you-go” model) to the information storage with efficient use of computer resources. Even the matured relational database products in the current market fall behind to scaling the applications according to the incoming traffic at a conventional cost. The demands of huge data and elastic scaling with desired performance has led to concept of NO SQL DATABASES. No SQL – undoubtedly is the hottest stint in today’s database technology and moving the data from the existing data structures to No SQL would be the potential area of interest for the customers.
Moving data from an RDBMS or other database to Cassandra is generally quite easy. The following options exist for migrating data to Cassandra:
1. COPY command – CQL provides a copy command (very similar to Postgres) that is able to load data from an operating system file into a Cassandra table. Note that this is not recommended for very large files.
2. Bulk loader – this utility is designed for more quickly loading a Cassandra table with a file that is delimited in some way (e.g. comma, tab, etc.)
3. Sqoop – Sqoop is a utility used in Hadoop to load data from RDBMSs into a Hadoop cluster. DataStax supports pipelining data directly from an RDBMS table into a Cassandra table.
4. ETL tools – there are a variety of ETL tools (e.g. Informatica) that support Cassandra as both a source and target data platform. Many of these tools not only extract and load data but also provide transformation routines that can manipulate the incoming data in many ways. A number of these tools are also free to use (e.g. Pentaho, Jaspersoft, Talend).
It’s clear that the database is critical to successfully managing the explosion of data. What’s less clear is how to transition from legacy RDBMS to modern NoSQL databases. Successfully migrating from a relational world to a NoSQL world requires careful planning.
In fact, one of the biggest things against NoSQL databases like MongoDB or Neo4j is that they’re so easy to work with that developers end up jumping in headfirst, without bothering to properly construct their data model, thereby causing problems later. While NoSQL databases do provide significantly more developer agility and flexibility, they still shouldn’t be used willy-nilly.
This is particularly true for those starting from an RDBMS background, as NoSQL differs markedly from relational. In the RDBMS world, an engineer designs the data schema from the outset, and SQL queries are then run against the database. If business/application changes then require changes to the database, a DBA must get involved. It’s not an easy process, as the DBA must navigate complex joins (i.e., inter-table relationships). NoSQL databases better fit modern application development, and provide significant database performance and developer agility benefits, albeit at the expense of some functionality.
Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.