Blog

  • Home
  • OpenStack
  • Detecting and replacing failed hard drives – OpenStack

Detecting and replacing failed hard drives – OpenStack

  • (4.0)
  • | 1442 Ratings

Openstack Object Storage won’t be of much use if it can’t access the hard drives where our data is stored; so being able to detect and replace failed hard drives is essential. OpenStack Object Storage can be configured to detect hard drive failures with the swift- drive-audit command. This will allow us to detect failures so that we can replace the failed hard drive which is essential to the system health and performance.

To gain in-depth knowledge and be on par with practical experience, then explore  OpenStack Training course.

Getting started

Log in to an OpenStack Object Storage node as well as the Proxy Server.

How to accomplish it…

To detect a failing hard drive, carry out the following:

Storage node

  • We first need to configure a cron job that monitors /var/log/kern.log for failed disk errors on our storage nodes. To do this, we create a configuration file named /etc/swift/swift-drive-audit.conf, as follows:

configuration file

  • We then add a cron job that executes swift-drive-audit hourly, or as often as needed for your environment, as follows:

executes swift-drive

  • With this in place, when a drive has been detected as faulty, the script will unmount it, so that OpenStack Object Storage can work around the issue. Therefore, when a disk has been marked as faulty and taken offline, you can now replace it.

Tip

Without swift-drive -audit taking care of this automatically, you should need act manually to ensure that the disk has been dismounted and removed from the ring.

  • Once the disk has been physically replaced, we can follow instructions as described in the Managing Swift Cluster Capacity recipe, to add our node or device back into our cluster.
Explore OpenStack Sample Resumes! Download & Edit, Get Noticed by Top Employers!  Download Now!

How it works…

Detection of failed hard drives can be picked up automatically by the swift- drive-audit tool, which we set up as a cron job to run hourly. With this in place, it detects failures, unmounts the drive so it cannot be used, and updates the ring, so that data isn’t being stored or replicated to it.
Once the drive has been removed from the rings, we can run maintenance on that device and replace the drive.
With a new drive in place, we can then put the device back in service on the storage node by adding it back into the rings. We can then rebalance the rings by running the swift-ring-builder commands.
https://docs.openstack.org/openstack-ops/content/maintenance.html


Related Pages:
Openstack Tutorial

Interview Questions:
Openstack Interview Questions



Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Ravindra Savaram
About The Author

Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.


DMCA.com Protection Status

Close
Close