Home / OpenStack

Detecting and replacing failed hard drives – OpenStack

Rating: 5.0Blog-star
Views: 2601
by Ravindra Savaram
Last modified: April 10th 2021

Openstack Object Storage won’t be of much use if it can’t access the hard drives where our data is stored; so being able to detect and replace failed hard drives is essential. OpenStack Object Storage can be configured to detect hard drive failures with the swift-drive-audit command. This will allow us to detect failures so that we can replace the failed hard drive which is essential to the system's health and performance.

To gain in-depth knowledge and be on par with practical experience, then explore OpenStack Online Certification Course.

How to Detect and Replace Failed Hard Drives

Log in to an OpenStack Object Storage node as well as the Proxy Server.

To detect a failing hard drive, carry out the following:

Storage node

  • We first need to configure a cron job that monitors /var/log/kern.log for failed disk errors on our storage nodes. To do this, we create a configuration file named /etc/swift/swift-drive-audit.conf, as follows:

configuration file

  • We then add a cron job that executes swift-drive-audit hourly, or as often as needed for your environment, as follows:

executes swift-drive

  • With this in place, when a drive has been detected as faulty, the script will unmount it, so that OpenStack Object Storage can work around the issue. Therefore, when a disk has been marked as faulty and taken offline, you can now replace it.
  • Without swift-drive -audit taking care of this automatically, you should need act manually to ensure that the disk has been dismounted and removed from the ring.
  • Once the disk has been physically replaced, we can follow instructions as described in the Managing Swift Cluster Capacity recipe, to add our node or device back into our cluster.

[Related Article: Swift Cluster]

Detection of failed hard drives can be picked up automatically by the swift- drive-audit tool, which we set up as a cron job to run hourly. With this in place, it detects failures, unmounts the drive so it cannot be used, and updates the ring, so that data isn’t being stored or replicated to it.

Once the drive has been removed from the rings, we can run maintenance on that device and replace the drive.
With a new drive in place, we can then put the device back in service on the storage node by adding it back into the rings. We can then rebalance the rings by running the swift-ring-builder commands.

Explore OpenStack Sample Resumes! Download & Edit, Get Noticed by Top Employers!  Download Now!

Related Pages:
learn Openstack Step by Step

Interview Questions:
Openstack Questions



About Author

NameRavindra Savaram
Author Bio


Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.