Blog

  • Home
  • OpenStack
  • Detecting and replacing failed hard drives – OpenStack

Detecting and replacing failed hard drives – OpenStack

  • (4.0)
  • | 781 Ratings |
  • Last Updated June 02, 2017

Openstack Object Storage won’t be of much use if it can’t access the hard drives where our data is stored; so being able to detect and replace failed hard drives is essential. OpenStack Object Storage can be configured to detect hard drive failures with the swift- drive-audit command. This will allow us to detect failures so that we can replace the failed hard drive which is essential to the system health and performance.

To gain in-depth knowledge and be on par with practical experience, then explore  OpenStack Training course.

Getting started

Log in to an OpenStack Object Storage node as well as the Proxy Server.

How to accomplish it…

To detect a failing hard drive, carry out the following:

Storage node

  • We first need to configure a cron job that monitors /var/log/kern.log for failed disk errors on our storage nodes. To do this, we create a configuration file named /etc/swift/swift-drive-audit.conf, as follows:

  • We then add a cron job that executes swift-drive-audit hourly, or as often as needed for your environment, as follows:

  • With this in place, when a drive has been detected as faulty, the script will unmount it, so that OpenStack Object Storage can work around the issue. Therefore, when a disk has been marked as faulty and taken offline, you can now replace it.

 

Tip

Without swift-drive -audit taking care of this automatically, you should need act manually to ensure that the disk has been dismounted and removed from the ring.

  • Once the disk has been physically replaced, we can follow instructions as described in the Managing Swift Cluster Capacity recipe, to add our node or device back into our cluster.
Explore OpenStack Sample Resumes! Download & Edit, Get Noticed by Top Employers!  Download Now!

How it works…

Detection of failed hard drives can be picked up automatically by the swift- drive-audit tool, which we set up as a cron job to run hourly. With this in place, it detects failures, unmounts the drive so it cannot be used, and updates the ring, so that data isn’t being stored or replicated to it.
Once the drive has been removed from the rings, we can run maintenance on that device and replace the drive.
With a new drive in place, we can then put the device back in service on the storage node by adding it back into the rings. We can then rebalance the rings by running the swift-ring-builder commands.
https://docs.openstack.org/openstack-ops/content/maintenance.html

 

Related Pages:
Openstack Tutorial

Interview Questions:
Openstack Interview Questions

 


Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Free Demo Popup -->