Deploying Highly Available OpenStack

OpenStack has been designed for highly scalable environments where it is possible to avoid single point of failures (SPOFs), but you must build this into your own environment. For example, Keystone is a central service underpinning your entire OpenStack environment, so you would build multiple instances into your environment. GLANCE is another service that is a key to the running of your OpenStack environment. 

Corosync and Pacemaker are popular high-availability utilities that allow you to configure Cloudera Manager to fail over automatically. By setting up multiple instances running glance services, controlled with Pacemaker and Corosync, we can enjoy an increase in resilience to failure of the nodes running these services.

Getting started

We must first create two servers configured appropriately for use with OpenStack. As these two servers will just be running Keystone and Glance, only a single network interface and address on the network that our OpenStack services communicate, will be required. This interface can be bonded for added resilience.
The first, controller1, will have a host management address of
The second, controller2, will have a host management address of

To gain in-depth knowledge and be on par with practical experience, then explore  OpenStack Training course.

How to achieve it…

To install Pacemaker and Corosync on these two servers that will be running OpenStack services such as Keystone and Glance, carry out the following:

First node (controller1)

  • Once Ubuntu has been installed with an address in our OpenStack environment that our other OpenStack services can communicate using, we can proceed to install Pacemaker and Corosync, as follows:
sudo apt-get update
sudo apt-get -y install pacemaker corosync
  • It’s important that our two nodes know each other by address and hostname, so enter their details in /etc/hosts to avoid DNS lookups, as follows: controller1 controller2
  • Edit the /etc/corosync/corosync.conf file, so that the interface section matches the following:
interface {
# The following values need to be set based on your environment 
ringnumber: 0
mcastport: 5405


Corosync uses multi-cast. Ensure that the values don’t conflict with any other multi-cast-enabled services on your network.

  • By default, the corosync service isn’t set to start. To ensure it starts, edit the /etc/default/corosync service and set START=yes, as follows:
sudo sed -i 's/^START=no/START=yes/g' /etc/default/corosync
  • We now need to generate an authorization key to secure the communication between our two hosts:
sudo corosync-keygen
  • You will be asked to generate some random entropy by typing using the keyboard.

Subscribe to our youtube channel to get new updates..!

If you are using an SSH session, rather than a console connection, you won’t be able to generate the entropy using a keyboard. To do this remotely, launch a new SSH session, and in that new session, while the  corosync-keygen                                           command is waiting for entropy, run the following:

while /bin/true; do dd if=/dev/urandom of=/tmp/100 bs=1024 count=100000; for i in {1..10}; do cp /tmp/100 /tmp/tmp_$i_$RANDOM; done; rm -f /tmp/tmp_* /tmp/100; done
  • When the corosync-keygen command has finished running and an authkey file has been generated, simply press Ctrl + C to copy this random entropy creation loop.


Second node (controller2)

  • We now need to install Pacemaker and Corosync on our second host, controller2. We do this as follows:
sudo apt-get update
sudo apt-get install pacemaker corosync
  • We also ensure that our /etc/hosts file has the same entries for our other host, as before: controller1 controller2
  • By default, the corosync service isn’t set to start. To ensure that it starts, edit the /etc/default/corosync service and set START=yes:
sudo sed -i 's/^START=no/START=yes/g' /etc/default/corosync

First node (controller1)

With the /etc/corosync/corosync.conf file modified and the /etc/corosync/authkey file generated, we copy this to the other node (or nodes) in our cluster, as follows:

scp /etc/corosync/corosync.conf /etc/corosync/authkey openstack@

Second node (controller2)

We can now put the same corosync.conf file as used by our first node and the generated authkey file into /etc/corosync:

sudo mv corosync.conf authkey /etc/corosync

Starting the Pacemaker and Corosync services

  • We are now ready to start the services. On both nodes, issue the following commands:
sudo service pacemaker start 
sudo service corosync start
  • To check that our services have started fine and our cluster is working, we can use the crm_mon command to query the cluster status, as follows:
sudo crm_mon -1
  • This will return output similar to the following where the important information includes the number of nodes configured, the expected number of nodes, and a list of our two nodes that are online:
Last updated: Sat Aug 24 21:07:05 2013
Last change: Sat Aug 24 21:06:10 2013 via crmd on controller1 Stack: openais
Current DC: controller1 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes
0 Resources configured.
============ Online: [ controller1 controller2 ] First node (controller1)
  • We can validate the configuration using the crm_verify command, as follows:
sudo crm_verify -L
  • This will bring back an error mentioning STONITH (Shoot The Other Node In The Head). STONITH is used to maintain quorum when there are at least three nodes configured. It isn’t required in a 2-node cluster. As we are only configuring a 2-node cluster, we disable STONITH.
sudo crm configure property stonith-enabled=false
  • Verifying the cluster using crm_verify again will now show errors:
sudo crm_verify -L
  • Again, as this is only a 2-node cluster, we also disable any notion of quorum, using the following command:
sudo crm configure property no-quorum-policy=ignore
  • On the first node, we can now configure our services and set up a floating address that will be shared between the two servers. In the following command, we’ve chosen as the floating IP address and a monitoring interval of 5 seconds. To do this, we use the crm command again to configure this floating IP address, which will be called as FloatingIP.
sudo crm configure primitive FloatingIP  
ocf:heartbeat:IPaddr2 params ip=  
cidr_netmask=32 op monitor interval=5s
  • On viewing the status of our cluster, using crm_mon, we can now see that the FloatingIP address has been assigned to our controller1 host:
sudo crm_mon -1

This outputs something similar to the following example, which now says we have 1 resource configured for this setup (our FloatingIP):

Last updated: Sat Aug 24 21:23:07 2013
Last change: Sat Aug 24 21:06:10 2013 via crmd on controller1 Stack: openais
Current DC: controller1 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes
1 Resources configured.
============ Online: [ controller1 controller2 ] FloatingIP (ocf::heartbeat:IPaddr2): Started controller1
  • We can now use this address to connect to our first node, and when we power that node off, that address will be sent to our second node after 5 seconds of no response from the first node.
Explore OpenStack Sample Resumes! Download & Edit, Get Noticed by Top Employers!  Download Now!

How it works…

Making OpenStack services highly available is a complex subject, and there are a number of ways to achieve this. Using Pacemaker and Corosync is a very good solution to this problem. It allows us to configure a floating IP address assigned to the cluster that will attach itself to the appropriate node (using Corosync), as well as control services using agents, so that the cluster manager can start and stop services as required, to provide a highly available experience to the end user.
By installing both Keystone and Glance onto two nodes (each configured appropriately with a remote database backend such as MySQL and Galera), having the images available using a shared filesystem or Cloud storage solution means we can configure these services with Pacemaker to allow Pacemaker to monitor these services. If unavailable on the active node, Pacemaker can start those services on the passive node.


Related Pages:

Sample Resume: