HA Cluster Stack Session at UDS-O

Thursday the 12th at noon we will be having the HA Cluster Stack session. In the session we will discuss the following:

  • Discuss the adoption of new upstream releases of the HA Cluster Stack to include in Oneiric in preparation for the next Ubuntu LTS release.
  • Finish up work items from previous sessions (mainly documentation).
  • Gather feature requests and discuss the creation of meta-packages.
  • And, if the time allows us, I’d like to follow up with HA for OpenStack as they had a session in their Design Summit about it.

If you are interested of the Future of HA Clustering in Ubuntu, you are more than welcome to join this session. For more information the blueprint can be found HERE.

UPDATED: Cluster Stack and PowerNap sessions at UDS-N

At UDS-N (Natty) I’ll be leading these two sessions:

  • Cluster Stack for Natty
    The Cluster stack session will be divided in two main parts. The first part we will discuss the current status of the Cluster Stack in Ubuntu, things that have been and haven’t been achieved so far, as well as the features we would like to see in the future. The second part of the session will be concentrated in the integration of the Cluster Stack with the Ubuntu Enterprise Cloud (UEC).

    The outcome of the discussion is:

    • Merge library split changes for cluster-glue, pacemaker from debian packages.
    • Complete MIR requests to finally get packages into Main.
    • Improve documentation, and add it to the Ubuntu Server Guide.
      • Docs: HA Apache2, HA MySQL, CLVM, Recommend a Cluster FS – OCFS2, Fecing, etc.
    • Automated Deployment (Look into deploying with puppet.).
      • Simple: Join a simple cluster/Virtual IP.
      • Advanced: CLVM, DRBD, Filesystems.
    • Meta-packages / Tasksel to install and join a Cluster.
    • HA for UEC.
      • Continue with the research on HA for CLC, Walrus, CC, SC
      • Eventually, write OCF RA’s for above components.
    • Investigate on providing HA *inside* the Cloud.

  • PowerNap Improvements
    PowerNap is a power management tool, created by Dustin Kirkland, that has been integrated with the Ubuntu Enterprise Cloud. However, this sessions we will discuss how to extend the functionality of PowerNap to make it available for other kinds of environments, as well as providing alternative methods of power savings for Servers.

    The outcome of the discussion is:

    • Investigate how PowerNap could tap into Upstart to monitor processes in an event driven manner rather than polling /proc.
    • Use pm-powersave for PowerNap new power save mode.
    • Contribute any new actions to pm-utils (rather keeping in PowerNap)
    • Use event based monitoring for input polling (limited to keyboard and mouse)
    • Get network monitor matching the MAC in the WoL.
    • Provide a powerwaked to track machines registered and be able to schedule poweroff’s/updates.

If you would like to know more and you are not attending to UDS personally, you can still participate remotely. Or, you can just show up at the session. I hope to see there anyone who’s interested.

UPDATE: High Availability for the Ubuntu Enterprise Cloud (UEC) – Cloud Controller (CLC)

So I finally had the time to write the OCF Resource Agent for the Cloud Controller as promised. It is an early Resource Agent and currently is tested for CLC’s running in Ubuntu ONLY (UEC).

But first what is an OCF Resource Agent? An OCF RA is an executable script that is used to manage a resource within a cluster. In this case, this RA is a script that will manage the resource (Cloud Controller) in a 2 node pacemaker based HA Cluster. The resource starts, stops, and monitors the service (Cloud Controller) when the Cluster Resource Manager (Pacemaker) indicates it to (This means that upstart will NOT start the CLC).

Now that we all know what are OCF RA’s, let’s test it: First download the RA from HERE and move the resource to:

wget -c http://people.ubuntu.com/~andreserl/eucaclc
sudo mkdir /usr/lib/ocf/resource.d/ubuntu
sudo mv eucaclc /usr/lib/ocf/resource.d/ubuntu/eucaclc
sudo chmod 755 /usr/lib/ocf/resource.d/ubuntu/eucaclc

Then, change the cluster configuration (sudo crm configure edit) for res_uec resource as follows:

primitive res_uec ocf:ubuntu:eucaclc op monitor interval=”20s”

And the new RA should start the Cloud Controller automatically and keep monitoring it.

NOTE: Please note that this Resource Agent is an initial draft and might be buggy. If you find any bugs or things don’t work as expected, please don’t hesitate to contact me.

At UDS-M, I raised the concern of the lack of High Availability for the Ubuntu Enterprise Cloud (UEC). As part as the Cluster Stack Blueprint, the effort of trying to bring HA to UEC was defined, however, it was barely discussed due to the lack of time, and the work on HA for the UEC has been deferred for Natty. However, in preparation for the next release cycle, I’ve been able to setup a two node HA Cluster (Master/Slave) for the Cloud Controller (CLC).

NOTE: Note that this tutorial is an early draft and might contain typos/erros that I might have not noticed. Also, this might not also work for you, that’s why I first recommend to have a UEC up and running with one CLC, and then add the second CLC. If you need help or guidance, you know where to find me :). Also note that this is only for testing purposes!, and I’ll be moving this HowTo to an Ubuntu Wiki page soon since the formatting seems to be somehow annoying :).

1. Installation Considerations
I’ll show you how to configure two UEC (eucalyptus) Cloud Controllers in High Availability (Active/Passive) , using the HA Clustering tools (Pacemaker, Heartbeat), and DRBD for replication between CLC’s. This is shown in the following image.

The setup I used is a 4 node setup, 1 CLC, 1 Walrus, 1 CC/SC, 1 NC, as it is detailed in the UEC Advanced Installation Doc, however, I installed the packages from the Ubuntu Server Installer. Now, as per the UEC Advanced Installation Doc, it is assumed that there is only one network interface (eth0) in the Cloud Controller connected to a “public network” that connects it to both, the outside world and the other components in the Cloud. However, to be able to provide HA be need the following requirements:

  • First, we need a Virtual IP (VIP) to allow both, the clients and the other Controllers to access either one of the CLC’s using that single IP. In this case, we are assuming that the “public network” is, and that the VIP is This VIP will also be used to generate the new certificates.
  • Second, we need to add a second network interface to the CLC’s to use it as a replication link between DRBD. This second interface is eth1 and will have address ranged in

2. Install Second Cloud Controller (CLC2)
Once you finish setting up the UEC and everything is working as expected, please install a second cloud controller.
Once installed, it is desirable to not start the services just yet. However, you will need to exchange the CLC ssh keys with both the CC and the Walrus as it is specified in SSH Key Authentication Setup, under STEP4 of the UEC Advanced Installation doc. Please note that this second CLC will also have two interfaces, eth0 and eth1. Leave eth1 unconfigured, but configure eth0 with an IP address in the same network as the other controllers.

3. Configure Second Network Interface
Once the two CLC’s are installed (CLC1 and CLC2), we need to configure eth1. This interface will be used as a direct link between CLC1 and CLC2 and will be used by DRBD as the replication link. In this example, we’ll be using On your /etc/network/interfaces.

On CLC1:

auth eth1
iface eth1 inet static

On CLC2:

auth eth1
iface eth1 inet static

NOTE: Do *NOT* add the gateway because it is a direct link between CLC’s. If we add it, it will create a default route the configuration of the resources will fail further along the way.

4. Setting up DRBD

Once the CLC2 is installed and configured, we need to setup DRBD for replication between CLC’s.

4.1. Create Partitions (CLC1/CLC2)
For this, we either need a new disk or disk partition. In my case, I’ll be using /dev/vdb1. Please note that both partitions need to be exactly equal in both nodes. You can create them whichever way you prefer.

4.2. Install DRBD and load module (CLC1/CLC2)
Now we need to install DRBD Utils.

sudo apt-get install drbd

Once it is installed, we need to load the kernel module, and add it is /etc/modules. Please note that DRBD Kernel Module is now included in mainline kernel.

sudo modprobe drbd
sudo -i
echo drbd >> /etc/modules

4.3. Configuring the DRBD resource (CLC1/CLC2)
Add a new resource for DRBD by editing the following file:

sudo vim /etc/drbd.d/uec-clc.res

The configuration looks similar as the following:

resource uec-clc {
device /dev/drbd0;
disk /dev/vdb1;
meta-disk internal;
on clc1 {
on clc2 {
syncer {
rate 10M;

4.4. Creating the resource (CLC1/CLC2)
Now we need to do the following on CLC1 and CLC2:

sudo drbdadm create-md uec-clc
sudo drbdadm up uec-clc

4.5. Establishing initial communication (CLC1)
Now, we need to do the following:

sudo drbdadm -- --clear-bitmap new-current-uuid uec-clc
sudo drbdadm primary uec-clc
mkfs -t ext4 /dev/drbd0

4.6. Copying the Cloud Controller Data for DRBD Replication (CLC1)
Once the DRBD nodes are in sync, we need have the data replicated between the CLC1 and the CLC2 and make the necessary changes so that they both can access the data at a given point in time. To do this, do the following in CLC1:

sudo mkdir /mnt/uecdata
sudo mount -t ext4 /dev/drbd0 /mnt/uecdata
sudo mv /var/lib/eucalyptus/ /mnt/uecdata
sudo mv /var/lib/image-store-proxy/ /mnt/uecdata
sudo ln -s /mnt/uecdata/eucalyptus/ /var/lib/eucalyptus
sudo ln -s /mnt/uecdata/image-store-proxy/ /var/lib/image-store-proxy
sudo umount /mnt/uecdata

What we did here is to move the Cloud Controller data to the DRBD mount point so that it get’s replicated to the second CLC, and then do a symlink from the mountpoint to the original data folders.

4.7. Preparing the second Cloud Controller (CLC2)
Once we prepared the data in CLC1, we can discard the data in CLC2, and we need to create the symlinks the same way we did in the CLC1. We do this as follows:

sudo mkdir /mnt/uecdata
sudo rm -fr /var/lib/eucalyptus
sudo rm -fr /var/lib/image-store-proxy
sudo ln -s /mnt/uecdata/eucalyptus/ /var/lib/eucalyptus
sudo ln -s /mnt/uecdata/image-store-proxy/ /var/lib/image-store-proxy

After this, the data will be replicated via DRBD. Whenever CLC1.

5. Setup the Cluster

5.1. Install the Cluster Tools
First we need to install the clustering tools:

sudo apt-get install heartbeat pacemaker

5.2. Configure Heartbeat
Then we need to configure Heartbeat. First, create /etc/ha.d/ha.cf and add the following:

autojoin none
mcast eth0 649 1 0
warntime 5
deadtime 15
initdead 60
keepalive 2
node clc1
node clc2
crm respawn

Then create the authentication file (/etc/ha.d/authkeys), ad add the following:

1 md5 password

and change the permissions:

sudo chmod 600 /etc/ha.d/authkeys

5.3. Removing Startup of services at boot up
We need to let the Cluster manage the resources, instead of starting them at bootup.

sudo update-rc.d -f eucalyptus remove
sudo update-rc.d -f eucalyptus-cloud remove
sudo update-rc.d -f eucalyptus-network remove
sudo update-rc.d -f image-store-proxy remove

And we also need to change the “start on” to “stop on” in the upstart configuration scripts at /etc/init/* for:


5.4. Configuring the resources
Then, we need to configure the cluster resources. For this do the following:

sudo crm configure

and paste the following:

primitive res_fs_clc ocf:heartbeat:Filesystem params device=/dev/drbd/by-res/uec-clc directory=/mnt/uecdata fstype=ext4 options=noatime
primitive res_ip_clc ocf:heartbeat:IPaddr2 params ip= cidr_netmask=24 nic=eth0
primitive res_ip_clc_src ocf:heartbeat:IPsrcaddr params ipaddress=""
primitive res_uec upstart:eucalyptus  op start timeout=120s op stop timeout=120s op monitor interval=30s
primitive res_uec_image_store_proxy lsb:image-store-proxy
group rg_uec res_fs_clc res_ip_clc res_ip_clc_src res_uec res_uec_image_store_proxy
primitive res_drbd_uec-clc ocf:linbit:drbd params drbd_resource=uec-clc
ms ms_drbd_uec res_drbd_uec-clc meta notify=true
order o_drbd_before_uec inf: ms_drbd_uec:promote rg_uec:start
colocation c_uec_on_drbd inf: rg_uec ms_drbd_uec:Master
property stonith-enabled=False
property no-quorum-policy=ignore

6. Specify the Cloud IP for the CC, NC, and in the CLC.
Once you finish the configuration above, one of the CLC’s will be the Active one and the Second will the passive one. The Cluster Resource Manager will decide which one will become the primary one. However, it is expected that CLC1 will become the primary.

Now, as specified in the UEC Advanced Installation Doc, we need to specify the Cloud Controller VIP in the CC. However it is also important to do it in the NC. This is done in /etc/eucalyptus/eucalyptus.conf by adding:


Then, log into the Web Front end (, and change the Cloud Configuration to have the VIP as the Cloud Host.

By doing this you will have the new certificates generated with the VIP, that will allow you to connect to the cloud even if the primary Cloud Controller failed, and the Second one tool control of the service.

Finally, restart the Walrus, CC/SC, and NC and enjoy.

7. Final Thoughts
The cluster resource manager is using the upstart script to manage the Cloud Controller. However, this is not optimal, and it is used for testing purposes. The creation of an OCF Resource Agent will be required to adequately start/stop and monitor eucalyptus. The OCF RA will be developed soon, and this will be discussed at Ubuntu Developer Summit – Natty.

My Thesis…

For all of you who wanted to take a look to my thesis (Design of a model to implement High Availability Web Servers, an overview here)… you can donwload it from here: http://roaksoax.files.wordpress.com/2009/03/thesis.pdf

Btw… it is in Spanish. :)

Cluster Sinchronization Tool (CSync2)

As you may know, there are many tools for file synchronization between servers that can suit your needs, but Csync2 (Website and Paper) was specially designed for Cluster File Synchronization, which makes it a great tool to synchronize config files and folders.

Now, I’ll show you a simple way of configuring it, by having a master server (where we can make changes to the config files) and one or multiple slave servers, where the files will be synchronized. First of all, we have to install it along with other packages:

:~# sudo apt-get install csync2 sqlite3 openssl xinetd

After having everything installed, we have to create the certificates that will allow Csync2 authenticate between servers so that the files can be synchronized. To do that we do this:

Read the rest of this entry »