Lance Albertson Musings of a UNIX SysAdmin, jazz lover, and wine/beer snob

21May/1016

Installing Ganeti on Gentoo

Installing Ganeti is a relatively simple process on Gentoo. This post will go over the basics on getting it running on Gentoo. Its based primarily on a wiki page at the OSUOSL so check it out for more detailed instructions. I also recommend you read the upstream docs on Ganeti prior to installing it on your own. It will cover a lot more topics in detail and this post is intended just as a diff from that doc.

I should note that I have only installed Ganeti with KVM and have not tested it with Xen on Gentoo. I appreciate feedback if you have installed and used Xen with Ganeti on Gentoo. I'm also the current package maintainer for Ganeti and the related packages in Gentoo such as:

The first step is to install a base Gentoo system using the standard profile. You can use a hardened profile however if you intend to use ganeti-htools, it requires haskell which seems to have issues in hardened.

Configuring DNS

Ganeti requires the following names to resolve before you can set it up.

  • A master name for the cluster, this IP must be available (ganeti.example.org)
  • A name for each node or Dom0 (node1.example.org)
  • A name for each instance or virtual machine (instance1.example.org)

Kernel

DRBD is optional in Ganeti so you can skip this step if you're not planning on using it. DRBD was recently included in the mainline kernel in 2.6.33 however Gentoo's DRBD packages do not currently reflect that. I hope to get that changed soon but for now you have two options.

  1. Install gentoo-sources, drbd, and drbd-kernel
  2. Install gentoo-sources & enable drbd, install drbd without deps

For simplicity, I'll describe option #2 above below. Check out the wiki page for #1.

DRBD requires you have the following option enabled. Make sure you've rebooted using a kernel with these options above before you continue.

Device Drivers --->
    <*> Connector - unified userspace <-> kernelspace linker

We recommend that you keyword both sys-cluster/drbd and sys-cluster/drbd-kernel so that you pull in the latest 8.3.x version.

echo "sys-cluster/drbd" >> /etc/portage/package.keywords
echo "sys-cluster/drbd-kernel" >> /etc/portage/package.keywords

Install DRBD.

emerge drbd

Ganeti uses DRBD in a unique way and requires the module to be loaded with specific settings. Add the autoload settings and load the module.

echo "drbd minor_count=255 usermode_helper=/bin/true" >> /etc/modules.autoload.d/kernel-2.6
modprobe drbd

If you forget this step, you will get an error similar to the one mentioned in this email thread.

Install Ganeti

Set the appropriate USE flags. In this case we will be using kvm with drbd.

echo "app-emulation/ganeti kvm drbd" >> /etc/portage/package.use

Install Ganeti (you might need to keyword other dependencies)

emerge ganeti

Configure Networking

There's currently two methods for setting up networking: bridged or routed. I picked the bridged method mainly because I'm familiar with the setup and it seemed to be the simplest.

Ideally you should have a public network that will be used for communicating with the nodes and instances from the outside, and a backend private network that will be used by ganeti for DRBD, migrations, etc. Assuming your public IP (which node1.example.org should resolve to) is 10.1.0.11 and your backend IP is 192.168.1.11, you should edit /etc/conf.d/net to look something like this:

bridge_br0="eth0"
config_eth0=( "null" )

config_br0=( "10.1.0.11 netmask 255.255.254.0" )
routes_br0=( "default gw 10.1.0.1" )

# make sure eth0 is up before configuring br0
depend_br0() {
        need net.eth0
}

config_eth1=( "192.168.1.11 netmask 255.255.255.0" )

You can have a more complicated networking setup using VLAN tagging and bridging but I'll go over that in another blog post.

Set the Hostname

Ganeti is picky about hostnames, and requires that the output of hostname be fully qualified. So make sure /etc/conf.d/hostname uses the FQDN and looks like this:

HOSTNAME="node1.example.org"

NOT like this:

HOSTNAME="node1"

Configure LVM

It is recommended that you edit this line in /etc/lvm/lvm.conf

filter = [ "r|/dev/nbd.*|", "a/.*/", "r|/dev/drbd[0-9]+|" ]

The important part is the

r|/dev/drbd[0-9]+|

entry, which will prevent LVM from scanning drbd devices.

Now, go ahead and create an LVM volume group with the disks you plan to use for instance storage. The default name that Ganeti prefers is xenvg but we recommend you choose something more useful for your infrastructure (we use ganeti).

pvcreate /dev/sda3
lvcreate ganeti /dev/sda3

Initialize the Cluster

Now we can initialize the cluster on the first node. The command below will do the following:

  • Set br0 as the primary interface for Ganeti communication
  • Set 192.168.1.11 as the DRBD ip for the node
  • Enable KVM
  • Set the default bridged interface for instances to br0
  • Set the default KVM settings to 2 vcpus & 512M RAM
  • Set the default kernel path to /boot/guest/vmlinuz-x86_64
  • Set the master DNS name is ganeti.example.org
gnt-cluster init --master-netdev=br0 \
  -g ganeti \
  -s 192.168.1.11 \
  --enabled-hypervisors=kvm \
  -N link=br0 \
  -B vcpus=2,memory=512M \
  -H kvm:kernel_path=/boot/guest/vmlinuz-x86_64
  ganeti.example.org

Now you have a ganeti cluster! Lets verify everything is setup correctly.

$ gnt-cluster verify
Sun May 16 22:43:00 2010 * Verifying global settings
Sun May 16 22:43:00 2010 * Gathering data (1 nodes)
Sun May 16 22:43:02 2010 * Verifying node status
Sun May 16 22:43:02 2010 * Verifying instance status
Sun May 16 22:43:02 2010 * Verifying orphan volumes
Sun May 16 22:43:02 2010 * Verifying remaining instances
Sun May 16 22:43:02 2010 * Verifying N+1 Memory redundancy
Sun May 16 22:43:02 2010 * Other Notes
Sun May 16 22:43:02 2010 * Hooks Results

Yay!

SSH Keys

Ganeti uses ssh to run some tasks but not for all tasks. During the initialization, it generated a new ssh key for the root user and installs it in /root/.ssh/authorized_keys. In our case, we manage that file with cfengine, so to work around it we copy the key as /root/.ssh/authorized_keys2 which ssh will automatically pick up.

Adding nother node

To add an additional node, you duplicate the setup steps above skipping initializing the cluster. Instead run the following command:

gnt-node add -s <node drbd_ip> <node hostname>

Next steps...

The next steps is actually deploying new virtual machines using Ganeti. I wrote a new instance creation script called ganeti-instance-image which uses disk images for deployment. I'm currently working on a new project website with detailed documentation and a blog post about it as well. We're able to deploy new virtual machines (such as Ubuntu, Centos, or Gentoo) in under 30 seconds using this method!

20May/101

Power Outage: A true test for Ganeti

Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices. Ganeti has a nice app called ganeti-watcher which is run every five minutes via cron. It has two primary functions currently (taken from ganeti-watcher(8)):

  1. Keep running all instances as marked (i.e. if they were running, restart them)
  2. Repair DRBD links by reactivating the block devices of instances which have secondaries on nodes that have rebooted.

The watcher app took around 30 minutes to bring all 62 VMs back online. The load on most of the nodes didn't go over 4 during the recovery which is quite impressive considering how much I/O its doing while VMs are booting. Normally the nodes have loads between 0.3 and 0.5. There were only 3 VMs that didn't boot cleanly because of incorrect fstab entries or incorrect kernel path settings in ganeti which was easy to fix. I was surprised we didn't have more issues like that.

While ganeti is bringing instances back online you can tail watcher.log which is generally at /var/log/ganeti/watcher.log and will show output similar to this:

2010-05-20 04:06:25,077:  pid=10202 INFO Restarting busybox.osuosl.org (Attempt #1)
2010-05-20 04:07:16,311:  pid=10202 INFO Restarting driverdev.osuosl.org (Attempt #1)
2010-05-20 04:07:18,346:  pid=10202 INFO Restarting pcc.osuosl.org (Attempt #1)

And once its finished will show output like this:

2010-05-20 04:35:04,066:  pid=22741 INFO Restart of busybox.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of driverdev.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of pcc.osuosl.org succeeded

It was great watching this system recover everything automatically with little issues and quickly. Needless to say, outages are a bad thing and its our fault that our cluster went down like this but it was great seeing this system work nearly flawlessly. We'll soon fix the power situation for our cluster so this shouldn't happen again.

Take that ESX ;-)