Lance Albertson Musings of a UNIX SysAdmin, jazz lover, and wine/beer snob

21May/1016

Installing Ganeti on Gentoo

Installing Ganeti is a relatively simple process on Gentoo. This post will go over the basics on getting it running on Gentoo. Its based primarily on a wiki page at the OSUOSL so check it out for more detailed instructions. I also recommend you read the upstream docs on Ganeti prior to installing it on your own. It will cover a lot more topics in detail and this post is intended just as a diff from that doc.

I should note that I have only installed Ganeti with KVM and have not tested it with Xen on Gentoo. I appreciate feedback if you have installed and used Xen with Ganeti on Gentoo. I'm also the current package maintainer for Ganeti and the related packages in Gentoo such as:

The first step is to install a base Gentoo system using the standard profile. You can use a hardened profile however if you intend to use ganeti-htools, it requires haskell which seems to have issues in hardened.

Configuring DNS

Ganeti requires the following names to resolve before you can set it up.

  • A master name for the cluster, this IP must be available (ganeti.example.org)
  • A name for each node or Dom0 (node1.example.org)
  • A name for each instance or virtual machine (instance1.example.org)

Kernel

DRBD is optional in Ganeti so you can skip this step if you're not planning on using it. DRBD was recently included in the mainline kernel in 2.6.33 however Gentoo's DRBD packages do not currently reflect that. I hope to get that changed soon but for now you have two options.

  1. Install gentoo-sources, drbd, and drbd-kernel
  2. Install gentoo-sources & enable drbd, install drbd without deps

For simplicity, I'll describe option #2 above below. Check out the wiki page for #1.

DRBD requires you have the following option enabled. Make sure you've rebooted using a kernel with these options above before you continue.

Device Drivers --->
    <*> Connector - unified userspace <-> kernelspace linker

We recommend that you keyword both sys-cluster/drbd and sys-cluster/drbd-kernel so that you pull in the latest 8.3.x version.

echo "sys-cluster/drbd" >> /etc/portage/package.keywords
echo "sys-cluster/drbd-kernel" >> /etc/portage/package.keywords

Install DRBD.

emerge drbd

Ganeti uses DRBD in a unique way and requires the module to be loaded with specific settings. Add the autoload settings and load the module.

echo "drbd minor_count=255 usermode_helper=/bin/true" >> /etc/modules.autoload.d/kernel-2.6
modprobe drbd

If you forget this step, you will get an error similar to the one mentioned in this email thread.

Install Ganeti

Set the appropriate USE flags. In this case we will be using kvm with drbd.

echo "app-emulation/ganeti kvm drbd" >> /etc/portage/package.use

Install Ganeti (you might need to keyword other dependencies)

emerge ganeti

Configure Networking

There's currently two methods for setting up networking: bridged or routed. I picked the bridged method mainly because I'm familiar with the setup and it seemed to be the simplest.

Ideally you should have a public network that will be used for communicating with the nodes and instances from the outside, and a backend private network that will be used by ganeti for DRBD, migrations, etc. Assuming your public IP (which node1.example.org should resolve to) is 10.1.0.11 and your backend IP is 192.168.1.11, you should edit /etc/conf.d/net to look something like this:

bridge_br0="eth0"
config_eth0=( "null" )

config_br0=( "10.1.0.11 netmask 255.255.254.0" )
routes_br0=( "default gw 10.1.0.1" )

# make sure eth0 is up before configuring br0
depend_br0() {
        need net.eth0
}

config_eth1=( "192.168.1.11 netmask 255.255.255.0" )

You can have a more complicated networking setup using VLAN tagging and bridging but I'll go over that in another blog post.

Set the Hostname

Ganeti is picky about hostnames, and requires that the output of hostname be fully qualified. So make sure /etc/conf.d/hostname uses the FQDN and looks like this:

HOSTNAME="node1.example.org"

NOT like this:

HOSTNAME="node1"

Configure LVM

It is recommended that you edit this line in /etc/lvm/lvm.conf

filter = [ "r|/dev/nbd.*|", "a/.*/", "r|/dev/drbd[0-9]+|" ]

The important part is the

r|/dev/drbd[0-9]+|

entry, which will prevent LVM from scanning drbd devices.

Now, go ahead and create an LVM volume group with the disks you plan to use for instance storage. The default name that Ganeti prefers is xenvg but we recommend you choose something more useful for your infrastructure (we use ganeti).

pvcreate /dev/sda3
lvcreate ganeti /dev/sda3

Initialize the Cluster

Now we can initialize the cluster on the first node. The command below will do the following:

  • Set br0 as the primary interface for Ganeti communication
  • Set 192.168.1.11 as the DRBD ip for the node
  • Enable KVM
  • Set the default bridged interface for instances to br0
  • Set the default KVM settings to 2 vcpus & 512M RAM
  • Set the default kernel path to /boot/guest/vmlinuz-x86_64
  • Set the master DNS name is ganeti.example.org
gnt-cluster init --master-netdev=br0 \
  -g ganeti \
  -s 192.168.1.11 \
  --enabled-hypervisors=kvm \
  -N link=br0 \
  -B vcpus=2,memory=512M \
  -H kvm:kernel_path=/boot/guest/vmlinuz-x86_64
  ganeti.example.org

Now you have a ganeti cluster! Lets verify everything is setup correctly.

$ gnt-cluster verify
Sun May 16 22:43:00 2010 * Verifying global settings
Sun May 16 22:43:00 2010 * Gathering data (1 nodes)
Sun May 16 22:43:02 2010 * Verifying node status
Sun May 16 22:43:02 2010 * Verifying instance status
Sun May 16 22:43:02 2010 * Verifying orphan volumes
Sun May 16 22:43:02 2010 * Verifying remaining instances
Sun May 16 22:43:02 2010 * Verifying N+1 Memory redundancy
Sun May 16 22:43:02 2010 * Other Notes
Sun May 16 22:43:02 2010 * Hooks Results

Yay!

SSH Keys

Ganeti uses ssh to run some tasks but not for all tasks. During the initialization, it generated a new ssh key for the root user and installs it in /root/.ssh/authorized_keys. In our case, we manage that file with cfengine, so to work around it we copy the key as /root/.ssh/authorized_keys2 which ssh will automatically pick up.

Adding nother node

To add an additional node, you duplicate the setup steps above skipping initializing the cluster. Instead run the following command:

gnt-node add -s <node drbd_ip> <node hostname>

Next steps...

The next steps is actually deploying new virtual machines using Ganeti. I wrote a new instance creation script called ganeti-instance-image which uses disk images for deployment. I'm currently working on a new project website with detailed documentation and a blog post about it as well. We're able to deploy new virtual machines (such as Ubuntu, Centos, or Gentoo) in under 30 seconds using this method!

20May/101

Power Outage: A true test for Ganeti

Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices. Ganeti has a nice app called ganeti-watcher which is run every five minutes via cron. It has two primary functions currently (taken from ganeti-watcher(8)):

  1. Keep running all instances as marked (i.e. if they were running, restart them)
  2. Repair DRBD links by reactivating the block devices of instances which have secondaries on nodes that have rebooted.

The watcher app took around 30 minutes to bring all 62 VMs back online. The load on most of the nodes didn't go over 4 during the recovery which is quite impressive considering how much I/O its doing while VMs are booting. Normally the nodes have loads between 0.3 and 0.5. There were only 3 VMs that didn't boot cleanly because of incorrect fstab entries or incorrect kernel path settings in ganeti which was easy to fix. I was surprised we didn't have more issues like that.

While ganeti is bringing instances back online you can tail watcher.log which is generally at /var/log/ganeti/watcher.log and will show output similar to this:

2010-05-20 04:06:25,077:  pid=10202 INFO Restarting busybox.osuosl.org (Attempt #1)
2010-05-20 04:07:16,311:  pid=10202 INFO Restarting driverdev.osuosl.org (Attempt #1)
2010-05-20 04:07:18,346:  pid=10202 INFO Restarting pcc.osuosl.org (Attempt #1)

And once its finished will show output like this:

2010-05-20 04:35:04,066:  pid=22741 INFO Restart of busybox.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of driverdev.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of pcc.osuosl.org succeeded

It was great watching this system recover everything automatically with little issues and quickly. Needless to say, outages are a bad thing and its our fault that our cluster went down like this but it was great seeing this system work nearly flawlessly. We'll soon fix the power situation for our cluster so this shouldn't happen again.

Take that ESX ;-)

15May/103

Creating a scalable virtualization cluster with Ganeti

Creating a virtualization cluster that is scalable, cheap, and easy to manage usually doesn't happen in the same sentence. Generally it involves a combination of a complex set of tools tied together, expensive storage, and difficult to scale. While I think that the suite of tools that use libvirt are great and are headed in the right direction, they're still not quite the right tool for the right job in some situations. There's also commercial solutions such as VMWare and Xen Server that are great but both cost money (especially if you want cluster features). If you're looking for a completely open source solution, then you may have found it.

Enter Ganeti, an open source virtualization management platform created by Google engineers. I never heard of it until one of the students that works for me at the OSUOSL mentioned it while he was being an intern at Google. The design and goal of Ganeti is to create a virtualization cluster that is stable, easy to use, and doesn't require expensive hardware.

So what makes it so awesome?

  • A master node controls all instances (virtual machines)
  • Built-in support for DRBD backed storage on all instances
  • Automated instance (virtual machine) deployment
  • Simple management tools all written in easy to read python
  • Responsive and helpful developer community
  • Works with both Xen and KVM

DRBD

The key feature that got me interested was the built-in DRBD support which enables us to have a "poor man's" SAN using local server storage. DRBD is essentially like having RAID1 over the network between two servers. It duplicates data between two block devices and keeps them in sync. Until recently, DRBD had to be built as an externel kernel module, but it was recently added to the mainline kernel in 2.6.33. Ganeti has a seamless DRBD integration and requires you to have little knowledge in the specific details of setting it up.

Centralized Instance Management

Before Ganeti, we had to look up which node an instance was located and it was difficult to see the whole cluster's state as a whole. During a crisis we would lose valuable time trying to locate a virtual machine, especially if it had been moved because of a hardware failure. Ganeti sets one node as a master and controls the other nodes via remote ssh commands and a restful API. You can switch which node is the master with one simple command and also recover a master node if it went offline. All ganeti commands must run on the master node.

Ganeti currently uses command line based interactions for all management tasks. However, it would not be difficult to create a web frontend to manage it. The OSUOSL actually has a working prototype of a django based web frontend that we'll eventually release once its out of alpha testing.

Automated Deployment

Ganeti uses a set of bash scripts to create an instance on the fly. Each of these scripts is considered an OS definition and they include a debootstrap package by default. Since we use several different distributions, I decided to write my own OS definition using file system dumps instead of direct OS install scripts. This reduced the deployment time considerably to the point where we can deploy a new virtual machine in 30 seconds (not counting DRBD sync time). You can optionally use scripts to setup grub, serial, and networking during the deployment.

Developer Community

The developer community surrounding Ganeti is still quite small but they are very helpful and responsive. I've sent in several feature and bug requests on their tracker and usually have a response within 24hrs and even a committed patch withing 48 hours. The end users on the mailing lists are quite helpful and usually response quickly as well. Nothing is more important to me in a project than the health and responsiveness of the community.

OSUOSL use of Ganeti

We recently migrated all of our virtual machines to Ganeti using KVM from Xen. We went from using a 14 blade servers and 3 disk nodes to 4 1U servers with faster processors, disks, and RAM. We instantly noticed a 2 to 3 times performance boost in I/O and CPU. A part of boost was the change in the backend storage, another is KVM.

We currently host around 60 virtual machines total (~15 per node) and can host up to 90 VMS with our current hardware configuration. Adding an additional node is a simple task and takes only minutes once all the software is installed. The new server doesn't need to have the exact same specs however I would recommend using at least have similar types and speeds of disks and CPUs.

Summary

Ganeti is still young but has matured very quickly over the last year or so. It may not be the best solution for everyone but it seems to fit quite well at the OSUOSL. I'll be writing several posts that cover the basics of installing and using Ganeti. Additionally I'll cover some of the specific steps we took to deploy our cluster.

10May/100

Upcoming Talks

I'm going to be speaking at several open source conferences this summer. Most of my talks will be centered around a virtualization management tool called Ganeti. I plan to write several blog posts in the coming weeks going over Ganeti in preparation to my talk at Open Source Bridge.

Here's the list of conferences I will be speaking at:

WhenConferenceWhereTalk TitleSlides
June 1-4, 2010Open Source BridgePortland, ORCreating a low-cost clustered virtualization environment using Ganetiganeti-osb10
July 19-23, 2010OSCONPortland, ORScaling your Open-Source Project Infrastructure on a Shoestring (panel)
August 10-12, 2010LinuxConBoston, MACreating a low-cost clustered virtualization environment using Ganeti

Let me know if you will be attending any of these conferences so we can hang out!