Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices. Ganeti has a nice app called ganeti-watcher which is run every five minutes via cron. It has two primary functions currently (taken from ganeti-watcher(8)):

  1. Keep running all instances as marked (i.e. if they were running, restart them)
  2. Repair DRBD links by reactivating the block devices of instances which have secondaries on nodes that have rebooted.

The watcher app took around 30 minutes to bring all 62 VMs back online. The load on most of the nodes didn’t go over 4 during the recovery which is quite impressive considering how much I/O its doing while VMs are booting. Normally the nodes have loads between 0.3 and 0.5. There were only 3 VMs that didn’t boot cleanly because of incorrect fstab entries or incorrect kernel path settings in ganeti which was easy to fix. I was surprised we didn’t have more issues like that.

While ganeti is bringing instances back online you can tail watcher.log which is generally at /var/log/ganeti/watcher.log and will show output similar to this:

2010-05-20 04:06:25,077:  pid=10202 INFO Restarting busybox.osuosl.org (Attempt #1)
2010-05-20 04:07:16,311:  pid=10202 INFO Restarting driverdev.osuosl.org (Attempt #1)
2010-05-20 04:07:18,346:  pid=10202 INFO Restarting pcc.osuosl.org (Attempt #1)

And once its finished will show output like this:

2010-05-20 04:35:04,066:  pid=22741 INFO Restart of busybox.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of driverdev.osuosl.org succeeded
2010-05-20 04:35:04,066:  pid=22741 INFO Restart of pcc.osuosl.org succeeded

It was great watching this system recover everything automatically with little issues and quickly. Needless to say, outages are a bad thing and its our fault that our cluster went down like this but it was great seeing this system work nearly flawlessly. We’ll soon fix the power situation for our cluster so this shouldn’t happen again.

Take that ESX ;-)

Creating a virtualization cluster that is scalable, cheap, and easy to manage usually doesn’t happen in the same sentence. Generally it involves a combination of a complex set of tools tied together, expensive storage, and difficult to scale. While I think that the suite of tools that use libvirt are great and are headed in the right direction, they’re still not quite the right tool for the right job in some situations. There’s also commercial solutions such as VMWare and Xen Server that are great but both cost money (especially if you want cluster features). If you’re looking for a completely open source solution, then you may have found it.

Enter Ganeti, an open source virtualization management platform created by Google engineers. I never heard of it until one of the students that works for me at the OSUOSL mentioned it while he was being an intern at Google. The design and goal of Ganeti is to create a virtualization cluster that is stable, easy to use, and doesn’t require expensive hardware.

So what makes it so awesome?

  • A master node controls all instances (virtual machines)
  • Built-in support for DRBD backed storage on all instances
  • Automated instance (virtual machine) deployment
  • Simple management tools all written in easy to read python
  • Responsive and helpful developer community
  • Works with both Xen and KVM

DRBD

The key feature that got me interested was the built-in DRBD support which enables us to have a “poor man’s” SAN using local server storage. DRBD is essentially like having RAID1 over the network between two servers. It duplicates data between two block devices and keeps them in sync. Until recently, DRBD had to be built as an externel kernel module, but it was recently added to the mainline kernel in 2.6.33. Ganeti has a seamless DRBD integration and requires you to have little knowledge in the specific details of setting it up.

Centralized Instance Management

Before Ganeti, we had to look up which node an instance was located and it was difficult to see the whole cluster’s state as a whole. During a crisis we would lose valuable time trying to locate a virtual machine, especially if it had been moved because of a hardware failure. Ganeti sets one node as a master and controls the other nodes via remote ssh commands and a restful API. You can switch which node is the master with one simple command and also recover a master node if it went offline. All ganeti commands must run on the master node.

Ganeti currently uses command line based interactions for all management tasks. However, it would not be difficult to create a web frontend to manage it. The OSUOSL actually has a working prototype of a django based web frontend that we’ll eventually release once its out of alpha testing.

Automated Deployment

Ganeti uses a set of bash scripts to create an instance on the fly. Each of these scripts is considered an OS definition and they include a debootstrap package by default. Since we use several different distributions, I decided to write my own OS definition using file system dumps instead of direct OS install scripts. This reduced the deployment time considerably to the point where we can deploy a new virtual machine in 30 seconds (not counting DRBD sync time). You can optionally use scripts to setup grub, serial, and networking during the deployment.

Developer Community

The developer community surrounding Ganeti is still quite small but they are very helpful and responsive. I’ve sent in several feature and bug requests on their tracker and usually have a response within 24hrs and even a committed patch withing 48 hours. The end users on the mailing lists are quite helpful and usually response quickly as well. Nothing is more important to me in a project than the health and responsiveness of the community.

OSUOSL use of Ganeti

We recently migrated all of our virtual machines to Ganeti using KVM from Xen. We went from using a 14 blade servers and 3 disk nodes to 4 1U servers with faster processors, disks, and RAM. We instantly noticed a 2 to 3 times performance boost in I/O and CPU. A part of boost was the change in the backend storage, another is KVM.

We currently host around 60 virtual machines total (~15 per node) and can host up to 90 VMS with our current hardware configuration. Adding an additional node is a simple task and takes only minutes once all the software is installed. The new server doesn’t need to have the exact same specs however I would recommend using at least have similar types and speeds of disks and CPUs.

Summary

Ganeti is still young but has matured very quickly over the last year or so. It may not be the best solution for everyone but it seems to fit quite well at the OSUOSL. I’ll be writing several posts that cover the basics of installing and using Ganeti. Additionally I’ll cover some of the specific steps we took to deploy our cluster.

I’m going to be speaking at several open source conferences this summer. Most of my talks will be centered around a virtualization management tool called Ganeti. I plan to write several blog posts in the coming weeks going over Ganeti in preparation to my talk at Open Source Bridge.

Here’s the list of conferences I will be speaking at:

WhenConferenceWhereTalk Title
June 26 - 29, 2012Open Source Bridge 2012Portland, ORComparing Open Source Private Cloud Platforms
June 26 - 29, 2012Open Source Bridge 2012Portland, ORPut the "Ops" in "Dev": what developers need to know about DevOps
July 26 - 27, 2012OSCON 2012Portland, ORComparing Open Source Private Cloud Platforms

Let me know if you will be attending any of these conferences so we can hang out!

I recently ran into an issue where I wanted to move several KVM based virtual machines from one server to another server. There’s several ways you can accomplish this depending on what you want to do. In my case I was using LVM for the disk backend, so simply copying the disk image files wasn’t an option. It boiled down to two basic options.

  • Put system in single-user mode, rsync the contents over, and reinstall grub
  • Use dd and copy the whole LVM volume over piped through ssh

The advantage using the rsync method is that you’re only copying the files you need over, thus less data transfer happens. But then you run into needing to re-run grub (which generally isn’t a problem). In addition, if you’re using LVM within the LVM volume for the VM and the volume group is named the same, you run into some interesting issues. The advantage for using dd is that you can get a literal copy of the disk image and just start the VM back up without any other steps. Of course, this will only work if the volumes are the same on both ends.

So I decided to go with dd but ran into a problem of seeing the progress of a 15G volume copy. I did some digging around and found a blog post that mentioned using a command line application called ‘bar‘ so I decided to give it a shot! Its a fairly simple application that just creates a basic progress bar based on the data being piped into it. If you’re running Gentoo, the package is called app-admin/bar.

Here’s the command I ended up running:

$ dd if=/dev/lvm/cholula-disk | bar -s 15g | \
    ssh -c arcfour $host "dd of=/dev/lvm/cholula-disk"

When ran, it gives you output similar to:

6.0GB at   17.9MB/s  eta:   0:08:32   40 [=========================                 ]

The downside is that you need to specify the block device size before hand, but for something simple like this its quite nice. Of course I could just use one of the many dd forks out there which include progress bars but this is quick, dirty, and simple!

I used the arcfour cipher mainly to reduce the CPU overhead and increase the throughput, but you should probably never use this cipher on an untrusted network as it does have weaknesses. I didn’t try doing throughput tests on other ciphers, but it would be interesting. It took me approximately 10-12 minutes to copy a 15G volume over a gigabit network which isn’t too bad.

Another trick you can do is utilitize the LVM snapshot feature and create a snapshot of the running volume. If any data changes on the volume, it won’t be copied over obviously, but it will at least let you do a cold “live” migration of sorts.

Beaver BarCamp is tomorrow in Corvallis and you should go to it! Why?

  • Its completely free and open to EVERYONE
  • Free food and drinks!
  • You get a free T-Shirt (if you register)
  • Cool and interesting topics & ideas will be discussed!
  • Meet other cool people from the area

But what is a BarCamp exactly?

BarCamp is an ad-hoc gathering born from the desire for people to share and learn in an open environment. It is an intense event with discussions, demos and interaction from participants who are the main actors of the event. — barcamp.org

beaver barcampThis is the third incarnation of Beaver BarCamp in Corvallis and is bound to be the largest thus far. Many people I have talked to have a hard time understanding what happens at a barcamp, and if its only for technical people. In the past, our barcamp has been tech focused, but that’s been primarily because the outreach for the event has been mostly directed at OSU EECS students. Barcamp’s are designed so that the people who attend also are the presenters. The more people who attend, the more variety you’ll have at the event. You don’t need to have a full-fledge presentation prepared, just an idea, a room (which we’ll provide), and people to talk to!

How do you give a session? Make sure you get to Kelly Engineering at 10AM, put your presentation and name on a sticky note, pick a room & time and stick the wall. We’ll adjust rooms based on the popularity of the session to ensure there is enough room. We have all of the meeting and classrooms available in Kelly essentially, so we shouldn’t have any problems finding a room! Food and drinks will be provided for free thanks to the sponsorship of the OSEL for breakfast, lunch, and dinner. We’ll also have a snack break in the afternoon!

Some fun non-technical examples you could see at a BarCamp:

  • How to kayake
  • How to brew beer
  • How to grow chickens in your backyard

Don’t feel like giving a session yourself? That’s OK! We won’t pressure you at all. You can come and attend just one session to see what its like and don’t need to spend all day there.

I plan on giving sessions about the following tomorrow:

  • What is Calagator and why Corvallis needs it
  • What’s the next step for the Corvallis Social Tech scene?
  • Ignite Corvallis brainstorm / roundtable discussion

I hope to see you there tomorrow!