Lance Albertson

Musings of a geek, jazz performer, and an OSUOSL sysadmin

Archive for the ‘ganeti’ Category

Rebalancing Ganeti Clusters

with one comment

One of the best features of Ganeti is its ability to grow linearly by adding new servers easily. We recently purchased a new server to expand our ever growing production cluster and needed to rebalance cluster. Adding and expanding the cluster consisted of the following steps:

  1. Installing the base OS on the new node
  2. Adding the node to your configuration management of choice and/or installing ganeti
  3. Add the node to the cluster with gnt-node add
  4. Check Ganeti using the verification action
  5. Use htools to rebalance the cluster

For simplicity sake I’ll cover the last three steps.

Adding the node

Assuming you’re using a secondary network, this is how you would add your node:

gnt-node add -s <secondary ip> newnode

Now lets check and make sure ganeti is happy:

gnt-cluster verify

If all is well, continue on otherwise try and resolve any issue that ganeti is complaining about.

Using htools

Make sure you install ganeti-htools on all your nodes before continuing. It requires haskell so just be aware of that requirement. Lets see what htools wants to do first:

hbal -m ganeti.example.org
Loaded 5 nodes, 73 instances
Group size 5 nodes, 73 instances
Selected node group: default
Initial check done: 0 bad nodes, 0 bad instances.
Initial score: 41.00076094
Trying to minimize the CV...
 1. openmrs.osuosl.org             g1.osuosl.bak:g2.osuosl.bak => g5.osuosl.bak:g1.osuosl.bak 38.85990831 a=r:g5.osuosl.bak f
 2. stagingvm.drupal.org           g3.osuosl.bak:g1.osuosl.bak => g5.osuosl.bak:g3.osuosl.bak 36.69303985 a=r:g5.osuosl.bak f
 3. scratchvm.drupal.org           g2.osuosl.bak:g4.osuosl.bak => g5.osuosl.bak:g2.osuosl.bak 34.61266967 a=r:g5.osuosl.bak f

<snip>

 28. crisiscommons1.osuosl.org      g3.osuosl.bak:g1.osuosl.bak => g3.osuosl.bak:g5.osuosl.bak 4.93089388 a=r:g5.osuosl.bak
 29. crisiscommons-web.osuosl.org   g2.osuosl.bak:g1.osuosl.bak => g1.osuosl.bak:g5.osuosl.bak 4.57788814 a=f r:g5.osuosl.bak
 30. aqsis2.osuosl.org              g1.osuosl.bak:g3.osuosl.bak => g1.osuosl.bak:g5.osuosl.bak 4.57312216 a=r:g5.osuosl.bak
Cluster score improved from 41.00076094 to 4.57312216
Solution length=30

I’ve shortened the actual output for the sake of this blog post. Htools automatically calculates which virtual machines to move and how using the least amount of operations. In most these moves, the VMs may simply be migrated, migrated & secondary storage replaced, or migrated, secondary storage replaced, migrated. In our environment we needed to move 30 VMs around out of the total 70 VMs that are hosted on the cluster.

Now lets see what commands we actually would need to run:

hbal -C -m ganeti.example.org
Commands to run to reach the above solution:

 echo jobset 1, 1 jobs
 echo job 1/1
 gnt-instance replace-disks -n g5.osuosl.bak openmrs.osuosl.org
 gnt-instance migrate -f openmrs.osuosl.org

 echo jobset 2, 1 jobs
 echo job 2/1
 gnt-instance replace-disks -n g5.osuosl.bak stagingvm.drupal.org
 gnt-instance migrate -f stagingvm.drupal.org

 echo jobset 3, 1 jobs
 echo job 3/1
 gnt-instance replace-disks -n g5.osuosl.bak scratchvm.drupal.org
 gnt-instance migrate -f scratchvm.drupal.org

<snip>

 echo jobset 28, 1 jobs
 echo job 28/1
 gnt-instance replace-disks -n g5.osuosl.bak crisiscommons1.osuosl.org

 echo jobset 29, 1 jobs
 echo job 29/1
 gnt-instance migrate -f crisiscommons-web.osuosl.org
 gnt-instance replace-disks -n g5.osuosl.bak crisiscommons-web.osuosl.org

 echo jobset 30, 1 jobs
 echo job 30/1
 gnt-instance replace-disks -n g5.osuosl.bak aqsis2.osuosl.org

Here you can see the commands it wants  you to execute. Now you can either put these all in a script and run them, split them up, or just run them one by one. In our case I ran them one by one just to be sure we didn’t run into any issues. I had a couple of VMs not migration properly but those were exactly fixed. I split this up into a three day migration running ten jobs a day.

The length of time that it takes to move each VM depends on the following factors:

  1. How fast your secondary network is
  2. How busy the nodes are
  3. How fast your disks are

Most of our VMs ranged in size from 10G to 40G in size and on average took around 10-15 minutes to complete each move. Addtionally, make sure you read the man page for hbal to see all the various features and options you can tweak. For example, you could tell hbal to just run all the commands for you which might be handy for automated rebalancing.

Conclusion

Overall the rebalancing of our cluster went without a hitch outside of a few minor issues. Ganeti made it really easy to expand our cluster with minimal to zero downtime for our hosted projects.

Written by lance

May 2nd, 2011 at 10:55 pm

Networking with Ganeti

with 7 comments

I’ve been asked quite a bit about how I do our network setup with Ganeti. I admit that it did take me a bit to figure out a sane way to do it in Gentoo. Unfortunately (at least in baselayout-1.x) bringing up VLANs with bridge interfaces in Gentoo is rather a pain. What I’m about to describe is basically a hack and there’s probably a better way to do this. I hope it gets improved in baselayout-2.x but I haven’t had a chance to take a look. Please feel free to add comments on what you feel will work better.

The key problem I ran into was dealing with starting up the vlan interfaces first, then starting up the bridged interfaces in the correct order. Here’s a peek at the network config on one of our Ganeti hosts on Gentoo:

# bring up bridge interfaces manually after eth0 is up
postup() {
    local vlans="42 113"
    if [ "${IFACE}" = "eth0" ] ; then
        for vlan in $vlans ; do
            /etc/init.d/net.br${vlan} start
            if [ "${vlan}" = "113" ] ; then
                # make sure the bridges get going first
                sleep 10
            fi
        done
    fi
}
# bring down bridge interfaces first
predown() {
    local vlans="42 113"
    if [ "${IFACE}" = "eth0" ] ; then
        for vlan in $vlans ; do
            /etc/init.d/net.br${vlan} stop
        done
    fi
}

# Setup trunked VLANs
vlans_eth0="42 113"
config_eth0=( "null" )
vconfig_eth0=( "set_name_type VLAN_PLUS_VID_NO_PAD" )
config_vlan42=( "null" )
config_vlan113=( "null" )

# Bring up primary IP on eth0 via the bridged interface
bridge_br42="vlan42"
config_br42=( "10.18.0.150 netmask 255.255.254.0" )
routes_br42=( "default gw 10.18.0.1" )

# Setup bridged VLAN interfaces
bridge_br113="vlan113"
config_br113=( "null" )

# Backend drbd network
config_eth1=( "192.168.19.136 netmask 255.255.255.0" )

The latter portion of the config its fairly normal. I setup eth0 to null, set the VLAN’s to null, then I add settings to the bridge interfaces. In our case we have the IP for the node itself on br42. The rest of the VLAN’s are just set to null. Finally we setup the backend secondary IP.

The first part of the config is the “fun stuff”. In order for this to work you need to only add net.eth0 and net.eth1 to the default enabled level. The post_up() function will start the bridge interfaces after eth0 has started and iterates through the list of vlans/bridges. Since I’m using the bridge interface as the primary host connection, I added a simple sleep at the end to let it see the traffic first.

That’s it! A fun hack that seems to work. I would love to hear feedback on this :)

Written by lance

March 12th, 2011 at 6:07 pm

Speaking at SCALE 9x

without comments

I’m going to be speaking at SCALE 9x this year and giving a session on Scalable Virtualization with Ganeti on Saturday February 26th at 6pm.  I will be going over the basics of what Ganeti is and how you use it. This session will be very similar to the ones I gave last year at Open Source Bridge and LinuxCon Boston.

If you want to meet me in person and talk about what’s going on at the Open Source Lab, Supercell, Ganeti,Gentoo, or just other random stuff, feel free to! I’ll be the only person coming from the OSUOSL but I’ll be sure to represent us the best that I can.

See you at SCALE9x in a few weeks!

Written by lance

February 6th, 2011 at 10:56 am

Handling HDD failures with Ganeti

with 4 comments

Recently I had one of the nodes in a Ganeti cluster go down because of a faulty hard drive. Normally we would have RAID on machines in our ganeti clusters, but this particular machine didn’t.  Having a machine go offline like that would usually be a big deal, but with ganeti and DRBD this isn’t the case usually.

After I triaged the situation and decided that the HDD on the machine node3 was a lost cause, I decided to see what ganeti showed as the situation. Below is what I found:

# gnt-cluster verify
* Verifying global settings
* Gathering data (3 nodes)
* Verifying node status
  - ERROR: node node1.osuosl.bak: ssh communication with node 'node3.osuosl.bak': ssh problem: exited with exit code 255 (no output)
  - ERROR: node node1.osuosl.bak: tcp communication with node 'node3.osuosl.bak': failure using the primary and secondary interface(s)
  - ERROR: node node2.osuosl.bak: ssh communication with node 'node3.osuosl.bak': ssh problem: exited with exit code 255 (no output)
  - ERROR: node node2.osuosl.bak: tcp communication with node 'node3.osuosl.bak': failure using the primary and secondary interface(s)
  - ERROR: node node3.osuosl.bak: while contacting node: Error 7: Failed connect to 10.1.0.179:1811; Success
* Verifying instance status
  - ERROR: node node3.osuosl.bak: instance vm1.osuosl.org, connection to secondary node failed
  - ERROR: node node3.osuosl.bak: instance vm2.osuosl.org, connection to secondary node failed
  - ERROR: node node3.osuosl.bak: instance vm3.osuosl.org, connection to secondary node failed
  - ERROR: instance vm4.osuosl.org: instance not running on its primary node node3.osuosl.bak
  - ERROR: node node3.osuosl.bak: instance vm4.osuosl.org, connection to primary node failed
  - ERROR: instance vm5.osuosl.org: instance not running on its primary node node3.osuosl.bak
  - ERROR: node node3.osuosl.bak: instance vm5.osuosl.org, connection to primary node failed
* Verifying orphan volumes
* Verifying orphan instances
* Verifying N+1 Memory redundancy
  - ERROR: node node3.osuosl.bak: not enough memory on to accommodate failovers should peer node node1.osuosl.bak fail
  - ERROR: node node3.osuosl.bak: not enough memory on to accommodate failovers should peer node node2.osuosl.bak fail
* Other Notes
 - WARNING: Communication failure to node node3.osuosl.bak: Error 7: Failed connect to 10.1.0.179:1811; Success
* Hooks Results
  - ERROR: node node3.osuosl.bak: Communication failure in hooks execution: Error 7: Failed connect to 10.1.0.179:1811; Success

That’s a lot of information to just say one of the nodes is offline. To summarize, this is what Ganeti is saying:

  • node1 & node2 can’t talk to node3
  • node3 isn’t responding to the master node
  • vm1, vm2, vm3′s secondary drbd connection failed
  • vm4 & vm5 is not running
  • node3 doesn’t have enough memory to deal with failovers (probably because ganeti can’t see its resources)
  • node3 connections failure

Needless to say, node3 is down. Now lets mark node3 offline and see what ganeti shows.

# gnt-node modify -O yes node3
 - WARNING: Communication failure to node node3.osuosl.bak: Error 7: Failed connect to 10.1.0.179:1811; Success
# gnt-cluster verify
* Verifying node status
* Verifying instance status
  - ERROR: instance vm1.osuosl.org: instance lives on offline node(s) node3.osuosl.bak
  - ERROR: instance vm2.osuosl.org: instance lives on offline node(s) node3.osuosl.bak
  - ERROR: instance vm3.osuosl.org: instance lives on offline node(s) node3.osuosl.bak
  - ERROR: instance vm4.osuosl.org: instance lives on offline node(s) node3.osuosl.bak
  - ERROR: instance vm5.osuosl.org: instance lives on offline node(s) node3.osuosl.bak
* Verifying orphan volumes
* Verifying orphan instances
* Verifying N+1 Memory redundancy
  - ERROR: node node3.osuosl.bak: not enough memory on to accommodate failovers should peer node node1.osuosl.bak fail
  - ERROR: node node3.osuosl.bak: not enough memory on to accommodate failovers should peer node osdv2.osuosl.bak fail
* Other Notes
  - NOTICE: 1 offline node(s) found.
* Hooks Results

That’s much easier to read and handle. At this point I’m ready to failover the instances that are offline.

# gnt-instance failover --ignore-consistency vm4
* checking disk consistency between source and target
* shutting down instance on source node
 - WARNING: Could not shutdown instance vm4.osuosl.org on node node3.osuosl.bak. Proceeding anyway. Please make sure node node3.osuosl.bak is down. Error details: Node is marked offline
* deactivating the instance's disks on source node
 - WARNING: Could not shutdown block device disk/0 on node node3.osuosl.bak: Node is marked offline
* activating the instance's disks on target node
 - WARNING: Could not prepare block device disk/0 on node node3.osuosl.bak (is_primary=False, pass=1): Node is marked offline
* starting the instance on the target node
# gnt-instance failover --ignore-consistency vm5

Now lets fix the secondary storage for the other instances.

# gnt-instance replace-disks -n node2 vm1
 - INFO: Old secondary node3.osuosl.bak is offline, automatically enabling early-release mode
Replacing disk(s) 0 for vm1.osuosl.org
STEP 1/6 Check device existence
 - INFO: Checking disk/0 on node1.osuosl.bak
 - INFO: Checking volume groups
STEP 2/6 Check peer consistency
 - INFO: Checking disk/0 consistency on node node1.osuosl.bak
STEP 3/6 Allocate new storage
 - INFO: Adding new local storage on node2.osuosl.bak for disk/0
STEP 4/6 Changing drbd configuration
 - INFO: activating a new drbd on node2.osuosl.bak for disk/0
 - INFO: Shutting down drbd for disk/0 on old node
 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
      Hint: Please cleanup this device manually as soon as possible
 - INFO: Detaching primary drbds from the network (=> standalone)
 - INFO: Updating instance configuration
 - INFO: Attaching primary drbds to new secondary (standalone => connected)
STEP 5/6 Removing old storage
 - INFO: Remove logical volumes for 0
 - WARNING: Can't remove old LV: Node is marked offline
      Hint: remove unused LVs manually
 - WARNING: Can't remove old LV: Node is marked offline
      Hint: remove unused LVs manually
STEP 6/6 Sync devices
 - INFO: Waiting for instance vm1.osuosl.org to sync disks.
 - INFO: - device disk/0:  0.00% done, no time estimate
 - INFO: - device disk/0: 25.00% done, 2h 23m 24s remaining (estimated)
 - INFO: - device disk/0: 50.40% done, 47m 38s remaining (estimated)
 - INFO: - device disk/0: 76.40% done, 26m 46s remaining (estimated)
 - INFO: - device disk/0: 92.20% done, 7m 49s remaining (estimated)
 - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
 - INFO: Instance vm1.osuosl.org's disks are in sync.

By using --submit you are able to let the output go into the background. You can view the output in real-time by running gnt-job watch <job id>. I went ahead and told ganeti replace the secondary disks on the other two machines at the same time. Be careful running too many replace disk operations as you may run into disk I/O issues on the nodes.

Now there is another way I could have fixed this and would have required less steps by using gnt-node evacuate. This command allows you to move all the secondary storage from a single node to another node quickly instead of doing it vm-by-vm. The command probably would have looked something similar to this:

# gnt-node evacuate --force -n node2 node3 

Instead of specifying which node to migrate storage to, you can also use an IAllocator plugin to automatically pick which node to use. So the command above would have been:

# gnt-node evacuate --force -I hail node3 

After a few minutes I brought redundancy back into my cluster, instances back online, an with no data loss.

Ganeti rocks!

Written by lance

February 5th, 2011 at 12:40 pm

Ganeti Web Manager 0.5 Released

with 7 comments

After nearly a month and a half (42 days) of development since 0.4 was released, the OSUOSL has released Ganeti Web Manager 0.5 today. This second release has some very nice new features included in it:

Read the full ChangeLog for more details.

noVNC Console

My favorite new feature by far is the inclusion of noVNC by default for VNC console access. This removes the Java requirement for your browsers and makes it much easier to use. It works the best using Chrome/Chromium but you can also use Firefox.

noVNC console

New Overview Page

I’m also excited about the new overview pages for users and admins. It makes it much easier to see the usage of your cluster(s) quickly. For users it will show some basic resource/quota usage.

New Overview Page

Upgrading

If you’re upgrading from 0.4 be sure to read the upgrading wiki page and go over the installation page again. We’ve added a few new requirements such as South for database migrations and Twisted for the new VNC Auth Proxy.

Be sure to also check out Peter’s blog post about the 0.5 release as well!

Written by lance

February 3rd, 2011 at 11:54 am