Quite a few of the projects hosted at the OSU Open Source Lab are using Xen virtual machines. If you are associated with one of those projects you may be interested to know what exactly our current setup is and what my future plans for it are. If you are not currently hosted by us maybe you will be some day. :-)
Since last fall the I have been running a Xen cluster at the OSL which is slowly replacing our original independent Xen hosts. We currently host a total of 41 Xen virtual machines which include projects like Busybox, Inkscape, an OFTC IRC node, the Freenode website, the OLPC user support forums, and many others. Currently 17 of those are on the new cluster split between 3 of the 6 available host nodes. The other 24 virtual machines are still on two of our older independent Xen hosts.
The older Xen hosts are just boxes loaded with lots of disk and ram, with the virtual machines running off of the local disk space. The problem with this setup is that Xen and Linux kernel upgrades are incredibly difficult. Since the virtual machines cannot easily move to another host upgrading Xen requires taking an outage for all 8 to 12 virtual machines running on that host. To complicate matters Xen can be a bit troublesome to install/upgrade sometimes so it is not uncommon for such an upgrade to take much longer than expected. To improve this situation I built out our Xen cluster.
The cluster currently consists of 6 Xen hosts which are part of a 14 blade IBM Bladecetner that was donated to us by Intel. The 6 hosts each have 4GB of RAM and dual Pentium 4 processors and can typically run between 6 and 8 virtual machines depending on RAM and CPU needs. The remaining 8 blades will eventually be built out as more hosts but currently are waiting on RAM. (Anyone have a pile of 1GB PC2100 sticks laying around?) All of the disk space is hosted via iSCSI on a separate disk node. The current disk node is a Dell 2650 with 260GB of disk for virtual machines and is serving up that space with ietd since we don't have a hardware based iSCSI target card.
The good thing about this new setup is I can migrate virtual machines between host nodes on demand while they are running so I can easily upgrade the host nodes as needed. Maybe some day I will get better monitoring set up so I can move virtual machines around to balance CPU load but that's not planned for the near future. The bad thing is I still have a single point of failure with the single disk node. Also the disk node doesn't have very much disk space so we have nearly filled it up which is why the cluster is only running 17 virtual machines. So the setup is not perfect but it's a pretty good start using hardware that was either donated or we already had.
Down the road I want to replace the current disk node with two boxes replicating the data using DRBD and set up graceful fail-over between the two using heartbeat. The current plan is to upgrade the disk space on our mirror servers and use some of the old disk arrays for the Xen cluster. This will give us about 3TB total for 1.5TB of redundant disk space between the two disk nodes. That will give us enough space to move all of our existing virtual machines over to the cluster with room for 30-40 more for a total of 54-64. That won't quite fill up the Xen host nodes which can probably host 80-90 virtual machines while keeping one host node as a hot spare. It will be enough room for about a year and a half worth of growth and should enable us to provide great up time for the hosted projects. :-) Unfortunately with this plan the Xen upgrade is waiting on the mirror upgrade which is waiting on money to buy the new disks and I have no idea when that is going to happen. Hopefully something will pull though soon, the mirrors have been needing this upgrade for nearly a year now.
And how does Gentoo fit into all of this? All of the Xen and disk hosts run Gentoo and are managed by our central cfengine system. I have been maintaining the Xen packages for Gentoo to keep them in working order for use at the OSL and the whole setup seems to work pretty well now. Hopefully later today I'll have a chance to start rolling packages for Xen 3.1.3 and 3.2.0.