This document describes some details of the process of taking an existing Linux firewall, and placing it into a virtual machine. Before we go on, though, it’s important to understand why we’d want to do this. There are benefits and drawbacks to a virtualized setup:
Benefits
- Improved reliability: migration to a new host computer reduces the # of active hardware components that can fail (and can help eliminate old hardware support costs).
- Reduced costs/consolidation: less redundant hardware also means reduced fan noise, power consumption, etc.
- Improved security: chroot jails allow access to the native hardware and host kernel more easily than a VM does.
- Improved use of remaining hardware: it’s less likely for the machine to be completely idle.
Drawbacks
- Performance is reduced for guests (disk and network I/O mainly; CPU if it doesn’t support full virtualization).
- Reduced hardware redundancy. While there are fewer parts to fail, the remaining hardware now has more roles to fill.
- Initial setup can be complicated.
I decided it made sense to virtualize the server in my case. After reading about Qemu/KVM and Xen, I selected KVM since I had previous experience with it, and because it wouldn’t require changes to the host environment (Xen a hypervisor environment which would have added complexity to the setup steps).
Requirements analysis
The goal was to take an existing firewall (2 GiB RAM, dual core, two NICs, RAID1 storage) and host it on another server (4 GiB RAM, dual core, quad NICs, multiple RAID5 arrays — a file server, primarily). The new host computer had more powerful hardware than the firewall, but it was not obvious how to allocate the resources between the host machine and the firewall VM. I sat down and examined the status of the machines to determine the appropriate settings.
The host’s CPU had hardware virtualization as a feature, which meant the CPU performance overhead would be minimal. Since the firewall was regularly idle and the host was a fileserver with regular concurrent use by clients, the firewall was allocated a single core.
Next, I checked over the memory load on both machines. To ensure I had an accurate understanding, this was done well after system bootup (months in my case). free and top indicated that the firewall was using most of its RAM as a disk cache. Of the processes in top (sorted by memory use via shift-M), the majority of the RAM was being used by the Squid web proxy and Apache instances. The host machine was in a similar situation, but here it was desirable to maximize the amount of RAM used for disk cache since it was also a file server. I decided to migrate the Squid installation to the host operating system and allocate 512 MiB RAM for the firewall.
With the processor and RAM information, the only remaining issues were storage and networking. With Squid migrated to the host, the only disk space and throughput requirements in the firewall were for booting and some minor web services. I settled on a 32GiB VM disk (in KVM’s qcow2 format which supports snapshots and compression) placed on the existing RAID filesystems of the host. The other alternative was using a host partition (disk or LVM), but this didn’t seem warranted since the firewall didn’t do much disk IO.
The network performance requirements of the firewall were limited by the speed of the Internet connection, which was well below 100Mbps. Initially, I decided to give certain network cards to the client using a method called PCI device assignment, hoping to minimize the latency that would be introduced by having a software network stack and host involvement in network operations. This would later turn out to be a disaster.
Initial KVM configuration
These were my initial firewall VM settings — 1 CPU, 512 MiB RAM, 32 GiB disk image, and two direct-access network cards.
To allow direct access to the network cards, I needed to tell the host kernel to ignore them using the pci-stub driver. This required activating the kernel module, telling it what PCI vendor/device IDs I was going to be disconnecting from the host as well as their PCI addresses:
modprobe pci-stub echo "2ea2 2e71" > /sys/bus/pci/drivers/pci-stub/new_id echo 0000:00:08.0 > /sys/bus/pci/devices/0000:00:08.0/driver/unbind echo 0000:00:08.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:00:09.0 > /sys/bus/pci/devices/0000:00:09.0/driver/unbind echo 0000:00:09.0 > /sys/bus/pci/drivers/pci-stub/bind kvm -vnc :1 -m 512 -boot c -net none -hda /vm-store/firewall.img -cpu host \ -pcidevice host=00:08.0 -pcidevice host=00:09.0 -daemonize
Migration of the data was a relatively straight-forward process. I powered down the firewall, removed one of the drives in the RAID, and connected it to the new host. The firewall VM was brought up and had its data copied over. Once the data was fully copied, the VM was powered down and its disk image duplicated for backup.
Troubles with the VM
The first bit of trouble raised its head here — Ubuntu (the Linux distribution I use currently) uses UUIDs to identify the filesystems, but the UUID does not get transferred when you copy data from one partition to another. I had forgotten to apt-get install the appropriate UUID tools as well, so I couldn’t simply update the partition with the correct UUID. I solved this by updating grub’s /boot/grub/menu.lst to reflect the new boot device, stripping out all UUID references.
Then I ran into my next snag: after starting and shutting down the firewall VM a few times to ensure it was working and passing the checklist of activated services, the host computer locked up. No pings, no response on the console — dead! The reason was determined to be the passing of the PCI devices to the VM. For some reason, the firewall’s kernel shutdown procedure would end up putting the host hardware into an inconsistent state, triggering a hard lock :(
Network bridging with TUN/TAP
Since I could not directly assign hardware to the VM, I had to setup a stack that would bridge packets from the host’s network cards to the VM’s software network cards. KVM’s software network card stack uses the TUN/TAP interface originally designed for user-mode Linux. To get these associated with a physical device, the taps would have to be bridged.
There are some details here that can trip up those who aren’t careful: computer networking is based around a stack of layers, each of which has specific responsibilities. Bridging occurs at layer 2 in the stack, while addressing and routing occurs at layer 3. For our bridge to work, there must not be routing information/addresses associated with the network devices used to construct it!
With two hardware NICs and two software NICs, I needed to create two bridges — one for the WAN connection, and one for the LAN. Here’s how I setup the LAN-side:
tunctl -b -t lan-tap brctl addbr br-lan
This creates a TUN/TAP PHY named lan-tap and a bridge called br-lan. Next we add the lan-tap and eth1 PHYs to the bridge:
brctl addif br-lan eth1 brctl addif br-lan lan-tap
Since this is a layer-2 bridge, we assign null addresses to the PHY and set both to promiscuous packet mode:
ifconfig eth1 up 0.0.0.0 promisc ifconfig lan-tap up 0.0.0.0 promisc
With everything else done, we can activate the bridge using ifconfig:
ifconfig br-lan upOnce this was verified to work, it was a simple matter to duplicate the entire thing for the WAN side:
tunctl -b -t wan-tap brctl addbr br-wan brctl addif br-wan eth2 brctl addif br-wan wan-tap ifconfig eth2 up 0.0.0.0 promisc ifconfig wan-tap up 0.0.0.0 promisc ifconfig br-wan up
The final change was to adjust the KVM command line so that it would not attempt to run its own if-up and if-down scripts:
kvm -vnc :1 -m 512 -boot c -hda /vm-store/zephyr-8.10.img -cpu host \ -net nic -net tap,ifname=lan-tap,script=no,downscript=no \ -net nic -net tap,ifname=shaw-tap,script=no,downscript=no --daemonize
Conclusion
While there were troubles in implementation, the goals of the virtualization were achieved. From the perspective of the users, there was no change to the network beyond some downtime during the migration of the data.