Using rsync on sparse Virtual Machine disk images.
Update: see VirtSync for a better solution.
Background
I have a co-located server for hosting Virtual Machines (KVM and VMWare). These virtual machines use sparse files as their disk images. This saves a huge amount of space, without incurring too much of a slowdown - it also takes away the sysadmin headaches of having to add more disk images when a VM outgrows its initial allocation of space.
Task
I needed to backup these VMs (using LVM2 to get as consistent an image as possible) to a server at home - efficiently.
Fruitless Investigations
- http://serverfault.com/questions/66338/how-do-you-synchronise-huge-sparse-files-vm-disk-images-between-machines
- http://groups.google.com/group/mailing.unix.rsync/browse_thread/thread/94f39271980513d3
- http://www.finalcog.com/synchronise-block-devices
Problems
Using rsync --sparse works, but causes a huge a mount of unnecessary disk writes. Changing 10 bytes on 50GB long (1GB used) should cause only one or two blocks to be written, this causes 1GB to be written. This is slow, and possible not good for the disks' longevity.
Using rsync --inplace works, but creates non-sparse files.
You cannot use --sparse and --inplace at the same time :-( this is disallowed by rsync.
rsync: --sparse cannot be used with --inplace
Solution
If you use --inplace to update a pre-existing sparse file, the file will remain sparse and only have a small number of blocks written. It's only when rsync --inplace creates a file that it makes it non-sparse.
So the solution is to create a corresponding, correctly-lengthed, empty, sparse file on the target machine for every file on the source machine - if the file isn't yet present on the target machine.
Then rsync --inplace will work as intended, leaving sparse files sparse, and only writing the changed blocks to disk.
If there's some interest on http://www.reddit.com/r/programming/comments/9rb98/using_rsync_on_sparse_virtual_machine_disk_images/ I'll package up my script nicely...
Update: see VirtSync for a better solution.
Posted on 06 October 2009.