VMware snapshotting is a wonderful and powerful technology that affords IT and Developer staff great flexibility and recovery options with virtual machines (VMs) that weren’t so flexible with physical machines or flat out did not exist. With this technology comes the responsibility of using it properly and knowing its limitations. Snapshots have a shelf life that varies somewhere between the moment the snapshot was created and infinity. Boy, that was real helpful wasn’t it?
Let me see if I can explain a little better. When a snapshot is created, a delta file is created on the VMFS volume and in the folder where the VM resides. The initial size of the delta file is 16MB. The purpose of the delta file is to maintain the delta changes to virtual disks since the snapshot was taken. This would be any disk write I/O activity inside the guest VM OS.
Disk write I/O inside a guest VM may be seldom or it may be very active. It depends on the role of the VM and more specifically the software and features installed inside the VM. When the initial 16MB delta file fills to capacity with the delta changes it maintains, it dynamically increases its size by another 16MB. Once again, if and when the delta file fills to capacity with delta changes, it grows by another 16MB. For those who excel in math, our delta file is now 48MB in size. Do you see the pattern? The delta file will continue to grow in 16MB increments to a maximum size of the parent file (and in some cases very rapidly!) unless one of a few conditions is met:
- Someone closes the snapshot
- Someone creates an additional child snapshot (perpetuating a potential problem)
- The snapshot file somehow becomes corrupted before or during closing of the snapshot (bad news)
- The VMFS volume where the VM and delta file are stored runs out of available storage space (update your resume. All other VMs on the same VMFS volume, snapshotted or not, as well as VMKernel swap and VM logs are now also out of write space)
Let’s connect the dots. The amount of time a snapshot should be left open is going to vary because of factors identified above. The amount of available VMFS storage, the rate at which the delta file is growing since the VM was snapped, number of VMDKs snapped, decaying VM disk performance as the delta file becomes fragmented across non-contiguous spots on disk, time to recovery if the snapshot is lost and the VM has to be restored, your personal comfort level, etc. To compound the anxiety, there are likely other VI administrators in your shop or automated backups creating and leaving snapshots open that you are unaware of on a regular basis. The urgency to have all open snapshots on your radar has increased.
Unfortunately in the current builds, VMware doesn’t give us real good (or automated) visibility of open snapshots. I liken it to handing a loaded gun to a child – it’s only a matter of time before an accident happens. That analogy is quite extreme but it gets my point across on the importance of preventing such an accident from happening. What we have right now from the Virtual Infrastructure Client console (as well as a few of the hosted product consoles) is called the Snapshot Manager. Snapshot Manager displays open snapshots and their hierarchy – but only when we open Snapshot Manager and that’s on a VM by VM basis. Very tedious.
So how do we gain better visibility of snapshots that’s not going to tie up a bunch of our valuable time? Fortunately there are some good 3rd party solutions available for free to help us out. A few that I like are Xtravirt’s Snaphunter, RVTools, and Hyper9.
Snaphunter is a simple piece of code that you install on an ESX host and schedule scanning and emailed reports via CRON. I get two Snaphunter reports emailed to me daily at noon (1 for PROD storage, 1 for DEV storage):
RVTools is a .NET Windows application that you can run from your desktop and get visibility of all VMs managed by a VirtualCenter instance. In addition to snapshots, RVTools shows a bunch of other cool stuff. This utility is worth checking out:
Hyper9 is an up and coming enterprise architected product (currently in beta, GA to release in early 2009) which will report on open snapshots as well as a many other facets of the virtual infrastructure:
Snapshots are so easy to create and there in and of itself lies its Achilles heel – snapshot sprawl and lack of native tools from VMware to keep them under control and keep us safe from danger.
Go forth and virtualize – but let’s be safe out there.