Using CloudBolt at CloudBolt – Snapshot Cleanup


VMware snapshots are an extremely useful feature for saving the state of a VM and being able to roll back. They are nearly instantaneous to create, and reverting to a snapshot is also very fast. They do have one huge drawback though – when they exist for more than a few days, they can take up more and more space, and affect performance of your infrastructure. This performance hit is incurred not just by VM with the snapshots but other VMs as well, since the presence of snapshots increases the number of disk reads & writes necessary to work with the filesystem, and also increases CPU load on the host as it calculates deltas between data.

This wouldn’t be a problem if people who took snapshots deleted them shortly after creating them, but they have a tendency to be forgotten. In a large IT environment, with multiple datacenters, vCenters, clusters, and many VMs, it can be hard to figure out how many snapshots are out there, how old they are, how much they are affecting performance, and to enforce a policy of periodic expiration and automatic deletion of these.

We had this exact problem in our labs at CloudBolt (where we have every version of vCenter & ESX since 4.1 running), so we decided to automate a solution for this with a CloudBolt rule.

This rule condition looks at all VMs known to CloudBolt, searches for snapshots on them that were created more than the threshold number of days ago, and reports on them. If the Dry Run flag is set to False, the rule will also initiate a deletion of these snapshots.

We ran this rule a week ago against our labs, starting with a large threshold to delete the oldest of our snapshots. We thought we had been doing a good job of cleaning snapshots up as we went, but, as it turned out, there were a lot of snapshots, and some very old ones. Over the course of a day, we decreased this threshold, reran the rule, and repeated.

We immediately noticed a performance improvement in our labs, some of our automated CIT tests that were failing with timeouts began succeeding, and all our developers, SEs, and other users of our infrastructure became happier.

After all the old snapshots were cleaned up, we set the rule to run nightly and delete any snapshot older than 14 days, ensuring that this problem does not affect us again.

Today, CloudBolt is releasing 6.1-alpha4, which comes with this rule built-in. To upgrade to this alpha release, navigate in your CloudBolt UI to Admin > Version & Upgrade Info (or download the upgrader from our support site). After upgrading, go to Admin > Rules where you can change the inputs to this rule and execute it. It starts with Dry Run enabled, so it will not delete any snapshots automatically.

Related Blogs

Top 3 cloud financial management challenges

Introduction As cloud costs continue to rise, comprising an ever-larger share of IT budgets, there is increasing executive scrutiny on…

VMWare Alternatives: Exploring migration options after Broadcom acquisition

As the saga of the recent $69 billion acquisition of VMware by Broadcom continues to play out, it has sent…

VMWare Competitors – What’s Next For Your Cloud Practice

As a VMware partner, you may have received notice that Broadcom is terminating your contract.  It’s like the tech world’s…