A Layered Approach to Backups




What do you get when you combine datacenter equipment, a failed air-conditioner and a weekend of 150-degree temperatures? You get a week of no sleep while you’re restoring data from your data backups.  This case, however rare, shows that even though you may have a fully redundant infrastructure, other factors (such as environmental issues) may still cause downtime. In this instance, multiple hard drives failed which caused complete data loss on our client’s storage device.

A Layered Approach

If you’ve ever had a hard drive fail, you know that sending the drive to a data recovery service is not cheap. This time the quote was $11,000 with no guarantee of recovery. Luckily, this client implemented our recommendations and had a multi-layered backup policy, which included snapshots or backups starting at the operating-system level, continuing on to the storage layer, then local backups, and finally an off-site backup at a disaster recovery site. The easiest place to do a file-level recovery is from the operating system level. Windows Server (and the desktop version of Windows) uses a service called Volume Shadow Copy (VSS), which allows you to right-click on a folder and click “Use Previous Version.” While this is not enabled by default, it can be an easy backup in case a folder or file gets deleted. When an operating system fails or gets corrupted, the first place we go to is your storage-layer snapshots. This will provide the quickest Recovery Time Objective (RTO), which means this is the method to get back up and running the quickest. Maintaining a proper snapshot schedule will allow you to balance the amount of storage used on the number of recovery points you can roll back to. Since this is normally an expensive storage option, maintaining this as a primary backup target is not ideal. To maintain a high Recovery Point Objective (RPO), we look at local backups. Local backups allow us to retain many more versions to rollback or recover from. We always recommend software that backs up an entire virtual container. This allows us to recover an entire server and configuration, rather than just the files within the server. We can recover systems on any hardware much quicker than reinstalling and reconfiguring the server and all the programs installed on it.

Off-Site Disaster Recovery to the Rescue

The last layer of backup is off-site replication. Most companies maintain an off-site file-level backup, and this ensures that their data is available in the event of a disaster. However this does not mean that a company will be able to use the data immediately. Unless you are replicating the entire virtual container, you will need to rebuild the servers to get back in business. Replicating the data to a disaster recovery site ensures that you can bring your mission critical services back up within minutes rather than hours or days. In the case of our client’s disaster, the operating system and storage-layer backups were unable to be used, but we were able to restore from local backups to get their business back up and running. Maintaining and verifying all levels of backups are key to ensuring that your data and your company are able to get back up and running as quickly as possible with as little downtime as possible in the event of a disaster.

This post was originally authored by James Balandrán and published in April 2014 here