Rock IT

4 tips for a rock solid disaster recovery plan

4 tips for a rock solid disaster recovery plan

First things first: Disaster Recovery (DR) is not backup.

What is backup?

Backing up is simply taking a copy of your data and storing it. Backup has a number of rules that you can follow, with a old favourite being the 3-2-1 rule:

Have 3 copies of your data, on 2 different types of media and have at least 1 copy of your data offsite and completely isolated.

image describing the 3-2-1 backup rule

3 data copies

This includes your production or “live” copy of your data, then have two different backup copies. This means that if something happens to your primary production data (ranging from accidental deletion to a ransomware attack), you have two further copies to rely on. Why two copies? Because backup systems can fail. And no doubt the day your backup fails you’ll need to rely on it!

2 media types

To prevent against media failure, it’s a good idea to have your data on two different media types. Your first copy could be on some form of disk on your primary server with the second type being some other type of media – which could be tapes, or detached storage devices like Network Array Storage (NAS) devices but they can be located close to your head office or wherever your primary data is located.

1 offsite copy

This is critical and covers some key, often under-appreciated risks. An offsite copy can help protect you against:

  • physical damage to your office (fire, flood, burglary or other)
  • ransomware and viruses which can spread to attached backup systems
  • malicious hackers who intend to do damage to your systems

What you are trying to achieve is to remove a single point of failure to improve your data resilience.

Is the Cloud backed up?

Not in the traditional sense.

When you used to have on-premise servers the perimeter of control was very clear. You could go and look at the blinking lights on your servers, plug in a USB drive and copy your data off. Now with the majority of services being delivered from the cloud, who is responsible for backing up your data?

If you read the terms and conditions of your Cloud Services Agreement, it quickly becomes apparent that your data is not backed up in the manner that you were expecting.

Cloud providers talk an awful lot about Uptime and being Highly Resilient – however they are referring to their hardware infrastructure (and possibly their software) but they are not talking about the resilience of your data.

What happens if your data becomes corrupt? Too bad. If you haven’t signed up for the cloud providers backup system (if they offer it) then that’s it: your data is gone.

This poses a huge challenge for businesses that are legally obligated to provide a certain level of assurance around their data integrity.

Funny Austin Powers Meme about failing to backup is like living dangerously

Uptime and hardware resilience gives you a level of comfort that if the underlying hardware running the vendor’s cloud were to fail, they have sufficient redundancy built in to manage that failure. However if your data becomes lost or corrupt during that failure then you could be on your own.

Whilst it’s highly likely that the vendor will have backup sets of your data to avoid further damage to their reputation, these are not backup sets that you can access in the event of accidental deletion. You have zero control over those backups (if they do exist) and even if they do exist, you’ll be stuck in a queue with every other client waiting to have their data restored.

Backing up the Cloud

Part of your backup and DR strategy must included backing up the cloud wherever possible. There are plenty of reputable cloud backup providers available – if you’re not sure, ask your IT provider or team.

What is Disaster Recovery?

Disaster Recovery (DR) is an overarching strategy that includes a technical element and strategic.

DR needs to factor in a lot more, such as:

1. Where will you restore your data to?

You could have the greatest backup system in the world, following the 3-2-1 method… but if all of your systems fail, where are you going to restore your data to?

If you’re predominantly in the cloud, what alternative service provider will you move to?

If your office burns down, where will you work from?

A pretty trashed office with the roof caved in

2. How will you restore your data if your equipment is not accessible?

If your equipment has gone up in flames, what will you do next? Understanding this requires a lot of planning and on-going discussions internally.

Who will manage the staff that you’ll have to continue to pay? Who will manage your clients who will still expect to be served?

It’s amazing how many businesses fail to even have that conversation because “everything’s been going well”, so it’s out of sight and out of mind.

3. How much data loss in the event of a disaster is acceptable?

The hardest of all to quantify: how much is your data worth? Working out the cost of a car is easy – it can be appraised and everyone can agree that a car is worth what many other identical cars is worth.

But what about your client database? It’s worth a lot to you but perhaps not much to someone else. Maybe it’s worth even more to a competitor than it is to you?

So how much data loss can you afford in a disaster?

This is typically defined in terms of Recovery Point Objective and Recovery Time Objectives.

The Recovery Point Objective (RPO) is the amount of data loss in terms of time that you’re willing to tolerate. One hour of lost data? Four hours of lost data? One day of lost data?

Think of it as how far back in time would you be willing to go?

Recovery Time Objective (RTO) talks about the amount of time it will take to recover from a disaster. As per RPO, this is quantified in terms of time.

Some businesses opt for an RPO of 1 hour and RTO of 4 hours, meaning that they will tolerate one hour of lost productivity and be operational again in 4 hours. This about that in terms of the infrastructure required to pull that off. It’s not to be taken lightly nor is it easy to pull off.

Diagram outlining recovery point objective and recovery time objective

 

Our top tips to starting a Disaster Recovery Plan

Tip 1 – make a plan!
1. Define what’s important to keep the business running – such as email and applications. This is best kept in a living document that is reviewed on a regular basis. Make it a top commitment and if you’re unlucky enough to need to use the plan, you’ll be very glad.
2. Understand where you can restore your data should your primary equipment be inaccessible. Do you have an IT service provider who can loan you equipment to run off in the short term? Do you have a spare fleet of laptops and servers available to cater for all of your staff?
3. Determine what an acceptable timeframe for recovery is. You need to work out your RPO and RTO and constantly review. This might change for your business as it grows along with your client’s dependency on you.Have a disaster recover plan

Tip 2 – test the plan
1. Nominate staff to be DR Champions, who test every element of the plan. This is an often overlooked component of Disaster Recovery. You simply MUST test the plan so you can be sure that it works! This means getting people to commit to regular testing of your Disaster Recovery plan and environments. The more frequently you test, the more comfort you should have in the resilience of your business.

test you DR system

Tip 3 – backup resiliency
1. Test your backups regularly. This can’t be overstated. Unfortunately too many businesses believe the email notifications that come to them stating that their backups succeeded. This may not be the case and the only way to know for sure is to perform regular backup restore tests. These tests need to be holistic to ensure that every part of your backup systems are working properly because they will occasionally fail. If you’re not looking at backup notifications then now is the time to start!
2. Store your backups in a different location to your production data. Doing this creates a physical separation between your core network and your backups. In the event that your production environment is infected with a malicious virus or attacker that is intent on doing damage, they will be unable to reach your backups. If your backup frequency is high (which is quite easy to do with cloud based backups) then you might not suffer a great deal of data loss in such an event.
3. You can never have too many backups. The 3-2-1 backup strategy is a good base level. Nobody every got fired for having too many backups, but please make sure that the backups are secure!

get serious about your backup and make sure it's resilient

Tip 4 – get serious about security.