Are you in charge of backups? I have been in IT for most of my working life and I have heard lots of ideas and thoughts about backups and recovery. After a while it starts to sound like a bunch of noise that you likely just tune out. What do you really know about your backups? I will try to streamline the conversation here, however note a good backup strategy could take weeks or even months to get vetted and tested. A solid backup strategy isn’t something you complete once and never go back to and reevaluate; it should be a living breathing thing. After all isn’t your company, your network, and your data ever changing? So should your backup strategy.
1. Most people set their backups running and never give it another thought and expect some tornado siren to go off, to tell you that there is something wrong. My car dings at me incessantly if I don’t have a seatbelt on 2 seconds after I start the engine; why wouldn’t my backup system(s)?
2. We give the job away. Backups are typically given to the “new guy” or the most junior of administrators, because we hate dealing with them: they are boring, meticulous, time consuming and we want to do cool/more “important” stuff. I would tell you unless that person has truly vested interest in the company, you need to have a close eye on this process.
3. We don’t think it’s that big of a deal. Would you get fired if you couldn’t recover data? Maybe not, but according to a Gartner report from a few years back only 6% of companies survive after a “major loss of computer records.” Now I think that is the extreme case but many of the other sources I reviewed didn’t give you much better odds, less than 20%. So even if you keep your job after a major catastrophe will your company that employs you survive? Yeah, backups/restores are that important!
4. The Tools. I have seen it all, from having a backup tool for each “kind of data” to one massively complicated tool that you have to go to a week’s worth of training to learn how to use. If you have too many tools, you have added layers of complication that will need to be explained and that may delay your restoration. If you have one massive tool that helps, because you only have one place to look, but it may be too complicated to use if you are not the one to restore the data. Any one of these options can work, but what if you were not around? Would someone else be able to easily step in and restore the data? Keep it as simple as possible!
1. How important is your data?
2. When was the last time your data was fully and successfully backed up?
3. When was the last time you tried to restore your data?
4. Was the restore successful?
5. Were you happy with the time it took to restore?
6. If you weren’t there would someone else be able to restore the data?
Right now some of you are saying well “I am in the cloud I have nothing to worry about.” Think again. Just last week I had an Amazon AWS instance fail on me. I couldn’t get into the machine and the data was corrupt. I got a cute little email from Amazon that said “We are sorry.” In this I was lucky because it was a test machine that I didn’t care if I lost the data, but what if it had been data I did care about? The cloud is many things, but it is not perfect.
A Starting Point:
If you can’t quickly answer the questions above let’s try and correct that. Keep it simple and start with the basics. Don’t get bogged down in the “what if” just yet. You need to work through the things you can control first. I have included a base outline for you to work from:
Let’s walk through the columns.
What is the system you are backing up? I would put lots of info in here. DNS name, IP address, the types of data (General files, SQL, Exchange, etc.), where it resides (your office, the cloud, a data center, etc.) You need to know what kind of data is there so that you know that your backup objectives are being met.
RPO (Recovery Point Objective):
How much data can you lose and be “okay?” Everyone is going to say well zero. Understand you can get very close to zero if not completely achieve it but there is a significant cost associated to that. You need to know how much data you can lose and still survive (one, five, twenty minutes, one day, one month).
RTO (Recovery Time Objective):
Simply put, what is an acceptable time frame to come back from a disaster? How does this data impact the company? If you have an online shopping cart that RTO may be very different than if your file server was inaccessible or if your email server went down.
Order of Restore:
Use this column once you have determined all the others. This is to help you when it’s hard to make a decision (like at 2am when you get the call that a system(s) just took a nose dive off a very short pier). This helps with the priority to understand where to start.
Primary and Secondary Locations:
These columns help to lay out where your data resides. Maybe your primary is onsite and your secondary is offsite. This will also help you if you need to keep some compliance initiative in check. For example, some of BizStream’s data needs to be backed up at least a 50 mile minimum from our primary location.
Fill this data out as best you can. Now for the “fun” part. You need to sit down with the people in charge, your boss, the owner, someone who can make the final call on what’s important, what should be the top priority etc. I can’t emphasize this enough, what you feel and what the management/ownership feel are the top priority may be vastly different. This conversation is a must! Additionally, you need to be honest and clear with them about what it will take to meet their expectations. If they say system X needs to be restored within an hour of an outage, your current backup system/strategy may not meet that. You need to help them understand what your current systems are capable of and from there determine if they are in a position to or willing to make their objectives a reality.
Make it happen:
Now that you have the info, do it! The point of the exercise and conversations you just had are to help you build or modify your plan to meet the objectives. Keep it as simple as possible, while still meeting the RPO/RTO. If you feel it’s too complicated take a step back and make adjustments as needed.
Test, Test, Test, Test:
You can have all the backups in the world but if you can’t restore them and in a relative approximation to the time you said you could what good are they? After you get your backups laid out and they are successful you need to test a restore. Schedule that test regularly. Make a mock disaster scenario. If you fail at least you will know now instead of when it really matters and know where to start fixing the issue(s). Make sure your management/ownership knows how these tests come out. It will help them sleep better at night and should give them confidence that you are in control and aware of what the issues (if any) are.
Yes that dreaded “D” word. We all know we should do it way more than we do. For backup/disaster recovery I suggest that you not only have good documentation but that you keep it in several places. If you keep it on the file server and the file server dies where are you going to get it from? I suggest you copy it to a secure location like a company Dropbox
or Google Drive
that you can access from anywhere. Make your management/ownership aware that it’s there so if you are unable to respond in a disaster that at least they also know where it’s at.
Backups are important! Know
the process inside and out. Know
that if you just got woken out of a dead sleep that you can restore your data without having to do “research” to make it happen. Know that if you couldn’t be there to fix it someone else could. In a word, know
, don’t think, just know
. Did I mention you need to know