The Ultimate Guide to Backup and Disaster Recovery
by Scott Jack Content Contributor, E-N Computers 7+ years experience in healthcare IT and tech support.
An essential — but often overlooked — aspect of building a successful business is preparing for potential disruptions by building a backup and disaster recovery plan. Recent reports show how businesses are adversely affected by power outages, ransomware attacks, natural disasters, and pandemics.
Proactively developing plans for business continuity as well as backup and disaster recovery is an investment that can protect your business’ revenue and reputation.
What is Backup and Disaster Recovery?
Backup and Disaster Recovery (BDR) is the process of planning for and recovering from incidents that affect IT systems and data. BDR involves creating, verifying, and storing backups, as well as ensuring that key systems can be accessed in the event of a physical or virtual disaster. BDR planning is one part of a business continuity plan.
Backup and disaster recovery is a broad term that describes the processes and systems that protect data and services against loss and enable their recovery after an adverse event. This includes making backups, monitoring and verifying backups, and testing restore and recovery features to make sure they will work as expected. It can also include keeping spare hardware on hand, or even systems that allow for seamless, real-time failover from one system to another.
As we’ll discuss, the level of sophistication – and the expense – involved in backup and disaster recovery depends on the specific needs of your business.
Part of a Business Continuity Plan
A business continuity plan describes how the business will operate during and immediately after a disaster. It begins with an analysis that identifies key business functions, then prioritizes their impact on operations and finances. The plan documents resources needed to maintain these key functions, such as alternate work locations, vital records, equipment, inventory, and utilities. Because of the integral role IT systems have in business operations, a backup and disaster recovery plan often plays a complementary role; it focuses on fully restoring data and services after a major disruption.
Understanding Services and Data
Data and services must be considered equally important when formulating a backup and disaster recovery plan. Data is the information you are working with. A service is the combination of hardware, software, configuration, and other details that make data available to a user or another service.
Take examples of two adverse events: malicious data manipulation and network switch failure. If an attacker manipulates your data to render it unreliable, it does not matter if your users can access the data. Conversely, if you have reliable data stored on a server, but a network switch stops working, your users will not be able to do anything with it. These two examples help to illustrate that data and services are completely interdependent; both must remain available for your business to operate.
What’s the Difference Between Backups and Archives?
To make it more likely that data can be fully restored, copies of important data are made. Backups are copies of current working data taken at regular intervals and stored in a way that makes them accessible relatively quickly. How they are made and where they are stored will depend on the system and data.
Archives are copies of data meant for long-term storage; this can be for compliance and regulatory purposes or to recover from a failure of both the primary system and backups. Restoring from an archive may take longer because of its size or format.
What is Disaster Recovery?
Meanwhile, disaster recovery aims for full restoration of services usually with a target recovery time. In addition to having working backups, disaster recovery involves having the resources needed to keep your services running. For example, it may include having a spare server or network switch on hand, a backup power source, or a secondary internet connection.
In addition, some infrastructure techniques can increase reliability and reduce recovery times. Two such techniques include failover and fault-tolerance. A failover, or high availability, system shares resources in order to meet an acceptable amount of downtime. On the other hand, fault-tolerant systems use redundant systems to guarantee no downtime at all.
Importantly, these techniques are primarily intended to maintain service availability. They do not necessarily protect against data loss, nor do they protect against all possible failures. For example, a fire or flood can destroy all the shared resources or redundant systems located in a single physical location. Or, failure of a key component could inadvertently overload other parts of the system, causing a larger cascading effect.
Cloud Service Considerations
Though cloud storage and services often provide high availability, they are not immune to disasters and should be considered in your backup and disaster recovery plan. If your cloud provider experiences an extended outage or reliability issues, or your data becomes corrupted or deleted, you will want a backup of your data either locally or with another online provider.
While it is often relatively fast and easy to upload your data to them, retrieving it can take a long time and be expensive. Additionally, a cloud provider may close down or terminate your service at any time; without a backup and recovery plan, you can be left scrambling.
With these various factors in mind, the rest of this article will focus on how to develop a backup and disaster recovery plan.
Get the Ultimate BDR Plan Templates
Backup and disaster recovery is complicated — but we’ve made it easy to get started with our free BDR planning guide and template. Click below to get your copy.
First, fill out the Planning Guide to identify the threats facing your business and the IT systems that you need to protect. Then, fill out one BDR Plan Template for each service or system you identified.
Don’t have time to fill out the templates right now? Enter your email address (totally optional!) and we’ll send you a link so you can download it later or share it with your team.
How to Develop a Backup and Disaster Recovery Plan
Building a backup and disaster recovery plan can be divided into several steps. They are 1) identify key systems, 2) set recovery time objectives, 3) identify where and how your data is stored, 4) set recovery point objectives, 5) determine archival requirements, 6) identify failure modes and recovery paths, 7) and testing. Let’s briefly review what is involved in each.
Identify Key Systems and Services
The first step in building your disaster recovery plan (BDR) is identifying your key systems and services and their dependencies. Maybe information passes through multiple applications as you work on it, or applications update one another with changes. Perhaps you have machinery that is controlled by a workstation with a specific combination of hardware and software. Even logging onto your workstations is likely dependent on being connected to a domain controller server. Documenting these dependencies helps avoid longer recovery times that result from overlooking essential pieces of your services.
Determine Recovery Time Objectives
Second, you will need to determine your recovery time objective (RTO). This is a target for how quickly key services should be fully operational following a disaster. It will be influenced by the cost of downtime for each system as well as what you can budget for disaster recovery. Fault-tolerant and failover systems are the most expensive BDR options. A less expensive and fairly standard target recovery timeframe is 1 business day; this is the default RTO included in ENC’s backup and disaster recovery plan. And in the middle range for cost, you have recovery times of a few hours.
Identifying Data Locations & Types
Third, it is important to identify where and how data is stored. For each service, document whether its data is stored locally, such as on a workstation or your own server, or online such as with a hosted web application or cloud storage provider. You should also know whether it is in the form of a database, a proprietary filetype, or an open, plain filetype such as comma-separated value (CSV). Once you have this information it is easier to determine the best backup and restore solution.
Selecting the right backup and restore solution is a balancing act. Backing up your data on-site requires additional storage capacity on your network. It may require an upfront investment in storage devices and spare hard disks. You can calculate how much storage is needed by multiplying the amount of data to back up by your backup interval and retention period. For example, 50GB of full backups once daily for a month could require as much as 2TB of backup storage space.
Backing up offsite incurs a subscription fee for online storage based on the amount of storage you use. It also uses internet bandwidth; depending on the size and frequency of your backups, a dedicated internet connection may be necessary. In both cases, backups travel over your internal network. This may require additional investment so that network performance is not impacted.
Recovery Point Objectives
Fourth, set the recovery point objective (RPO) for each service. The RPO describes how much data loss is acceptable, which determines the frequency of your backups. If absolutely no data loss is acceptable, all changes have to be replicated immediately; this requires very expensive solutions that are usually only seen at the enterprise level. Intraday backups, taken on some hourly interval, are likewise quite expensive. For most small businesses, E-N Computers recommends daily backups of the past 30 days for a balance of affordability and limited data loss.
Fifth, determine your archival requirements. For some businesses, having the last 30 days of working files is enough; this is included in ENC’s standard plan. We can also include periodic archival of this data, such as archiving last year’s accounting data at the start of a new year. However, for regulatory and compliance reasons, archives going further back may be necessary. The cost of these archives will increase with how frequently they are created (i.e., each week, month, quarter, or year) and how far back they go.
Creating the Backup Plan
The details discussed above should all be recorded in one place. Consider the following example, then adapt the italicized portions to your own needs.
Accounting data needs to be backed up daily. These backups should be available for the past 30 days.
Archives of this data should be created quarterly and should be available for seven years.
Backups should be stored on-site, and archives should be stored off-site.
When deciding on the frequency of your backups, keep in mind that more frequent intervals require additional resources. These include processing power, disk space, disk speed, internal network bandwidth, and internet bandwidth in the case of off-site backups. This may require more advanced engineering, introduce complexity, and significantly increase both the initial and ongoing costs of your BDR solution.
Failure Modes and Recovery Paths
The sixth step in developing your BDR is identifying failure modes and recovery paths. During this stage, the goal is to document various adverse events and how to go about recovering from them.
Failure modes can be grouped into several categories: equipment, service, environmental, cybersecurity, and internal. For example:
An equipment failure could include a server’s power supply dying or a network switch becoming non-functional.
Service failures run the gamut from an Internet outage to a web application shutting down or a cloud provider terminating your account.
Environmental failures include natural disasters, flood, fire, and burglary.
Cybersecurity failures might be a ransomware attack, someone gaining unauthorized access to your systems, or one of your cloud service providers being hacked.
Finally, there are internal threats to consider. Data may be deleted either maliciously or accidentally; simple mistakes or misconfigurations could also result in data loss.
When considering potential scenarios, also estimate the likelihood of each compared to the potential impact. For example, equipment failures are relatively common, but modern redundant hardware can minimize the impact of many types of failures. A fire or flood may be rare, but its impact on your business could be catastrophic.
After coming up with a list of scenarios and their likeliness, you are ready to plan how you will recover from each. Keeping spares of critical equipment on site can help reduce recovery time. Though this can apply to other equipment, let’s use storage as an example. In a network attached storage (NAS) device, drives can be grouped and configured to improve reliability. The NAS may also have room for a hot spare; when one of the active drives begins to degrade, the NAS automatically begins copying its data to this spare so that the eventual drive failure is less likely to cause a business interruption. In other designs, it can be more practical to keep a cold spare on hand that is manually installed after a failure and may require a reboot for proper configuration. Without spares, you may be left waiting weeks or longer for critical equipment, or you may discover that a part is no longer available. Spares can be a more affordable form of redundancy.
Geo-redundancy is also available, but it is complex and expensive. With geo-redundancy, you have a duplicate system ready to go that is hosted in another region of the country or world. This is particularly useful if environmental failures like fire, flood, extended power outages, or natural disasters are a prominent concern. If your primary system fails, the duplicate can be manually set up to take over the workload. A more advanced setup can detect when your primary system goes down and automatically reroute to the redundant system.
Testing Backups and Disaster Recovery
The seventh and final stage, after all your backup and disaster recovery plans have been implemented, is testing. The only thing worse than no backup is a bad backup.
In most cases, a partial restore from your backups is sufficient to verify that it will work when you need it; this is what ENC includes in our managed services. A full restoration and disaster recovery drill is considerably more expensive but will need to be done for complex recovery scenarios that depend on automated failover systems to work correctly.
Compiling Your BDR Plan
To make compiling all the information you need for your BDR plan easier, we have prepared two templates for you. The Backup and Disaster Recovery Planning Guide will help you evaluate the overall risks and recovery paths that your business needs to plan for. Then, complete one BDR Plan Template for each service or system that needs to be backed up. It will walk you through determining the best backup, archive, and recovery options for each critical part of your IT infrastructure. Your plan should be reviewed and updated annually so that it continues to meet business needs.
Example Backup and Disaster Recovery Plans
Realistically, the most advanced and immediate forms of backup and disaster recovery are not viable for small businesses. They are employed by large enterprises that lose thousands of dollars per minute when systems fail. To help put your BDR plan in perspective, consider these example tiers.
Who it’s for: Large enterprises with over $1 billion in annual revenue
Downtime costs: $2,000 or more per minute
What it includes: Fully redundant infrastructure in high-end datacenters in multiple regions; real-time or hourly data replication; recovery points going back months or years
Costs: If you have to ask, you don’t want to know.
Who it’s for: Medium enterprises with between $50 million and $1 billion in annual revenue
Downtime costs: $5,000 to $10,000 per hour
What it includes: Fully redundant infrastructure, but not geo-redundant; real-time or hourly on-site backups; daily off-site backups; long-term off-site archiving
Costs: $100,000 to $1 million annually in expenses and capital expenditures
Who it’s for: Small enterprises with between $10 million and $50 million in annual revenue
Downtime costs: $30,000 to $100,000 per day
What it includes: Mostly redundant infrastructure, but not geo-redundant; hourly or twice daily on-site backups; archiving of key data
Costs: $10,000 to $100,000 annually in expenses and capital expenditures
Who it’s for: Small business with less than $10 million in annual revenue
Downtime costs: $3,000 to $30,000 per day
What it includes: Spares of basic server hardware like disks and power supplies; daily on-site and off-site backups going back 30 days; most recovery requires manual intervention
Costs: less than $10,000 per year (included with ENC managed services)
Next Steps: Backups and Disaster Recovery Planning
Building a backup and disaster recovery plan from nothing can seem daunting, but it doesn’t have to be. Experts at E-N Computers can help you set up and manage it along with other aspects of your business IT so that you can focus on your core operations. Unexpected system failures are not completely avoidable, but by proactively developing a BDR plan, you strengthen your business position and minimize potential disruption. To learn about how disruptions and downtime can affect your bottom line, read the article What is the Cost of Downtime for Small Businesses in 2021?
If you are ready to create a plan, download the linked templates and get started. If you have questions about E-N Computers managed IT services, which include basic backup and disaster recovery, please contact us. We look forward to talking with you.
Take the IT Maturity Assessment
Is your business ready to overcome a natural disaster or cybersecurity incident? Take our free IT Maturity Assessment to find out. It will help you evaluate your IT partnerships, strategy, and systems to see how they’re meeting your business goals and objectives.
You’ll get personalized action items that you can use to make improvements right away. Plus, you’ll have the opportunity to book a no-obligation IT strategy session to get even more insights into your IT needs.