Recovery Strategies
Three Numbers That Drive Every Decision
Strip away the jargon and recovery planning comes down to three questions: How long can we be down? How much data can we afford to lose? How much will each answer cost us? Every recovery strategy — every backup design, every alternate site selection, every replication architecture — is an attempt to meet the numbers that answer those questions within a budget management will approve.
Recovery strategies are not technical decisions. They are business decisions with technical implementations. The business defines the requirements; the technical team designs the solution.
This module covers CISSP exam objective 7.10. ISC2 heavily tests recovery time objectives, recovery point objectives, and maximum tolerable downtime because these metrics are the foundation for every recovery decision on the exam. If you understand how these three numbers interact, you can reason through any recovery scenario.
The Recovery Metrics
These four metrics form the vocabulary of recovery planning. The exam expects you to know them precisely and apply them correctly in scenarios.
Maximum Tolerable Downtime (MTD)
The MTD is the absolute longest a business function can be unavailable before the organization suffers irreversible harm — regulatory penalties, permanent loss of customers, contractual breach, or existential threat. The MTD is set by business leadership, not IT.
Think of MTD as the outer boundary. If you exceed it, recovery may no longer matter because the damage is permanent. Every other recovery metric must fit inside the MTD.
Recovery Time Objective (RTO)
The RTO is the target time to restore a business function after a disruption. It must be shorter than the MTD — short enough that the organization recovers before irreversible damage occurs.
RTO includes everything from the moment the disruption is declared to the moment the function is operational again: notification time, decision-making time, travel to an alternate site, system restoration, data loading, and validation.
Recovery Point Objective (RPO)
The RPO defines the maximum acceptable data loss measured in time. An RPO of four hours means the organization can tolerate losing up to four hours of data. An RPO of zero means no data loss is acceptable.
RPO directly determines backup frequency and replication strategy:
- RPO of 24 hours — Daily backups are sufficient
- RPO of 4 hours — Backups or transaction log shipping every four hours
- RPO of 1 hour — Hourly backups or near-continuous replication
- RPO of zero — Synchronous replication, where data is written to both primary and secondary storage simultaneously
Minimum Operating Requirements (MOR)
The MOR defines the minimum level of service needed to keep the business function operating at an acceptable level during recovery. Not every feature or capability needs to be restored immediately. The MOR identifies which functions, systems, and data are required for basic operations.
A retail company’s MOR might include: process in-store transactions and accept returns. Functions like loyalty program tracking and inventory analytics can wait. This distinction shapes what gets restored first and what hardware is needed at the alternate site.
How the Metrics Relate
The relationship between these metrics is tested frequently:
- MTD sets the outer boundary. Everything must happen within this window or the organization faces irreversible harm.
- RTO must be less than MTD. If your MTD is 48 hours, your RTO should be significantly less — perhaps 24 hours — to provide a margin of safety.
- RPO determines data protection strategy. A tighter RPO demands more frequent backups or real-time replication, which costs more.
- MOR determines what gets restored first. Not everything comes back at once. The most critical functions, as defined by the business impact analysis, are prioritized.
Every recovery strategy question on the exam can be resolved by asking: does this option meet the RTO and RPO within the MTD?
Business Impact Analysis for Recovery
The Business Impact Analysis (BIA) is the process that produces the RTO, RPO, and MTD values. It is not a technical exercise — it is a business exercise that interviews process owners, analyzes financial impact, and quantifies the cost of downtime over time.
The BIA asks each business function:
- What is the financial impact per hour of being unavailable?
- What is the reputational impact?
- What are the regulatory or contractual obligations for uptime?
- What dependencies exist on other internal systems or external services?
- At what point does the impact become unrecoverable?
The output of the BIA is a prioritized list of business functions with their recovery requirements. This drives every downstream decision about backup, replication, and alternate site strategies.
Backup Strategies
Backups are the most fundamental recovery mechanism. The strategy must align with the RPO — you cannot have an RPO of one hour with daily backups.
Full Backup
A complete copy of all data. Slow to create, fast to restore. Every full backup is self-contained, so restoration requires only one backup set.
Incremental Backup
Backs up only data that has changed since the last backup of any type (full or incremental). Fast to create, slow to restore. Restoration requires the last full backup plus every incremental backup since then, applied in sequence.
Differential Backup
Backs up all data that has changed since the last full backup. Faster to create than a full backup, faster to restore than incrementals. Restoration requires the last full backup plus only the most recent differential.
Synthetic Full Backup
Combines a previous full backup with subsequent incrementals to create a new full backup without reading the production system. This reduces the performance impact on production during backup windows.
The exam expects you to know the tradeoffs between these types — particularly the restore time differences. When a scenario specifies a tight RTO, the backup strategy that minimizes restore time (full or differential) is usually preferred over incrementals, even though incrementals are faster to create.
Backup Storage Locations
Where backups are stored determines whether they survive the same event that takes down the primary systems.
- Onsite storage — Fast restoration, but vulnerable to the same physical threats (fire, flood, power failure) as the primary systems. Useful for operational recovery of individual files or systems, but not for site-level disasters.
- Offsite storage — Physical media transported to a secure location away from the primary site. Provides protection against site-level disasters. The distance should be sufficient that a single event cannot affect both locations. Restoration is slower due to retrieval time.
- Cloud backup — Offsite by default, with on-demand retrieval. Considerations include bandwidth for restoration, encryption in transit and at rest, vendor lock-in, and data sovereignty (which jurisdiction stores the backup data).
The 3-2-1 backup rule is a widely accepted guideline: maintain at least 3 copies of data, on 2 different storage media, with 1 copy offsite. This provides resilience against hardware failure, media corruption, and site-level events.
Redundancy Strategies: Alternate Sites
When the primary site is unavailable, operations must move to an alternate location. The type of alternate site is driven directly by the RTO.
Hot Site
A fully equipped facility with hardware, software, network connectivity, and current data. Ready to take over operations within hours. Hot sites provide the shortest RTO but are the most expensive because you are maintaining a duplicate environment that sits idle most of the time.
Warm Site
A facility with hardware and network infrastructure but without current data. Data must be loaded from backups before operations can resume. RTOs are typically measured in days. Warm sites balance cost against recovery speed.
Cold Site
A facility with basic infrastructure (power, cooling, connectivity) but no hardware or data. Equipment must be procured, installed, configured, and loaded with data before operations can resume. RTOs are measured in weeks. Cold sites are inexpensive but only appropriate for functions with very long MTDs.
Reciprocal Agreements
An arrangement between two organizations to provide alternate processing capability for each other. These are low-cost but risky: the partner may not have spare capacity when you need it, configurations may not match, and the agreement depends entirely on good faith.
Cloud-Based DR
Cloud infrastructure (IaaS) can serve as an alternate site that scales on demand. Virtual machines and storage can be pre-configured but remain dormant until activated, reducing the ongoing cost compared to a physical hot site. Recovery time depends on the pre-staging — a well-designed cloud DR architecture can achieve RTOs comparable to a warm site at lower cost.
Data Replication
Replication keeps data synchronized between the primary site and the recovery site. The choice between synchronous and asynchronous replication is driven by the RPO.
Synchronous Replication
Every write to the primary is simultaneously written to the secondary before the write is acknowledged as complete. This achieves an RPO of zero — no data loss — but introduces latency because every transaction must wait for confirmation from both sites. Synchronous replication requires low-latency, high-bandwidth connections and is practical only over short distances (typically less than 100 miles).
Asynchronous Replication
Writes are committed to the primary first, then replicated to the secondary with a short delay (seconds to minutes). This allows the primary to operate at full speed but means some recent data may be lost if the primary fails before replication completes. The RPO equals the replication lag.
The exam tests whether you understand the tradeoff: synchronous replication eliminates data loss but adds latency and requires proximity. Asynchronous replication works over any distance but accepts some data loss.
High Availability Architectures
High availability (HA) is designed to prevent outages from becoming disruptions in the first place. Rather than recovering after failure, HA architectures eliminate single points of failure so operations continue through component failures.
- Server clustering — Multiple servers running the same workload, so if one fails, others take over automatically
- Load balancing — Distributing traffic across multiple servers, with automatic rerouting when a server fails
- RAID storage — Disk redundancy that allows continued operation through individual disk failures
- Redundant network paths — Multiple network links and switches, so a single network failure does not isolate systems
- Geographic distribution — Running active workloads across multiple data centers or cloud regions simultaneously
HA and DR are complementary, not interchangeable. HA prevents individual component failures from causing outages. DR recovers from events that HA cannot prevent — site-level disasters, widespread corruption, or events that affect the entire primary environment.
Pattern Recognition
Recovery strategy questions on the CISSP follow these structures:
- “Which site type meets this RTO?” — Match the RTO to the site: hours = hot site, days = warm site, weeks = cold site.
- “Which backup type meets this RPO?” — Match the RPO to the backup frequency. Zero RPO = synchronous replication, not backups.
- “What determines the recovery strategy?” — The BIA. Not IT preference, not budget alone, not vendor recommendations. The BIA produces the RTO, RPO, and MTD that drive strategy selection.
- “Why did recovery fail?” — The strategy did not match the actual requirements. The RTO was exceeded, the RPO was not met, or the MOR was not considered.
Trap Patterns
Watch for these incorrect answers:
- “A hot site eliminates the need for backups” — A hot site with replicated data still needs backups to protect against data corruption, accidental deletion, or ransomware that replicates to both sites.
- “RTO and MTD are the same thing” — RTO is the target for recovery. MTD is the absolute limit beyond which damage is irreversible. RTO must always be less than MTD.
- “The cheapest option is always the best recovery strategy” — The best strategy is the one that meets the RTO and RPO defined by the BIA. A cold site that costs less but cannot meet the RTO is not a valid strategy.
- “Synchronous replication works over any distance” — Synchronous replication requires low latency, which limits distance. Beyond roughly 100 miles, the latency makes synchronous replication impractical for most applications.
- “Incremental backups are always best because they are fastest” — Fastest to create, yes. But slowest to restore. When RTO is tight, the restore time matters more than the backup time.
Scenario Practice
Question 1
A financial services company determines through its BIA that its online trading platform has an MTD of 4 hours, an RTO of 2 hours, and an RPO of zero. The platform processes $50 million in transactions daily.
Which recovery strategy BEST meets these requirements?
A. A warm site with daily full backups stored offsite
B. A hot site with synchronous data replication
C. A cold site with hourly incremental backups
D. A cloud-based DR site with asynchronous replication on a 15-minute lag
Answer & reasoning
Correct: B
An RPO of zero requires synchronous replication — no data loss is acceptable. An RTO of 2 hours requires a hot site that can be activated quickly. A warm site (A) takes days to activate. A cold site (C) takes weeks. Cloud DR with asynchronous replication (D) would lose up to 15 minutes of data, violating the zero RPO. Only a hot site with synchronous replication satisfies both the 2-hour RTO and zero RPO.
Question 2
An organization performs weekly full backups on Sunday nights and daily incremental backups Monday through Saturday. A server failure occurs on Thursday afternoon. The last successful incremental backup completed early Thursday morning.
To restore the server, what backup sets are needed?
A. Only Thursday morning’s incremental backup
B. Sunday’s full backup plus Thursday morning’s incremental backup
C. Sunday’s full backup plus Monday, Tuesday, Wednesday, and Thursday morning’s incremental backups applied in sequence
D. Sunday’s full backup only
Answer & reasoning
Correct: C
Incremental backups capture only changes since the last backup of any type. Each incremental depends on the previous one. To restore, you need the last full backup (Sunday) plus every incremental in sequence: Monday, Tuesday, Wednesday, and Thursday morning. If any incremental in the chain is missing or corrupted, the restore fails. This is the key downside of incremental backups — fast to create, but restoration requires the complete chain.
Question 3
A manufacturing company has two critical systems. System A (production scheduling) has an MTD of 72 hours. System B (order processing) has an MTD of 8 hours. The company has budget for one hot site and one warm site.
How should these sites be allocated?
A. Hot site for System A, warm site for System B
B. Hot site for System B, warm site for System A
C. Both systems at the hot site to maximize protection
D. Both systems at the warm site to save on unused hot site capacity
Answer & reasoning
Correct: B
Recovery resources should be allocated based on the business impact analysis. System B has an 8-hour MTD, requiring rapid recovery that only a hot site can provide. System A has a 72-hour MTD, which gives enough time for warm site activation (typically measured in days). Placing both at the hot site (C) wastes resources on System A. Placing both at the warm site (D) risks exceeding System B’s MTD. Match the investment to the recovery requirement.
Key Takeaway
Every recovery strategy decision on the CISSP exam traces back to three numbers: RTO, RPO, and MTD. The BIA produces these numbers. The technical team designs solutions to meet them. When a question asks which strategy is “best” or “most appropriate,” the answer is always the option that meets the stated RTO and RPO within the MTD — not the most expensive option, not the most technically advanced option, and not the cheapest option. Match the strategy to the requirement. That is the entire decision framework.