Disaster Recovery & Business Continuity: What to Expect

Disaster Recovery & Business Continuity: What to Expect

For clients on managed services or hosting arrangements, this article explains our approach to disaster recovery (DR) and business continuity planning (BCP) — including what happens in worst-case scenarios.

What Are RTO and RPO?

Two key metrics define our DR commitments:

  • Recovery Time Objective (RTO) — The maximum time we aim to restore your service after a failure. For example, an RTO of 4 hours means we target full restoration within 4 hours of a confirmed failure.
  • Recovery Point Objective (RPO) — The maximum amount of data loss (measured in time) that is acceptable. For example, an RPO of 24 hours means backups are taken daily and in the worst case, up to 24 hours of data could be lost.

Your specific RTO and RPO targets are defined in your Service Level Agreement (SLA). If you do not have an SLA, contact your Account Manager to discuss your requirements.

Our Standard DR Measures

  • Backups: Regular automated backups (daily minimum; hourly for critical systems). Backups are stored in geographically separate locations to your primary hosting.
  • Redundancy: Key infrastructure is deployed with redundancy (e.g. multiple availability zones for cloud services) to reduce single points of failure.
  • Monitoring: 24/7 automated monitoring with alerting on service failure, performance degradation, and security events.
  • Runbooks: Pre-written recovery procedures for common failure scenarios, ensuring rapid and consistent response.

When a Major Incident Occurs

  1. Detection: Our monitoring systems alert the on-call engineer immediately upon detection
  2. Initial notification: You are contacted within 30 minutes of a confirmed Priority 1 incident (see Incident Management article)
  3. Recovery: The on-call team begins recovery procedures, providing regular status updates (minimum every 30 minutes for active P1s)
  4. Restoration: Service is restored and verified. You are notified once confirmed stable.
  5. Post-Incident Review: Within 48 hours, we provide a written Root Cause Analysis (RCA) report detailing what happened, why, and what we are doing to prevent recurrence

Your Responsibilities in a DR Scenario

To support rapid recovery, we ask that you:

  • Maintain an up-to-date list of key contacts and their escalation preferences in this portal
  • Communicate any critical business periods when disruption would be most impactful (so we can schedule high-risk changes away from these times)
  • Have an internal communication plan for how you'll notify your own customers or teams during any downtime

Testing DR Plans

We conduct periodic DR drills for managed services clients, typically annually or on a schedule agreed with you. If you would like to arrange a DR test or review your current RTO/RPO targets, contact your Account Manager.

Did you find this article useful?