DR & BCP Policies
DR & BCP Policies
DR & BCP Policies
1. Introduction
1.1 e-Learn Design (ELD) provides all hosting clients with a Warm Standby VM solution as standard, where a prepared server is kept available as a destination to recover client data from off-site backups. This secondary location will not be within the UK (if UK primary hosting) or EU (if EU primary hosting) and will only be used in the case of a severe outage in the primary location.
1.2 ELD offers an optional Hot Standby VM solution to dedicated server clients where the live server is mirrored to a redundant VM in a separate data centre location in real-time using Zerto. This secondary location will not be within the UK (if UK primary hosting) or EU (if EU primary hosting) and will only be used in the case of a severe outage in the primary location.
1.3 In the event of a disaster, all network traffic will be internal to ELD’s data centre private networks (with respective UK or EU endpoints), so all locations will still be considered to be in the UK or EU for the purposes of cross-border data transfer laws.
2. Disaster Recovery Triggers
2.1 Disaster Recovery is initiated under the following circumstances:
- All connectivity to the primary location Data Centre is lost
- The primary location ESX Cluster suffers a catastrophic failure
- Corruption of a client server’s underlying disks
- Hostile takeover of the server OS (e.g. for ransomware purposes)
- Other catastrophic failure of the server OS, which requires a point-in-time recovery
2.2 In the case of a catastrophic event, DR protocols specific to a client Standby VM solution will be undertaken.
3. Disaster Recovery Protocols
3.1 In the case of a catastrophic event, steps specific to the following options will be taken.
3.2 Option 1 – Hot Standby VM has been agreed as part of the solution:
Step 1: ELD will bring the Hot Standby server VM online, removing the synchronisation from the original live server VM.
Risk – the sync process would not have copied over any small unflushed filesystem writes to the Hot Standby server. Latency is very low between both sites, and this would be less than 1 second.
Impact – there is the potential of less than a second of data loss for files which were in the process of being uploaded or changed on the webserver.
Step 2: ELD will switch off, where possible, the original server to ensure that there is no erroneous access to this system.
Step 3: ELD will inform the Client’s staff of any changes they need to make to the DNS entries to ensure that their staff and students are connecting to the correct server.
Risk – the time to live within the DNS entries defines how long this takes to propagate. This setting is under the control of a Client’s staff.
Impact – a Client’s users would be unable to access the site until the DNS changes had propagated.
Step 4: ELD will subsequently update all backups and monitoring to point to the new server.
Step 5: ELD will set up synchronisation back to the original data centre once it becomes available.
Risk – Hot Standby DR functionality is unavailable until the original data centre is restored or an alternative destination is provisioned.
Impact – the only available method for DR/BCP is restoring from off-site backups until Hot Standby can be re-enabled.
3.3 Option 2 – No Hot Standby VM has been included as part of the solution:
Step 1: ELD will have a server ready for client site recovery (Warm Standby).
Risk – Client-specific server configurations may be missing from the server.
Impact – server configuration may require remediation, and timescales would depend on the level of complexity.
Step 2: ELD will copy back the last filesystem backup to this new server. The copy-back timescale will depend on data size.
Risk – this copy is taken at the backup time and could result in up to 24 hours of data loss.
Impact – there will be a loss of filesystem data from the previous backup to the start of the DR process.
Step 3: ELD will inform the Client’s staff of any changes they need to make to the DNS entries to ensure that their staff and students are connecting to the correct server.
Risk – the time to live within the DNS entries defines how long this takes to propagate. This setting is under the control of a Client’s staff.
Impact – a Client’s users would be unable to access the site until the DNS changes had propagated.
Step 4: ELD will subsequently update all backups and monitoring to point to the new server.
Risk – Warm Standby DR functionality is unavailable until the original data centre is restored or an alternative destination is provisioned.
Impact – the only available method for DR/BCP is restoring from off-site backups until Warm Standby can be re-enabled.
4. Business Continuity Plans
4.1 ELD-specific BCP is as follows.
4.1.1 All of the ELD internal systems use infrastructure to allow ELD staff to work from anywhere. For example, a cloud-based helpdesk system, a distributed and cloud-based solution for documentation, and VoIP and/or mobile phones for telephony. As long as there is either an internet or 5G connection, ELD staff can work as usual.
4.1.2 ELD use email, social media and telephony to relay information to Clients. Should there be an issue with one of these, the remaining would be used for backup.
4.1.3 ELD uses several cloud-based online meeting solutions, so multiple alternatives are available should the preferred option be offline.
4.1.4 Access to ELD’s helpdesk and support is available through email or a web portal. The helpdesk uses a separate email solution from ELD internal provisioning.
4.2 Client-specific BCP is as follows.
4.2.1 There are multiple redundant systems built into the servers for a Client and the data centres housing them. If these systems are no longer available, ELD will enact the DR procedures to bring the systems online within a different data centre.
4.2.2 Server backups are held in discrete off-site locations. These form the last resort methods for DR and allow a Client’s servers to be rebuilt using any other compute resource providers, such as Amazon, Rackspace, Azure, etc.
4.2.3: Support: As detailed in 4.1 (ELD-specific BCP), support services are already cloud-based for the helpdesk. If the helpdesk is unavailable, support will be provided by alternate methods such as direct email, social media and telephone.
4.2.4 ELD will ensure that Clients are made aware of the best channels for support in the case of contingency.
5. Testing & Review
5.1 DR testing is performed on a rolling basis for all ELD servers every 6 months.
5.2 BCP reviews (both ELD-specific and client-specific) are conducted every 12 months.
Last reviewed: January 2025