Disaster Recovery in Azure: Architecture and Best Practices

Disaster Recovery

Disaster Recovery in Azure: Architecture and Best Practices

What Is Disaster Recovery in Azure?

Microsoft Azure is Microsoft’s public cloud offering, which is the world’s second largest cloud platform. It provides cloud services such as computing, storage, networking, DevOps services, data analysis, and artificial intelligence (AI).

Azure provides a secure and scalable end-to-end backup and disaster recovery solution that you can integrate with on-premise data protection solutions. It allows organizations to automatically restore services in the event of accidental deletion, malicious attack, or other disaster. Azure’s backup and disaster recovery service is cloud-based and highly available.

In this article:


This article is part of a series on Disaster Recovery.

What Is Azure Site Recovery?

Azure Site Recovery (ASR) is a Disaster Recovery as a Service (DRaaS) offering that organizations can use as part of their overall business continuity and disaster recovery (BCDR) program. ASR orchestrates and automates replication across cloud and hybrid environments. The service can replicate Azure virtual machines (VMs) between regions, on-premises machines to a secondary datacenter, and on-premise physical servers and VMs to Azure. It can also replicate Azure Stack VMs.

Related content: Read our guide to disaster recovery in the cloud

Azure Disaster Recovery: Two Solution Architectures

SMB Disaster Recovery in Azure

Small businesses can implement disaster recovery in the cloud at low cost using partner solutions such as Double Take DR, which is based on Azure Traffic Manager, Azure Virtual Network, and Site Recovery – services that run in a patched, supported, high-availability environment. The solution architecture is illustrated below.

Image Source: Azure

Here is how the solution works:

  • Traffic Manager routes DNS traffic, which can move traffic between sites based on policies that you define.
  • Azure Site Recovery orchestrates machine replication and manages your failback procedures’ configuration.
  • Virtual Network is the location of the failover site to be created during a disaster.
  • Blob storage allows you to store replica images of all your Site Recovery-protected machines.


Enterprise-scale Disaster Recovery in Azure

Large organizations might need to build disaster recovery capabilities for systems like SharePoint, Linux, and Dynamics CRM web servers in an on-premise datacenter. Azure provides a solution that enables failover of a complex environment to Azure infrastructure.

The solution rests on Traffic Manager, Azure Active Directory, Site Recovery, Virtual Network, and VPN Gateway. These services can run in high-availability environments that are supported and patched by Azure. The solution architecture is illustrated below.

Image Source: Azure

  • TrafficManager routes DNS traffic, moving traffic between sites based on policies your organization defines.
  • Azure Site Recovery orchestrates machine replication and handles the configuration of your disaster recovery process.
  • Blob storage contains replica images of every machine protected by Site Recovery.
  • Azure Active Directory is a replica of your on-premise Azure Active Directory service that allows companies to authenticate and authorize cloud applications.
  • VPN Gateway achieves communication between on-premises and cloud networks, while keeping them secure and private.
  • Virtual Network is where a failover site is created in the event of a disaster.

Best Practices for Azure Disaster Recovery

Azure Disaster Recovery Plan

The first step is to build a disaster recovery plan, test it fully to verify its effectiveness, and then implement it. Remember to include all relevant people, technologies, and processes required to restore functionality within your service-level agreement (SLA).

Here are tips to help you create and test your disaster recovery plan:

  • Evaluation—before creating a plan, you must evaluate the business impact of an application failure and build your recovery plan around the most critical applications and data. Determine and specify a role to own the disaster recovery plan, someone who can oversee all aspects, including testing and automation.
  • Support—clearly define and write a process for contacting your support services and instructions for escalating issues. This document can help prevent prolonged downtime that occurs simply because the team tries to work out a recovery process on the fly. Use cross-region recovery for your mission-critical applications.
  • Automate—your plan must include a backup strategy covering all transactional and reference data. You should test the backup restoration processes regularly. Document all processes, including manual steps, and automate as many tasks as possible.
  • Monitor—configure alerts for all Azure services consumed by the application. Train the relevant staff to execute the plan and perform regular disaster recovery simulations to verify and improve your plan.

Related content: Read our guide to disaster recovery plan

Operational Readiness Testing

Testing your disaster recovery plan prior to implementation can help you verify its effectiveness. You should perform operational readiness tests for the following:

  • Failback to the primary region
  • Failover to a secondary region

Failback and failover tests can help you verify that an application’s dependent services remain synchronized when restored during the disaster recovery process. It may be difficult to determine the impact of changes to operations and systems on failback and failover functions. Ideally, you should test these functions to avoid problems in real scenarios.

Azure supports manual failover for many services and sometimes offers failover tests for disaster recovery drills. You can also simulate an outage by removing or shutting down Azure services. Additionally, you can set up automated testing for operational responses to ensure operational effectiveness.

Dependent Service Outage

You must assess and determine the implications of disruptions in each service and how the application might respond to these disruptions. When the service includes features to support availability and resiliency, evaluate each service independently to strengthen the overall disaster recovery plan. Azure Event Hubs, for example, supports failover to a secondary namespace.

Network Outage

A disaster recovery plan must define processes for network outage events. If parts of the network are rendered inaccessible, the issue might prevent you from accessing applications or data. You can respond to this issue by running most applications with reduced functionality. However, if you cannot reduce functionality, try failing over to another region to avoid application downtime.

Plan For Regional Failures

Azure is divided logically and physically into units called regions. Each region includes one or more data centers located nearby. Many regions also support availability zones (AZs) that offer more resiliency during outages. You can use regions with AZs to improve your applications’ availability.

Your plan should also account for disasters that cause an entire region or AZ to become inaccessible. For example, natural disasters can cause entire facilities to be lost, and network failures can cause shutdowns. You can mitigate this issue by distributing applications across regions and AZs. If disaster strikes in one region, you still have data and software in another active region.

Built-In Data Protection for Disaster Recovery with Cloudian

Do you need to backup data to on-premises storage, as part of your disaster recovery setup? Cloudian offers a low-cost disk-based storage technology that lets you backup data locally with a capacity of up to 1.5 Petabytes. You can also set up a Cloudian appliance in a remote site and use our integrated data management tools to save data there.

Another deployment option is a hybrid cloud configuration. You can backup data to a local Cloudian appliance, then replicate to the cloud for DR purposes. This combines the low latency of local storage with the resilience of the cloud.

Learn more about Cloudian’s data protection solution


Click to rate this post!
[Total: 10 Average: 5]

Get Started With Cloudian Today