WHAT WE'RE UP TO

What Happens When Your Cloud Provider Goes Down?

By: Derek Barka | 9/27/23

screenshot of azure connectivity issue announcement

 

Infrastructure Connectivity Issues

This past weekend Microsoft’s Azure cloud infrastructure experienced a database connectivity issue in their US East data center.  Customers using Azure SQL databases in that region experienced outages lasting up to 14 hours through Saturday afternoon.  This led to many websites hosted in that region having downtime due to the sudden loss of connectivity to their databases.  Even proactive customers with database geo-replication enabled experienced 8 hours of downtime before the automatic failover kicked in due to underlying network infrastructure outages. Manual failovers by vendors who had a robust disaster recovery system in place were successful in the same time frame.

Cloud computing brings about tremendous advantages when it comes to easily building out complex infrastructure and scaling to meet unexpected demand.  However, it also has the same challenges of on-premises infrastructure in ensuring that your data and web services remain available and resilient against failures.

One such challenge for websites hosted in the cloud is to be prepared for unexpected disasters, such as hardware failures, data corruption, malicious attacks, or even natural disasters that might impact data centers.

The importance of disaster recovery planning and execution is critical to every business for a myriad of reasons:

Business Continuity: Ensure that your website remains accessible or can be quickly restored to minimize disruptions to customer service or your day-to-day business.

Data Protection: Safeguard data from corruption or loss.

Regulatory Compliance: Some industries like finance, utilities, and healthcare may mandate DR capabilities for compliance reasons and require annual audits.

Azure, Microsoft's cloud computing platform, provides many options for Disaster Recovery (DR) planning and mitigation.  Microsoft has data centers in dozens of regions across the world.   A true DR strategy should leverage multiple regions or data centers to protect you should one region go down.

1. Geo-Redundant Storage (GRS)
For asset-heavy websites, Azure offers Geo-Redundant Storage, which replicates your data to a secondary region (hundreds of miles away from the primary location). In the event of a data center disaster, GRS provides six copies of your data across two Azure regions.  This works great for backing up digital assets, documents, and other files stored on your website. 

2. Azure SQL Zone Redundancy
In premium tiers of Azure SQL, Microsoft offers Zone Redundancy meaning that there will be multiple copies of your database within an Azure Data Center.  This protects you if a certain database server or zone fails or when Azure is performing maintenance on a certain server.  Unfortunately, this does not protect you if the entire data center or region has problems. 

3. Auto-Failover SQL Database Groups
Azure auto-failover groups allow you to manage the replication and failover of all your databases from one data center to another data center in another region.  When enabled, Azure will automatically replicate your data from the primary database to your replicated database at set intervals.

4. Azure Traffic Manager and Azure Front Door
To maintain high availability, networking gateways like Azure Traffic Manager and Azure Front Door direct user traffic to the primary data center or, if that's unavailable, to the failover region. This ensures users still access your website even if one data center experiences issues.  Azure Front Door also has additional features like Web Application Firewalls and Content Delivery Networks to help protect against malicious traffic and unexpected traffic surges.

None of these services are a one-stop disaster recovery solution but are each individual tools in your arsenal to deploy strategies for true disaster recovery planning.  Each also has its own benefits and costs associated with them.  Your final solution will depend on your website platform, organizational needs, risk tolerance, and, of course, budgets.

SilverTech hosts over 500 websites in our various Cloud Data Centers and have fine-tuned our approach to disaster recovery for the various Digital Experience Platforms (DXP) we work with. 

A typical solution would involve:

  • Application Services replicated to at least 2 Azure regions (Us East and Us West for example)
  • Azure SQL Failover groups configured for at least 2 regions.
  • Geo-redundant storage and backups
  • Azure Front Door configured to direct traffic to the healthy region.
  • Web Application Firewall configured for the current DXP.
  • Content Delivery Network configured for the current DXP.

This combination of services allows us to offer a secure, resilient hosting environment with automatic DR failover at an affordable price to many organizations.

flow chart for azure front door

In addition to planning for DR and configuring your infrastructure to support it, ongoing maintenance is essential.

Regular Testing: Simulate disaster scenarios and test your recovery processes to ensure they work as expected.

Update & Review: Regularly update your DR plan, especially as your website evolves or Azure introduces new services/features.

Documentation: Have a clear, documented process that everyone can follow during disaster recovery.

Keep Security Front and Center: During DR, it's essential to ensure that your data remains secure.

While disasters are unpredictable, their impact on your website doesn't have to be. With the right DR strategy, leveraging Azure's vast array of tools and services, you can ensure that your website remains available, resilient, and secure, no matter what challenges arise. Always remember - it's not about preventing disasters but being prepared for when they do finally occur. Learn about our hosting and managed services and contact us today for a consultation.  


Meet the Author: Derek Barka

This website uses cookies in order to offer you the most relevant information. Please "Accept & Continue" for optimal site performance.