Archive

Archive for the ‘HADR’ Category

When SQL Server AlwaysOn for HADR is not always suitable

November 22, 2013 Comments off

Contrary to popular perception on how smooth SQL AlwaysOn can be for HADR scenarios, my experiences with it has been not so great when it comes to DR situation. AlwaysOn will need meticulous design, planning and DR procedures (TEST, TEST, and TEST) to ensure smooth failover/failback.

I embarked on designing and implementing SQL 2012 HADR solution across two sites based on white paper FCI+AG. My implementation was 2 node SQL VM FCI at each site and had additional complexities –

  • HyperV Virtualization
  • Only Guest Failover Clustering
  • SQL Disks presented directly to the guest
  • Mount Points to circumvent limited drive letters

You might be asking me WHY FOR HEAVENS SAKE? Yes I did as well. It was a strict customer requirement to virtualize everything. Virtualizing also helped reduce SQL Licensing cost (for core counts). FCI will require Guest clustering hence the disks need to be direct attached to the guest from a shared SAN with multipath IO. Redundancy was built in at each physical and logical layers sufficiently (rather excessively).

Figure 1: High Level SQL Solution

Note: Storage replication was strictly to transport backups between site

Although it was complex I successfully implemented it and tested the failover and failback within the stipulated RPO/RTO. But soon after we started experiencing intermittent disruptions at host level resulting in unexpected power on/off of SQL nodes. This resulted in some databases on DR appear in RECOVERY PENDING status. Resolution for which is to remove database from AG or recreate the AG. Important fact is the server instance on which the secondary replica is hosted will be unavailable in such case. Your upstream applications are using AG Listener and any changes or reverting to instance name will have further rework, testing and downtime.

Lesson learnt –

  • You may want a second local server instance to host a secondary replica. This should help you restore and recreate AG Databases without having to entirely rely on second site. But beware of the licensing implications and cost of doubling compute, storage and network resources at each site
  • Alternatively completely forgo FCI
  • Look at further abstraction from AG such as SQL ALIAS for upstream applications
  • Have a strong SQL DBA monitor and oversee SQL environment and SQL DR procedures instead of your platform guys

Question is though does it really pay to have AlwaysOn HADR? Using Storage Replication technologies for syncing backups/Tlogs and manual DR restore maybe smoother in some cases. What are your thoughts?