SQL Server AlwaysON Group Failover in Cluster causes Node to lose network connectivity

| 0 comments

Last week, I worked on an issue which might sound weird or strange when you hear it for the first time.

“Whenever we fail SQL Server AlwaysON Group from Cluster, Node A loses its network connectivity.” (Sounds weird, Right ???)

Here is the email which I received from my customer

We’re able to re-create the issue on demand now, as every time we try to fail the SQL cluster group over from A to B, as soon as the SQL network name resource goes offline on the A server, the default gateway on the public NIC gets deleted and the box drops off the network. This only happens when failing over from A to B (B to A does not cause any issues), and also only explicitly occurs when the SQL Network resource is taken offline on the A server when being failed over to B (the core cluster name resource and group can fail over between nodes without issue).

The first obvious thing which comes to our mind is, it can’t be SQL Server application causing this since it sits on top on Windows platform & the failover mechanism doesn’t have anything to do with Default Gateway setting in NICs but as Sherlock Homes says

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

After spending few days on troubleshooting & research we discovered & found the root cause of the issue was incorrect IP routing on Cluster Nodes.

More Specifically,

The issue was caused due to SQL Resource IP registered in persistent route of Public NIC for Node A. This incorrect persistent route registered for Public NIC on Node A which is not bound to the Interface causes the Active Route & Persistent Route to go down when the SQL IP resource goes offline or is failed over

More Details on this issue can be found in the blog

http://blogs.technet.com/b/networking/archive/2009/05/21/active-route-gets-removed-on-windows-server-2008-offline-cluster-ip-address.aspx

http://support.microsoft.com/kb/2161341

To resolve the issue, the Windows Networking team was involved to rectify the persistent route setting & IP routing table.

Hope this helps for anyone who hears or faces this strange issue w.r.t SQL AlwaysON Cluster on Windows 2008 or above Cluster

Parikshit Savjani
Premier Field Engineer

Leave a Reply

Required fields are marked *.