You are on page 1of 33

Mic

Sol
Ava
LeRoy
Contrib
Mishra
Review
(SQLHA
Matthe
Thoma


Summa
maximiz
AlwaysO
A key go
between
infrastru
Categor
Applies
Source:
E-book
32 page

croso
ution
ailab
y Tuttle,
butors: Li
wers: Kevi
A), Alexei
ews, AyadS
s, Benjam
ry: This wh
ze applicatio
On high ava
oal of this p
n business s
ucture engin
ry: Quick G
to: SQL Se
White pap
publicatio
s
oft SQ
ns Gu
ility a
, Jr.
indsey All
n Farlee, S
Khalyako,
Shammou
min Wright
ite paper d
on availabil
ilability and
paper is to e
stakeholder
neers, and d
uide
erver 2012
er (link to s
on date: Ma

QL Se
uide
and
en, Justin
Shahryar G
, Wolfgan
ut (Caregr
t-Jones
iscusses ho
ity, and pro
d disaster re
establish a
rs, technica
database ad
source cont
ay 2012
erver
for H
Disas
Erickson,
G. Hashem
g Kutsche
roup), Dav
ow to reduc
ovide data p
ecovery sol
common co
l decision m
dministrato
ent)
r Alw
High
ster
Min He, C
mi (Motric
era (Bwin
vid P. Smit
ce planned
protection
utions.
ontext for r
makers, syst
ors.
waysO
Reco
Cephas Li
city), Allan
Party), Ch
th (Service
and unplan
using SQL S
related disc
tem archite
On
overy
n, Sanjay
n Hirt
harles
eU), Juerg
nned downt
Server 2012
ussions
ects,

y
gen
time,
2
This page intentionally left blank

Copyright 2012 by Microsoft Corporation


All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means
without the written permission of the publisher.


Microsoft and the trademarks listed at
http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the
Microsoft group of companies. All other marks are property of their respective owners.

The example companies, organizations, products, domain names, email addresses, logos, people, places, and events
depicted herein are fictitious. No association with any real company, organization, product, domain name, email address,
logo, person, place, or event is intended or should be inferred.

This book expresses the authors views and opinions. The information contained in this book is provided without any
express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will
be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv

Contents
HighAvailabilityandDisasterRecoveryConcepts.........................................................................1
DescribingHighAvailability................................................................................................................................................1
Plannedvs.UnplannedDowntime..........................................................................................................................................1
DegradedAvailability..............................................................................................................................................................2
QuantifyingDowntime.........................................................................................................................................................2
RecoveryObjectives................................................................................................................................................................3
JustifyingROIorOpportunityCost..........................................................................................................................................3
MonitoringAvailabilityHealth................................................................................................................................................4
PlanningforDisasterRecovery...............................................................................................................................................4
Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5
SQLServerAlwaysOn..............................................................................................................................................................5
SignificantlyReducePlannedDowntime.................................................................................................................................5
EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6
EasyDeploymentandManagement.......................................................................................................................................6
ContrastingRPOandRTOCapabilities....................................................................................................................................6
SQLServerAlwaysOnLayersofProtection..........................................................................................7
InfrastructureAvailability...................................................................................................................................................8
WindowsOperatingSystem....................................................................................................................................................8
WindowsServerFailoverClustering.......................................................................................................................................9
WSFCClusterValidationWizard...........................................................................................................................................11
WSFCQuorumModesandVotingConfiguration..................................................................................................................12
WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15
SQLServerInstanceLevelProtection...........................................................................................................................17
AvailabilityImprovementsSQLServerInstances...............................................................................................................17
AlwaysOnFailoverClusterInstances.....................................................................................................................................18
DatabaseAvailability..........................................................................................................................................................21
AlwaysOnAvailabilityGroups...............................................................................................................................................21
AvailabilityGroupFailover....................................................................................................................................................22
AvailabilityGroupListener....................................................................................................................................................24
AvailabilityImprovementsDatabases................................................................................................................................26
ClientConnectivityRecommendations........................................................................................................................27
Conclusion..............................................................................................................................................................................28

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1

HighAvailabilityandDisasterRecoveryConcepts
Youcanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecovery
solutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,
andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.
ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywith
MicrosoftSQLServer2012sectionofthispaper.
DescribingHighAvailability
Foragivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsofthe
endusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemay
beexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,
contractualdamages,orthelossofgoodwill.
Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.A
soundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)with
technicalcapabilitiesandinfrastructurecosts.
Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersand
stakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:
Actuol uptimc
ExpcctcJ uptimc
1uu%
Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolution
provides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesof
downtime.
Numberof9s AvailabilityPercentage TotalAnnualDowntime
2 99% 3days,15hours
3 99.9% 8hours,45minutes
4 99.99% 52minutes,34seconds
5 99.999% 5minutes,15seconds

Plannedvs.UnplannedDowntime
Systemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplanned
failure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokey
typesofforeseeabledowntime:
Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance
taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,data
loading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperational
proceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2

canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplanned
outagescenarios.
Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedor
uncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orare
consideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesof
failures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.
WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformance
indicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyou
tocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanned
downtime.
DegradedAvailability
Highavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoa
completeoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohave
limitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude:
Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster
recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybe
temporarilyhaltedorqueued.
Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,ora
partialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.User
experiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.
Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthat
retriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduser
asdatalatencyorpoorapplicationresponsiveness.
Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayers
ofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferent
functionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthe
featuresorcomponentsthatareaffected.
Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegraded
availabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.
QuantifyingDowntime
Whendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthe
systembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.With
unplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutage
occurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3

Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostop
investigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthe
systembackonline,andifneeded,reestablishfaulttolerance.
RecoveryObjectives
Dataredundancyisakeycomponentofahighavailabilitydatabasesolution.Transactionalactivityon
yourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondary
instances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybe
lostonthesecondaryinstancesduetodelaysindatapropagation.
Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackin
business,andhowmuchtimelatencythereisinthelasttransactionrecovered:
RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe
systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,
theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.
RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itis
thetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthe
mostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependinguponthe
workloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhigh
availabilitysolutionused.
YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeand
acceptabledataloss,andasmetricsformonitoringavailabilityhealth.
JustifyingROIorOpportunityCost
Thebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecosts
mayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditionto
projectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcan
alsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPO
goalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude:
Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe
firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,
distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventive
maintenance.
Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeon
thecustomerexperiencethroughautomaticandtransparentrecovery.
Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocan
beleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributing
workloadsacrossallavailablehardware.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4

ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththe
projectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactual
outage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.
MonitoringAvailabilityHealth
Fromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderall
relevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordata
latencyonyourstandbyinstancesasaproxyforexpectedRPO.
Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduring
theoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupon
detailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.
PlanningforDisasterRecovery
Whilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddress
whatisdonetoreestablishhighavailabilityaftertheoutage.
Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforean
actualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedor
manualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescope
ofasounddisasterrecoveryplanshouldinclude:
Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake
correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,or
workload.
Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,and
diagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.
Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethe
systemandbusinessdependencies?
Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesrole
responsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecovery
steps.
Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthe
systemhasreturnedtonormaloperations?
Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailand
claritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistype
ofdocumentationiscommonlyreferredasarunbookoracookbook.
Recoveryrehearsals.Regularlyexercisethedisasterrecoveryplantoestablishbaselineexpectations
forRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimary
andeachofthedisasterrecoverysites.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5

Overview:HighAvailabilitywithMicrosoftSQLServer2012
AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplications
andprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetof
featuresandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.
ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothe
deepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.
SQLServerAlwaysOn
AlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.It
canprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplication
failovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibility
inconfigurationandenablesreuseofexistinghardwareinvestments.
AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityat
boththedatabaseandtheinstancelevel:
AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase
mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodataloss
throughlogbaseddatamovementfordataprotectionwithoutshareddisks.
Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofa
logicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,and
automaticpagerepair.
AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureand
supportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServer
instances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfaster
applicationrecovery.
SignificantlyReducePlannedDowntime
Thekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperating
systempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentofthe
outagesinanITenvironment.
SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsand
enablingmoreonlinemaintenanceoperations:
WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,
streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.This
operatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystem
patchingrequirementsbyasmuchas60percent.
OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumns
withdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6

RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingof
instances,whichhelpssignificantlytoreduceapplicationdowntime.
SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivethe
additionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhosts
withzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithout
impactingapplications.
EliminateIdleHardwareandImproveCostEfficiencyandPerformance
Typicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOn
AvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleservers
forreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.The
abilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimprove
performanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardware
investments.
EasyDeploymentandManagement
FeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandline
interface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystem
Centerintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.
ContrastingRPOandRTOCapabilities
ThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekey
driversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.
Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:
HighAvailabilityandDisasterRecovery
SQLServerSolution
Potential
DataLoss
(RPO)
Potential
Recovery
Time(RTO)
Automatic
Failover
Readable
Secondaries
(1)

AlwaysOnAvailabilityGroupsynchronouscommit

Zero Seconds Yes


(4)
02
AlwaysOnAvailabilityGroupasynchronouscommit

Seconds Minutes No 04
AlwaysOnFailoverClusterInstance
NA
(5)
Seconds
tominutes
Yes NA
DatabaseMirroring
(2)
Highsafety(sync+witness)

Zero Seconds Yes NA


DatabaseMirroring
(2)
Highperformance(async)

Seconds
(6)
Minutes
(6)
No NA
LogShipping
Minutes
(6)
Minutes
tohours
(6)

No Notduring
arestore
Backup,Copy,Restore
(3)

Hours
(6)
Hours
todays
(6)

No Notduring
arestore
(1)
AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.
(2)
ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.

(3)
Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.
(4)
Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.
(5)
TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.
(6)
Highlydependentupontheworkload,datavolume,andfailoverprocedures.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7

SQLServerAlwaysOnLayersofProtection
SQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogical
andphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommon
practicetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,
suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.
Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,and
toofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.
AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers:
Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages
WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination.
SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer
instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.Thenodesthat
hosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).
Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailability
groupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.Eachreplicaishostedbyan
instanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.
Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstance
networkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailability
grouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,
logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.
ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8

InfrastructureAvailability
BothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindows
ServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successful
MicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.
WindowsOperatingSystem
SQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfor
networking,storage,security,patching,andmonitoring.
ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesand
capacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer
2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,and
WindowsServer2008R2Datacenteroperatingsystem.
Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer
2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).
WindowsServerCoreInstallationOption
Asakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallation
optioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimal
environmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplication
support.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.
Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcan
significantlyreduceongoingmaintenance,servicing,andpatchingrequirements.
AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,
configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbe
doneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineor
remotetools.
OptimizingSQLServerforPrivateCloud
HighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloud
environment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkand
storageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperational
expenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresources
ondemandwithoutcompromisingcontrol.
InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQL
ServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithno
discernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.
Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivate
Cloud(http://www.microsoft.com/SqlServerPrivateCloud).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9

WindowsServerFailoverClustering
WindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehigh
availabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.
IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbe
automaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.With
AlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.
ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities:
Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais
maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusin
additiontohostedapplicationsettings.Changestothemetadataorstatusononenodeare
automaticallypropagatedtotheothernodesinthecluster.
Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchas
directattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hosted
applications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigure
startupandhealthdependenciesuponotherresources.
Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthrougha
combinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverall
healthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.
Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbe
automaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailover
policycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhosted
applicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.
Formoreinformation,seeWindowsServer|FailoverClusteringandNode
Balancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).
Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFC
clustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecovery
stepsareallintrinsicallytiedtoyourWSFCconfiguration.
WSFCStorageConfigurations
WindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnected
storagedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremely
robust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeis
consideredtobeatfault.
Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusinga
SCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifa
nodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10

SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfiguration
combinations,including:
Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey
arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).Remotestorage
technologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,as
wellasServerMessagingBlock(SMB)filesharebasedsolutions.
Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldisk
volumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysical
implementationandcapacityoftheunderlyingdiskvolumescanvary.
Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthe
cluster.Sharedstorageisaccessibletomultiplenodesinthecluster.Controlandownershipof
compliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3
protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfile
sharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoashared
volume.
Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenode
ownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannow
deploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,
localorremotestorage.
WSFCResourceHealthDetectionandFailover
EachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.A
varietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemory
errors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.
YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentuponone
another.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththe
healthofeachofitsresourcedependencies.
ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregistered
asWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQL
ServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentuponthe
instancesvirtualnetworknameresource.
IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,the
configuredfailoverpolicycausestheclusterservicetodooneofthefollowing:
Restarttheresourceonthecurrentnode.
Settheresourceoffline.
Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11

Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthorthe
overallhealthofthecluster.
WSFCClusterValidationWizard
TheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer
2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensure
thataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.
Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofservers
thatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlying
hardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFC
clusterwouldbesupportedonagivenconfiguration.
Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories:
Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating
systemversions,devices,services,drivers,andsoon.
Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall
configuration.ValidatesinternodecommunicationsonallNICs.
Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI
commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration.
Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory
dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,
andservicepackandWindowsSoftwareUpdatelevels.
Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,
tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.
YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.
YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyou
installSQLServer,andasapartofanydisasterrecoveryprocess.Aclustervalidationreportisrequired
byMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFC
clusterconfiguration.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster
(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).
Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeo
clusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedto
applyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidation
steps.
Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailability
Groups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12

WSFCQuorumModesandVotingConfiguration
WSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfault
tolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisvery
importanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisaster
recoverysolution.
ClusterHealthDetectionbyQuorum
EachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode's
healthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.
AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.Theoverallhealth
andstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeans
thattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.
Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbe
maintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailover
to.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.Thisalso
causesallSQLServerinstancesregisteredwiththeclustertobestopped.
Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobring
itbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsection
laterinthispaper.
QuorumModes
AquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorum
voting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodes
inthecluster.
Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes:
NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe
clustertobehealthy.
NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile
shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalso
countedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.
Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshould
bevisibletoallnodesinthecluster.
NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskcluster
resourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskis
alsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13

DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodeto
thatshareddiskiscountedasanaffirmativevote.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuorumina
Cluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).
Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,
youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,
ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.
VotingandNonVotingNodes
Bydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,file
sharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.Thequorum
discussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteon
clusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.
EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeinthe
clustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygiven
moment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,or
appeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunication
failure.Akeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodein
theWSFCclusterisindeedthatactualstateofthosenodes.
ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliable
communicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenall
nodesareonthesamephysicalsubnet.
However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactually
onlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetween
subnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,that
networkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.
Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasa
splitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,and
inconflictwithoneanother.
Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforced
quorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthe
quorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorum
sectionlaterinthispaper.
Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodes
NodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14

RecommendedAdjustmentstoQuorumVoting
Todeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,in
sequentialorder:
1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.
2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris
thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.
3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as
theresultofanautomaticfailover,shouldhaveavote.
4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary
disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisionto
taketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.
5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQL
Serverinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossible
tiesinthequorumvote.
6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfiguration
thatdoesnotsupportahealthyquorum.
Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeight
Settings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).
Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodeto
includeorexcludeitsvote.
Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyou
administersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.
Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff878305(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15

WSFCDisasterRecoverythroughForcedQuorum
Quorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolving
severalnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQL
Serverinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausethecluster
cannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFC
clusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemay
havejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilityto
communicatewithaquorum.
TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleast
onenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureor
identifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFC
clustertoreflectthesurvivingclustertopologyaswell.
YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthat
tooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andlets
youbringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.
Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:
1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare
nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthen
examinetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshould
preserveforensicdataandsystemlogsforlateranalysis.
2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,
manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotential
dataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.
Formoreinformation,seeForceaWSFCClustertoStartWithouta
Quorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).
Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFC
clusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeof
operation.
3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothaveto
specifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.
AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodesto
synchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetoprevent
potentialraceconditionsinresolvingthelastknownstateofthecluster.
Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,or
youruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyour
findingsinstep1areaccurate,thisshouldnotoccur.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16

4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesinthe
clusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorum
failure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.
Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,
andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFC
clusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.
Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestored
backtoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverCluster
Manager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriate
DMVs,verifythatahealthyquorumhasbeenrestored.
5) Recoveravailabilitygroupdatabasereplicasasneeded.Somedatabasesmayrecoverandcome
backonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofother
databasesmayrequireadditionalmanualsteps.
Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringing
thembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,
asynchronoussecondaryreplicas.
6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfrom
theinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjust
relatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroup
replicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.
Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncate
thetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailed
replicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyou
willruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.
7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhigh
availabilityforhealthyoperations.
8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,and
Windowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPoint
andRecoveryTimeexperiences.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17

SQLServerInstanceLevelProtection
ThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilities
andfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServer
infrastructurecomponents.
AvailabilityImprovementsSQLServerInstances
ThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOn
FailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.
Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios:
FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure
detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityof
afailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactsthe
SQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServer
internalcomponenterror.
Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:server
down,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.The
FailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.
PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;any
servicelevelfailurecausedfailover.
Formoreinformation,seeFailoverPolicyforFailoverClusterInstances
(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).
Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystem
configurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthat
capturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOn
deployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.
Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions
(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes
(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).
SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefile
shareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedrive
letterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageona
physicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/O
performancecanverynearlyapproximatethatofdirectattachedstorage.
Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe
scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabaseson
filesharesitstimetoreconsiderthescenario.aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18

Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServer
resourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthe
filesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.
WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygroup
listenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIP
addresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtual
networkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIP
addressinavaryingroundrobinsequence.
AlwaysOnFailoverClusterInstances
TheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailability
ofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.
AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailover
Clustering(WSFC)cluster,butonlyactiveononenodeatatime.Clientapplicationsconnecttoavirtual
networknameandvirtualIPaddressthatareownedbytheactiveclusternode.
EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCcluster
servicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeach
installednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstance
anditsresources,withinapreferredfailoversequence.
Databasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththe
WSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.
Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/en
us/library/ms189134(SQL.110).aspx).
FCIFailoverProcess
Ifadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFC
clusterserviceusingthishighlevelprocesstodoafailover:
1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration
indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeis
initiated.Atimeoutintherestartattemptindicatesaresourcefailure.
2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.
3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer
serviceisattempted.
4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand
itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednode
owneroftheFCI.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19

5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartup
procedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputs
theresourceonthisnewnodeinafailedstate.
6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymode
whiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.
FCIImprovements
PreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeature
enhancementsinSQLServer2012improveavailabilityrobustnessandserviceability:
Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone
subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetwork
interfaceisavailable;thisisknownasanORclusterresourcedependency.
PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServer
servicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.
Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnet
clustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicate
dataandcoordinatestoragefailoverbetweenclusternodes.
Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverCluster
Instance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012
alwayson_3a00_multisitefailoverclusterinstance.aspx).
Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnection
toeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystem
storedprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnostic
information.
PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasa
simpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanew
SQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailureto
connect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailable
diagnosticinformation.
Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/en
us/library/ff878233(SQL.110).aspx).
ThereisnowbroadersupportforFCIstoragescenarios:
Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The
specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresource
dependencyduringsetup.
tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,such
asalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20

PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstorage
volumethatfailedoverwithothersystemdatabases.
Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduring
failover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotential
nodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21

DatabaseAvailability
ThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponents
worktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetof
optionsforexplicitlyprotectingdatabasedataanddatatierapplications.
AlwaysOnAvailabilityGroups
AnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstanceto
anotherwithinthesameWSFCcluster.Clientapplicationscanconnecttotheavailabilitygroups
databasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstracts
theunderlyingSQLServerinstances.
AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,
failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServer
instancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,andit
doesnotrequiretheuseofsymmetricalsharedstorage.
Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff877884(SQL.110).aspx).
AvailabilityReplicasandRoles
EachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyofthe
userdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafrom
agivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQL
Serverinstancemusthavededicated(nonshared)storagevolumes.
Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyof
theavailabilitygroupdatabasesandisenabledforread/writeoperations.
Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateach
separatelyserveintheroleofasecondaryreplica.
AvailabilityReplicaSynchronization
Thecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeach
ofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,all
databasesintheavailabilitygroupmustbesettothefullrecoverymodel.
Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesand
transactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportion
ofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroring
endpointoneachofthesecondaryreplicanodes.
Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthe
secondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequence
number(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionlog
hasbeenhardenedandflushedtotheremotedisk.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22

Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenot
partoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocess
onthesecondaryreplicasasdatalatency.
Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailability
mode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRAN
statement:
Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall
synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheir
respectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2
synchronouscommitsecondaryreplicas.
Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butit
ensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.
Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocal
transactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondary
replicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommit
secondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.
Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbut
allowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.
Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/en
us/library/ff877931(SQL.110).aspx).
Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronization
stateofeachreplica.Youwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawith
asynchronizationstateofanythingotherthanSynchronizedorSynchronizing.
Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondary
replicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itis
temporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoes
notimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicais
healthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommit
modeoperations.
AvailabilityGroupFailover
Theavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesinthe
WSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealth
andfailoverpolicyoftheprimaryreplica.
AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseverity
tolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththe
sp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23

Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanother
node,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeover
theroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredto
thatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.
Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplica
hasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.Thisreplicahealth
informationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_states
systemview.
Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhen
failoverisindicated.
Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn
configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedand
synchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicarole
istransferredtoasecondaryreplicawithoutanyuserintervention.
Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbeset
tosynchronouscommitavailabilitymode.Thesynchronizationstatebetweenthereplicasmustbe
Synchronized.Additionally,theWSFCclustermusthaveahealthyquorum.
AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.Thisis
blockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.
Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakea
decisiontodeliberatelyfailovertoasecondaryreplicaornot.
Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:
o Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth
theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionally
equivalenttoanautomaticfailover.
o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatis
possibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitis
notsynchronizedwiththeprimaryreplica.
Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimary
replicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicas
tosynchronouscommitandthenperformaplannedmanualfailover.
Formoreinformation,seePerformaForcedManualFailoverofanAvailability
Group(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24

Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimary
replicaorthesecondaryreplicathatyouwanttofailoverto:
Failovermodeissettomanual.
Availabilitymodeissettoasynchronouscommit.
ReplicaresidesonanFCI.
Formoreinformation,seeFailoverModes(AlwaysOnAvailability
Groups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).
Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,the
secondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondary
replicasuntiltheprimaryreplicaissettosynchronouscommitmode.
AvailabilityGroupListener
AnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessa
databaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceon
whichtheprimaryreplicaresides.
ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduring
configurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerare
registeredwithDNSunderthesamevirtualnetworkname.
Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworkname
astheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultina
connectiontotheSQLServerinstancethatishostingtheprimaryreplica.
Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmapto
thevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitis
successful,oruntilitreachestheconnectiontimeout.Theclientwillattempttomaketheseconnections
inparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.
Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygroup
listenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointis
boundtothenewinstancesvirtualIPaddressesandTCPports.
Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/en
us/library/hh213417(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25

ApplicationIntentFiltering
Whileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentis
tobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,
thedefaultapplicationintentfortheclientisreadwrite.
Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnection
accesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,
invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServer
shouldfilteroutclientconnectionrequestsusingthefollowingrules.
Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.
Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto:
Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery.
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.
Formoreinformation,seeConfigureConnectionAccessonanAvailability
Replica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).
ApplicationIntentReadOnlyRouting
AkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandby
hardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyour
secondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimary
replicas.
Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,
databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,
operationalsupport,andadhocqueries.
Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServer
instanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedto
redirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailable
secondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.
Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisbound
totheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.
Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQL
Server)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26

AvailabilityImprovementsDatabases
SQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationand
capabilities.
Thefollowingimprovementreducesrecoverytime:
PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused
tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccurs
periodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofa
restartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeach
checkpoint,andincreasingrecoverytime(RTO)predictability.
PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,
irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.
Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/en
us/library/ms189573(SQL.110).aspx).

Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime:
OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),
varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline.
OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha
defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystem
metadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.
SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorre
indexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.

Thereisanexampleofbroadersupportforstoragescenarios:
AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit
unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesof
errorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfroma
differentavailabilityreplica.
SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhanced
tosupportmultiplereplicas.
Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/en
us/library/bb677167(SQL.110).aspx).

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27

ClientConnectivityRecommendations
FollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012
AlwaysOntechnologies:
AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)
protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOn
features.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,
andtheSQLNativeClient11.0.

Connectionproviderproperty:MultiSubnetFailover=True.Usethiskeywordinyourconnection
stringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatare
registeredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.

Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonly
workloadsfromyourprimaryreplicaontothesecondaryreplicas.

Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallel
connectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachof
themsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.
Youshouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotential
sequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15
seconds+21secondsforeverysecondaryreplica.

MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28

Conclusion
Thiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanned
downtime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012
AlwaysOnhighavailabilityanddisasterrecoverysolutions.
Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailable
databaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecovery
TimeObjectives(RTO).
SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevel
thatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,ina
mannerthatcanbewelljustifiedusingRPOandRTOgoals.

For more information:
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter

Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.



Version 1.1, 21 February 2012.

You might also like