Summary of pool failover and failback for Skype for Business

3
4389

The following summary provides details about the process and commands provided by Skype for Business and Lync to manage failover and failback of services in a DR situation.

Summary of steps required for a pool failover

  1. If the failed pool has an Edge server associated, first move the association
    • Set-CsEdgeServer -Identity EdgeServer:<Edge pool FQDN> -Registrar Registrar:<NextHopPoolFQDN>
  2. Determine which pool hosts the Central Management Store (CMS)
    • Invoke-CsManagementServerFailover -Whatif
  3. Determine the backup pool relationship
    • Get-CsPoolBackupRelationship -PoolFQDN <Pool FQDN>
  4. Check the CMS status – if ActiveMasterFQDN and ActiveFileTransferAgents are empty then will will need to fail over the CMS
    • Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus
  5. If a mirror is used check which SQL server is the principle and use that server in the following failover command
    • Get-CsDatabaseMirrorState -DatabaseType Centralmgmt -PoolFqdn <Backup_Pool Fqdn>
  6. Failover the CMS
    • Invoke-CSManagementServerFailover -BackupSQLServerFqdn <BackEnd SQL Server FQDN> -BackupSQLInstanceName <Backend SQL Instance Name>
  7. Verify failover is successful – ActiveMasterFQDN and ActiveFileTransferAgents should be populated
    • Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus
  8. Verify that we now have CMS replication and that all replicas have a value of True
    • Get-CsManagementStoreReplicationStatus
  9. Failover users
    • Invoke-CsPoolFailover -PoolFQDN <Failed Pool FQDN> -DisasterMode -Verbose

 

Summary of steps required for a pool failback

  1. Failback users
    • Invoke-CsPoolFailback -PoolFQDN <Pool1 FQDN> -Verbose

 

Note: The CMS does not need to be failed back as it can run on any server in the topology.

More information can be found here – https://technet.microsoft.com/en-us/library/jj204678(v=ocs.15).aspx

 

Summary of steps required for a Response Group failover

  1. Pool failover should be completed first
  2. A pre-requisit to this process is that regular backups of the RGS Configuration have been taken. Backups are taken using the following commandlet:
    • Export-CsRgsConfiguration -Source “service:ApplicationServer:<primary pool FQDN>” -FileName “<backup path and file name>”
  3. During an outage, after failover to the backup pool, import the response groups to the backup pool (use the -ReplaceExistingSettings switch if you want to replace application level settings on the backup pool):
    • Import-CsRgsConfiguration -Destination “service:ApplicationServer:<backup pool FQDN>” -FileName “<backup path and file name>”
  4. Check that the response groups that are owned by the failed pool and now active on the backup pool:
    • Get-CsRgsWorkflow -Identity “service:ApplicationServer:<backup pool FQDN>” -ShowAll

 

Summary of steps required for a Response Group failback

  1. Pool failback should be completed first
  2. In case of any changes while failed over, export the RGS configuration from the backup pool:
    • Export-CsRgsConfiguration -Source ApplicationServer:<backup pool FQDN> -Owner ApplicationServer:<primary pool FQDN> -FileName “<backup path and file name>”
  3. Import the response groups back to the primary pool (use the -ReplaceExistingSettings switch if you want to replace application level settings on the backup pool):
    • Import-CsRgsConfiguration -Destination “service:ApplicationServer:<primary pool FQDN>” -OverwriteOwner -FileName “<exported path and file name>”
  4. Check that the response groups that are owned by the failed pool and now active on the now recoverd pool:
    • Get-CsRgsWorkflow -Identity “service:ApplicationServer:<backup pool FQDN>” -ShowAll

 

More information can be found here – https://technet.microsoft.com/en-us/library/jj205186(v=ocs.15).aspx

 

 

3 COMMENTS

  1. Thanks for the to the point article. I am facing an issue with failling back my pool, I keep getting an error for the backup service on one of front end server in the pool.

  2. Too bad Andrew that these things are never that easy and straightforward in the real life as you explained in this post. For example lets imagine the highly likely nightmware, when a CMS failover dies in the middle of the process (so doesnt finish 100%), and then its not Active in the backup pool not in the source pool anymore.. Which means in plain language: you are screwed. And these things happen in real world quite frequently, shame on Microsoft for this. I am just about to solve this since a couple of hours, and its so frustrating all articles consider just the ideal dream world, where these failovers just work flawlessly, and no need to dedicate a couple of posts for those disaster scenarios when things do break actually.

LEAVE A REPLY