Summary of pool failover and failback for Skype for Business

The following summary provides details about the process and commands provided by Skype for Business and Lync to manage failover and failback of services in a DR situation.

Summary of steps required for a pool failover

  1. If the failed pool has an Edge server associated, first move the association
    • Set-CsEdgeServer -Identity EdgeServer:<Edge pool FQDN> -Registrar Registrar:<NextHopPoolFQDN>
  2. Determine which pool hosts the Central Management Store (CMS)
    • Invoke-CsManagementServerFailover -Whatif
  3. Determine the backup pool relationship
    • Get-CsPoolBackupRelationship -PoolFQDN <Pool FQDN>
  4. Check the CMS status – if ActiveMasterFQDN and ActiveFileTransferAgents are empty then will will need to fail over the CMS
    • Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus
  5. If a mirror is used check which SQL server is the principle and use that server in the following failover command
    • Get-CsDatabaseMirrorState -DatabaseType Centralmgmt -PoolFqdn <Backup_Pool Fqdn>
  6. Failover the CMS
    • Invoke-CSManagementServerFailover -BackupSQLServerFqdn <BackEnd SQL Server FQDN> -BackupSQLInstanceName <Backend SQL Instance Name>
  7. Verify failover is successful – ActiveMasterFQDN and ActiveFileTransferAgents should be populated
    • Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus
  8. Verify that we now have CMS replication and that all replicas have a value of True
    • Get-CsManagementStoreReplicationStatus
  9. Failover users
    • Invoke-CsPoolFailover -PoolFQDN <Failed Pool FQDN> -DisasterMode -Verbose

 

Summary of steps required for a pool failback

  1. Failback users
    • Invoke-CsPoolFailback -PoolFQDN <Pool1 FQDN> -Verbose

 

Note: The CMS does not need to be failed back as it can run on any server in the topology.

More information can be found here – https://technet.microsoft.com/en-us/library/jj204678(v=ocs.15).aspx

 

Summary of steps required for a Response Group failover

  1. Pool failover should be completed first
  2. A pre-requisit to this process is that regular backups of the RGS Configuration have been taken. Backups are taken using the following commandlet:
    • Export-CsRgsConfiguration -Source “service:ApplicationServer:<primary pool FQDN>” -FileName “<backup path and file name>”
  3. During an outage, after failover to the backup pool, import the response groups to the backup pool (use the -ReplaceExistingSettings switch if you want to replace application level settings on the backup pool):
    • Import-CsRgsConfiguration -Destination “service:ApplicationServer:<backup pool FQDN>” -FileName “<backup path and file name>”
  4. Check that the response groups that are owned by the failed pool and now active on the backup pool:
    • Get-CsRgsWorkflow -Identity “service:ApplicationServer:<backup pool FQDN>” -ShowAll

 

Summary of steps required for a Response Group failback

  1. Pool failback should be completed first
  2. In case of any changes while failed over, export the RGS configuration from the backup pool:
    • Export-CsRgsConfiguration -Source ApplicationServer:<backup pool FQDN> -Owner ApplicationServer:<primary pool FQDN> -FileName “<backup path and file name>”
  3. Import the response groups back to the primary pool (use the -ReplaceExistingSettings switch if you want to replace application level settings on the backup pool):
    • Import-CsRgsConfiguration -Destination “service:ApplicationServer:<primary pool FQDN>” -OverwriteOwner -FileName “<exported path and file name>”
  4. Check that the response groups that are owned by the failed pool and now active on the now recoverd pool:
    • Get-CsRgsWorkflow -Identity “service:ApplicationServer:<backup pool FQDN>” -ShowAll

 

More information can be found here – https://technet.microsoft.com/en-us/library/jj205186(v=ocs.15).aspx

 

 

Andrew Morpeth
Andrew Morpethhttps://ucgeek.co/author/amorpeth/
Andrew is a Modern Workplace Consultant specialising in Microsoft technologies based in Auckland, New Zealand; Andrew is a Director and Professional Services Manager at Lucidity Cloud Services and a Microsoft MVP.

Related Articles

11 COMMENTS

  1. Thanks for the to the point article. I am facing an issue with failling back my pool, I keep getting an error for the backup service on one of front end server in the pool.

  2. Too bad Andrew that these things are never that easy and straightforward in the real life as you explained in this post. For example lets imagine the highly likely nightmware, when a CMS failover dies in the middle of the process (so doesnt finish 100%), and then its not Active in the backup pool not in the source pool anymore.. Which means in plain language: you are screwed. And these things happen in real world quite frequently, shame on Microsoft for this. I am just about to solve this since a couple of hours, and its so frustrating all articles consider just the ideal dream world, where these failovers just work flawlessly, and no need to dedicate a couple of posts for those disaster scenarios when things do break actually.

  3. Hi Andrew,

    Could you advise please
    If primary site completely lost (DR mode) do we need to make CMS failover or it will be performed automatically?

    Thanks.

  4. Hi,

    This is a very useful blog. The steps being distilled and put all together like this is very helpful.

    In terms of response groups, if we wanted to simply move all the workflows etc to the backup pool permanently, would we just run the import on the backup pool with the -OverwriteOwner flag? Or would we need to also delete them from the original pool after import? I guess this would also be the same process if you were building a replacement pool and decomissioning an old one?

    I am facing a scenario where the network infrastructure at our primary site needs to undergo sustained maintenance (many days). Because our RGs need constant updating due to the way our teams work, it seems like it will be more sensible to move thing rather than fail over.

    Thanks,

    Richard

  5. Is step 1 100% required. For instance, you’re in a COOP drill and you are manually failing out to your paired pool as part of a planned exercise (thus there is no down Front End pool). Do you still need to execute Step 1 or will the Invoke-cs command do all the work.

    • No you don’t need to do this. If an FE was in a failed state and it had an Edge associated to it, then it would need to be moved so that it stays in service. For your testing purposes its not a requirement, although for an end to end test id probably shut down an FE to simulate failure and run through all steps.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Andrew Morpeth
Andrew Morpethhttps://ucgeek.co/author/amorpeth/
Andrew is a Modern Workplace Consultant specialising in Microsoft technologies based in Auckland, New Zealand; Andrew is a Director and Professional Services Manager at Lucidity Cloud Services and a Microsoft MVP.

Latest Articles