Summary of pool failover and failback for Skype for Business

June 7, 2016

20319

The following summary provides details about the process and commands provided by Skype for Business and Lync to manage failover and failback of services in a DR situation.

Summary of steps required for a pool failover

If the failed pool has an Edge server associated, first move the association
- Set-CsEdgeServer -Identity EdgeServer:<Edge pool FQDN> -Registrar Registrar:<NextHopPoolFQDN>
Determine which pool hosts the Central Management Store (CMS)
- Invoke-CsManagementServerFailover -Whatif
Determine the backup pool relationship
- Get-CsPoolBackupRelationship -PoolFQDN <Pool FQDN>
Check the CMS status – if ActiveMasterFQDN and ActiveFileTransferAgents are empty then will will need to fail over the CMS
- Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus
If a mirror is used check which SQL server is the principle and use that server in the following failover command
- Get-CsDatabaseMirrorState -DatabaseType Centralmgmt -PoolFqdn <Backup_Pool Fqdn>
Failover the CMS
- Invoke-CSManagementServerFailover -BackupSQLServerFqdn <BackEnd SQL Server FQDN> -BackupSQLInstanceName <Backend SQL Instance Name>
Verify failover is successful – ActiveMasterFQDN and ActiveFileTransferAgents should be populated
- Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus
Verify that we now have CMS replication and that all replicas have a value of True
- Get-CsManagementStoreReplicationStatus
Failover users
- Invoke-CsPoolFailover -PoolFQDN <Failed Pool FQDN> -DisasterMode -Verbose

Summary of steps required for a pool failback

Failback users
- Invoke-CsPoolFailback -PoolFQDN <Pool1 FQDN> -Verbose

Note: The CMS does not need to be failed back as it can run on any server in the topology.

More information can be found here – https://technet.microsoft.com/en-us/library/jj204678(v=ocs.15).aspx

Summary of steps required for a Response Group failover

Pool failover should be completed first
A pre-requisit to this process is that regular backups of the RGS Configuration have been taken. Backups are taken using the following commandlet:
- Export-CsRgsConfiguration -Source “service:ApplicationServer:<primary pool FQDN>” -FileName “<backup path and file name>”
During an outage, after failover to the backup pool, import the response groups to the backup pool (use the -ReplaceExistingSettings switch if you want to replace application level settings on the backup pool):
- Import-CsRgsConfiguration -Destination “service:ApplicationServer:<backup pool FQDN>” -FileName “<backup path and file name>”
Check that the response groups that are owned by the failed pool and now active on the backup pool:
- Get-CsRgsWorkflow -Identity “service:ApplicationServer:<backup pool FQDN>” -ShowAll

Summary of steps required for a Response Group failback

Pool failback should be completed first
In case of any changes while failed over, export the RGS configuration from the backup pool:
- Export-CsRgsConfiguration -Source ApplicationServer:<backup pool FQDN> -Owner ApplicationServer:<primary pool FQDN> -FileName “<backup path and file name>”
Import the response groups back to the primary pool (use the -ReplaceExistingSettings switch if you want to replace application level settings on the backup pool):
- Import-CsRgsConfiguration -Destination “service:ApplicationServer:<primary pool FQDN>” -OverwriteOwner -FileName “<exported path and file name>”
Check that the response groups that are owned by the failed pool and now active on the now recoverd pool:
- Get-CsRgsWorkflow -Identity “service:ApplicationServer:<backup pool FQDN>” -ShowAll

More information can be found here – https://technet.microsoft.com/en-us/library/jj205186(v=ocs.15).aspx

11 COMMENTS

Dewalt kotze July 21, 2016 At 3:51 AM

Thanks for the to the point article. I am facing an issue with failling back my pool, I keep getting an error for the backup service on one of front end server in the pool.

Reply
- Andrew Morpeth July 21, 2016 At 7:17 AM
  
  Please explain a little further and I will see i I can help. Any errors?
  
  Reply
soder September 30, 2016 At 3:20 AM

Too bad Andrew that these things are never that easy and straightforward in the real life as you explained in this post. For example lets imagine the highly likely nightmware, when a CMS failover dies in the middle of the process (so doesnt finish 100%), and then its not Active in the backup pool not in the source pool anymore.. Which means in plain language: you are screwed. And these things happen in real world quite frequently, shame on Microsoft for this. I am just about to solve this since a couple of hours, and its so frustrating all articles consider just the ideal dream world, where these failovers just work flawlessly, and no need to dedicate a couple of posts for those disaster scenarios when things do break actually.

Reply
Pavel May 30, 2017 At 3:51 PM

Hi Andrew,

Could you advise please
If primary site completely lost (DR mode) do we need to make CMS failover or it will be performed automatically?

Thanks.

Reply
- Andrew Morpeth May 30, 2017 At 7:04 PM
  
  Hey Pavel,
  
  If the CMS is located on the failed sites pool, then you will need to fail over the CMS.
  
  Reply
Richard Black June 15, 2017 At 9:07 PM

Hi,

This is a very useful blog. The steps being distilled and put all together like this is very helpful.

In terms of response groups, if we wanted to simply move all the workflows etc to the backup pool permanently, would we just run the import on the backup pool with the -OverwriteOwner flag? Or would we need to also delete them from the original pool after import? I guess this would also be the same process if you were building a replacement pool and decomissioning an old one?

I am facing a scenario where the network infrastructure at our primary site needs to undergo sustained maintenance (many days). Because our RGs need constant updating due to the way our teams work, it seems like it will be more sensible to move thing rather than fail over.

Thanks,

Richard

Reply
- Andrew Morpeth June 16, 2017 At 10:27 AM
  
  Yes you are correct. Its optional to remove them from the original pool, but its probably worthwhile to keep things tidy. You can use the -RemoveExportedConfiguration parameter in the Export-CsRgsConfiguration command. When importing you may want to also use the -ReplaceExistingSettings command to copy application level settings across. This article explains things pretty well – https://technet.microsoft.com/en-us/library/jj205298(v=ocs.15).aspx
  
  Hope that helps!
  
  Reply
  - Richard Black June 21, 2017 At 10:52 PM
    
    Thanks Andrew, thats really helpful. Am I also right in thinking it doesn’t matter which pool the agents are homed on and i could move the RGs either before or after the users?
    
    Richard
    
    Reply
    - Andrew Morpeth June 26, 2017 At 8:05 AM
      
      Yes correct. I normally do the users then the Response Groups during fail over testing.
      
      Reply
Ryan Bess November 15, 2017 At 8:35 AM

Is step 1 100% required. For instance, you’re in a COOP drill and you are manually failing out to your paired pool as part of a planned exercise (thus there is no down Front End pool). Do you still need to execute Step 1 or will the Invoke-cs command do all the work.

Reply
- Andrew Morpeth November 17, 2017 At 7:18 AM
  
  No you don’t need to do this. If an FE was in a failed state and it had an Edge associated to it, then it would need to be moved so that it stays in service. For your testing purposes its not a requirement, although for an end to end test id probably shut down an FE to simulate failure and run through all steps.
  
  Reply

Summary of pool failover and failback for Skype for Business

Summary of steps required for a pool failover

Summary of steps required for a pool failback

Summary of steps required for a Response Group failover

Summary of steps required for a Response Group failback

Related Articles

11 COMMENTS

LEAVE A REPLY Cancel reply

Latest Articles

Categories