Understanding how Lync establishes audio/video paths using ICE

18
15769
The real time aspects of Lync require a different approach to SIP signally to ensure a quality of service. SIP signalling can be delayed without causing too many issues, however audio/video this is much more important, and to do this Lync uses Interactive Connectivity Establishment (ICE). ICE is the overall process that helps discover and exchange candidates to finds most optimal media path.

Definitions

  • SIP signalling – allows clients to send invites to other parties.
  • Interactive Connectivity Establishment (ICE) – Process used to discover and exchange
    candidates in order to find the most optimal media path.
  • Candidates – A list of possible IP addresses that could be used to establish a media path.
  • Reflective/Session Traversal Utilities for NAT (STUN) – STUN reflects or returns the public NAT address to the Lync client e.g. a home based user sends a packet to edge server, which discovers the public IP address (a candidate) , and returns it to the client.
  • Relay/Traversal Using Relays around NAT (TURN) – TURN allows the media traffic to be relayed/proxied by the Edge server to the client by providing the client a relay addresses to send media.
  • ICE endpoints – An ICE endpoint is anything that is involved in media e.g. Lync Clients, Lync Web
    App, Lync Phone, FE Server (App Sharing MCU, RGS, Call Park A/V Conf etc),
    Mediation Server, SBA, Exch UM. Session Border controllers and the director
    role would not be considered as ICE endpoints. Edge server is doing STUN and
    TURN but not an ICE endpoint, more and ICE server.

The 5 Phases of Media Path
Establishment

1. TURN provisioning and credentials (MRAS)

The Lync client does an SRV lookup to find an Edge server to register against and then performs a SIP register. The server provides a 200 OK which includes in band provisioning details, including MRAS (media relay authentication services) which tells the client there is an Edge server service deployed. With this the client sends SIP service request to Front End which includes the client’s location (internal or external). Because the Edge is not on the domain it can’t authenticate client directly, so the Front End server requests the credentials on behalf of the client. The AV Edge service creates credentials using AV Edge certificate for the Front End which sends a 200 OK back to client with the Edge server it should connect to, ports and username and password. Credentials are valid for 8 hours and for this period the client can now go straight to Edge server. In conferencing scenario the same thing happens, however because can join anonymously the Front End checks to see if a meeting exists, and then gets and passes the credentials to meeting participant.

Tip – Always make sure you use the same external certificate for all Edge servers. The certificate is used to create credentials for the client to connect. If an Edge server goes down, and the client try’s to connect to another Edge server using a different certificate, it will not be able to validate the credentials and authentication will fail. Search for “MRAS” in Snooper to find authentication messages. There should be 3 messages per request. Port 5062 for MRAS.

2. Address Discovery (Allocation)

Address discovery is the process the client goes through to determine what IP addresses it might be reached on. These IP addresses are the client’s candidates.
Audio/Video
  • Discover local UDP candidate for every network card (peer to peer so UDP is best)
  • Connect to media relay (Edge server) to discover reflexive address (the address the Edge server sees the client connect from) and allocate a candidate on the media relay for UDP then TCP
File Transfer and Desktop Sharing (RDP over RTP) – Both require TCP
  • Discovers local TCP candidate
  • Media Relay TCP only

3. Address Exchange (SIP Invite/200OK)
 

Address exchange is the process of sharing candidates with other endpoints that will be part of the call (peers). This is achieved by sending a SIP invite to the peer, who in turn will discover their own candidates, and send them back as part of a SIP 183 Session Progress.


4. Connectivity Checks

This is the process of taking the provided candidates and determining a possible media path. The Lync client validates the list of candidates by opening connections to all entries in the list simultaneously. The first to respond is used to establish the “Early Media” connection, however the media path may change during the call using a process called candidate promotion. When the called party picks up it will again send its candidates to the caller, but this time part of a 200OK.
  1. Connect directly (peer to peer)
  2. Connect to reflective address
  3. Connect via media relay by connecting to the Edge and asking it to contact a candidate and establish a connection on its behalf.
**If there is no Edge server it only does local candidates.

5. Candidate Promotion
 

This is the process of determining the best possible candidate for the session. If a better path is found the then media path can change during the call.
  • Host/Local Candidate (UDP) – The most preferred candidate is always a local candidate and is the reason that peer media sessions between clients on the same network will never use the Edge server.
  • Reflexive/STUN Candidate (UDP) – The next preferred option is to use the server reflexive candidate which is provided by the Edge Server using STUN. This scenario involves attempting to connect to the reflexive IP addresses for each externally connected user. The reflexive IP address is the public IP address of the external user e.g. a home router.
  • Relay/TURN Candidate (UDP) – In the event that STUN fails then the final option is to utilise the Edge Server as a media relay. The calling client will establish a media session directly with the A/V Edge Server as will the receiving client. This connectivity is relayed through the public IP address of the Audio/Video Edge service.
  • Relay/TURN Candidate (TCP) – when connectivity is not available on UDP. TCP Relay is a last resort.

SIP Messages in Media Path Establishment

  • Out INVITE (SDP session description protocol – tells other party what I can do e.g. codecs).
    First set of candidates is ICE v6 (ms-proxy-2007fallback) second set is ICE V19. OCS r2 + uses V19 includes both for back compact. Candidates come in peers – one for RTC and one for RTCP.A=candidate 1 1 Protocol(UDP/TCP Passive – candidate they I expect to send traffic to /TCP Active – candidate that sends me traffic) priority (high best) IPAddress Port Typ(host/relay/server reflective)A=candidate 1 2 UDP priority(high best) IPAddress Port Typ(host/relay/server reflective).
  • In SIP 183 Peer sends its candidates. You may see multiple – one for each end point.
  • In SIP 200 OK Peer picks up the call. This still includes a full candidate set as the best have not been negotiated yet.
  • Out INVITE Re-invite which will include the 1 chosen candidate peer as decided in the earlier process.
  • In SIP 200 OK Includes other party’s final candidates.
NOTE: The Edge server is used in discovery process, but not necessarily once media path has established. This is why it can be important for internal clients to be able to access the internal NIC for edge. If the candidate list doesn’t include UDP and TCP reflective then it probably can’t talk to Edge server. If you see only UDP or only TCP then firewall might be blocking ports.

Call Scenarios and Connections Options

Inside <-> Inside
  • Peer to Peer
Inside <-> Outside
  • Peer to peer will not work
  • Outside connects to reflective candidate UDP or TCP
  • Outside connects to own edge server (relay) which hairpins traffic to internal user
Outside <-> Outside
  • Peer to peer might work if clients are on the same network
  • Reflective candidate UDP or TCP
  • Relay via Edge server
Federation OCS 2007
  • Edge servers connect to each other on the 50k port range directly and relay the call. Ports need to be open in both directions.
Federation 2007 R2 (tunnel mode introduced)
  • The Edge server sends a special packet to UDP port 3478 on the other Edge to find out if it is OCS 2007 R2 or above. If it is then tunnel mode can be used, and all UDP traffic can be sent on these ports. Candidate data still includes the 50k ports, but the Edge server just contacts the other Edge server to share this information and connect.
  • TCP is very similar, but because a connection to a source IP/port and destination IP/port can only be in use at one time, the Edge server allocates a port in 50k range as a source port, and then opens a connection to the other Edge server on port 443. This gets around having to have 50k ports open which is required for OCS 2007.
While the 50k port range is not required for OCS 2007 R2 and above, there are still benefits to opening it. In a situation where 2 Edge servers would normally be involved in relaying media, this situation allows both
clients to connect to the same Edge server. The initiating client connects to its home Edge server, gets candidates and passes those to the other party. The other party then attempts to connect to the 50k range directly on the initiators home Edge server. Without these ports open this would not work, and the client would need to involve its own Edge server and ask that it connects to the initiating Edge to relay on its behalf. This introduces a longer media path.

Troubleshooting Media Connectivity

  • Get client login from fresh sign in – is there MRAS? If no it can’t talk to edge.
  • Check if FE can telnet to FQDN on internal edge Check logs for STUN and TURN candidates. If none then there is an issue between client and edge
  • User port query to test UDP
  • When edge sends candidates in NAT situation, edge uses external IP configured in topology and sends this to client – make sure it’s correct.
  • Search Snooper logs for MRAS for authentication Search a=candidate to see candidate
  • Search a=remote-candidate to find final candidates that are chosen
  • After call pickup it can take several seconds before final candidates are chosen, and media path might change. Final re-invite will include this but the result may not be in logs for a few seconds after connection.
Thanks to Thomas Binder for this excellent deep dive as well as Jeff Schertz for his summary.

 

18 COMMENTS

  1. Good summary. I've been trying to get a handle on the entire process as we have deployed Lync 2013 in our environment (without enterprise voice), One question I can't seem to find an answer on is, what path does an external client that is using Lync Web App take when connecting to a conference for desktop sharing. The behavior I see in our environment is that when a guest is connected from external using the Lync Web App connected either to someone external using the full client or someone internal using the full client, we have a difficult time establishing a desktop sharing session or maintaining one. Looking at our firewall logs, I see dropped communication between our front end server ip's and the external edge nic ip's. I see the external edge ip trying to talk to the front end server ip on 3478 stun and occasionally other 50,000 or higher ports, as well as the front end server trying to talk to the external edge ip in the 50,000 range. This traffic will all drop because I was under the impression that communication with the front end servers or internal clients should go through the edge internal interface due to our persistent routes on edge.

    I started exploring the log files and candidate lists. With an internal full lync client user and an external lwa user scenario, I see the internal user candidate list look correct.. it sends an invitation and has its local ip of typ host on tcp-act and tcp-pass. I also see the relay ip of the external edge ip in the list. What seems odd is that, the internal client then receives a SIP 200 OK with a candidate list, but the list is the local ip of the front end server's nics (both the ip of the default nic for communication and the ip of the nic used to connect to some back end storage for the lync file share). It also shows the relay address of the edge's external ip.

    Looking at the logs on the lwa client, I see a candidate list that looks correct for its local ip information, but I also see candidate lists show up which list the information for our front end server as well.

  2. That's an interesting problem and one that I have not seen before. The Lync Web App is still an ICE client and will attempt to establish the media path in the same way. Signally will occur via the FE web services. The FE server is also an ICE client and this maybe why you are seeing its IP addresses in some logs, however I would not expect this to be in the client invites. It may be that the FE forwards the request to the internal client, acting as an ICE proxy, however that's just me speculating.

    What are you using to publish web services externally? Make sure that the time-out is set at 200 seconds or more, I normally configure 3600 seconds.

  3. We are publishing web services externally using an F5 Big IP as a reverse proxy which is a supported device. I thought perhaps this was a routing issue on the Edge server but I have double checked the persistent routes.

  4. Shouldn't be a routing issue if your other client types are working externally. Did you check what your time out is set at?

    Is the issue only effecting content sharing? Does the audio and video also drop?

    Are you using 1 or 3 IP's on your edge server? Are you using NAT?

  5. Thanks Andew

    I think single pool media flow has been documented well online in regards to Lync and media establishment.
    What I struggle to find is media flow in scenarios with two central sites each with their own edge pool.
    For example, how would a remote user in FEPool1 connect to an internal user in FEPool2?

    Would media flow to Edgepool1 > internally to FEPool1 then intercluster routing to FEPool2 and to the client

    OR

    Would media flow to Edgepool01 then proxied across to Edgepool02 and internally to FEPool02 and to the client

    Thanks Andrew

    • Thats a very good question! I would expect the media to be proxied via EdgePool1, then:
      1. If EdgePool1’s internal interface can route to the client this would take place internally (this is a requirement based on my experience)
      2. EdgePool1 could possibly proxy to EdgePool2 – This one I am not sure about, but this is how it would work for a federated deployment. Considering the ICE process this should be possible.

      • Thanks Andrew, just a questions based on your reply.

        “If EdgePool1’s internal interface can route to the client this would take place internally (this is a requirement based on my experience)!”

        Are you suggesting that it should be a requirement for every client on the LAN despite their home pool should be able to access the internal interface on all edge servers in the topology?

        Thanks

        • Ive seen issues where you have a user connected externally on Site1 who communicates with an internal user on Site2. If the internal users PC knows how to route to the Edge internal network in Site1, but traffic is blocked by a firewall or the Edge doesn’t know how to route back, media fails.

  6. we run private ip addressing internally, and have good sized bandwidth available to us (higher ed institution, and can test easily during “downtime”).
    We’re using Skype For Business in the cloud, (no on-prem), and point to point calls are totally fine. When we get into multiparty calls, we occasionally have issues.
    When we use the Skype/Lync tool at FastTrack Network Analysis tool (http://em1-fasttrack.cloudapp.net/o365nwtest), ti gives us poor results for Consistency and Quality of service.

    I’m not seeing it reflected in troubleshooting. Where we NAT our clients at our edge firewall, is it possible the NAT’ing is presenting issues? Possibly ALGS on that same fw? We’re at a loss at this point of where to look next.Any advice is appreciated. Thanks.

    • Hey Matt,

      The the point to point calls you talk about with internal parties? or does this comment included external and federated parties? One thing to note is that when you go multi-party, the O365 conferencing server comes in to play. Compared with point to point calls the audio path in some cases can be direct. NATing you clients out to the internet shouldn’t be an issues as this is very common. If you could provide a bit more detail on the call flows I may be able to help further.

  7. If you have multiple endpoints active, letsay a Lync 2013 client on a PC and Lync on a mobile device, with no simultaneous ring setup, then a person with a Lync client calls you, which of the two endpoints will receive the call? Will they ring at the same time?

    • Yes calls should ring on both devices. In my experience iPhone/Android are slower notify of incoming calls. Compared to Windows Phone which uses push notifications, the phone often rings before the desktop client. In general SfB applies a preference system for uses active on multiple devices. This can be seen most apparently for incoming IM’s. If you are active on desktop and mobile, SfB will prefer desktop for delivery. Hope that helps!

  8. Thanks for the Great Article,

    Quick Question:
    Calls to a remote client (Skype or another Company’s Lync) do not connect. The call goes through but won’t establish. The strange behavior is that on our external firewall I can see both the Lync Client and the Lync Edge server trying to talk to the LOCAL IP address of the remote (for example 192.168.1.6). So I tested with a Lync machine and skype machine inside of the same local subnet and it works fine. Ever heard of that?

    For some reason our lync client and edge server know the local RFC1918 IP of the remote user.

    • Thanks for your feedback! It sounds like you have a firewall between client networks and that the ports required for a point to point connection are not open. If this is the case, then the client will try and use the Edge server to relay. If this is not working then you likely have a firewall or routing issue between each client networks and the internal interface of the Edge server. Hope that points you in the right direction.

  9. Hi Andrew,

    If two internal client networks have overlapping IP Address spaces will the clients be able to use the internal Edge interface to relay media between the two internal networks/clients?

    • That’s a damn good question, not something I have come across before. I’m picking this would cause an issue from a routing perspective so would be problematic. How can the Edge determine how to route to the correct network when they are overlapping is the question that comes to mind.

  10. We are trying to do Split Tunnel configuration for AV traffic, so when users are working from home, and they VPN into work, their inside status will be TRUE, as all the traffic will come in directly to Lync servers, except AV traffic. We would like VPN client’s AV traffic to come via Edge server. Can you please shed some light on what we would need to do achieve this. I am thinking we would need to firewall block the AV ports for VPN clients, to prevent AV traffic to come in directly from VPN clients. We have dedicated DNS server for VPN clients, do we need to create any DNS records for VPN clients, or something else I might be missing, please advice.

    • Hey, Is the VPN capable of managing this by FQDN? Which VPN do you use? This is often how we achieve this – ALL SfB FQDN’s are bypassed for DNS and service connection and sent straight out to the internet. An example of what I mean tih regard to Direct Access is here -https://blogs.technet.microsoft.com/edgeaccessblog/2010/05/12/split-brain-dns-configuring-directaccess-for-office-communications-server-ocs/

LEAVE A REPLY