WebRTC flow

Let’s simplify how webRTC works. There are 3 parts to webRTC.

  1. Signalling
  2. Connecting
  3. Communicating

WebRTC is a peer to peer protocol which eliminates the need for the traffic to be routed through a central server.

The server is used only to help setup the initial channels. Its called a signalling server since it builds up the initial handshakes.

When 2 users want to communicate using webRTC, they need to know each other’s IPs. To do so they first go to the central server to find out how to connect with each other.

Here all the NAT and symmetric NAT comes in.

Full Cone – Static

The router contains a stating mapping from external facing port to a internal facing IP:port, so any external computer can send a message to the internal computer by sending a message to externally mapped port

Restricted Cone – IP restriction

The router only allows a incoming packet on the mapped port if there was a originating packet from inside of the network to the IP from where the traffic is coming in.

Port Restricted Cone – IP+port restriction

The router only allows a incoming packet on the mapped port if there was a originating packet from inside of the network to the IP and port from where the traffic is coming in

Symmetric NAT – Hard NAT

In all above the internal device port is mapped to one single external facing port. 10.0.0.1:8000 has only 1 mapping externalIp:1234

In Symmetric NAT, we create a new mapping for each external client.

So 10.0.0.1:8000 will be exposed as externalIP:1234 to computer 1, but to computer 2 it gets exposed as externalIP:5678. And same port restricted cone applies here, so Computer 2 cannot send a packet to internal computer via 1234 since there was no originating packet to it from 1234

This brings in a challenge in webRTC peer to peer communication. Assume External Computer 1 is the Signalling server, and External Computer 2 is the User 2 who wants to talk to Internal computer. Since Signalling is happening on Port 1234, when External Computer 2 tries to reach back to Internal computer, it has no possible path since 5678 port has not been setup. It becomes the chicken and egg situation. To overcome this we have to route the traffic between these clients using a intermediary server called the TURN server.

End to end flow

Lets look at the communication flow between the 2 peers. The first section is just message exchange over sockets, When both users are able to see their own streams, then they send a offer and answer with their stream information to each other.

First we create a room where the host and the participant can join.

This is done by client (User-1) sending a message over socket to server

socket.emit("create", roomName);

On server side we listen to this message, create the room and respond with “created”

socket.on("create", function (roomName) {
  let rooms = io.sockets.adapter.rooms;
  socket.join(roomName);
  socket.emit("created");
}

When the User-1 receives the “created” message, it builds up the stream and displays it to User-1

socket.on("created", function () {
  navigator.mediaDevices
    .getUserMedia({
      audio: false,
      video: { width: 360, height: 360 },
    })
    .then(function (stream) {
      var hostVideo = document.getElementById("host-video");
      hostVideo.srcObject = stream;
      hostVideo.onloadedmetadata = function (e) {
          hostVideo.play();
      };
    })
    .catch(function (err) {
      alert("Couldn't Access User Media");
    });
});

Now the room was ready as soon as the server responded with created message , and User-2 can also join.

User-2 sends a join message

socket.emit("join", roomName);

On server side the logic is same, we check the room exist and add this socket to the room, and respond with “joined” message

socket.on("join", function (roomName) {
  let rooms = io.sockets.adapter.rooms;
  socket.join(roomName);
  socket.emit("joined");
}

When the User-2 gets the “joined” message, it builds up the stream and displays it to User-2

It also emits a “ready” message back to the server.

socket.on("joined", function () {
  navigator.mediaDevices
    .getUserMedia({
      audio: false,
      video: { width: 360, height: 360 },
    })
    .then(function (stream) {
      var hostVideo = document.getElementById("host-video");
      hostVideo.srcObject = stream;
      hostVideo.onloadedmetadata = function (e) {
        hostVideo.play();
      };
      socket.emit("ready", roomName);
    })
    .catch(function (err) {
      alert("Couldn't Access User Media");
    });
});

Now the message exchange starts. The server sends this “ready” message to User-1 by doing a broadcast

socket.on("ready", function (roomName) {
  socket.broadcast.to(roomName).emit("ready");
});

When the User-1 receives the ready message, it prepares the Offer.

In the Offer

  • it adds the ICE candidates,
  • It adds the user-1 stream
  • and it provides callback handles
var iceServers = {
{urls: "stun:stin.services.mozilla.com"}.
{urls: "stun1.1.google.com:19302"}
}
//RTCPeerConnection isnds iceCandidates to clients using onIceCandidate function
//The Client will use these icCandidates to identify proper route to Server.

socket.on("ready", function () { rtcPeerConnection = new RTCPeerConnection(iceServers); rtcPeerConnection.onicecandidate = onIceCandidate; rtcPeerConnection.ontrack = onTrack; rtcPeerConnection.addTrack(userStream.getTracks()[0], userStream); rtcPeerConnection.addTrack(userStream.getTracks()[1], userStream); rtcPeerConnection.setRemoteDescription(null); rtcPeerConnection .createOffer() .then((offer) => { rtcPeerConnection.setLocalDescription(offer); socket.emit("offer", offer, roomName); }) .catch((error) => { console.log(error); }); });

The callback handles are defined on client side. They will get triggered by delegate mechanism in RTCPeerConnection

function onIceCandidate(event) {
  if (event.candidate) {
    socket.emit("candidate", event.candidate, roomName);
  }
}

function onTrack(event) {
  var peerVideo = document.getElementById("peer-video");
  peerVideo.srcObject = event.streams[0];
  peerVideo.onloadedmetadata = function (e) {
    peerVideo.play();
  };
}

The ontrack callback gets triggered when the peer’s stream come to the user, so we display the peer stream in another div tag.

The “offer” message created by User-1 above goes to server side, and gets broadcasted to peer

socket.on("offer", function (offer, roomName) {
  socket.broadcast.to(roomName).emit("offer", offer);
});

The peer recieves the offer and creates the answer in a similar way to offer.

At this stage the peer does have knowledge of the User-1’s “description”, so it is able to set their remote description to the offer data received.

socket.on("offer", function (offer) {
  rtcPeerConnection = new RTCPeerConnection(iceServers);
  rtcPeerConnection.onicecandidate = onIceCandidate;
  rtcPeerConnection.ontrack = onTrack;
  rtcPeerConnection.addTrack(userStream.getTracks()[0], userStream);
  rtcPeerConnection.addTrack(userStream.getTracks()[1], userStream);
  rtcPeerConnection.setRemoteDescription(offer);

  rtcPeerConnection
    .createAnswer()
    .then((answer) => {
      rtcPeerConnection.setLocalDescription(answer);
      socket.emit("answer", answer, roomName);
    })
    .catch((error) => {
      console.log(error);
    });
});

The “answer” gets transmitted back to server which broadcasts it back to the host

socket.on("answer", function (answer, roomName) {
  socket.broadcast.to(roomName).emit("answer", answer);
});

The host takes the information present in “answer” message to setup the RemoteDescription.

socket.on("answer", function (answer) {
  rtcPeerConnection.setRemoteDescription(answer);
});

When we are preparing the offer and answer, we are adding the iceServer information to RTCPeerConnection. During this exchange, the onIceCandidate event handler gets triggered. The “candidate” message goes to server side, and gets broadcasted to peer behind the scenes

socket.on("candidate", function (candidate, roomName) {
  socket.broadcast.to(roomName).emit("candidate", candidate);
});

When the peer gets the User’s candidate message, it adds the candidate to its own list

socket.on("candidate", function (candidate) {
  let icecandidate = new RTCIceCandidate(candidate);
  rtcPeerConnection.addIceCandidate(icecandidate);
});

Now the peer has the ice candidate of the host. When it prepares the answer, the iceCandidates of the peer are transmitted and added to the host list as well.

When the offer / answer is being exchanged, the peer’s stream information is reaching the other user and is getting set in RTCPeerConnection->RemoteDescription. This triggers the event handler configured on OnTrack (that a remote track information is received and is ready to play). So the onTrack function can set the stream in peer-video tag and play the peer’s stream.

Cheers!! – Amit Tomar