TechTarget and Informa Tech’s Digital Business Combine.

TechTarget and Informa Tech’s Digital Business Combine.

Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.

The What and Why of Jitter

First let us define jitter. To define jitter we need to back up and understand latency. Latency is the time it takes a packet to traverse the network from source to destination. Latency is half of the 'ping' time, or half of the round-trip delay. Jitter is the variation in latency. For example, suppose that the average latency from source to destination is 100 ms. If a specific packet, packet A, traverses the network in exactly 100 ms it arrives just as expected and has a jitter of 0. If packet B traverses the network and is delayed slightly, arriving 130 ms after it left, it is 30 ms later than expected, and has a jitter of 30 ms.

When we send real-time traffic across the network we are trying to encode and then decode a real-time event such as sound or a visual image. The sound and the visual image change constantly, so we have to continually take samples, encode them, send them across the network, decode them and reproduce the sound and images on the far end. The receiving end is expecting a continuous stream of data and needs that data to arrive at regular intervals so it can properly recreate the original audio or video. If a packet is late, the time slot in which that data was needed has gone by and the arriving packet is of no use.

Because we know that IP networks are asynchronous and can cause delays in the packets, we implement a jitter buffer on the receiving end. Let's consider a 40 ms jitter buffer. The jitter buffer predicts the expected time of arrival for each packet, but then delays the playing of those packets by 40 ms. So the real-time event at the far end is being recreated 40 ms later than it could otherwise have been recreated. The value of this is that if a packet arrives less than 40ms late it can be pushed ahead in the jitter buffer so that it is available for its play window even though it arrived late. This is like your colleague telling you the train leaves a half hour earlier than it really does because he knows you often arrive late. The train really leaves a half hour later (the jitter buffer) than the expected schedule.

So a 40 ms jitter buffer will take care of network jitter up to 40ms. If packets are later than that, then again their play window has gone by and they are discarded by the jitter buffer. So why don't we just make the jitter buffer arbitrarily long to allow for any amount of jitter in the network? Remember that delay. As we delay the recreation of the voice or video image, we reduce the ability for participants to easily interact. When there is a delay on the connection we find ourselves stepping on each other's speech. This effect is disconcerting. It can make you wonder if the other party is listening, and it can make a back and forth discussion very difficult. So we limit the size of the jitter buffer. This means we need to ensure that the network can keep packets within the jitter specification that the jitter buffer can handle.

Jitter is often measured using RFC 1889, which does a continuous average of the jitter of individual packets smoothed by a factor of 16. This is the jitter that is reported in the RTCP packets that accompany each RTP stream of a voice or video call. Unfortunately, comparing the RFC 1889 jitter values to the size of the jitter buffer does not give us direct information about how jitter is affecting the quality of the voice or video stream. What we really want to know is how many packets were dropped by the jitter buffer. The voice and video quality is impacted when data is not available for reproducing the original event. So in the same way that voice and video are impacted by lost packets, they are impacted by packets with jitter that exceeds the jitter buffer and are then dropped before reaching the codec. They arrived too late to be useful so they have the same effect as if they never arrived at all.

Some of the test tools I use will report max jitter. This is somewhat useful because it tells me that jitter reached the max value at least once during the test interval. If I can create short enough test intervals (e.g. 10 seconds) I can get some idea of how often the network is creating jitter problems.

A better measure (IMHO) would be to directly know how many packets the receiving codec dropped due to jitter. When I study a sniffer trace I emulate a 40ms jitter buffer, and then generate a value for jitter loss, or the percentage of packets dropped due to having jitter in excess of my simulated jitter buffer. This value then can be evaluated the same ways as packet loss since its effect is the same.

We'll have to ask Eric Krapf why he titled this site "No Jitter". It's a noble goal, but for me keeping jitter within the bounds of the jitter buffer will suffice.