The way a switch or router provides priority is by using more than one queue on an output port. Let's think about the simplest case where there is a best-effort queue and a high-priority queue. Packets are enqueued in either the best effort (low priority) queue or the high priority queue based on their DiffServ marking. The output port will always take packets from the high priority queue if there are any packets in it. If there are no packets in the high priority queue, then packets are taken from the low priority queue.
Now let's think about the timing a bit. If the output port is a 1-Gbps Ethernet port, packets are taken out of the queues very quickly. A 214-byte G.711 packet can be sent every 1.7 microseconds and a 1500-byte packet every 12 microseconds. But if the output port is a T1, running at 1.54 Mbps, then those 214-byte packets can only be sent every 1.1 milliseconds, and a 1500-byte packet takes a full 7.8 milliseconds, about 650 times slower.
Queuing (and QoS) are needed when the input rate to a switch or router can exceed the output rate. If packets arrive faster than they can be sent, then they must be stored in queues for a short time until the output port can send them. In the core of the network, this situation occurs anyplace where multiple switch inputs could be sending traffic to a single output. An even more critical location is at the WAN access router where traffic arrives from the LAN core at perhaps 1 Gbps and can only be sent out at the rate of the access link, perhaps 1.5 Mbps.
If in my example above, the WAN router is directly connected to the T1 access link, then packets will be taken from the priority and best effort queues at the slower rates we calculated above. If multiple high priority packets arrive in a burst of traffic, all of those packets will be sent before any best effort traffic is sent.
Now consider the example where the WAN router is not directly connected to the access link but is instead connected to a DSL or satellite modem or a cryptology device. The connection between the router and the modem may be 100 Mbps but the access link is still only a T1. But the router can now take packets out of its output queues at a rate of 100 Mbps. If a burst of traffic arrives at the router, the router will pass that burst on to the modem at 100 Mbps and the modem will have to queue the traffic while its slower output (T1) sends those packets at the lower rate.
In this scenario, the router will not do a proper job of prioritizing packets. Suppose a burst of packets arrives at the router with two high-priority voice packets spaced 1 millisecond apart. If the router were connected directly to the T1, the first voice packet would be sent as soon as it arrives in the queue. This packet takes 1.1 milliseconds to send according to our previous calculation, so the second voice packet will be available at the front of the priority queue by the time the first packet is finished, and so it will be sent next. Both priority packets are sent before any best effort traffic is sent.
However if the router sees a 100 Mbps output port, it can forward the first voice packet in only 17 microseconds, and so it will start sending data packets from the best effort queue. The router can forward 9 full sized data packets before the next voice packet arrives.
Now think about the modem which does not have a priority queuing mechanism. In its single queue is a voice packet followed by 9 full sized data packets and then another voice packet. The second voice packet will have to wait a long time before being forwarded (9 x 7.8 ms = 70ms) before being sent. So this queue just introduced 70 ms of jitter to that packet. Ouch!
The way to resolve this is to constrain the output rate of the queues on the router. Most routers can be configured so that the output rate of the router can be set via a configuration option. If the output of the router in our example is set to 1.5 Mbps (the T1 rate), then the queuing will take place in the router queues and the forwarding order will be correct. Because the router is forwarding packets at a slower rate, the second high priority packet will have arrived in the high priority queue by the time the first one has been sent, and it will go next.
There are a number of network situations where this kind of mismatch can occur, such as with ATM PVCs and with modems as described above. The QoS designer must recognize these situations and create the right combination of priority queuing and forwarding bandwidth to ensure low latency and low packet loss for the voice and video packets.