Voip FAQ

Voip FAQ

1.RTP default maximum payload

RTP default maximum payload is 1376

=1500 – (12 + 16 * 4) – 40 – 8

=Maximum ethernet data length – RTPPacket size – (16 * CSRC) – UDP header size – IPv6 header size.


2.Minimal Version of RTP header

A minimal version of RTP would likely contain a sequence number (SN) and a payload type (PT), with a minimum combined size of two bytes. Unfortunately, such a choice would have a number of disadvantages:

  • Not suitable for mixers and translators, due to the absense of SSRC.
  • The total reduction in overhead is modest: A G.723.1 packet with an audio payload of 20 bytes would shrink from a total of 60 bytes to 50 bytes, or 20%. This covers Internet traffic growth of at most two months…
  • Since SSRC is needed for multicast, this header would break compatibility even within the H.323 suite (namely, H.332).
  • H.245 and SDP would have to be extended to handle the negotiation of the additional RTP header format.
  • Unless every device supports both the full-length and short version, gateways are needed to translated between the two.
  • Without a time-stamp, cross-media synchronization becomes very difficult unless audio without silence suppression is used. (Silence suppression is a far more effective mechanism of saving bandwidth than header compression, with a typical bandwidth reduction of close to 50%.)
  • Much of the RTCP functionality would have to be revisited, since it relies on the presence of timestamps and longer sequence numbers for jitter computation, loss statistics and synchronization.




No, initial time stamp values are picked randomly and independently for each RTP stream. (This is more or less unavoidable if different media types are generated by independent applications, whether these applications reside on the same host or not.) Synchronization (such as lip sync) between different media is performed by receivers through the NTP timestamps in the RTCP sender reports. This timestamp provides a common time reference that associates a media-specific RTP timestamp with the common “wallclock” time shared across media. The mechanism how end systems synchronize different media is not prescribed by RTP, however, a workable approach is to periodically exchange messages between applications to indicate what delay each application would impose on the stream (including any media decoding delays) if it were not to synchronize and then have all applications choose the maximum of these delays.



4.Measuring jitter

Jitter is calculated based on the inter-arrival time of successive packets. Frequently, two numbers are given: the average inter-arrival time, and the standard deviation. On a good network, the average inter-arrival time will be the inter-arrival time of the emitted packets, and the standard deviation will be low – pointing at a consistent inter-arrival time. When correct jitter measurements are desired for audio streams, it is important to take into account three phenomena: silence suppression, packet loss and out of sequence errors.


CODECs take advantage of periods of silence in the conversation to reduce the number of packets being sent. Typically, up to 50% bandwidth savings can be realized in this way. The RTP packet immediately after a period of silence is marked with the silence suppression bit. Jitter calculations look at the silence suppression bit and disregard the long gap between the packet right before the silence and the packet right after the silence period.


In the event of packet loss, the inter-arrival time between two successive packets will also appear excessive. For instance, if three packets were sent at a time of 0, 20 and 40 mSec, and the second packet was lost in transit, the inter-arrival time would appear to be 40mSec even if the network induced no jitter. Correct jitter measurements would discover these cases by looking at the packet sequence number and compensate for packet loss in the jitter calculation.


Out of sequence packets may also skew jitter measurements when not taken into account. For instance, consider an example where packet 1 was sent at time 0 and arrived at time 100, packet 2 was sent at time 20 and arrived at time 140 while packet 3 was sent at time 40 and arrived at time 120. Packets arrived to the receiver at times 100, 120 and 140, so no jitter would be detected unless the analyzer also examined the sequence numbers. When doing so, the jitter would be calculated based on a 40 mSec inter-arrival between packets 1 and 2, as well as a -20 mSec inter-arrival time between packets 2 and 3.