You are on page 1of 34

Sri Vidya College of Engineering & Technology MCC-NOTES

Voice over Internet Protocol

Basics of IP transport

(Voice over IP, VoIP) is a family of technologies, methodologies,


communication protocols, and transmission techniques for the delivery of voice
communications and multimedia sessions over Internet Protocol (IP) networks, such as
the Internet. Other terms frequently encountered and often used synonymously with VoIP
are IP telephony, Internet telephony, voice over broadband (VoBB), broadband
telephony, and broadband phone.

Internet telephony refers to communications servicesVoice, fax, SMS, and/or voice-


messaging applicationsthat are transported via the Internet, rather than the public
switched telephone network (PSTN). The steps involved in originating a VoIP telephone
call are signaling and media channel setup, digitization of the analog voice signal,
encoding, packetization, and transmission as Internet Protocol (IP) packets over a packet-
switched network. On the receiving side, similar steps (usually in the reverse order) such
as reception of the IP packets, decoding of the packets and digital-to-analog conversion
reproduce the original voice stream.[1] Even though IP Telephony and VoIP are terms that
are used interchangeably, they are actually different; IP telephony has to do with digital
telephony systems that use IP protocols for voice communication while VoIP is actually a
subset of IP Telephony. VoIP is a technology used by IP telephony as a means of
transporting phone calls.[2]

VoIP systems employ session control protocols to control the set-up and tear-down of
calls as well as audio codecs which encode speech allowing transmission over an IP
network as digital audio via an audio stream. The codec used is varied between different
implementations of VoIP (and often a range of codecs are used); some implementations
rely on narrowband and compressed speech, while others support high fidelity stereo
codecs.

There are three types of VoIP tools that are commonly used; IP Phones, Software VoIP
and Mobile and Integrated VoIP. The IP Phones are the most institutionally established
but still the least obvious of the VoIP tools. The use of software VoIP has increased
during the global recession of 2008-2010, as many persons, looking for ways to cut costs
have turned to these tools for free or inexpensive calling or video conferencing
applications.[citation needed] Software VoIP can be further broken down into three classes or
subcategories; Web Calling, Voice and Video Instant Messaging and Web Conferencing.
Mobile and Integrated VoIP is just another example of the adaptability of VoIP. VoIP is
available on many smartphones and internet devices so even the users of portable devices
that are not phones can still make calls or send SMS text messages over 3G or WIFI.[3]

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 1


Sri Vidya College of Engineering & Technology MCC-NOTES

Protocols
Voice over IP has been implemented in various ways using both proprietary and open
protocols and standards. Examples of technologies used to implement Voice over IP
include:

H.323
IP Multimedia Subsystem (IMS)
Media Gateway Control Protocol (MGCP)
Session Initiation Protocol (SIP)
Real-time Transport Protocol (RTP)
Session Description Protocol (SDP)
Inter-Asterisk eXchange (IAX)

The H.323 protocol was one of the first VoIP protocols that found widespread
implementation for long-distance traffic, as well as local area network services. However,
since the development of newer, less complex protocols, such as MGCP and SIP, H.323
deployments are increasingly limited to carrying existing long-haul network traffic. In
particular, the Session Initiation Protocol (SIP) has gained widespread VoIP market
penetration.

A notable proprietary implementation is the Skype protocol, which is in part based on the
principles of Peer-to-Peer (P2P) networking.

Benefits
Operational cost

VoIP can be a benefit for reducing communication and infrastructure costs. Examples
include:

Routing phone calls over existing data networks to avoid the need for separate
voice and data networks.[12]
Conference calling, IVR, call forwarding, automatic redial, and caller ID features
that traditional telecommunication companies (telcos) normally charge extra for, are
available free of charge from open source VoIP implementations.[citation needed]

Flexibility

VoIP can facilitate tasks and provide services that may be more difficult to implement
using the PSTN. Examples include:

The ability to transmit more than one telephone call over a single broadband
connection.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 2


Sri Vidya College of Engineering & Technology MCC-NOTES

Secure calls using standardized protocols (such as Secure Real-time Transport


Protocol). Most of the difficulties of creating a secure telephone connection over
traditional phone lines, such as digitizing and digital transmission, are already in
place with VoIP. It is only necessary to encrypt and authenticate the existing data
stream.
Location independence. Only a sufficiently fast and stable Internet connection is
needed to get a connection from anywhere to a VoIP provider.
Integration with other services available over the Internet, including video
conversation, message or data file exchange during the conversation, audio
conferencing, managing address books, and passing information about whether other
people are available to interested parties.
Unified Communications, the integration of VoIP with other business systems
including E-mail, Customer Relationship Management (CRM), and Web systems.

VoIP Challenges

1. Quality of service

Communication on the IP network is inherently less reliable in contrast to the circuit-


switched public telephone network, as it does not provide a network-based mechanism to
ensure that data packets are not lost, and are delivered in sequential order. It is a best-
effort network without fundamental Quality of Service (QoS) guarantees. Therefore,
VoIP implementations may face problems mitigating latency and jitter.[13][14]

By default, network routers handle traffic on a first-come, first-served basis. Network


routers on high volume traffic links may introduce latency that exceeds permissible
thresholds for VoIP. Fixed delays cannot be controlled, as they are caused by the physical
distance the packets travel; however, latency can be minimized by marking voice packets
as being delay-sensitive with methods such as DiffServ.[13]

A VoIP packet usually has to wait for the current packet to finish transmission, although
it is possible to preempt (abort) a less important packet in mid-transmission, although this
is not commonly done, especially on high-speed links where transmission times are short
even for maximum-sized packets.[15] An alternative to preemption on slower links, such
as dialup and DSL, is to reduce the maximum transmission time by reducing the
maximum transmission unit. But every packet must contain protocol headers, so this
increases relative header overhead on every link along the user's Internet paths, not just
the bottleneck (usually Internet access) link.[15]

ADSL modems provide Ethernet (or Ethernet over USB) connections to local equipment,
but inside they are actually Asynchronous Transfer Mode (ATM) modems. They use
ATM Adaptation Layer 5 (AAL5) to segment each Ethernet packet into a series of 53-
byte ATM cells for transmission and reassemble them back into Ethernet packets at the
receiver. A virtual circuit identifier (VCI) is part of the 5-byte header on every ATM cell,
so the transmitter can multiplex the active virtual circuits (VCs) in any arbitrary order.
Cells from the same VC are always sent sequentially.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 3


Sri Vidya College of Engineering & Technology MCC-NOTES

However, the great majority of DSL providers use only one VC for each customer, even
those with bundled VoIP service. Every Ethernet packet must be completely transmitted
before another can begin. If a second PVC were established, given high priority and
reserved for VoIP, then a low priority data packet could be suspended in mid-
transmission and a VoIP packet sent right away on the high priority VC. Then the link
would pick up the low priority VC where it left off. Because ATM links are multiplexed
on a cell-by-cell basis, a high priority packet would have to wait at most 53 byte times to
begin transmission. There would be no need to reduce the interface MTU and accept the
resulting increase in higher layer protocol overhead, and no need to abort a low priority
packet and resend it later.

ATM has substantial header overhead: 5/53 = 9.4%, roughly twice the total header
overhead of a 1500 byte TCP/IP Ethernet packet (with TCP timestamps). This "ATM
tax" is incurred by every DSL user whether or not he takes advantage of multiple virtual
circuits - and few can.[13]

ATM's potential for latency reduction is greatest on slow links, because worst-case
latency decreases with increasing link speed. A full-size (1500 byte) Ethernet frame takes
94 ms to transmit at 128 kb/s but only 8 ms at 1.5 Mb/s. If this is the bottleneck link, this
latency is probably small enough to ensure good VoIP performance without MTU
reductions or multiple ATM PVCs. The latest generations of DSL, VDSL and VDSL2,
carry Ethernet without intermediate ATM/AAL5 layers, and they generally support IEEE
802.1p priority tagging so that VoIP can be queued ahead of less time-critical traffic.[13]

Voice, and all other data, travels in packets over IP networks with fixed maximum
capacity. This system may be more prone to congestion[citation needed] and DoS attacks[16]
than traditional circuit switched systems; a circuit switched system of insufficient
capacity will refuse new connections while carrying the remainder without impairment,
while the quality of real-time data such as telephone conversations on packet-switched
networks degrades dramatically.[13]

Fixed delays cannot be controlled as they are caused by the physical distance the packets
travel. They are especially problematic when satellite circuits are involved because of the
long distance to a geostationary satellite and back; delays of 400600 ms are typical.

When the load on a link grows so quickly that its switches experience queue overflows,
congestion results and data packets are lost. This signals a transport protocol like TCP to
reduce its transmission rate to alleviate the congestion. But VoIP usually uses UDP not
TCP because recovering from congestion through retransmission usually entails too much
latency.[13] So QoS mechanisms can avoid the undesirable loss of VoIP packets by
immediately transmitting them ahead of any queued bulk traffic on the same link, even
when that bulk traffic queue is overflowing.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 4


Sri Vidya College of Engineering & Technology MCC-NOTES

The receiver must resequence IP packets that arrive out of order and recover gracefully
when packets arrive too late or not at all. Jitter results from the rapid and random (i.e.,
unpredictable) changes in queue lengths along a given Internet path due to competition
from other users for the same transmission links. VoIP receivers counter jitter by storing
incoming packets briefly in a "de-jitter" or "playout" buffer, deliberately increasing
latency to improve the chance that each packet will be on hand when it is time for the
voice engine to play it. The added delay is thus a compromise between excessive latency
and excessive dropout, i.e., momentary audio interruptions.

Although jitter is a random variable, it is the sum of several other random variables that
are at least somewhat independent: the individual queuing delays of the routers along the
Internet path in question. Thus according to the central limit theorem, we can model jitter
as a gaussian random variable. This suggests continually estimating the mean delay and
its standard deviation and setting the playout delay so that only packets delayed more
than several standard deviations above the mean will arrive too late to be useful. In
practice, however, the variance in latency of many Internet paths is dominated by a small
number (often one) of relatively slow and congested "bottleneck" links. Most Internet
backbone links are now so fast (e.g. 10 Gb/s) that their delays are dominated by the
transmission medium (e.g. optical fiber) and the routers driving them do not have enough
buffering for queuing delays to be significant.

It has been suggested to rely on the packetized nature of media in VoIP communications
and transmit the stream of packets from the source phone to the destination phone
simultaneously across different routes (multi-path routing).[17] In such a way, temporary
failures have less impact on the communication quality. In capillary routing it has been
suggested to use at the packet level Fountain codes or particularly raptor codes for
transmitting extra redundant packets making the communication more reliable.[citation
needed]

A number of protocols have been defined to support the reporting of QoS/QoE for VoIP
calls. These include RTCP Extended Report (RFC 3611), SIP RTCP Summary Reports,
H.460.9 Annex B (for H.323), H.248.30 and MGCP extensions. The RFC 3611 VoIP
Metrics block is generated by an IP phone or gateway during a live call and contains
information on packet loss rate, packet discard rate (because of jitter), packet loss/discard
burst metrics (burst length/density, gap length/density), network delay, end system delay,
signal / noise / echo level, Mean Opinion Scores (MOS) and R factors and configuration
information related to the jitter buffer.

RFC 3611 VoIP metrics reports are exchanged between IP endpoints on an occasional
basis during a call, and an end of call message sent via SIP RTCP Summary Report or
one of the other signaling protocol extensions. RFC 3611 VoIP metrics reports are
intended to support real time feedback related to QoS problems, the exchange of
information between the endpoints for improved call quality calculation and a variety of
other applications.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 5


Sri Vidya College of Engineering & Technology MCC-NOTES

Layer-2 quality of service


A number of protocols that deal with the data link layer and physical layer include
quality-of-service mechanisms that can be used to ensure that applications like VoIP
work well even in congested scenarios. Some examples include:

IEEE 802.11e is an approved amendment to the IEEE 802.11 standard that


defines a set of quality-of-service enhancements for wireless LAN applications
through modifications to the Media Access Control (MAC) layer. The standard is
considered of critical importance for delay-sensitive applications, such as Voice over
Wireless IP.
IEEE 802.1p defines 8 different classes of service (including one dedicated to
voice) for traffic on layer-2 wired Ethernet.
The ITU-T G.hn standard, which provides a way to create a high-speed (up to 1
gigabit per second) Local area network using existing home wiring (power lines,
phone lines and coaxial cables). G.hn provides QoS by means of "Contention-Free
Transmission Opportunities" (CFTXOPs) which are allocated to flows (such as a
VoIP call) which require QoS and which have negotiated a "contract" with the
network controller.

2. Susceptibility to power failure

Telephones for traditional residential analog service are usually connected directly to
telephone company phone lines which provide direct current to power most basic analog
handsets independently of locally available power.

IP Phones and VoIP telephone adapters connect to routers or cable modems which
typically depend on the availability of mains electricity or locally generated power.[18]
Some VoIP service providers use customer premise equipment (e.g., cablemodems) with
battery-backed power supplies to assure uninterrupted service for up to several hours in
case of local power failures. Such battery-backed devices typically are designed for use
with analog handsets.

Some VoIP service providers implement services to route calls to other telephone
services of the subscriber, such a cellular phone, in the event that the customer's network
device is inaccessible to terminate the call.

The susceptibility of phone service to power failures is a common problem even with
traditional analog service in areas where many customers purchase modern telephone
units that operate with wireless handsets to a base station, or that have other modern
phone features, such as built-in voicemail or phone book features.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 6


Sri Vidya College of Engineering & Technology MCC-NOTES

3. Emergency calls

The nature of IP makes it difficult to locate network users geographically. Emergency


calls, therefore, cannot easily be routed to a nearby call center. Sometimes, VoIP systems
may route emergency calls to a non-emergency phone line at the intended department; in
the United States, at least one major police department has strongly objected to this
practice as potentially endangering the public.[19][20]

A fixed line phone has a direct relationship between a telephone number and a physical
location. If an emergency call comes from that number, then the physical location is
known.

In the IP world, it is not so simple. A broadband provider may know the location where
the wires terminate, but this does not necessarily allow the mapping of an IP address to
that location.[citation needed] IP addresses are often dynamically assigned, so the ISP may
allocate an address for online access, or at the time a broadband router is engaged. The
ISP recognizes individual IP addresses, but does not necessarily know to which physical
location it corresponds.[citation needed] The broadband service provider knows the physical
location, but is not necessarily tracking the IP addresses in use.[20]

There are more complications since IP allows a great deal of mobility. For example, a
broadband connection can be used to dial a virtual private network that is employer-
owned. When this is done, the IP address being used will belong to the range of the
employer, rather than the address of the ISP, so this could be many kilometres away or
even in another country. To provide another example: if mobile data is used, e.g., a 3G
mobile handset or USB wireless broadband adapter, then the IP address has no
relationship with any physical location, since a mobile user could be anywhere that there
is network coverage, even roaming via another cellular company.

In short, there is no relationship between IP address and physical location, so the address
itself reveals no useful information for the emergency services.

At the VoIP level, a phone or gateway may identify itself with a SIP registrar by using a
username and password. So in this case, the Internet Telephony Service Provider (ITSP)
knows that a particular user is online, and can relate a specific telephone number to the
user. However, it does not recognize how that IP traffic was engaged. Since the IP
address itself does not necessarily provide location information presently, today a "best
efforts" approach is to use an available database to find that user and the physical address
the user chose to associate with that telephone numberclearly an imperfect solution.[20]

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 7


Sri Vidya College of Engineering & Technology MCC-NOTES

VoIP Enhanced 911 (E911) is a method by which VoIP providers in the United States
support emergency services. The VoIP E911 emergency-calling system associates a
physical address with the calling party's telephone number as required by the Wireless
Communications and Public Safety Act of 1999. All VoIP providers that provide access
to the public switched telephone network are required to implement E911,[20] a service for
which the subscriber may be charged. Participation in E911 is not required and customers
may opt-out of E911 service.[20]

One shortcoming of VoIP E911 is that the emergency system is based on a static table
lookup. Unlike in cellular phones, where the location of an E911 call can be traced using
Assisted GPS or other methods, the VoIP E911 information is only accurate so long as
subscribers are diligent in keeping their emergency address information up-to-date. In the
United States, the Wireless Communications and Public Safety Act of 1999 leaves the
burden of responsibility upon the subscribers and not the service providers to keep their
emergency information up to date.[20]

Lack of redundancy

With the current separation of the Internet and the PSTN, a certain amount of redundancy
is provided. An Internet outage does not necessarily mean that a voice communication
outage will occur simultaneously, allowing individuals to call for emergency services and
many businesses to continue to operate normally. In situations where telephone services
become completely reliant on the Internet infrastructure, a single-point failure can isolate
communities from all communication, including Enhanced 911 and equivalent services in
other locales.[original research?] However, the internet as designed by DARPA in the early
1980s was specifically designed to be fault tolerant under adverse conditions. Even
during the 9/11 attacks on the World Trade Centers the internet routed data around the
failed nodes that were housed in or near the towers. So single point failures while
possible in some geographic areas are not the norm for the internet as a whole.

Number portability

Local number portability (LNP) and Mobile number portability (MNP) also impact VoIP
business. In November 2007, the Federal Communications Commission in the United
States released an order extending number portability obligations to interconnected VoIP
providers and carriers that support VoIP providers.[21] Number portability is a service that
allows a subscriber to select a new telephone carrier without requiring a new number to
be issued. Typically, it is the responsibility of the former carrier to "map" the old number
to the undisclosed number assigned by the new carrier. This is achieved by maintaining a
database of numbers. A dialed number is initially received by the original carrier and
quickly rerouted to the new carrier. Multiple porting references must be maintained even
if the subscriber returns to the original carrier. The FCC mandates carrier compliance
with these consumer-protection stipulations.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 8


Sri Vidya College of Engineering & Technology MCC-NOTES

A voice call originating in the VoIP environment also faces challenges to reach its
destination if the number is routed to a mobile phone number on a traditional mobile
carrier. VoIP has been identified in the past as a Least Cost Routing (LCR) system, which
is based on checking the destination of each telephone call as it is made, and then sending
the call via the network that will cost the customer the least.[22] This rating is subject to
some debate given the complexity of call routing created by number portability. With
GSM number portability now in place, LCR providers can no longer rely on using the
network root prefix to determine how to route a call. Instead, they must now determine
the actual network of every number before routing the call.

Therefore, VoIP solutions also need to handle MNP when routing a voice call. In
countries without a central database, like the UK, it might be necessary to query the GSM
network about which home network a mobile phone number belongs to. As the popularity
of VoIP increases in the enterprise markets because of least cost routing options, it needs
to provide a certain level of reliability when handling calls.

MNP checks are important to assure that this quality of service is met. By handling MNP
lookups before routing a call and by assuring that the voice call will actually work, VoIP
service providers are able to offer business subscribers the level of reliability they
require.

PSTN integration

E.164 is a global FGFnumbering standard for both the PSTN and PLMN. Most VoIP
implementations support E.164 to allow calls to be routed to and from VoIP subscribers
and the PSTN/PLMN.[23] VoIP implementations can also allow other identification
techniques to be used. For example, Skype allows subscribers to choose "Skype
names"[24] (usernames) whereas SIP implementations can use URIs[25] similar to email
addresses. Often VoIP implementations employ methods of translating non-E.164
identifiers to E.164 numbers and vice-versa, such as the Skype-In service provided by
Skype[26] and the ENUM service in IMS and SIP.[27]

Echo can also be an issue for PSTN integration.[28] Common causes of echo include
impedance mismatches in analog circuitry and acoustic coupling of the transmit and
receive signal at the receiving end.

Security

VoIP telephone systems are susceptible to attacks as are any internet-connected devices.
This means that hackers who know about these vulnerabilities (such as insecure
passwords) can institute denial-of-service attacks, harvest customer data, record
conversations and break into voice mailboxes.[29][30][31]

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 9


Sri Vidya College of Engineering & Technology MCC-NOTES

Another challenge is routing VoIP traffic through firewalls and network address
translators. Private Session Border Controllers are used along with firewalls to enable
VoIP calls to and from protected networks. For example, Skype uses a proprietary
protocol to route calls through other Skype peers on the network, allowing it to traverse
symmetric NATs and firewalls. Other methods to traverse NATs involve using protocols
such as STUN or Interactive Connectivity Establishment (ICE).

Many consumer VoIP solutions do not support encryption, although having a secure
phone is much easier to implement with VoIP than traditional phone lines. As a result, it
is relatively easy to eavesdrop on VoIP calls and even change their content.[32] An
attacker with a packet sniffer could intercept your VoIP calls if you are not on a secure
VLAN. However, physical security of the switches within an enterprise and the facility
security provided by ISPs make packet capture less of a problem than originally foreseen.
Further research has shown that tapping into a fiber optic network without detection is
difficult if not impossible. This means that once a voice packet is within the internet
backbone it is relatively safe from interception.

There are open source solutions, such as Wireshark, that facilitate sniffing of VoIP
conversations. A modicum of security is afforded by patented audio codecs in proprietary
implementations that are not easily available for open source applications[citation needed];
however, such security through obscurity has not proven effective in other fields.[citation
needed]
Some vendors also use compression, which may make eavesdropping more
difficult.[citation needed] However, real security requires encryption and cryptographic
authentication which are not widely supported at a consumer level. The existing security
standard Secure Real-time Transport Protocol (SRTP) and the new ZRTP protocol are
available on Analog Telephone Adapters (ATAs) as well as various softphones. It is
possible to use IPsec to secure P2P VoIP by using opportunistic encryption. Skype does
not use SRTP, but uses encryption which is transparent to the Skype provider[citation needed].
In 2005, Skype invited a researcher, Dr Tom Berson, to assess the security of the Skype
software, and his conclusions are available in a published report.[33]

Securing VoIP

To prevent the above security concerns government and military organizations are using
Voice over Secure IP (VoSIP), Secure Voice over IP (SVoIP), and Secure Voice over
Secure IP (SVoSIP) to protect confidential and classified VoIP communications.[34]
Secure Voice over IP is accomplished by encrypting VoIP with Type 1 encryption.
Secure Voice over Secure IP is accomplished by using Type 1 encryption on a classified
network, like SIPRNet.[35][36][37][38][39] Public Secure VoIP is also available with free GNU
programs.[40]

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 10


Sri Vidya College of Engineering & Technology MCC-NOTES

Caller ID

Caller ID support among VoIP providers varies, but is provided by the majority of VoIP
providers.

Many VoIP carriers allow callers to configure arbitrary Caller ID information, thus
permitting spoofing attacks.[41] Business grade VoIP equipment and software often makes
it easy to modify caller ID information, providing many businesses great flexibility.

The Truth in Caller ID Act has been in preparation in the US Congress since 2006, but as
of January 2009 still has not been enacted. This bill proposes to make it a crime in the
United States to "knowingly transmit misleading or inaccurate caller identification
information with the intent to defraud, cause harm, or wrongfully obtain anything of
value ..."[42]

Compatibility with traditional analog telephone sets

Some analog telephone adapters do not decode pulse dialing from older phones. They
may only work with push-button telephones using the touch-tone system. The VoIP user
may use a pulse-to-tone converter, if needed.[43]

Fax handling

Support for sending faxes over VoIP implementations is still limited. The existing voice
codecs are not designed for fax transmission; they are designed to digitize an analog
representation of a human voice efficiently. However, the inefficiency of digitizing an
analog representation (modem signal) of a digital representation (a document image) of
analog data (an original document) more than negates any bandwidth advantage of VoIP.
In other words, the fax "sounds" simply do not fit in the VoIP channel. An alternative IP-
based solution for delivering fax-over-IP called T.38 is available.

The T.38 protocol is designed to compensate for the differences between traditional
packet-less communications over analog lines and packet based transmissions which are
the basis for IP communications. The fax machine could be a traditional fax machine
connected to the PSTN, or an ATA box (or similar). It could be a fax machine with an
RJ-45 connector plugged straight into an IP network, or it could be a computer
pretending to be a fax machine.[44] Originally, T.38 was designed to use UDP and TCP
transmission methods across an IP network. TCP is better suited for use between two IP
devices. However, older fax machines, connected to an analog system, benefit from UDP
near real-time characteristics due to the "no recovery rule" when a UDP packet is lost or
an error occurs during transmission.[45] UDP transmissions are preferred as they do not
require testing for dropped packets and as such since each T.38 packet transmission
includes a majority of the data sent in the prior packet, a T.38 termination point has a
higher degree of success in re-assembling the fax transmission back into its original form

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 11


Sri Vidya College of Engineering & Technology MCC-NOTES

for interpretation by the end device. This in an attempt to overcome the obstacles of
simulating real time transmissions using packet based protocol.[46]

There have been updated versions of T.30 to resolve the fax over IP issues, which is the
core fax protocol. Some newer high end fax machines have T.38 built-in capabilities
which allow the user to plug right into the network and transmit/receive faxes in native
T.38 like the Ricoh 4410NF Fax Machine.[47] A unique feature of T.38 is that each packet
contains a portion of the main data sent in the previous packet. With T.38, two successive
lost packets are needed to actually lose any data. The data you lose will only be a small
piece, but with the right settings and error correction mode, there is an increased
likelihood that you will receive enough of the transmission to satisfy the requirements of
the fax machine for output of the sent document.

Support for other telephony devices

Another challenge for VoIP implementations is the proper handling of outgoing calls
from other telephony devices such as digital video recorders, satellite television receivers,
alarm systems, conventional modems and other similar devices that depend on access to a
PSTN telephone line for some or all of their functionality.

These types of calls sometimes complete without any problems, but in other cases they
fail. If VoIP and cellular substitution becomes very popular, some ancillary equipment
makers may be forced to redesign equipment, because it would no longer be possible to
assume a conventional PSTN telephone line would be available in consumer's homes.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 12


Sri Vidya College of Engineering & Technology MCC-NOTES

H.323 is a recommendation from the ITU Telecommunication Standardization Sector


(ITU-T) that defines the protocols to provide audio-visual communication sessions on
any packet network. The H.323 standard addresses call signaling and control, multimedia
transport and control, and bandwidth control for point-to-point and multi-point
conferences.[1]

It is widely implemented by voice and videoconferencing equipment manufacturers, is


used within various Internet real-time applications such as GnuGK and NetMeeting and
is widely deployed worldwide by service providers and enterprises for both voice and
video services over IP networks.

It is a part of the ITU-T H.32x series of protocols, which also address multimedia
communications over ISDN, the PSTN or SS7, and 3G mobile networks.

H.323 call signaling is based on the ITU-T Recommendation Q.931 protocol and is suited
for transmitting calls across networks using a mixture of IP, PSTN, ISDN, and QSIG
over ISDN. A call model, similar to the ISDN call model, eases the introduction of IP
telephony into existing networks of ISDN-based PBX systems, including transitions to
IP-based PBXs.

Within the context of H.323, an IP-based PBX might be a gatekeeper or other call control
element which provides service to telephones or videophones. Such a device may provide
or facilitate both basic services and supplementary services, such as call transfer, park,
pick-up, and hold.

While H.323 excels at providing basic telephony functionality and interoperability,


H.323s strength lies in multimedia communication functionality designed specifically
for IP networks.

H.323 was the first VoIP standard to adopt the Internet Engineering Task Force (IETF)
standard Real-time Transport Protocol (RTP) to transport audio and video over IP
networks.[citation needed]

Protocols
H.323 is a system specification that describes the use of several ITU-T and IETF
protocols. The protocols that comprise the core of almost any H.323 system are:[6]

H.225.0 Registration, Admission and Status (RAS), which is used between an


H.323 endpoint and a Gatekeeper to provide address resolution and admission control
services.
H.225.0 Call Signaling, which is used between any two H.323 entities in order to
establish communication.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 13


Sri Vidya College of Engineering & Technology MCC-NOTES

H.245 control protocol for multimedia communication, which describes the


messages and procedures used for capability exchange, opening and closing logical
channels for audio, video and data, control and indications.
Real-time Transport Protocol (RTP), which is used for sending or receiving
multimedia information (voice, video, or text) between any two entities.

Many H.323 systems also implement other protocols that are defined in various ITU-T
Recommendations to provide supplementary services support or deliver other
functionality to the user. Some of those Recommendations are:[citation needed]

H.235 series describes security within H.323, including security for both signaling
and media.
H.239 describes dual stream use in videoconferencing, usually one for live video,
the other for still images.
H.450 series describes various supplementary services.
H.460 series defines optional extensions that might be implemented by an
endpoint or a Gatekeeper, including ITU-T Recommendations H.460.17, H.460.18,
and H.460.19 for Network address translation (NAT) / Firewall (FW) traversal.

In addition to those ITU-T Recommendations, H.323 implements various IETF Request


for Comments (RFCs) for media transport and media packetization, including the Real-
time Transport Protocol (RTP).

Codecs
H.323 utilizes both ITU-defined codecs and codecs defined outside the ITU. Codecs that
are widely implemented by H.323 equipment include:

Audio codecs: G.711, G.729 (including G.729a), G.723.1, G.726, G.722, G.728,
Speex
Text codecs: T.140
Video codecs: H.261, H.263, H.264

All H.323 terminals providing video communications shall be capable of encoding and
decoding video according to H.261 QCIF. All H.323 terminals shall have an audio codec
and shall be capable of encoding and decoding speech according to ITU-T Rec. G.711.
All terminals shall be capable of transmitting and receiving A-law and -law. Support for
other audio and video codecs is optional.[5]

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 14


Sri Vidya College of Engineering & Technology MCC-NOTES

H.323 Architecture
The H.323 system defines several network elements that work together in order to deliver
rich multimedia communication capabilities. Those elements are Terminals, Multipoint
Control Units (MCUs), Gateways, Gatekeepers, and Border Elements. Collectively,
terminals, multipoint control units and gateways are often referred to as endpoints.

While not all elements are required, at least two terminals are required in order to enable
communication between two people. In most H.323 deployments, a gatekeeper is
employed in order to, among other things, facilitate address resolution.

H.323 Network Elements

Terminals

Figure 1 - A complete, sophisticated protocol stack

Terminals in an H.323 network are the most fundamental elements in any H.323 system,
as those are the devices that users would normally encounter. They might exist in the
form of a simple IP phone or a powerful high-definition videoconferencing system.

Inside an H.323 terminal is something referred to as a Protocol stack, which implements


the functionality defined by the H.323 system. The protocol stack would include an
implementation of the basic protocol defined in ITU-T Recommendation H.225.0 and
H.245, as well as RTP or other protocols described above.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 15


Sri Vidya College of Engineering & Technology MCC-NOTES

The diagram, figure 1, depicts a complete, sophisticated stack that provides support for
voice, video, and various forms of data communication. In reality, most H.323 systems
do not implement such a wide array of capabilities, but the logical arrangement is useful
in understanding the relationships.

Multipoint Control Units


A Multipoint Control Unit (MCU) is responsible for managing multipoint conferences
and is composed of two logical entities referred to as the Multipoint Controller (MC) and
the Multipoint Processor (MP). In more practical terms, an MCU is a conference bridge
not unlike the conference bridges used in the PSTN today. The most significant
difference, however, is that H.323 MCUs might be capable of mixing or switching video,
in addition to the normal audio mixing done by a traditional conference bridge. Some
MCUs also provide multipoint data collaboration capabilities. What this means to the end
user is that, by placing a video call into an H.323 MCU, the user might be able to see all
of the other participants in the conference, not only hear their voices.

Gateways
Gateways are devices that enable communication between H.323 networks and other
networks, such as PSTN or ISDN networks. If one party in a conversation is utilizing a
terminal that is not an H.323 terminal, then the call must pass through a gateway in order
to enable both parties to communicate.

Gateways are widely used today in order to enable the legacy PSTN phones to
interconnect with the large, international H.323 networks that are presently deployed by
services providers. Gateways are also used within the enterprise in order to enable
enterprise IP phones to communicate through the service provider to users on the PSTN.

Gateways are also used in order to enable videoconferencing devices based on H.320 and
H.324 to communicate with H.323 systems. Most of the third generation (3G) mobile
networks deployed today utilize the H.324 protocol and are able to communicate with
H.323-based terminals in corporate networks through such gateway devices.

Gatekeepers
A Gatekeeper is an optional component in the H.323 network that provides a number of
services to terminals, gateways, and MCU devices. Those services include endpoint
registration, address resolution, admission control, user authentication, and so forth. Of
the various functions performed by the gatekeeper, address resolution is the most
important as it enables two endpoints to contact each other without either endpoint
having to know the IP address of the other endpoint.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 16


Sri Vidya College of Engineering & Technology MCC-NOTES

Gatekeepers may be designed to operate in one of two signaling modes, namely "direct
routed" and "gatekeeper routed" mode. Direct routed mode is the most efficient and most
widely deployed mode. In this mode, endpoints utilize the RAS protocol in order to learn
the IP address of the remote endpoint and a call is established directly with the remote
device. In the gatekeeper routed mode, call signaling always passes through the
gatekeeper. While the latter requires the gatekeeper to have more processing power, it
also gives the gatekeeper complete control over the call and the ability to provide
supplementary services on behalf of the endpoints.

H.323 endpoints use the RAS protocol to communicate with a gatekeeper. Likewise,
gatekeepers use RAS to communicate with other gatekeepers.

A collection of endpoints that are registered to a single Gatekeeper in H.323 is referred to


as a zone. This collection of devices does not necessarily have to have an associated
physical topology. Rather, a zone may be entirely logical and is arbitrarily defined by the
network administrator.

Gatekeepers have the ability to neighbor together so that call resolution can happen
between zones. Neighboring facilitates the use of dial plans such as the Global Dialing
Scheme. Dial plans facilitate inter-zone dialing so that two endpoints in separate zones
can still communicate with each other.

Border Elements and Peer Elements

Figure 2 - An illustration of an administrative domain with border elements, peer


elements, and gatekeepers

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 17


Sri Vidya College of Engineering & Technology MCC-NOTES

Border Elements and Peer Elements are optional entities similar to a Gatekeeper, but that
do not manage endpoints directly and provide some services that are not described in the
RAS protocol. The role of a border or peer element is understood via the definition of an
"administrative domain".

An administrative domain is the collection of all zones that are under the control of a
single person or organization, such as a service provider. Within a service provider
network there may be hundreds or thousands of gateway devices, telephones, video
terminals, or other H.323 network elements. The service provider might arrange devices
into "zones" that enable the service provider to best manage all of the devices under its
control, such as logical arrangement by city. Taken together, all of the zones within the
service provider network would appear to another service provider as an "administrative
domain".

The border element is a signaling entity that generally sits at the edge of the
administrative domain and communicates with another administrative domain. This
communication might include such things as access authorization information; call
pricing information; or other important data necessary to enable communication between
the two administrative domains.

Peer elements are entities within the administrative domain that, more or less, help to
propagate information learned from the border elements throughout the administrative
domain. Such architecture is intended to enable large-scale deployments within carrier
networks and to enable services such as clearing houses.

The diagram, figure 2, provides an illustration of an administrative domain with border


elements, peer elements, and gatekeepers

H.323 and Voice over IP services

Voice over Internet Protocol (VoIP) describes the transmission of voice using the Internet
or other packet switched networks. ITU-T Recommendation H.323 is one of the
standards used in VoIP. VoIP requires a connection to the Internet or another packet
switched network, a subscription to a VoIP service provider and a client (an analogue
telephone adapter (ATA), VoIP Phone or "soft phone"). The service provider offers the
connection to other VoIP services or to the PSTN. Most service providers charge a
monthly fee, then additional costs when calls are made.[citation needed] Using VoIP between
two enterprise locations would not necessarily require a VoIP service provider, for
example. H.323 has been widely deployed by companies who wish to interconnect
remote locations over IP using a number of various wired and wireless technologies.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 18


Sri Vidya College of Engineering & Technology MCC-NOTES

A codec is an algorithm (OK lets be simple sort of a program!), most of the time
installed as a software on a server or embedded within a piece of hardware (ATA, IP
Phone etc.), that is used to convert voice (in the case of VoIP) signals into digital data to
be transmitted over the Internet or any network during a VoIP call.

The word codec comes from the composed words coder-decoder or compressor-
decompressor. Codecs normally achieve the following three tasks (very few do the last
one):

Encoding decoding
Compression decompression
Encryption - Decryption

Encoding - decoding

When you talk over normal PSTN phone, your voice is transported in an analog way over
the phone line. But with VoIP, your voice is converted into digital signals. This
conversion is technically called encoding, and is achieved by a codec. When the digitized
voice reaches its destination, it has to be decoded back to its original analog state so that
the other correspondent can hear and understand it.

Compression decompression

Bandwidth is a scarce commodity. Therefore, if the data to be sent is made lighter, you
can send more in a certain amount of time, and thus improve performance. To make the
digitized voice less bulky, it is compressed. Compression is a complex process whereby
the same data is stored but using lesser space (digital bits). During compression, the data
is confined to a structure (packet) proper to the compression algorithm. The compressed
data is sent over the network and once it reaches its destination, it is decompressed back
to it original state before being decoded. In most cases, however, it is not necessary to
decompress the data back, since the compressed data is already in a consumable state.

Types of compression

When data is compressed, it becomes lighter and hence performance is improved.


However, it tends to be that the best compression algorithms decrease the quality of the
compressed data. There are two types of compression: lossless and lossy. With lossless
compression, you lose nothing, but you cant compress that much. With lossy
compression, you achieve great downsizing, but you lose in quality. You normally cant
get the compressed data back to its original state with lossy compression, since the
quality had been sacrificed for size. But this is most of the time not necessary.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 19


Sri Vidya College of Engineering & Technology MCC-NOTES

A good example of lossy compression is MP3 for audio. When you compress to audio,
you cant compress back, you MP3 audio is already very good to listen to, compared to
huge pure audio files.

Encryption decryption

Encryption is one of the best tools for achieving security. It is the process of changing
data into such a state that it no one can understand. This way, even if the encrypted data
is intercepted by unauthorized people, the data still remains confidential. Once the
encrypted data reaches destination, it is decrypted back to its original form. Often, when
data is compressed, it already is encrypted to a certain extent, since it is altered from its
original state.

There are many codecs for audio, video, fax and text. Below is a list of the most common
codecs for VoIP. As a user, you may think that you have little to do with what these are,
but it is always good to know a minimum about these, since you might have to make
decisions one day relating codecs concerning VoIP in your business; or at least might one
day understand some words in the Greek VoIP people speak! I wont drag you into all the
technicalities of codecs, but will just mention them.

If you are a techie and want to know more about each one of these codecs in detail, have
a look there.

Common VoIP Codecs

Codec Bandwidth/kbps Comments


G.711 64 Delivers precise speech transmission. Very low processor
requirements. Needs at least 128 kbps for two-way.
G.722 48/56/64 Adapts to varying compressions and bandwidth is conserved
with network congestion.
G.723.1 5.3/6.3 High compression with high quality audio. Can use with dial-
up. Lot of processor power.
G.726 16/24/32/40 An improved version of G.721 and G.723 (different from
G.723.1)
G.729 8 Excellent bandwidth utilization. Error tolerant. License
required.
GSM 13 High compression ratio. Free and available in many hardware
and software platforms. Same encoding is used in GSM
cellphones (improved versions are often used nowadays).
iLBC 15 Robust to packet loss. Free
Speex 2.15 / 44 Minimizes bandwidth usage by using variable bit rate.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 20


Sri Vidya College of Engineering & Technology MCC-NOTES

SIP ARCHITECTURE

Introduction
As the Internet became more popular in the 1990s, network programs that allowed
communication with other Internet users also became more common. Over the years, a
need was seen for a standard protocol that could allow participants in a chat,
videoconference, interactive gaming, or other media to initiate user sessions with one
another. In other words, a standard set of rules and services was needed that defined how
computers would connect to one another so that they could share media and
communicate. The Session Initiation Protocol (SIP) was developed to set up, maintain,
and tear down these sessions between computers.

By working in conjunction with a variety of other protocols and specialized


servers, SIP provides a number of important functions that are necessary in allowing
communications between participants. SIP provides methods of sharing the location and
availability of users and explains the capabilities of the software or device being used.
SIP then makes it possible to set up and manage the session between the parties. Without
these tasks being performed, communication over a large network like the Internet would
be impossible. It would be like a message in a bottle being thrown in the ocean; you
would have no way of knowing how to reach someone directly or whether the person
even could receive the message.

Beyond communicating with voice and video, SIP has also been extended to
support instant messaging and is becoming a popular choice thats incorporated in many
of the instant messaging applications being produced. This extension, called SIMPLE,
provides the means of setting up a session in much the same way as SIP. SIMPLE also
provides information on the status of users, showing whether they are online, busy, or in
some other state of presence.

Because SIP is being used in these various methods of communications, it has


become a widely used and important component of todays communications.

Understanding SIP

SIP was designed to initiate interactive sessions on an IP network. Programs that


provide real-time communication between participants can use SIP to set up, modify, and
terminate a connection between two or more computers, allowing them to interact and
exchange data. The programs that can use SIP include instant messaging, voice over IP
(VoIP), video teleconferencing, virtual reality, multiplayer games, and other applications
that employ single media or multimedia. SIP doesnt provide all the functions that enable
these programs to communicate, but it is an important component that facilitates
communication between two or more endpoints.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 21


Sri Vidya College of Engineering & Technology MCC-NOTES

You could compare SIP to a telephone switchboard operator, who uses other
technology to connect you to another party, set up conference calls or other operations on
your behalf, and disconnect you when youre done. SIP is a type of signaling protocol
that is responsible for sending commands to start and stop transmissions or other
operations used by a program. The commands sent between computers are codes that do
such things as open a connection to make a phone call over the Internet or disconnect that
call later on. SIP supports additional functions, such as call waiting, call transfer, and
conference calling, by sending out the necessary signals to enable and disable these
functions. Just as the telephone operator isnt concerned with how communication
occurs, SIP works with a number of components and can run on top of several different
transport protocols to transfer media between the participants.

Overview of SIP

One of the major reasons that SIP is necessary is found in the nature of programs
that involve messaging, voice communication, and exchange of other media. The people
who use these programs may change locations and use different computers, have several
usernames or accounts, or communicate using a combination of voice, text, or other
media (requiring different protocols).This creates a situation thats similar to trying to
mail a letter to someone who has several aliases, speaks different languages, and could
change addresses at any particular moment.

SIP works with various network components to identify and locate these
endpoints. Information is passed through proxy servers, which are used to register and
route requests to the users location, invite another user(s) into a session, and make other
requests to connect these endpoints. Because there are a number of different protocols
available that may be used to transfer voice, text, or other media, SIP runs on top of other
protocols that transport data and perform other functions. By working with other
components of the network, data can be exchanged between these user agents regardless
of where they are at any given point.

It is the simplicity of SIP that makes it so versatile. SIP is an ASCII- or text-based


protocol, similar to HTTP or SMTP, which makes it more lightweight and flexible than
other signaling protocols (such as H.323). Like HTTP and SMTP, SIP is a request-
response protocol, meaning that it makes a request of a server, and awaits a response.
Once it has established a session, other protocols handle such tasks as negotiating the
type of media to be exchanged, and transporting it between the endpoints. The reusing of
existing protocols and their functions means that fewer resources are used, and minimizes
the complexity of SIP. By keeping the functionality of SIP simple, it allows SIP to work
with a wider variety of applications.

The similarities to HTTP and SMTP are no accident. SIP was modeled after these
text-based protocols, which work in conjunction with other protocols to perform specific
tasks. As well see later in this chapter, SIP is also similar to these other protocols in that
it uses Universal Resource Identifiers (URIs) for identifying users. A URI identifies

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 22


Sri Vidya College of Engineering & Technology MCC-NOTES

resources on the Internet, just as a Uniform Resource Locator (URL) is used to identify
Web sites. The URI used by SIP incorporates a phone number or name, such as SIP:
user@syngress.com, which makes reading SIP addresses easier. Rather than reinventing
the wheel, the development of SIP incorporated familiar aspects of existing protocols that
have long been used on IP networks. The modular design allows SIP to be easily
incorporated into Internet and network applications, and its similarities to other protocols
make it easier to use.

RFC 2543/RFC 3261

The Session Initiation Protocol is a standard that was developed by the Internet
Engineering Task Force (IETF).The IETF is a body of network designers, researchers,
and vendors that are members of the Internet Society Architecture Board for the purpose
of developing Internet communication standards. The standards they create are important
because they establish consistent methods and functionality. Unlike proprietary
technology, which may or may not work outside of a specific program, standardization
allows a protocol or other technology to function the same way in any application or
environment. In other words, because SIP is a standard, it can work on any system,
regardless of the communication program, operating system, or infrastructure of the IP
network.

The way that IETF develops a standard is through recommendations for rules that
are made through Request for Comments (RFCs).The RFC starts as a draft that is
examined by members of a Working Group, and during the review process, it is
developed into a finalized document. The first proposed standard for SIP was produced in
1999 as RFC 2543, but in 2002, the standard was further defined in RFC 3261.
Additional documents outlining extensions and specific issues related to the SIP standard
have also been released, which make RFC 2543 obsolete and update RFC 3261.The
reason for these changes is that as technology changes, the development of SIP also
evolves. The IETF continues developing SIP and its extensions as new products are
introduced and its applications expand.

SIP and Mbone

Although RFC 2543 and RFC 3261 define SIP as a protocol for setting up,
managing, and tearing down sessions, the original version of SIP had no mechanism for
tearing down sessions and was designed for the Multicast Backbone (Mbone).Mbone
originated as a method of broadcasting audio and video over the Internet. The Mbone is a
broadcast channel that is overlaid on the Internet, and allowed a method of providing
Internet broadcasts of things like IETF meetings, space shuttle launches, live concerts,
and other meetings, seminars, and events. The ability to communicate with several hosts
simultaneously needed a way of inviting users into sessions; the Session Invitation
Protocol (as it was originally called) was developed in 1996.www.syngress.com

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 23


Sri Vidya College of Engineering & Technology MCC-NOTES

The Session Invitation Protocol was a precursor to SIP that was defined by the
IETF MMUSIC Working group, and a primitive version of the Session Initiation Protocol
used today. However, as VoIP and other methods of communications became more
popular, SIP evolved into the Session Initiation Protocol. With added features like the
ability to tear down a session, it was a still more lightweight than more complex protocols
like H.323. In 1999, the Session Initiation Protocol was defined as RFC 2543, and has
become a vital part of multimedia applications used today.

OSI

In designing the SIP standard, the IETF mapped the protocol to the OSI (Open
Systems Interconnection) reference model. The OSI reference model is used to associate
protocols to different layers, showing their function in transferring and receiving data
across a network, and their relation to other existing protocols. A protocol at one layer
uses only the functions of the layer below it, while exporting the information it processes
to the layer above it. It is a conceptual model that originated to promote interoperability,
so that a protocol or element of a network developed by one vendor would work with
others.
As seen in Figure 8.1, the OSI model contains seven layers: Application,
Presentation, Session, Transport, Network, Data Link, and Physical. As seen in this
figure, network communication starts at the Application layer and works its way down
through the layers step by step to the Physical layer. The information then passes along
the cable to the receiving computer, which starts the information at the Physical layer.
From there it steps back up the OSI layers to the Application layer where the receiving
computer finalizes the processing and sends back an acknowledgement if needed. Then
the whole process starts over.
www.syngress.com
Figure 8.1 In the OSI Reference Model, Data is Transmitted down through the
Layers, across the Medium, and Back up through the Layers

The layers of the OSI reference model have different functions that are necessary
in transferring data across a network, and mapping protocols to these layers make it
easier to understand how they interrelate to the network as a whole. Table 8.1 shows the
seven layers of the OSI model, and briefly explains their functions.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 24


Sri Vidya College of Engineering & Technology MCC-NOTES

Table 8.1 Layers of the OSI Model

Layer Description

7: Application The Application layer is used to identify communication


partners, facilitate authentication (if necessary), and
allows a program to communicate with lower layer
protocols, so that in turn it can communicate across the
network. Protocols that map to this layer include SIP,
HTTP, and SMTP.

6: Presentation The Presentation layer converts data from one format to


another, such as converting a stream of text into a popup
window, and handles encoding and encryption.

5: Session The Session layer is responsible for coordinating sessions


and connections.

4: Transport The Transport layer is used to transparently transfer


data between computers. Protocols that map to this
layer include TCP, UDP, and RTP.

3: Network The Network Layer is used to route and forward data so


that it goes to the proper destination. The most common
protocol that maps to this layer is IP.
www.syngress.com
2: Data Link The Data Link layer is used to provide error correction
that may occur at the physical level, and provide physical
addressing through the use of MAC addresses that are hard-
coded into network cards.

1: Physical The Physical layer defines electrical and physical


specifications of network devices, and provides the means
of allowing hardware to send and receive data on a
particular type of media. At this level, data is passed as a
bit stream across the network.

SIP and the Application Layer

Because SIP is the Session Initiation Protocol, and its purpose is to establish,
modify, and terminate sessions, it would seem at face-value that this protocol maps to the
Session layer of the OSI reference model. However, it is important to remember that the
protocols at each layer interact only with the layers above and below it. Programs directly
access the functions and supported features available through SIP, disassociating it from
this layer. SIP is used to invite a user into an interactive session, and can also invite

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 25


Sri Vidya College of Engineering & Technology MCC-NOTES

additional participants into existing sessions, such as conference calls or chats. It allows
media to be added to or removed from a session, provides the ability to identify and
locate a user, and also supports name mapping, redirection, and other services. When
comparing these features to the OSI model, it becomes apparent that SIP is actually an
Application-layer protocol.

The Application layer is used to identify communication partners, facilitate


authentication (if necessary), and allows a program to communicate with lower layer
protocols, so that in turn it can communicate across the network. In the case of SIP, it is
setting up, maintaining, and ending interactive sessions, and providing a method of
locating and inviting participants into these sessions. The software being used
communicates through SIP, which passes the data down to lower layer protocols and
sends it across the network.
www.syngress.com
SIP Functions and Features

When SIP was developed, it was designed to support five specific elements of
setting up and tearing down communication sessions. These supported facets of the
protocol are:

User location, where the endpoint of a session can be identified and found, so that a
session can be established
User availability, where the participant thats being called has the opportunity and
ability to indicate whether he or she wishes to engage in the communication
User capabilities, where the media that will be used in the communication is
established, and the parameters of that media are agreed upon
Session setup, where the parameters of the session are negotiated and established
Session management, where the parameters of the session are modified, data is
transferred, services are invoked, and the session is terminated

Although these are only a few of the issues needed to connect parties together so
they can communicate, they are important ones that SIP is designed to address. However,
beyond these functions, SIP uses other protocols to perform tasks necessary that allow
participants to communicate with each other, which well discuss later in this chapter.

User Location

The ability to find the location of a user requires being able to translate a
participants username to their current IP address of the computer being used. The reason
this is so important is because the user may be using different computers, or (if DHCP is
used) may have different IP addresses to identify the computer on the network. The
program can use SIP to register the user with a server, providing a username and IP
address to the server. Because a server now knows the current location of the user, other
users can now find that user on the network. Requests are redirected through the proxy
server to the users current location. By going through the server, other potential

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 26


Sri Vidya College of Engineering & Technology MCC-NOTES

participants in a communication can find users, and establish a session after acquiring
their IP addresses.

User Availability

The user availability function of SIP allows a user to control whether he or she
can be contacted. The user can set themselves as being away or busy, or available for
certain types of communication. If available, other users can then invite the user to join in
a type of communication (e.g., voice or videoconference), depending on the capabilities
of the program being used.

User Capabilities

Determining the users capabilities involves determining what features are


available on the programs being used by each of the parties, and then negotiating which
can be used during the session. Because SIP can be used with different programs on
different platforms, and can be used to establish a variety of single-media and multimedia
communications, the type of communication and its parameters needs to be determined.
For example, if you were to call a particular user, your computer might support video
conferencing, but the person youre calling doesnt have a camera installed. Determining
the user capabilities allows the participants to agree on which features, media types, and
parameters will be used during a session.

Session Setup

Session setup is where the participants of the communication connect together.


The user who is contacted to participate in a conversation will have their program ring
or produce some other notification, and has the option of accepting or rejecting the
communication. If accepted, the parameters of the session are agreed upon and
established, and the two endpoints will have a session started, allowing them to
communicate.

Session Management

Session management is the final function of SIP, and is used for modifying the
session as it is in use. During the session, data will be transferred between the
participants, and the types of media used may change. For example, during a voice
conversation, the participants may decide to invoke other services available through the
program, and change to a video conferencing. During communication, they may also
decide to add or drop other participants, place a call on hold, have the call transferred,
and finally terminate the session by ending their conversation. These are all aspects of
session management, which are performed through SIP.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 27


Sri Vidya College of Engineering & Technology MCC-NOTES

SIP URIs

Because SIP was based on existing standards that had already been proven on the
Internet, it uses established methods for identifying and connecting endpoints together.
This is particularly seen in the addressing scheme that it uses to identify different SIP
accounts. SIP uses addresses that are similar to e-mail addresses. The hierarchical URI
shows the domain where a users account is located, and a host name or phone number
that serves as the users account. For example, SIP: myaccount@madeupsip.com shows
that the account my account is located at the domain madeupsip.com. Using this method
makes it simple to connect someone to a particular phone number or username.
Because the addresses of those using SIP follow a username @ domain name
format, the usernames created for accounts must be unique within the namespace.
Usernames and phone numbers must be unique as they identify which account belongs to
a specific person, and used when someone attempts sending a message or placing a call to
someone else. Because the usernames are stored on centralized servers, the server can
determine whether a particular username is available or not when a person initially sets
up an account.
URIs also can contain other information that allows it to connect to a particular
user, such as a port number, password, or other parameters. In addition to this, although
SIP URIs will generally begin with SIP:, others will begin with SIPS:, which indicates
that the information must be sent over a secure transmission. In such cases, the data and
messages transmitted are transported using the Transport Layer Security (TLS) protocol,
which well discuss later in this chapter.

SIP Architecture
Though weve discussed a number of the elements of SIP, there are still a number
of essential components that make up SIPs architecture that we need to address. SIP
would not be able to function on a network without the use of various devices and
protocols. The essential devices are those that you and other participants would use in a
conversation, allowing you to communicate with one another, and various servers may
also be required to allow the participants to connect together. In addition to this, there are
a number of protocols that carry your voice and other data between these computers and
devices. Together, they make up the overall architecture of SIP.

SIP Components

Although SIP works in conjunction with other technologies and protocols, there
are two fundamental components that are used by the Session Initiation Protocol:
User agents, which are endpoints of a call (i.e., each of the participants in a call)
SIP servers, which are computers on the network that service requests from clients, and
send back responses

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 28


Sri Vidya College of Engineering & Technology MCC-NOTES

User Agents

User agents are both the computer that is being used to make a call, and the target
computer that is being called. These make the two endpoints of the communication
session. There are two components to a user agent: a client and a server. When a user
agent makes a request (such as initiating a session), it is the User Agent Client (UAC),
and the user agent responding to the request is the User Agent Server (UAS). Because the
user agent will send a message, and then respond to another, it will switch back and forth
between these roles throughout a session.
Even though other devices that well discuss are optional to various degrees, User
Agents must exist for a SIP session to be established. Without them, it would be like
trying to make a phone call without having another person to call. One UA will invite the
other into a session, and SIP can then be used to manage and tear down the session when
it is complete. During this time, the UAC will use SIP to send requests to the UAS, which
will acknowledge the request and respond to it. Just as a conversation between two
people on the phone consists of conveying a message or asking a question and then
waiting for a response, the UAC and UAS will exchange messages and swap roles in a
similar manner throughout the session. Without this interaction, communication couldnt
exist.
Although a user agent is often a software application installed on a computer, it
can also be a PDA, USB phone that connects to a computer, or a gateway that connects
the network to the Public Switched Telephone Network. In any of these situations
however, the user agent will continue to act as both a client and a server, as it sends and
responds to messages.

SIP Server

The SIP server is used to resolve usernames to IP addresses, so that requests sent
from one user agent to another can be directed properly. A user agent registers with the
SIP server, providing it with their username and current IP address, thereby establishing
their current location on the network. This also verifies that they are online, so that other
user agents can see whether theyre available and invite them into a session. Because the
user agent probably wouldnt know the IP address of another user agent, a request is
made to the SIP server to invite another user into a session. The SIP server then identifies
whether the person is currently online, and if so, compares the username to their IP
address to determine their location. If the user isnt part of that domain, and thereby uses
a different SIP server, it will also pass on requests to other servers.
In performing these various tasks of serving client requests, the SIP server will act
in any of several different roles:

Registrar server
Proxy server
Redirect server

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 29


Sri Vidya College of Engineering & Technology MCC-NOTES

Registrar Server

Registrar servers are used to register the location of a user agent who has logged
onto the network. It obtains the IP address of the user and associates it with their
username on the system. This creates a directory of all those who are currently logged
onto the network, and where they are located. When someone wishes to establish a
session with one of these users, the Registrar servers information is referred to, thereby
identifying the IP addresses of those involved in the session.

Proxy Server

Proxy servers are computers that are used to forward requests on behalf of other
computers. If a SIP server receives a request from a client, it can forward the request onto
another SIP server on the network. While functioning as a proxy server, the SIP server
can provide such functions as network access control, security, authentication, and
authorization.

Redirect Server

The Redirect servers are used by SIP to redirect clients to the user agent they are
attempting to contact. If a user agent makes a request, the Redirect server can respond
with the IP address of the user agent being contacted. This is different from a Proxy
server, which forwards the request on your behalf, as the Redirect server essentially tells
you to contact them yourself. The Redirect server also has the ability to fork a call, by
splitting the call to several locations. If a call was made to a particular user, it could be
split to a number of different locations, so that it rang at all of them at the same time. The
first of these locations to answer the call would receive it, and the other locations would
stop ringing.

Stateful versus Stateless

The servers used by SIP can run in one of two modes: stateful or stateless. When
a server runs in stateful mode, it will keep track of all requests and responses it sends and
receives. A server that operates in a stateless mode wont remember this information, but
will instead forget about what it has done once it has processed a request. A server
running in stateful mode generally is found in a domain where the user agents resides,
whereas stateless servers are often found as part of the backbone, receiving so many
requests that it would be difficult to keep track of them.

Location Service

The location service is used to keep a database of those who have registered
through a SIP server, and where they are located. When a user agent registers with a

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 30


Sri Vidya College of Engineering & Technology MCC-NOTES

Registrar server, a REGISTER request is made (which well discuss in the later section).
If the Registrar accepts the request, it will obtain the SIP address and IP address of the
user agent, and add it to the location service for its domain. This database provides an up-
to-date catalog of everyone who is online, and where they are located, which Redirect
servers and Proxy servers can then use to acquire information about user agents. This
allows the servers to connect user agents together or forward requests to the proper
location.

Client/Server versus Peer-to-Peer Architecture

In looking at the components of SIP, you can see that requests are processed in
different ways. When user agents communicate with one another, they send requests and
responses to one another. In doing so, one acts as a User Agent Client, and the other
fulfills the request acts as a User Agent Server. When dealing with SIP servers however,
they simply send requests that are processed by a specific server. This reflects two
different types of architectures used in network communications:

Client/Server
Peer-to-peer

Client/Server

In a client/server architecture, the relationship of the computers are separated into


two roles:

The client, which requests specific services or resources


The server, which is dedicated to fulfilling requests by responding (or attempting to
respond) with requested services or resources

An easy-to-understand example of a client/server relationship is seen when using


the Internet. When using an Internet browser to access a Web site, the client would be the
computer running the browser software, which would request a Web page from a Web
server. The Web server receives this request and then responds to it by sending the Web
page to the client computer. In VoIP, this same relationship can be seen when a client
sends a request to register with a Registrar server, or makes a request to a Proxy Server or
Redirect Server that allows it to connect with another user agent. In all these cases, the
clients role is to request services and resources, and the servers role is to listen to the
network and await requests that it can process or pass onto other servers.
The servers that are used on a network acquire their abilities to service requests by
the programs installed on it. Because a server may run a number of services or have
multiple server applications installed on it, a computer dedicated to the role of being a
server may provide several functions on a network. For example, a Web server might also
act as an e-mail server. In the same way, SIP servers also may provide different services.
A Registrar can register clients and also run the location service that allows clients and
other servers to locate other users who have registered on the network. In this way, a
single server may provide diverse functionality to a network that would otherwise

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 31


Sri Vidya College of Engineering & Technology MCC-NOTES

be unavailable.
www.syngress.com
Another important function of the server is that, unlike clients that may be
disconnected from the Internet or shutdown on a network when the person using it is
done, a server is generally active and awaiting client requests. Problems and maintenance
aside, a dedicated server is up and running, so that it is accessible. The IP address of the
server generally doesnt change, meaning that clients can always find it on a network,
making it important for such functions as finding other computers on the network.

Peer to Peer

A peer-to-peer (P2P) architecture is different from the client/server model, as the


computers involved have similar capabilities, and can initiate sessions with one another
to make and service requests from one another. Each computer provides services and
resources, so if one becomes unavailable, another can be contacted to exchange messages
or access resources. In this way, the user agents act as both client and server, and are
considered peers.
Once a user agent is able to establish a communication session with another user
agent, a P2P architecture is established where each machine makes requests and responds
to the other. One machine acting as the User Agent client will make a request, while the
other acting as the User Agent server will respond to it. Each machine can then swap
roles, allowing them to interact as equals on the network. For example, if the applications
being used allowed file sharing, a UAC could request a specific file from the UAS and
download it. During this time, the peers could also be exchanging messages or talking
using VoIP, and once these activities are completed, one could send a request to
terminate the session to end the communications between them. As seen by this, the
computers act in the roles of both client and server, but are always peers by having the
same functionality of making and responding to requests.

SIP Requests and Responses

Because SIP is a text-based protocol like HTTP, it is used to send information


between clients and servers, and User Agent clients and User Agent servers, as a series of
requests and responses. When requests are made, there are a number of possible signaling
commands that might be used:
www.syngress.com
REGISTER Used when a user agent first goes online and registers their SIP address
and IP address with a Registrar server.
INVITE Used to invite another User agent to communicate, and then establish a SIP
session between them.
ACK Used to accept a session and confirm reliable message exchanges.
OPTIONS Used to obtain information on the capabilities of another user agent, so that
a session can be established between them. When this information is provided a session
isnt automatically created as a result.

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 32


Sri Vidya College of Engineering & Technology MCC-NOTES

SUBSCRIBE Used to request updated presence information on another user agents


status. This is used to acquire updated information on whether a User agent is online,
busy, offline, and so on.
NOTIFY Used to send updated information on a User agents current status. This
sends presence information on whether a User agent is online, busy, offline, and so on.
CANCEL Used to cancel a pending request without terminating the session.
BYE Used to terminate the session. Either the user agent who initiated the session, or
the one being called can use the BYE command at any time to terminate the session.

When a request is made to a SIP server or another user agent, one of a number of
possible responses may be sent back. These responses are grouped into six different
categories, with a three-digit numerical response code that begins with a number relating
to one of these categories. The various categories and their response code prefixes are as
follows:

Informational (1xx): The request has been received and is being processed.
Success (2xx): The request was acknowledged and accepted.
Redirection (3xx): The request cant be completed and additional steps are required
(such as redirecting the user agent to another IP address).
Client error (4xx): The request contained errors, so the server cant process the
request
Server error (5xx): The request was received, but the server cant process it. Errors of
this type refer to the server itself, and dont indicate that another server wont be able to
process the request.
Global failure (6xx): The request was received and the server is unable to process it.
Errors of this type refer to errors that would occur on any server, so the request wouldnt
be forwarded to another server for processing.

Protocols Used with SIP

Although SIP is a protocol in itself, it still needs to work with different protocols
at different stages of communication to pass data between servers, devices, and
participants. Without the use of these protocols, communication and the transport of
certain types of media would either be impossible or insecure. In the sections that follow,
well discuss a number of the common protocols that are used with SIP, and the functions
they provide during a session.

UDP

The User Datagram Protocol (UDP) is part of the TCP/IP suite of protocols, and
is used to transport units of data called datagrams over an IP network. It is similar to the
Transmission Control Protocol (TCP), except that it doesnt divide messages into packets
and reassembles them at the end. Because the datagrams dont support sequencing of the
packets as the data arrives at the endpoint, it is up to the application to ensure that the
data has arrived in the right order and has arrived completely. This may sound less
beneficial than using TCP for transporting data, but it makes UDP faster because there is

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 33


Sri Vidya College of Engineering & Technology MCC-NOTES

less processing of data. It often is used when messages with small amounts of data (which
requires less reassembling) are being sent across the network, or with data that will be
unaffected overall by a few units of missing data.
Although an application may have features that ensure that datagrams havent
gone missing or arrived out of order, many simply accept the potential of data loss,
duplication, or errors. In the case of Voice over IP, streaming video, or interactive games,
a minor loss of data or error will be a minor glitch that generally wont affect the overall
quality or performance. In these cases, it is more important that the data is passed quickly
from one endpoint to another. If reliability were a major issue, then the use of TCP as a
transport protocol would be a better choice over hindering the application with features
that check for the reliability of the data it receives.
Notes from the Underground
Transport Layer Security

Transport Layer Security (TLS) is a protocol that can be used with other protocols
like UDP to provide security between applications communicating over an IP network.
TLS uses encryption to ensure privacy, so that other parties cant eavesdrop or tamper
with the messages being sent. Using TLS, a secure connection is established by
authenticating the client and server, or User Agent Client and User Agent Server, and
then encrypting the connection between them.
Transport Layer Security is a successor to Secure Sockets Layer (SSL), which
was developed by Netscape. Even though it is based on SSL 3.0, TLS is a standard that
has been defined in RFC 2246, and is designed to be its replacement. In this standard,
TLS is designed as a multilayer protocol that consists of:

TLS Handshake Protocol


TLS Record Protocol

The TLS Handshake Protocol is used to authenticate the participants of the


communication and negotiate an encryption algorithm. This allows the client and server
to agree upon an encryption method and prove who they are using cryptographic keys
before any data is sent between them. Once this has been done successfully, a secure
channel is established between them.
After the TLS Handshake Protocol is used, the TLS Record Protocol ensures that
the data exchanged between the parties isnt altered en route. This protocol can be used
with or without encryption, but TLS Record Protocol provides enhanced security using
encryption methods like the Data Encryption Standard (DES). In doing so, it provides the
security of ensuring data isnt modified, and others cant access the data while in
transit.www.syngress.com

EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type text]Page 34

You might also like