You are on page 1of 59

MULTIMEDIA

Multimedia refers to a number of different integrated media, such as text,


images, audio, and video, that are generated, stored, and transmitted digitally
and can be accessed interactively.
Compression plays a crucial role in multimedia communication by reducing the
large volume of data to be exchanged. Lossless and Lossy categories.
With lossless compression, every single bit of data that was originally in the file
remains after the file is uncompressed. All of the information is completely
restored. Examples are run-length coding, dictionary coding, Huffman coding,
and arithmetic coding.
Lossy compression reduces a file by permanently eliminating certain
information, especially redundant information.
Multimedia consists of text, images, video and audio.

The Internet stores a large amount of text that can be downloaded and used.
One often refers to plaintext, as a linear form, and hypertext, as a nonlinear
form, of textual data.
Text stored in the Internet uses a character set, such as Unicode, to represent
symbols in the underlying language.
Unicode is a 16-bit code which can represent 65,536 characters (216), which is
enough for any language character set.
Lossless compression is only used for text because we cannot afford to loose any
pieces of information when we do decompression.
In multimedia parlance, an image (or a still image as it is often called) is the
representation of a photograph, a fax page, or a frame in a moving picture.
Digitization of an image means to represent an image as a two-dimensional array
of dots, called pixels.
Each pixel then can be represented as a number of bits, referred to as the bit
depth.
In a black-and-white image, such as a fax page, the bit depth = 1; each pixel can be represented
as a 0-bit (black) or a 1-bit (white).
In a gray picture, one normally uses a bit depth of 8 with 256 levels.
In a color image, the image is normally divided into three channels, with each channel
representing one of the three primary colors of red, green, or blue (RGB). In this case, the bit
depth is 24 (8 bits for each color).
Moving from black-and-white, to gray, to color representation of images
tremendously increases the size of the information to transmit in the Internet
which implies the need for compression of images to save time.
Example image formats
Image: An image is a two dimensional signal. It is defined by the
mathematical function f(x,y) where x and y are the two coordinates horizontally and
vertically.
Pixel: The value of f(x,y) at any point is gives the pixel value at that point of an image.
Image Resolution: It refers to number of pixels in an image

The following shows the time required to transmit an image of 1280 720 pixels
using the transmission rate of 100 kbps.
The simplest bitmap images only
uses black and white: each pixel is
colored black or white and called
monochrome graphics.
A greyscale graphic is a bitmap
using shades of gray
e.g. 256 shades of gray
Each pixel can be white, black, or
one of 254 shades of gray.
256 is 28, so each pixel requires 8
bits of information
Color images can use 16, 256,
65,536, or 16.7 million colors.
Greyscale files are bigger than
monochrome files. Graphics files
with full color can get big!
Compression is a way to reduce the number of bits in a frame but retaining its meaning.
Image compression can benefit users by having pictures load faster and web pages use up
less space on a Web host.
The Joint Photographic Experts Group (JPEG) standard provides lossy compression that is
used in most implementations and can be used for both color and gray images.
In JPEG, a grayscale picture is divided into blocks of 8 x 8 pixels. The compression and
decompression each go through three steps:
JPEG normally uses DCT (Discrete Cosine Transform) in the first step in compression and inverse
DCT in the last step in decompression. Transformation and inverse transformation are applied on
8 x 8 blocks.

The output of DCT transformation is a matrix of real numbers.


The precise encoding of these real numbers requires a lot of bits, JPEG uses a quantization step
that not only rounds real values in the matrix, but also changes some values to zeros. The zeros can
be eliminated in the encoding step to achieve a high compression rate.
The only phase in the process that is not completely reversible is the quantizing phase.

After quantization, the values are reordered in a zigzag sequence before being input into the
encoder.
The zigzag reordering of the quantized values is done to let the values related to the lower
frequency feed into the encoder before the values related to the higher frequency.
Since most of the higher-frequency values are zeros, this means nonzero values are given to the
encoder before the zero values. The encoding is a lossless compression.
uniform gray scale gradient gray scale
Steps of JPEG image compression based on DCT

Three different quantization matrices


The Graphics Interchange Format (GIF) is a bitmap image format that was developed by
US-based software writer, Steve Wilhite while working at the internet service provider
CompuServe in 1987.
The format supports up to 8 bits per pixel for each image, allowing a single image to
reference its own palette of up to 256 different colors chosen from the 24-bit RGB color
space. It also supports animations and allows a separate palette of up to 256 colors for
each frame.
These palette limitations make the GIF format less suitable for reproducing color
photographs and other images with continuous color, but it is well-suited for simpler
images such as graphics or logos with solid areas of color.
For example, the color Magenta in JPEG format (224) is represented as the integer
(FF00FF)16 while the same color in GIF (28) can be represented using (E2)16 thereby
reducing the size of the image by a factor of 3 compared to JPEG.
Video is composed of multiple frames; each frame is one image requiring a high
transmission rate for the video file.
A video is composed of a series of pictures (frames), displayed at 30 frames per
second (FPS).
Let us show the transmission rate for some video standards:

Even if we have a high-end computer(processor, hard disk etc.), the computational


demands placed by such a video is very high.
Motion Picture Experts Group (MPEG) is a method to compress video. In
principle, a motion picture is a rapid flow of a set of frames, where each
frame is an image.
A frame is a spatial combination of pixels, and a video is a temporal
combination of frames that are sent one after another.
Compressing video, then, means spatially compressing each frame and
temporally compressing a set of frames.
The spatial compression of each frame is done with JPEG. Each frame is a
picture that can be independently compressed.
Video clips have certain amount of redundancy that can be discarded to
achieve compression. Lossy compression schemes are used. In temporal
compression, redundant frames are removed.
To temporally compress data, the MPEG method first divides a set of frames into three
categories: I-frames, P-frames, and B-frames.

An intra coded frame (I-frame) is an independent frame that is not related to any
other frame. An I-frame must appear periodically to handle some sudden change in the frame
that the previous and following frames cannot show.
A predicted frame (P-frame) is related to the preceding I-frame or P-frame. P-
frames can be constructed only from previous I- or P-frames. P-frames carry much less
information than other frame types and carry even fewer bits after compression.
A bidirectional frame (B-frame) is related to the preceding and following I-frame
or P-frame. In other words, each B-frame is relative to the past and the future. A B-frame is
never related to another B-frame.
Audio (sound) signals are analog signals that need a medium to travel; they cannot
travel through a vacuum.
The speed of the sound in the air is about 330 m/s (740 mph). The audible
frequency range for normal human hearing is from about 20Hz to 20kHz with
maximum audibility around 3300 Hz.
To be able to provide compression, audio analog signals are digitized using an
analog-to-digital converter. The analog-to-digital conversion consists of two
processes: sampling and quantizing.
Pulse code modulation (PCM) involved sampling an analog signal, quantizing the
sample, and coding the quantized values as streams of bits.
Voice signal is sampled at the rate of 8,000 samples per second with 8 bits per sample; the result
is a digital signal of 8,000 x 8 = 64 kbps.
Music is sampled at 44,100 samples per second with 16 bits per sample; the result is a digital
signal of 44,100 x 16 = 705.6 kbps
Both lossy and lossless compression algorithms are used in audio compression.
Compression techniques used for speech and music have different requirements.
Compression techniques used for speech must have low latency because significant delays degrade the
communication quality in telephony.
Compression algorithms used for music must be able to produce high quality sound with lower numbers of
bits.
Two categories of techniques are used in audio compressions: predictive coding and
perceptual coding.
Predictive coding techniques have low latency and therefore are popular in speech coding
for telephony where significant delays degrade the communication quality. Common
examples of these techniques are Delta Modulation (DM),Adaptive Delta Modulation
(ADM), Differential PCM (DPCM), Adaptive Differential PCM (ADPCM), and Linear
Predictive Coding (LPC).
The most common compression technique used to create CD-quality audio is
perceptual coding, which is based on the science of psychoacoustics.
Algorithms used in perceptual coding first transform the data from time domain
to frequency domain; the operations are then performed on the data in the
frequency domain. This technique, hence, is also called the frequency domain
method.
Psychoacoustics is the study of subjective human perception of sound.
Perceptual coding takes advantage of flaws in the human auditory system. The
lower limit of human audibility is 0 dB.
This is only true for sounds with frequencies of about 2.5 and 5 kHz and hence
not necessary to code.
One standard that uses perceptual coding is MP3 (MPEG audio layer 3).
Multimedia In the Internet
Audio and Video services are classified into three broad categories:
streaming stored audio/video,
streaming live audio/video,
interactive audio/video.
Streaming means a user can listen (or watch) the file after the downloading has started.

In streaming stored audio/video category, the files are compressed and stored on a
server.
A client downloads the files through the Internet (referred to as on-demand
audio/video).
Examples of stored audio files are songs, symphonies, books on tape, and famous
lectures. Examples of stored video files are movies, TV shows, and music video clips.
The client (browser) can use the services of HTTP and send a GET message to
download the file. The Web server can send the compressed file to the browser. The
browser can then use a help application, normally called a media player, to play the
file.
This approach is very simple
and does not involve
streaming.
In this approach, the file
needs to download
completely before it can be
played.
The media player is directly connected to the Web server for downloading the
audio/video file. The Web server stores two files: the actual audio/video file and a
metafile that holds information about the audio/video file.
1. The HTTP client accesses the Web
server using the GET message.
2. The information about the
metafile comes in the response.
3. The metafile is passed to the
media player.
4. The media player uses the URL in
the metafile to access the
audio/video file.
5. The Web server responds.
The problem with the second approach is that the browser and the media player both use the
services of HTTP. HTTP is designed to run over TCP, which is suitable for retrieving the metafile but
not the audio/video file.
As retransmissions are not suited for streaming, UDP needs to be used. Hence a new server called
Media Server is used.
1. The HTTP client accesses the Web server
using a GET message.
2. The information about the metafile comes
in the response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the
metafile to access the media server to
download the file. Downloading can take
place by any protocol that uses UDP.
5. The media server responds.
The Real-Time Streaming Protocol (RTSP) is a control protocol designed to add more
functionalities to the streaming process. Using RTSP, we can control the playing of audio/video.
RTSP is an out-of-band control protocol that is similar to the second connection in FTP.
1. The HTTP client accesses the Web server using a GET
message.
2. The information about the metafile comes in the response.
3. The metafile is passed to the media player.
4. The media player sends a SETUP message to create a
connection with the media server.
5. The media server responds.
6. The media player sends a PLAY message to start playing
(downloading).
7. The audio/video file is downloaded using another protocol
that runs over UDP.
8. The connection is broken using the TEARDOWN message.
9. The media server responds.
In streaming live audio/video category, a user listens to broadcast audio and
video through the Internet. Good examples of this type of application are
Internet radio and Internet TV.
In the first application, the communication is unicast and on-demand. In the
second, the communication is multicast and live. Live streaming is better
suited to the multicast services of IP and the use of protocols such as UDP and
RTP.
Examples are
Internet Radio
Internet Television (ITV)
IPTV
Real- Time Interactive Audio/Video
In interactive audio/video category, people use the Internet to interactively
communicate with one another.
The Internet phone or voice over IP is an example of this type of application. Video
conferencing is another example that allows people to communicate visually and
orally.

Real-time data on a packet-switched


network require the preservation of the time
relationship between packets of a session.
A real-time video server creates live video images and
sends them online. The video is digitized and packetized.
Three packets with 10 seconds of video are sent.
Even with a 1-second delay, the time relationship
between the packets is preserved.
If the packets arrive with different delays, the situation is called Jitter.
For example, the first packet arrives at 00:00:01 (l-s delay), the second arrives at 00:00:15
(5-s delay), and the third arrives at 00:00:27 (7-s delay). If the receiver starts playing the
first packet at 00:00:01, it will finish at 00:00: 11. However, the next packet has not yet
arrived; it arrives 4 s later.
There is a gap between the first and second packets and between the second and the third
as the video is viewed at the remote site.
One solution to jitter is the use of a timestamp. If each packet has a timestamp that
shows the time it was produced relative to the first (or previous) packet, then the
receiver can add this time to the time at which it starts the playback. In other words, the
receiver knows when each packet is to be played.

To prevent jitter, we can


timestamp the packets and
separate the arrival time
from the playback time.
To be able to separate the arrival time from the playback time, we need a buffer to store the data until
they are played back. The buffer is referred to as a playback buffer.
When a session begins (the first bit of the first packet arrives), the receiver delays playing the data
until a threshold is reached.
Data are stored in the buffer at a possibly variable rate, but they are extracted and played back at a
fixed rate.
The amount of data in the buffer shrinks or expands, but as long as the delay is less than the time to
play back the threshold amount of data, there is no jitter.
A sequence number for each packet in addition to time relationship information and timestamps for
real-time traffic.

Multimedia playa primary role in audio and video conferencing. The traffic can be heavy, and the
data are distributed using multicasting methods. Conferencing requires two-way communication
between receivers and senders.

A translator is a computer that can change the format of a high-bandwidth video signal to a lower-
quality narrow-bandwidth signal.

If there is more than one source that can send data at the same time (as in a video or audio
conference), the traffic is made of multiple streams. To converge the traffic to one stream, data from
different sources can be mixed. A mixer mathematically adds signals coming from different sources
to create one single signal.
Real-Time Interactive Protocols
Each microphone or camera at the source site is called a contributor and is given a 32-bit
identifier called the contributing source (CSRC) identifier.
The mixer is also called the synchronizer and is given another identifier called the synchronizing
source (SSRC) identifier.

Schematic diagram of a real-time multimedia system


Transport-Layer Requirements for Interactive Real- Time Multimedia

Sender-Receiver Negotiation
Creation of Packet Stream
Source Synchronization
Error Control
Congestion Control
Jitter Removal
Identifying Sender.

Capability of UDP or TCP to handle real-time data


Real-time Transport Protocol (RTP)
The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and
video over IP networks. RTP is used extensively in communication and entertainment
systems that involve streaming media, such as telephony, video teleconference applications,
television services, etc.

RTP is treated as the transport protocol (not a


transport-layer protocol) that can be thought of as
located in the application layer.

The data from multimedia applications are


encapsulated in RTP, which in turn passes them to the
transport layer. The socket interface is located
between RTP and UDP.
RTP Header
The format of RTP Packet header is very simple and general enough to cover all
real-time applications.
- This 2-bit field defines the version number. The current version is 2.
- If it is set, this packet contains one or more additional padding bytes at the end which are
not part of the payload. The last byte of the padding contains a count of how many padding bytes
should be ignored.
- If set to 1, indicates an extension header between the basic header and the data.
- It indicates the number of contributing sources (CSRCs). In an audio or
video conferencing, each active source is called a contributor.
- It is used by the application to indicate, for example, the end of its data. t is
intended to allow significant events such as frame boundaries to be marked in the packet stream.
- Identifies the format of the RTP payload and determines its
interpretation by the application.
It is used to number the RTP packets. The sequence number of
the first packet is chosen randomly; it is incremented by 1 for each subsequent packet. The
sequence number is used by the receiver to detect lost or out of order packets.
- This indicates the time relationship between packets. The timestamp
for the first packet is a random number. For each succeeding packet, the value is the sum of
the preceding timestamp plus the time the first byte is produced (sampled).
- If there is only one source, this 32-bit
field defines the source. However, if there are several sources, the mixer is the
synchronization source and the other sources are contributors.
- Each of these 32-bit identifiers (a maximum
of 15) defines a source. When there is more than one source in a session, the mixer is the
synchronization source and the remaining sources are the contributors.
Although RTP is itself a transport-layer protocol, the RTP packet is not encapsulated directly
in an IP datagram. It is encapsulated in a UDP user datagram. No well known port is
assigned to RTP. RTP uses an even-numbered UDP port.
Real-time Transport Control Protocol (RTCP)
RTP allows only one type of message, one that carries data from the source to the
destination.
RTCP is the control protocol designed to work in conjunction with RTP. The RTP
data transport is augmented by a control protocol (RTCP), which provides the RTP
session participants feedback on the quality of the data distribution.
The main functions of the RTCP are:
RTCP informs the sender or senders of multimedia streams about the network performance, which can
be directly related to the congestion in the network.
Information carried in the RTCP packets can be used to synchronize different streams associated with
the same source. For this, RTCP provides one single identity, called a canonical name (CNAME) for
each source and uses the timestamp field in the RTP packet.
An RTCP packet can carry extra information about the sender that can be useful for the receiver, such
as the name of the sender (beyond canonical name) or captions for a video.
RTCP Packets
Each RTCP packet starts with an header similar to that of the RTP data packets.
The payload type field identifies the type of the packet. There are five RTCP
payload types (200-204) defined:

More than one RTCP packet can be packed as a single payload for UDP because
the RTCP packets are smaller than RTP packets.
The sender report packet is sent periodically by the active senders in a
session to report transmission and reception statistics for all RTP packets
sent during the interval. It includes the following:
The SSRC of the RTP stream.
The absolute timestamp which is the number of seconds elapsed since midnight on
January 1, 1970. It allows the receiver to synchronize RTP messages.
The number of RTP packets and bytes sent from the beginning of the session.
The receiver report is issued by passive participants, those that do not send RTP
packets. The report informs the sender and other receivers about the quality of
service. The feedback information can be used for congestion control at the sender
site. A receiver report includes the following information:
The SSRC of the RTP stream for which the receiver report has been generated.
The fraction of packet loss.
The last sequence number.
The interval jitter
The source periodically sends a source description packet to give additional
information about itself. The packet can include:
The SSRC.
The canonical name (CNAME) of the sender.
Other information such as the real name, the e-mail address, the telephone number.
The source description packet may also include extra data, such as captions used for
video.
A source sends a bye packet to shut down a stream. It allows the source to announce that it
is leaving the conference, Although other sources can detect the absence of a source, this
packet is a direct announcement. It is also very useful to a mixer.

The application-specific packet is a packet for an application that wants to use new
applications (not defined in the standard). It allows the definition of a new packet type.
RTCP, like RTP, does not use a well-known UDP port. It uses a temporary port. The
UDP port chosen must be the number immediately following the UDP port selected
for RTP, which makes it an odd-numbered port.
Requirement Fulfillment
The combination of RTP and RTCP can respond to the requirements of an interactive real-
time multimedia application as follows:
A digital audio or video stream, a sequence of bits, is divided into chunks. Each chunk has a
predefined boundary that distinguishes the chunk from the previous chunk or the next one.
A chunk is encapsulated in an RTP packet, which defines a specific encoding (payload type),
a sequence number, a timestamp, a synchronization source (SSRC) identifier, and one or
more contributing source (CSRC) identifiers.

The first requirement, sender-receiver negotiation, cannot be satisfied by the combination of the
RTP/RTCP protocols.
The second requirement, creation of a stream of chunks, is provided by encapsulating each chunk
in an RTP packet and giving a sequence number to each chunk. The M field in an RTP packet also
defines whether there is a specific type of boundary between chunks.
The third requirement, synchronization of sources, is satisfied by identifying each
source by a 32-bit identifier and using the relative timestamping in the RTP packet
and the absolute times tamping in the RTCP packet.
The fourth requirement, error control, is provided by using the sequence number in
the RTP packet and letting the application regenerate the lost packet using FEC
methods.
The fifth requirement, congestion control, is met by the feedback from the receiver
using the receiver report packets (RTCP) that notify the sender about the number of
lost packets.
The sixth requirement, jitter removal, is achieved by the timestamping and
sequencing provided in each RTP packet to be used in buffered playback of the data.
The seventh requirement, identification of source, is provided by using the CNAME
included in the source description packets (RTCP) sent by the sender.
Session Initiation Protocol (SIP)
The Session Initiation Protocol (SIP) is a communications protocol for signaling, for the
purpose of controlling multimedia communication sessions. Internet telephony, business
IP telephone systems, service providers and carriers use SIP.
It is an application-layer protocol, similar to HTTP, that establishes, manages, and
terminates a multimedia session (call). It can be used to create two-party, multiparty, or
multicast sessions. SIP is designed to be independent of the underlying transport layer; it
can run on UDP, TCP, or SCTP, using the port 5060.
SIP can provide the following services:
It establishes a call between users if they are connected to the Internet.
It finds the location of the users (their IP addresses) on the Internet, because the
users may be changing their IP addresses (think about mobile IP and DHCP).
It finds out if the users are able or willing to take part in the conference call.
It determines the users' capabilities in terms of media to be used and the type of
encoding
It establishes session setup by defining parameters such as port numbers to be
used
It provides session management functions such as call holding, call forwarding,
accepting new participants, and changing the session parameters.
The SIP protocol needs to find the location of the callee and at the same time negotiate the
capability of the devices the participants are using.
In SIP, an e-mail address, an IP address, a telephone number, and other types of addresses
can be used to identify the sender and receiver. However, the address needs to be in SIP
format (also called scheme).

The SIP addresses are URLs that can be included in the web page of the potential callee.
Other addresses are also possible, such as those that use first name followed by last
name, but all addresses need to be in the form sip:user@address.

A SIP address is a unique identifier for each user on the network, just like a phone
number identifies each user on the global phone network, or an email address. It is
also known as a SIP URI (Uniform Resource Identifier).
SIP Messages
SIP is a text-based protocol like HTTP. SIP, like HTTP, uses messages.
Messages in SIP are divided into two broad categories: Requests and
responses.

The opening line of a request contains a method that defines the request,
and a Request-URI that defines where the request is to be sent.
Similarly, the opening line of a response contains a response code.
Request Messages
IETF originally defined six request messages, but some new request messages have been
proposed to extend the functionality of the SIP.
INVITE - The INVITE request message is used by a caller to initialize a session. Using this
request message, a caller invites one or more callees to participate in the conference.
ACK - The ACK message is sent by the caller to confirm that the session initialization has been
completed.
OPTIONS - The OPTIONS message queries a machine about its capabilities.
CANCEL - The CANCEL message cancels an already started initialization process, but does not
terminate the call. A new initialization may start after the CANCEL message.
REGISTER - The REGISTER message makes a connection when the callee is not available.
BYE - The BYE message is used to terminate the session. The BYE message, which can be
initiated from the caller or callee, terminates the whole session.
Response Messages
IETF has also defined six types of response messages that can be sent to request
messages. A response message can be sent to any request message.
Informational Responses - These responses are in the form SIP 1xx (the common ones are 100
trying, 180 ringing, 181 call forwarded, 182 queued, and 183 session progress).
Successful Responses - These responses are in the form SIP 2xx (the common one is 200 OK).
Redirection Responses - These responses are in the form SIP 3xx (the common ones are 301
moved permanently, 302 moved temporarily, 380 alternative service).
Client Failure Responses - These responses are in the form SIP 4xx (the common ones are 400
bad request, 401 unauthorized, 403 forbidden, 404 not found, 405 method not allowed, 406 not
acceptable, 415 unsupported media type, 420 bad extension, 486 busy here).
Server Failure Responses - These responses are in the form SIP 5xx (the common ones are 500
server internal error, 501 not implemented, 503 service unavailable, 504 timeout, 505 SIP version
not supported).
Global Failure Responses - These responses are in the form SIP 6xx (the common ones are 600
busy everywhere, 603 decline, 604 doesn't exist, and 606 not acceptable).
SIP communication is divided into three modules:
establishing, communicating, and terminating.
Establishing a session in SIP requires a three-way
handshake. Alice sends an INVITE request message,
using UDP, TCP, or SCTP to begin the
communication. If Bob is willing to start the session,
he sends a response (200 OK) message. To confirm
that a reply code has been received, Alice sends an
ACK request message to start the audio
communication.
After the session has been established, Alice and
Bob can communicate using two temporary ports
defined in the establishing sessions. The even-
numbered ports are used for RTP; RTCP can use the
odd-numbered ports that follow.
The session can be terminated with a BYE message
sent by either party.
A proxy server is a network server with UAC (user agent client) and UAS (user agent server)
components that functions as an intermediary entity for the purpose of performing requests on
behalf of other network elements.
A registrar is a SIP endpoint provides a location service. It accepts REGISTER requests, recording
the address and other parameters from the user agent. The location service links one or more IP
addresses to the SIP URI of the registering agent.
At any moment a user is registered with at least one registrar server; this server knows the IP
address of the callee.
When Alice needs to communicate
with Bob, she can use the e-mail
address instead of the IP address in
the INVITE message.
The message goes to a proxy server.
The proxy server sends a lookup
message (not part of SIP) to some
registrar server that has registered
Bob.
When the proxy server receives a reply
message from the registrar server, the
proxy server takes Alice's INVITE
message and inserts the newly
discovered IP address of Bob. This
message is then sent to Bob.
SIP Message Format and SDP Protocol
SIP request and response messages are divided into four sections: start or status line, header, a
blank line, and the body.
Start Line: It is a single line that starts with the message request name, followed by the address of
the recipient and the SIP version. Status Line: It is a single line that starts with the three-digit
response code.
Header: A header, in the request or response message, can use several lines. Each line starts with
the line name followed by a colon and space and followed by the value. Some typical header lines
are: Via, From, To, Call-ID, Content-Type, Content-Length, and Expired.
The Via header defines the SIP device through which the message passes, including the sender
The From header defines the sender and the To header defines the recipient.
The Call-ID header is a random number that defines the session.
The Content-Type defines the type of body of the message (SPD).
The Content-Length defines the length of the body of the message in bytes.
The Expired header is normally used in a REGISTER message to define the expiration of the information in the body
Example of a header in an INVITE message.
SIP uses another protocol, called Session Description Protocol (SDP), to define the body.
Each line in the body is made of an SDP code followed by an equal sign, and followed by the
value.
The first part of the body is normally general information. The codes used in this section
are: v (for version of SDP), and o (for origin of the message).
The second part of the body normally gives information to the recipient for making a
decision to take part in the session. The codes used in this section are: s (subject), i
(information about subject), u (for session URL), and e (the e-mail address of the person
responsible for the session).
The third part of the body gives the technical details to make the session possible. The
codes used in this part are: c (the unicast or multicast IP address that the user needs to join
to be able to take part in the session), t (the start time and end time of the session, encoded
as integers), m (the information about media such as audio, video, the port number, the
protocol used).
H.323
H.323 is a standard designed by ITU to allow telephones on the public telephone network
to talk to computers (called terminals in H.323) connected to the Internet.

A gateway connects the Internet to the telephone network.


The gateway transforms a telephone network message into an Internet message.
The gatekeeper server on the local area network plays the role of the registrar server, as
in SIP.
H.323 uses a number of protocols to establish and maintain voice (or video)
communication.

H.323 uses G.71 or G.723.l for compression. It uses a protocol named H.245, which allows
the parties to negotiate the compression method. Protocol Q.931 is used for establishing
and terminating connections.
Another protocol, called H.225, or Registration/Administration/Status (RAS), is used for
registration with the gatekeeper.
H.323, unlike SIP, is a complete set of protocols that mandates the use of RTP and RTCP.
Operation of H.323 The terminal sends a broadcast message to the
gatekeeper. The gatekeeper responds with its IP
address.
The terminal and gatekeeper communicate, using
H.225 to negotiate bandwidth.
The terminal, the gatekeeper, the gateway, and the
telephone communicate using Q.931 to set up a
connection.
The terminal, the gatekeeper, the gateway, and the
telephone communicate using H.245 to negotiate
the compression method.
The terminal, the gateway, and the telephone
exchange audio using RTP under the management
of RTCP.
The terminal, the gatekeeper, the gateway, and the
telephone communicate using Q.931 to terminate
the communication.

You might also like