Professional Documents
Culture Documents
The Internet stores a large amount of text that can be downloaded and used.
One often refers to plaintext, as a linear form, and hypertext, as a nonlinear
form, of textual data.
Text stored in the Internet uses a character set, such as Unicode, to represent
symbols in the underlying language.
Unicode is a 16-bit code which can represent 65,536 characters (216), which is
enough for any language character set.
Lossless compression is only used for text because we cannot afford to loose any
pieces of information when we do decompression.
In multimedia parlance, an image (or a still image as it is often called) is the
representation of a photograph, a fax page, or a frame in a moving picture.
Digitization of an image means to represent an image as a two-dimensional array
of dots, called pixels.
Each pixel then can be represented as a number of bits, referred to as the bit
depth.
In a black-and-white image, such as a fax page, the bit depth = 1; each pixel can be represented
as a 0-bit (black) or a 1-bit (white).
In a gray picture, one normally uses a bit depth of 8 with 256 levels.
In a color image, the image is normally divided into three channels, with each channel
representing one of the three primary colors of red, green, or blue (RGB). In this case, the bit
depth is 24 (8 bits for each color).
Moving from black-and-white, to gray, to color representation of images
tremendously increases the size of the information to transmit in the Internet
which implies the need for compression of images to save time.
Example image formats
Image: An image is a two dimensional signal. It is defined by the
mathematical function f(x,y) where x and y are the two coordinates horizontally and
vertically.
Pixel: The value of f(x,y) at any point is gives the pixel value at that point of an image.
Image Resolution: It refers to number of pixels in an image
The following shows the time required to transmit an image of 1280 720 pixels
using the transmission rate of 100 kbps.
The simplest bitmap images only
uses black and white: each pixel is
colored black or white and called
monochrome graphics.
A greyscale graphic is a bitmap
using shades of gray
e.g. 256 shades of gray
Each pixel can be white, black, or
one of 254 shades of gray.
256 is 28, so each pixel requires 8
bits of information
Color images can use 16, 256,
65,536, or 16.7 million colors.
Greyscale files are bigger than
monochrome files. Graphics files
with full color can get big!
Compression is a way to reduce the number of bits in a frame but retaining its meaning.
Image compression can benefit users by having pictures load faster and web pages use up
less space on a Web host.
The Joint Photographic Experts Group (JPEG) standard provides lossy compression that is
used in most implementations and can be used for both color and gray images.
In JPEG, a grayscale picture is divided into blocks of 8 x 8 pixels. The compression and
decompression each go through three steps:
JPEG normally uses DCT (Discrete Cosine Transform) in the first step in compression and inverse
DCT in the last step in decompression. Transformation and inverse transformation are applied on
8 x 8 blocks.
After quantization, the values are reordered in a zigzag sequence before being input into the
encoder.
The zigzag reordering of the quantized values is done to let the values related to the lower
frequency feed into the encoder before the values related to the higher frequency.
Since most of the higher-frequency values are zeros, this means nonzero values are given to the
encoder before the zero values. The encoding is a lossless compression.
uniform gray scale gradient gray scale
Steps of JPEG image compression based on DCT
An intra coded frame (I-frame) is an independent frame that is not related to any
other frame. An I-frame must appear periodically to handle some sudden change in the frame
that the previous and following frames cannot show.
A predicted frame (P-frame) is related to the preceding I-frame or P-frame. P-
frames can be constructed only from previous I- or P-frames. P-frames carry much less
information than other frame types and carry even fewer bits after compression.
A bidirectional frame (B-frame) is related to the preceding and following I-frame
or P-frame. In other words, each B-frame is relative to the past and the future. A B-frame is
never related to another B-frame.
Audio (sound) signals are analog signals that need a medium to travel; they cannot
travel through a vacuum.
The speed of the sound in the air is about 330 m/s (740 mph). The audible
frequency range for normal human hearing is from about 20Hz to 20kHz with
maximum audibility around 3300 Hz.
To be able to provide compression, audio analog signals are digitized using an
analog-to-digital converter. The analog-to-digital conversion consists of two
processes: sampling and quantizing.
Pulse code modulation (PCM) involved sampling an analog signal, quantizing the
sample, and coding the quantized values as streams of bits.
Voice signal is sampled at the rate of 8,000 samples per second with 8 bits per sample; the result
is a digital signal of 8,000 x 8 = 64 kbps.
Music is sampled at 44,100 samples per second with 16 bits per sample; the result is a digital
signal of 44,100 x 16 = 705.6 kbps
Both lossy and lossless compression algorithms are used in audio compression.
Compression techniques used for speech and music have different requirements.
Compression techniques used for speech must have low latency because significant delays degrade the
communication quality in telephony.
Compression algorithms used for music must be able to produce high quality sound with lower numbers of
bits.
Two categories of techniques are used in audio compressions: predictive coding and
perceptual coding.
Predictive coding techniques have low latency and therefore are popular in speech coding
for telephony where significant delays degrade the communication quality. Common
examples of these techniques are Delta Modulation (DM),Adaptive Delta Modulation
(ADM), Differential PCM (DPCM), Adaptive Differential PCM (ADPCM), and Linear
Predictive Coding (LPC).
The most common compression technique used to create CD-quality audio is
perceptual coding, which is based on the science of psychoacoustics.
Algorithms used in perceptual coding first transform the data from time domain
to frequency domain; the operations are then performed on the data in the
frequency domain. This technique, hence, is also called the frequency domain
method.
Psychoacoustics is the study of subjective human perception of sound.
Perceptual coding takes advantage of flaws in the human auditory system. The
lower limit of human audibility is 0 dB.
This is only true for sounds with frequencies of about 2.5 and 5 kHz and hence
not necessary to code.
One standard that uses perceptual coding is MP3 (MPEG audio layer 3).
Multimedia In the Internet
Audio and Video services are classified into three broad categories:
streaming stored audio/video,
streaming live audio/video,
interactive audio/video.
Streaming means a user can listen (or watch) the file after the downloading has started.
In streaming stored audio/video category, the files are compressed and stored on a
server.
A client downloads the files through the Internet (referred to as on-demand
audio/video).
Examples of stored audio files are songs, symphonies, books on tape, and famous
lectures. Examples of stored video files are movies, TV shows, and music video clips.
The client (browser) can use the services of HTTP and send a GET message to
download the file. The Web server can send the compressed file to the browser. The
browser can then use a help application, normally called a media player, to play the
file.
This approach is very simple
and does not involve
streaming.
In this approach, the file
needs to download
completely before it can be
played.
The media player is directly connected to the Web server for downloading the
audio/video file. The Web server stores two files: the actual audio/video file and a
metafile that holds information about the audio/video file.
1. The HTTP client accesses the Web
server using the GET message.
2. The information about the
metafile comes in the response.
3. The metafile is passed to the
media player.
4. The media player uses the URL in
the metafile to access the
audio/video file.
5. The Web server responds.
The problem with the second approach is that the browser and the media player both use the
services of HTTP. HTTP is designed to run over TCP, which is suitable for retrieving the metafile but
not the audio/video file.
As retransmissions are not suited for streaming, UDP needs to be used. Hence a new server called
Media Server is used.
1. The HTTP client accesses the Web server
using a GET message.
2. The information about the metafile comes
in the response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the
metafile to access the media server to
download the file. Downloading can take
place by any protocol that uses UDP.
5. The media server responds.
The Real-Time Streaming Protocol (RTSP) is a control protocol designed to add more
functionalities to the streaming process. Using RTSP, we can control the playing of audio/video.
RTSP is an out-of-band control protocol that is similar to the second connection in FTP.
1. The HTTP client accesses the Web server using a GET
message.
2. The information about the metafile comes in the response.
3. The metafile is passed to the media player.
4. The media player sends a SETUP message to create a
connection with the media server.
5. The media server responds.
6. The media player sends a PLAY message to start playing
(downloading).
7. The audio/video file is downloaded using another protocol
that runs over UDP.
8. The connection is broken using the TEARDOWN message.
9. The media server responds.
In streaming live audio/video category, a user listens to broadcast audio and
video through the Internet. Good examples of this type of application are
Internet radio and Internet TV.
In the first application, the communication is unicast and on-demand. In the
second, the communication is multicast and live. Live streaming is better
suited to the multicast services of IP and the use of protocols such as UDP and
RTP.
Examples are
Internet Radio
Internet Television (ITV)
IPTV
Real- Time Interactive Audio/Video
In interactive audio/video category, people use the Internet to interactively
communicate with one another.
The Internet phone or voice over IP is an example of this type of application. Video
conferencing is another example that allows people to communicate visually and
orally.
Multimedia playa primary role in audio and video conferencing. The traffic can be heavy, and the
data are distributed using multicasting methods. Conferencing requires two-way communication
between receivers and senders.
A translator is a computer that can change the format of a high-bandwidth video signal to a lower-
quality narrow-bandwidth signal.
If there is more than one source that can send data at the same time (as in a video or audio
conference), the traffic is made of multiple streams. To converge the traffic to one stream, data from
different sources can be mixed. A mixer mathematically adds signals coming from different sources
to create one single signal.
Real-Time Interactive Protocols
Each microphone or camera at the source site is called a contributor and is given a 32-bit
identifier called the contributing source (CSRC) identifier.
The mixer is also called the synchronizer and is given another identifier called the synchronizing
source (SSRC) identifier.
Sender-Receiver Negotiation
Creation of Packet Stream
Source Synchronization
Error Control
Congestion Control
Jitter Removal
Identifying Sender.
More than one RTCP packet can be packed as a single payload for UDP because
the RTCP packets are smaller than RTP packets.
The sender report packet is sent periodically by the active senders in a
session to report transmission and reception statistics for all RTP packets
sent during the interval. It includes the following:
The SSRC of the RTP stream.
The absolute timestamp which is the number of seconds elapsed since midnight on
January 1, 1970. It allows the receiver to synchronize RTP messages.
The number of RTP packets and bytes sent from the beginning of the session.
The receiver report is issued by passive participants, those that do not send RTP
packets. The report informs the sender and other receivers about the quality of
service. The feedback information can be used for congestion control at the sender
site. A receiver report includes the following information:
The SSRC of the RTP stream for which the receiver report has been generated.
The fraction of packet loss.
The last sequence number.
The interval jitter
The source periodically sends a source description packet to give additional
information about itself. The packet can include:
The SSRC.
The canonical name (CNAME) of the sender.
Other information such as the real name, the e-mail address, the telephone number.
The source description packet may also include extra data, such as captions used for
video.
A source sends a bye packet to shut down a stream. It allows the source to announce that it
is leaving the conference, Although other sources can detect the absence of a source, this
packet is a direct announcement. It is also very useful to a mixer.
The application-specific packet is a packet for an application that wants to use new
applications (not defined in the standard). It allows the definition of a new packet type.
RTCP, like RTP, does not use a well-known UDP port. It uses a temporary port. The
UDP port chosen must be the number immediately following the UDP port selected
for RTP, which makes it an odd-numbered port.
Requirement Fulfillment
The combination of RTP and RTCP can respond to the requirements of an interactive real-
time multimedia application as follows:
A digital audio or video stream, a sequence of bits, is divided into chunks. Each chunk has a
predefined boundary that distinguishes the chunk from the previous chunk or the next one.
A chunk is encapsulated in an RTP packet, which defines a specific encoding (payload type),
a sequence number, a timestamp, a synchronization source (SSRC) identifier, and one or
more contributing source (CSRC) identifiers.
The first requirement, sender-receiver negotiation, cannot be satisfied by the combination of the
RTP/RTCP protocols.
The second requirement, creation of a stream of chunks, is provided by encapsulating each chunk
in an RTP packet and giving a sequence number to each chunk. The M field in an RTP packet also
defines whether there is a specific type of boundary between chunks.
The third requirement, synchronization of sources, is satisfied by identifying each
source by a 32-bit identifier and using the relative timestamping in the RTP packet
and the absolute times tamping in the RTCP packet.
The fourth requirement, error control, is provided by using the sequence number in
the RTP packet and letting the application regenerate the lost packet using FEC
methods.
The fifth requirement, congestion control, is met by the feedback from the receiver
using the receiver report packets (RTCP) that notify the sender about the number of
lost packets.
The sixth requirement, jitter removal, is achieved by the timestamping and
sequencing provided in each RTP packet to be used in buffered playback of the data.
The seventh requirement, identification of source, is provided by using the CNAME
included in the source description packets (RTCP) sent by the sender.
Session Initiation Protocol (SIP)
The Session Initiation Protocol (SIP) is a communications protocol for signaling, for the
purpose of controlling multimedia communication sessions. Internet telephony, business
IP telephone systems, service providers and carriers use SIP.
It is an application-layer protocol, similar to HTTP, that establishes, manages, and
terminates a multimedia session (call). It can be used to create two-party, multiparty, or
multicast sessions. SIP is designed to be independent of the underlying transport layer; it
can run on UDP, TCP, or SCTP, using the port 5060.
SIP can provide the following services:
It establishes a call between users if they are connected to the Internet.
It finds the location of the users (their IP addresses) on the Internet, because the
users may be changing their IP addresses (think about mobile IP and DHCP).
It finds out if the users are able or willing to take part in the conference call.
It determines the users' capabilities in terms of media to be used and the type of
encoding
It establishes session setup by defining parameters such as port numbers to be
used
It provides session management functions such as call holding, call forwarding,
accepting new participants, and changing the session parameters.
The SIP protocol needs to find the location of the callee and at the same time negotiate the
capability of the devices the participants are using.
In SIP, an e-mail address, an IP address, a telephone number, and other types of addresses
can be used to identify the sender and receiver. However, the address needs to be in SIP
format (also called scheme).
The SIP addresses are URLs that can be included in the web page of the potential callee.
Other addresses are also possible, such as those that use first name followed by last
name, but all addresses need to be in the form sip:user@address.
A SIP address is a unique identifier for each user on the network, just like a phone
number identifies each user on the global phone network, or an email address. It is
also known as a SIP URI (Uniform Resource Identifier).
SIP Messages
SIP is a text-based protocol like HTTP. SIP, like HTTP, uses messages.
Messages in SIP are divided into two broad categories: Requests and
responses.
The opening line of a request contains a method that defines the request,
and a Request-URI that defines where the request is to be sent.
Similarly, the opening line of a response contains a response code.
Request Messages
IETF originally defined six request messages, but some new request messages have been
proposed to extend the functionality of the SIP.
INVITE - The INVITE request message is used by a caller to initialize a session. Using this
request message, a caller invites one or more callees to participate in the conference.
ACK - The ACK message is sent by the caller to confirm that the session initialization has been
completed.
OPTIONS - The OPTIONS message queries a machine about its capabilities.
CANCEL - The CANCEL message cancels an already started initialization process, but does not
terminate the call. A new initialization may start after the CANCEL message.
REGISTER - The REGISTER message makes a connection when the callee is not available.
BYE - The BYE message is used to terminate the session. The BYE message, which can be
initiated from the caller or callee, terminates the whole session.
Response Messages
IETF has also defined six types of response messages that can be sent to request
messages. A response message can be sent to any request message.
Informational Responses - These responses are in the form SIP 1xx (the common ones are 100
trying, 180 ringing, 181 call forwarded, 182 queued, and 183 session progress).
Successful Responses - These responses are in the form SIP 2xx (the common one is 200 OK).
Redirection Responses - These responses are in the form SIP 3xx (the common ones are 301
moved permanently, 302 moved temporarily, 380 alternative service).
Client Failure Responses - These responses are in the form SIP 4xx (the common ones are 400
bad request, 401 unauthorized, 403 forbidden, 404 not found, 405 method not allowed, 406 not
acceptable, 415 unsupported media type, 420 bad extension, 486 busy here).
Server Failure Responses - These responses are in the form SIP 5xx (the common ones are 500
server internal error, 501 not implemented, 503 service unavailable, 504 timeout, 505 SIP version
not supported).
Global Failure Responses - These responses are in the form SIP 6xx (the common ones are 600
busy everywhere, 603 decline, 604 doesn't exist, and 606 not acceptable).
SIP communication is divided into three modules:
establishing, communicating, and terminating.
Establishing a session in SIP requires a three-way
handshake. Alice sends an INVITE request message,
using UDP, TCP, or SCTP to begin the
communication. If Bob is willing to start the session,
he sends a response (200 OK) message. To confirm
that a reply code has been received, Alice sends an
ACK request message to start the audio
communication.
After the session has been established, Alice and
Bob can communicate using two temporary ports
defined in the establishing sessions. The even-
numbered ports are used for RTP; RTCP can use the
odd-numbered ports that follow.
The session can be terminated with a BYE message
sent by either party.
A proxy server is a network server with UAC (user agent client) and UAS (user agent server)
components that functions as an intermediary entity for the purpose of performing requests on
behalf of other network elements.
A registrar is a SIP endpoint provides a location service. It accepts REGISTER requests, recording
the address and other parameters from the user agent. The location service links one or more IP
addresses to the SIP URI of the registering agent.
At any moment a user is registered with at least one registrar server; this server knows the IP
address of the callee.
When Alice needs to communicate
with Bob, she can use the e-mail
address instead of the IP address in
the INVITE message.
The message goes to a proxy server.
The proxy server sends a lookup
message (not part of SIP) to some
registrar server that has registered
Bob.
When the proxy server receives a reply
message from the registrar server, the
proxy server takes Alice's INVITE
message and inserts the newly
discovered IP address of Bob. This
message is then sent to Bob.
SIP Message Format and SDP Protocol
SIP request and response messages are divided into four sections: start or status line, header, a
blank line, and the body.
Start Line: It is a single line that starts with the message request name, followed by the address of
the recipient and the SIP version. Status Line: It is a single line that starts with the three-digit
response code.
Header: A header, in the request or response message, can use several lines. Each line starts with
the line name followed by a colon and space and followed by the value. Some typical header lines
are: Via, From, To, Call-ID, Content-Type, Content-Length, and Expired.
The Via header defines the SIP device through which the message passes, including the sender
The From header defines the sender and the To header defines the recipient.
The Call-ID header is a random number that defines the session.
The Content-Type defines the type of body of the message (SPD).
The Content-Length defines the length of the body of the message in bytes.
The Expired header is normally used in a REGISTER message to define the expiration of the information in the body
Example of a header in an INVITE message.
SIP uses another protocol, called Session Description Protocol (SDP), to define the body.
Each line in the body is made of an SDP code followed by an equal sign, and followed by the
value.
The first part of the body is normally general information. The codes used in this section
are: v (for version of SDP), and o (for origin of the message).
The second part of the body normally gives information to the recipient for making a
decision to take part in the session. The codes used in this section are: s (subject), i
(information about subject), u (for session URL), and e (the e-mail address of the person
responsible for the session).
The third part of the body gives the technical details to make the session possible. The
codes used in this part are: c (the unicast or multicast IP address that the user needs to join
to be able to take part in the session), t (the start time and end time of the session, encoded
as integers), m (the information about media such as audio, video, the port number, the
protocol used).
H.323
H.323 is a standard designed by ITU to allow telephones on the public telephone network
to talk to computers (called terminals in H.323) connected to the Internet.
H.323 uses G.71 or G.723.l for compression. It uses a protocol named H.245, which allows
the parties to negotiate the compression method. Protocol Q.931 is used for establishing
and terminating connections.
Another protocol, called H.225, or Registration/Administration/Status (RAS), is used for
registration with the gatekeeper.
H.323, unlike SIP, is a complete set of protocols that mandates the use of RTP and RTCP.
Operation of H.323 The terminal sends a broadcast message to the
gatekeeper. The gatekeeper responds with its IP
address.
The terminal and gatekeeper communicate, using
H.225 to negotiate bandwidth.
The terminal, the gatekeeper, the gateway, and the
telephone communicate using Q.931 to set up a
connection.
The terminal, the gatekeeper, the gateway, and the
telephone communicate using H.245 to negotiate
the compression method.
The terminal, the gateway, and the telephone
exchange audio using RTP under the management
of RTCP.
The terminal, the gatekeeper, the gateway, and the
telephone communicate using Q.931 to terminate
the communication.