Professional Documents
Culture Documents
Person
Transforming the input data set into a set of features is called as feature extraction. Feature extraction
includes heavy use of processes and algorithms like Edge detection, Curvature/corner detection, blob
detection, ridge detection etc.. .
Then the next image frame is captured and target object is identified. This process is repeated multiple
times.
In case moving object is not found, image is captured once again and the entire process is repeated.
Application 2. Vector
Quantization
Image Compression
Image compression is the process of reducing the number of bits required to represent an image. Vector
quantization, the mapping of pixel intensity vectors into binary vectors indexing a limited number of
possible reproductions, is a popular image compression algorithm. Compression has traditionally been
done with little regard for image processing operations that may precede or follow the compression step.
Recent work has used vector quantization both to simplify image processing tasks -- such as
enhancement, classification, halftoning, and edge detection -- and to reduce the computational
complexity by performing them simultaneously with the compression. After briefly reviewing the
fundamental ideas of vector quantization, we present a survey of vector quantization algorithms that
perform image processing. 1 Introduction Data compression is the mapping of a data set into a bit stream
to decrease the number of bits required to represent the data set .
Figure 2: The Encoder and decoder in a vector quantizer. Given an input vector, the closest codeword is found and the index
of the codeword is sent through the channel. The decoder receives the index of the codeword, and outputs the codeword.
Application 3: COLOR
Introduction
Color image processing is motivated by two main factors:
1. Color is a powerful descriptor simplifying object recognition.
2. We can distinquish between thousands of color shades and intensities, compared to about 20-30
values of gray.
There are two major areas of color image processing:
1. Full-color processing images are acquired with a full-color sensor (TV camera, color scanner);
2. Pseudocolor processing colors are assigned to a particular monochrome intensity or intensity range.
Fundamentals of color
Particular colors of the object as human perceive them are determined by the nature of light
reflection properties of the object. A body reflecting light that is balanced in all visible wavelengths
body reflecting light that is balanced in all visible wavelengths appears as white. A body reflecting
more light in a particular range of wavelengths and absorbing light in other bands appears as colored.
If the light is achromatic (no colors), its only attribute is its intensity (amount). Examples of
achromatic light: images produced by a b&w TVTV set monochrome pictures (not necessary b&w.
Due to the absorption characteristics of the human eye, we see colors as variable combinations of socalled primary colors of light: red (R), green (G), blue (B). The following wavelengths are designated
to them in 1931: 700 nm, 546.1 nm, and 435.8 nm.
Adding primary colors of light produces secondary colors: magenta (red + blue), cyan (green + blue),
and yellow (green + red). Mixing the three primary colors of light (or a secondary with its opposite
primary color) in the right intensities produces white light. Mixing together the three secondary
colors of light, black (no light) can be produced.
Colors are usually distinguished from each other through the three characteristics: brightness, hue,
and saturation. As mentioned before brightness embodies achromatic intensity Hue represents the
before, brightness embodies achromatic intensity. Hue represents the dominant color as perceived
by an observer (red, yellow, blue). Saturation is the amount of white added to a hue (purity of the
color).
For example, we need to specify saturation to characterize pink (red + white)
Color models
A color model (color space or color system) is a specification of a coordinate system and a subspace
within that system where each color is represented by a single point.Most contemporary color models
are oriented either toward hardware (color monitors and printers) or toward applications where color
manipulation is used (color graphics for animation).
The most commonly used in Image Processing practice models are RGB (monitors, most of cameras),
CMY and CMYK (printers), and HIS (closely correspond to the human visual system.
Images represented in the RGB model consist of three component images (one for each primary color)
that are combined into a composite color image. The number of bits used to represent a pixel is called
the pixel depth. Assuming that each component image uses 8 bits, each RGB color pixel is said to have a
depth of 24 bits. Such RGB color images are frequently called full-color images
Equal amount of three pigments must produce black. This approach is not very practical and leads to a
muddy-looking black. To produce a true black (predominant color in printing), a fourth color black is
added to the color model to make a CMYK model.
Application 4: HUMAN
EXPRESSION DETECTION
1. Introduction
Human facial expression recognition by a machine can be described as an interpretation of
human facial characteristics via mathematical algorithms. Gestures of the body are read by an
input sensing device such as a web-cam. It reads the movements of the human body and
communicates with computer that uses these gestures as an input. These gestures are then
interpreted using algorithm either based on statistical analysis or artificial intelligence
techniques. The primary goal of gesture recognition research is to create a system which can
identify specific human gestures and use them to convey information. By observing face, one
can decide whether a man is serious, happy, thinking, sad, feeling pain and so on. Recognizing
the expression of a man can help in many of the areas like in the field of medical science
whereas doctor can be alerted when a patient is in severe pain. It helps in taking prompt action
at that time.
F
igure 1: Simple architecture of Gesture Recognition
Each box shown in figure 1 is treated as one module. The first module captures the image using the web
cam. Second module is for face detection which can detect the human face from the captured image. A
set of modules bounded by a boundary line represent pre-processing block. It consists of histogram
equalization, edge detection, thinning, and token generation modules. The next module is the training
module to store the token information that comes from the image pre-processing module. This training
has been done using back propagation neural network. And the last module is the token matching and
decision making called recognition module which produces the final result. The following flow chart
represents how all the modules work.
2. Face detection
Face detection is a process that aims to locate a human face in an image. The process is applied on stored
image or images from a camera. Human face varies from one person to another. This variation in faces
could be due to race, gender, age, and other physical characteristics of an individual. Therefore face
detection becomes a challenging task in computer vision. It becomes more challenging task due to the
additional variations in scale, orientation, pose, facial expressions, and lighting conditions. Many methods
have been proposed to detect faces such as neutral networks, skin locus, and color analysis. Since these
detected faces become an input to the recognition of the gestures, it is important to get rid of non-facial
information in the image.
3. Image pre-processing
In this block, consisting of four different modules, a face image is taken as an input and tokens are
produced as output. The first step in this block is to enhance the image quality. To do this, histogram
equalization has been performed. It is then followed by the edge detection process. Since edge detection
plays an important role in finding out the tokens, four well known algorithms i.e. Prewitt, Sobel, Prewitt
Diagonal, and Sobel Diagonal are implemented to do this.. To find edges, an image is convolved with both
masks, producing two derivative images (dx and dy). The strength of the edge at any given image location
is then calculated by taking the square root of the sum of the squares of these two derivatives.
Basically these kernels respond to edges that run vertically and horizontally according to
the pixel grid.
`
Figure 2: Detection using Prewitt, Sobel, and their diagonal.
4. Recognition
Once the training is over, the network is ready to recognize gesture presented at its input. For
recognizing the gesture of a face two options are provided. If user wants to recognize the gesture of
existing image, then it can be loaded from memory. As the user selects the image, the face recognition
method works and returns the face part of the image.
5. Applications
1. Gesture of a driver when he/she is driving and alerting him/her when in sleepy mood
2. In Human Computer Interaction, where the computer can interact with humans based their
expression
Application 5.
The objective of image compression is to reduce irrelevance and redundancy of the image data
in order to be able to store or transmit data in an efficient form. In numerical analysis and
functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the
wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over
Fourier Transform is temporal resolution: it captures both frequency and location information
(location in time).
Wavelet compression is a form of data compression well suited for image
compression (sometimes
also video
compression and audio
compression).
Notable
implementations are JPEG 2000 and ECW for still images, and REDCODE, the BBC's Dirac, and
Ogg Tarkin for video. The goal is to store image data in as little space as possible in a file.
Wavelet compression can be either lossless or lossy.
Using a wavelet transform, the wavelet compression methods are adequate for
representing transients, such as percussion sounds in audio, or high-frequency components in
two-dimensional images, for example an image of stars on a night sky. This means that the
transient elements of a data signal can be represented by a smaller amount of information than
would be the case if some other transform, such as the more widespread discrete cosine
transform, had been used.
Wavelet compression is not good for all kinds of data: transient signal characteristics mean good
wavelet compression, while smooth, periodic signals are better compressed by other methods,
particularly traditional harmonic compression (frequency domain, as by Fourier transforms and
related). Data statistically indistinguishable from random noise is not compressible by any
means.
Method
First a wavelet transform is applied. This produces as many coefficients as there are pixels in the
image (i.e.: there is no compression yet since it is only a transform). These coefficients can then
be compressed more easily because the information is statistically concentrated in just a few
coefficients.
This
principle
is
called transform
coding.
After
that,
the coefficients are quantized and the quantized values are entropy encoded and/or run length
encoded. A few 1D and 2D applications of wavelet compression use a technique called "wavelet
footprints".
H=
1
(1,1)
2
10
And the high-pass filter performs a differencing operation and can be expressed as
G=
1
(1, 1)
2
On any adjacent pixel pair, the complete wavelet transform can be represented in matrix format by:
Second half: Applying 1D
Transformation to Columns of Image
T = WN AWNT
First half: Applying 1D
Transformation to Rows of Image
Where A is the matrix representing the 2D image pixels, T is the Haar wavelet transformation of the
image, and
1
2
0
H
WN = =
G 1
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
0
1
2
1
2
1
2
In the case of transforming a 4x4 pixel image. The equivalent matrix can be expanded for larger images.
This transformation can be represented by an FIR filter approach for use with the DSPs MACP operation.
The result of the complete transformation, T, is composed of 4 new sub-images, which correspond to the
blurred image, and the vertical, diagonal, and horizontal differences between the original image and the
blurred image. The blurred representation of the image removes the details (high frequency
components), which are represented separately in the other three images, in a manner that produces a
sparser representation overall, making it is easier to store and transmit. Below is an example of a wavelet
transformed image portraying the four sub-images as explained above.
11
The inverse transformation can be applied to T, resulting in lossless compression. Lossy compression can
be implemented by manually setting to zero the elements below a certain threshold in T. The equation of
the inverse transformation is:
12
The technology used for identification of a user based on a physical characteristic, such as a fingerprint,
iris, face, voice or handwriting is called Biometrics.
Advancements in technology has made possible to build rugged and reliable Biometric authentication
systems, and the costs of biometrics authentication systems have been dropping as reliability is
improving.
The key steps involved in a biometric authentication system are:
Fingerprint recognition
Fingerprint recognition is also known as image acquisition. In this part of the rocess, a user places his or
her finger on a scanner. Numerous images of the fingerprint are then captured. It should be noted that
13
during this stage, the goal is to capture images of the center of the fingerprint, which contains many of
the unique features. All of the captured images are then converted into black and white images.
Face Recognition:
Face Recognition technology have lead to perfection since many years. Now Face recognition technology
is the best of all technology available where the large no of database is required to identify with easy,
user friendly process. The Face technology has been made to safely recognize persons, independent of
variances that appear to human faces. Face Recognition technology handles pose, mimic, aging variance
as well as variances coming from a new hair style, glasses or temporary lighting changes.
Vein Recognition:
Each and every person is having unique pattern of palm veins. Even if they are twin they have different
pattern of palm veins. In order to identify a person complicated vascular pattern is very much helpful and
thats why it is having quite differentiating features for persons identification. One of the greater
advantages of palm vein is that they do not adopt any change during the life of a human because they lie
under the skin. It is very much secure method of identification and authentication.
IRIS Recognition:
IRIS recognition is one of the biometric identification and authentication that employs pattern
recognition technology with the help of high resolution images of IRIS of a eye of a particular person. IRIS
recognition is absolutely different than Retina Scan Technology. IRIS recognition employs camera with
infrared illumination which reduces reflection from convex cornea and imparts detail rich images with
complex structure of IRIS. Then these images converted to digital templates and provide mathematical
representation of the IRIS. This technology provides unambiguous and perfect identification of an
individual. Person with a glasses or contact lenses can also be identified by IRIS recognition. Because of its
speed of comparison, IRIS recognition is the unique biometric recognition technology perfectly suited for
one to many identification.
Ethical challenges
Deployment of biometric systems for usage in Health care industry, law enforcement, financial sector,
security. Database creation and management, theoretical limitations, future and issues of biometrics.
Applications of Biometric
1.
2.
3.
4.
5.
6.
7.
8.
14
Application 7.
Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based
visual information retrieval (CBVIR) is the application of computer vision techniques to the image
retrieval problem, that is, the problem of searching for digital images in large databases. (see this
survey[1] for a recent scientific overview of the CBIR field). Content based image retrieval is opposed to
concept based approaches (see concept based image indexing).
"Content-based" means that the search will analyze the actual contents of the image rather than the
metadata such as keywords, tags, and/or descriptions associated with the image. The term 'content' in
this context might refer to colors, shapes, textures, or any other information that can be derived from the
image itself. CBIR is desirable because most web based image search engines rely purely on metadata
and this produces a lot of garbage in the results. Also having humans manually enter keywords for images
in a large database can be inefficient, expensive and may not capture every keyword that describes the
image. Thus a system that can filter images based on their content would provide better indexing and
return more accurate results.
CBIR is about developing an image search engine, not only by using the text annotated to the image by an
end user (as traditional image search engines), but also using the visual contents available into the
images itselves.
Initially, CBIR system should has a database, containing several images to be searched. Then, it should
derive the feature vectors of these images, and stores them into a data structure like on of the Tree
Data Structures (these structures will improve searching efficiancy).
A CBIR system gets a query from user, whether an image or the specification of the desired image. Then,
it searchs the whole database in order to find the most similar images to the input or desired image.
The main issues in improving CBIR systems are:
Which features should be derived to describe the images better within database
Which data structure should be used to store the feature vectors
Which learning algorithms should be used in order to make the CBIR wiser
How to participate the users feedback in order to improve the searching result
15
16
3. Dilation
Dilation is one of the two basic operators in the area of mathematical morphology, the other
being erosion. It is typically applied to binary images, but there are versions that work on
grayscale images. The basic effect of the operator on a binary image is to gradually enlarge the
boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground
pixels grow in size while holes within those regions become smaller.
4. Sub Images
The pixel subtraction operator takes two images as input and produces as output a third image
whose pixel values are simply those of the first image minus the corresponding pixel values from
the second image. It is also often possible to just use a single image as input and subtract a
constant value from all the pixels. Some versions of the operator will just output the absolute
difference between pixel values, rather than the straightforward signed output.
17
Application 9.
The process of smoothing or blurring and image supresses noise and small fluctuations. In the frequency
domain, this process refers to the supression of high frequencies.
A smoothing filter can be built in Matlab by using function fspecial (special filters):
gaussianFilter = fspecial('gaussian', [7, 7], 5)
builds a gaussian filter matrix of 7 rows and 7 columns, with standard deviation of 5.
The process of sharpening is related to edge detection - changes in color are attenuated to create an
effect of sharper edges.
Using afspecial, we create a filter for sharpening an image. The special filter is ironically named 'unsharp':
sharpFilter = fspecial('unsharp');
subplot(2,2,1), image(pomegranate), title('Original Pomegranate Seeds');
sharp = imfilter(pomegranate, sharpFilter, 'replicate');
subplot(2,2,2), image(sharp), title('Sharpened Pomegranate');
sharpMore = imfilter(sharp, sharpFilter, 'replicate');
subplot(2,2,3), image(sharpMore), title('Excessive sharpening attenuates noise');
18
Watermark
Embedding
Watermark
Watermarked
Embedding
image
Figure: Watermark Embedding Scheme.
key
Types of Digital Watermarking:
Visible watermarks
Invisible watermark:
Public watermark:
Fragile watermark:
Private watermark:
Perceptual watermarks:
APPLICATIONS OF WATERMARKING
Copyright protection is probably the most common use of watermarks today. Copyright owner
information is embedded in the image in order to prevent others from alleging ownership of th
The fingerprint embeds information about the legal receiver in the image. This involves
embedding a different watermark into each distributed image and allows the owner to locate
and monitor pirated images that are illegally obtained.
Prevention of unauthorized copying is accomplished by embedding information about how often
an image can be legally copied. An ironic example in which the use of a watermark might have
prevented the wholesale pilfering of an image is in the ubiquitous Lena image, which has been
used without the original owners permission.
In an image authentication application the intent is to detect modifications to the data. The
characteristics of the image, such as its edges, are embedded and compared with the current
images for differences.
Medical applications Names of the patients can be printed on the X-ray reports and MRI scans
using techniques of visible watermarking.
CHARACTERISTICS OF WATERMARKING
Invisibility: an embedded watermark is not visible.
Robustness: piracy attack or image processing should not affect the embedded watermark.
19
RGB TO YCBCR
In the watermark embedding phase, the color space of the color host image is first converted from RGB
to YCbCr.
The original image X is converted to the YCbCr color space.
Following equation is the formula of YCbCr transformationY 0.29 0.58 0.11 R
Cb = -0.16 -0.33 0.50 G
Cr 0.50 -0.41 -0.81 B
The Y component represents the luminance. The Cb and Cr components represent the chrominance.
There are many kind of sampling method are available like 4:2:2 sampling and 4:1:1 sampling.
Digital image watermarking schemes mainly fall into two broad categories: Spatial-domain and
Frequency-domain techniques.
Spatial Domain Techniques
Some of the Spatial Techniques of watermarking are as follow.
Least-Significant Bit (LSB): The earliest work of digital image watermarking schemes embeds
watermarks in the LSB of the pixels. Given an image with pixels, and each pixel being represented
by an 8-bit sequence, the watermarks are embedded in the last (i.e., least significant), bit, of
selected pixels of the image. This method is easy to implement and does not generate serious
distortion to the image; however, it is not very robust against attacks. For instance, an attacker
could simply randomize all LSBs, which effectively destroys the hidden information.
SSM-Modulation-Based Technique: Spread-spectrum techniques are methods in which energy
generated a one or more discrete frequencies is deliberately spread or distributed in time or
frequency domains. This is done for a variety of reasons, including the establishment of secure
communications, increasing resistance to natural interference and jamming, and to prevent
detection. When applied to the context of image watermarking, SSM based watermarking
algorithms embed information by linearly combining the host image with a small pseudo noise
signal that is modulated by the embedded watermark.
Frequency Domain Techniques
Compared to spatial-domain methods, frequency-domain methods are more widely applied. The aim is to
embed the watermarks in the spectral coefficients of the image. The most commonly used transforms are
the Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), Discrete Wavelet Transform
(DWT), Discrete Laguerre Transform (DLT) and the Discrete Hadamard Transform (DHT).
The reason for watermarking in the frequency domain is that the characteristics of the human visual
system (HVS) are better captured by the spectral coefficients. For example, the HVS is more sensitive to
low-frequency coefficients, and less sensitive to high-frequency coefficients. In other words, lowfrequency coefficients are perceptually significant, which means alterations to those components might
cause severe distortion to the original image. On the other hand, high-frequency coefficients are
20
considered insignificant; thus, processing techniques, such as compression, tend to remove highfrequency coefficients aggressively. To obtain a balance between imperceptibility and robustness, most
algorithms embed watermarks in the midrange frequencies.
Discrete Cosine Transformation (DCT):
CT like a Fourier Transform, it represents data in terms of frequency space rather than an
amplitude space. This is useful because that corresponds more to the way humans perceive light,
so that the part that are not perceived can be identified and thrown away.
DCT based watermarking techniques are robust compared to spatial domain techniques. Such
algorithms are robust against simple image processing operations like low pass filtering,
brightness and contrast adjustment, blurring etc. However, they are difficult to implement and
are computationally more expensive. At the same time they are weak against geometric attacks
like rotation, scaling, cropping etc. DCT domain watermarking can be classified into Global DCT
watermarking and Block based DCT watermarking. Embedding in the perceptually significant
portion of the image has its own advantages because most compression schemes remove the
perceptually insignificant portion of the image.
Discrete Wavelet Transformation (DWT): The Discrete Wavelet Transform (DWT) is currently
used in a wide variety of signal processing applications, such as in audio and video compression,
removal of noise in audio, and the simulation of wireless antenna distribution. Wavelets have
their energy concentrated in time and are well suited for the analysis of transient, time-varying
signals. Since most of the real life signals encountered are time varying in nature, the Wavelet
Transform suits many applications very well. Watermarked Original Watermarked is currently
used in a wide variety of signal processing applications, such as in audio and video compression,
removal of noise in audio, and the simulation of wireless antenna distribution. Wavelets have
their energy concentrated in time and are well suited for the analysis of transient, time varying
signals. Since most of the real life signals encountered are time varying in nature, the Wavelet
Transform suits many applications very well . We use the DWT to implement a simple
watermarking scheme. The 2-D discrete wavelet transforms (DWT) decomposes the image into
sub-images, 3 details and 1 approximation. The approximation looks just like the original; only on
1/4 the scale. The 2-D DWT is an application of the 1-D DWT in both the horizontal and the
vertical directions. The DWT separates an image into a lower resolution approximation image (LL)
as well as horizontal (HL), vertical (LH) and diagonal (HH) detail components. The low-pass and
highpass filters of the wavelet transform naturally break a signal into similar (low pass) and
discontinuous/rapidlychanging (high-pass) sub-signals. The slow changing aspects of a signal are
preserved in the channel with the lowpass filter and the quickly changing parts are kept in the
high-pass filters channel. Therefore we can embed high-energywatermarks in the regions that
human vision is less sensitive to, such as the high-resolution detail bands (LH, HL, and HH).
Embedding watermarks in these regions allow us to increase the robustness of our watermark, at
littleto no additional impact on image quality. The fact that the DWT is a multi-scale analysis can
be used to the watermarking algorithms benefit.
21
In computer vision, segmentation refers to the process of partitioning a digital image into multiple
segments (sets of pixels, also known as superpixels). The goal of segmentation is to simplify and/or
change the representation of an image into something that is more meaningful and easier to analyze.
Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.
More precisely, image segmentation is the process of assigning a label to every pixel in an image such
that pixels with the same label share certain visual characteristics.
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of
contours extracted from the image. Each of the pixels in a region are similar with respect to some
characteristic or computed property, such as color, intensity, or texture.
Adjacent regions are significantly different with respect to the same characteristics.
When applied to a stack of images, typical in Medical imaging(Medical imaging is the technique and
process used to create images of the human body (or parts and function thereof) for clinical purposes), the
resulting contours after image segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like Marching cubes.
1. Medical Imaging
Color image segmentation is useful in many applications. From the segmentation results, it is possible to
identify Regions of interest and objects in the scene, which is very beneficial to the subsequent image
analysis or annotation.
22
rate. The lossy compression that produces imperceptible differences may be called visually lossless.
Few Segmentation results : Column (a) Original Image, Column (b) Segmented images.
23
Basis for finding matches of a sub-image w (x , y) ( Size J x K) and image f (x, y) ( size M x N)
Correlation between f ( x, y) and w (x y)
c (x, y ) = f (s, t) . w (x +s, y + t)
o For x = 0, 1, 2, . . M-1
o y =0, 1, 2, . . .N-1
Summation is taken over the region where w and f overlap
The functions used are normalized
Where
x = 0, 1, 2, . . M-1
y =0, 1, 2, . . .N-1
24
25
Compression Results
Search
Huffman
RLE
LZ family
Bad-OK
Bad
Good
Bad-OK
OK
Bad
26
Huffman coding
Variable length code whose length is inversely proportional to that characters frequency must satisfy
nonprefix property to be uniquely decodable two pass algorithm first pass accumulates the character
frequency and generate codebook second pass does compression with the codebook
Huffman Algorithm:
Create codes by constructing a binary tree
1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their
frequencies
3. remove the two free nodes and add the newly created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node left. It becomes the root of tree
Example(64 data)
R
K
K
K
G
B
G
G
K
K
K
K
G
B
G
R
K
K
R
B
G
B
G
R
K
R
R
C
M
M
G
R
K
R
R
C
C
Y
G
R
K
K
R
C
B
B
G
G
K
K
G
R
R
B
G
R
K
K
G
R
R
R
R
R
Color frequency
Huffman code
=================================
R
19
00
K
17
01
G
14
10
B
7
110
C
4
1110
M
2
11110
Y
1
11111
27
Automatic number plate recognition is a mass surveillance method that uses optical character
recognition on images to read the license plates on vehicles. They can use existing closed-circuit
television or road-rule enforcement cameras, or ones specifically designed for the task. They are used by
various police forces and as a method of electronic toll collection on pay-per-use roads and cataloging
the movements of traffic or individuals.
ANPR can be used to store the images captured by the cameras as well as the text from the license plate,
with some configurable to store a photograph of the driver. Systems commonly use infrared lighting to
allow the camera to take the picture at any time of the day. ANPR technology tends to be region-specific,
owing to plate variation from place to place.
There are six primary algorithms that the software requires for identifying a license plate:
1. Plate localization responsible for finding and isolating the plate on the picture.
2. Plate orientation and sizing compensates for the skew of the plate and adjusts the dimensions
to the required size.
3. Normalization adjusts the brightness and contrast of the image.
4. Character segmentation finds the individual characters on the plates.
5. Optical character recognition.
6. Syntactical/Geometrical analysis check characters and positions against country-specific rules.
The complexity of each of these subsections of the program determines the accuracy of the system.
During the third phase (normalization), some systems use edge detection techniques to increase the
picture difference between the letters and the plate backing. A median filter may also be used to reduce
the visual noise on the image.
Procedure
1. The first step in the recognition process is obtaining a photo of the vehicle usually by use of mounted
CCTV camera. After this, some type of Algorithm must be performed to transform an image to a stream
consisting of the license plate number.
2. One would require doing basic preprocessing like image enhancement; plate area localization and
noise reduction. An important characteristic of a license plate is its rectangular shape, which can also be
exploited for localization purposes. This would have to be followed by image segmentation, where
individual character is identified based on their orientation. A simple way to localize these features is to
examine edge and variance information.
3. This can be done by applying a sobel operator and obtaining the image gradient. A thresholding
algorithm then can be applied to obtain a binary edge image. A local variance image can be obtained by
sliding a window across an image and calculating the variance within each window. Combining this area
of high activity can be localized. Finally Character recognition could be performed on the number plate.
28
29
recognition
30
31
Application 16.
32
On-line handwriting recognition involves the automatic conversion of text as it is written on a special
digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching.
That kind of data is known as digital ink and can be regarded as a dynamic representation of handwriting.
The obtained signal is converted into letter codes which are usable within computer and text-processing
applications.
The elements of an on-line handwriting recognition interface typically include:
a pen or stylus for the user to write with.
a touch sensitive surface, which may be integrated with, or adjacent to, an output display.
a software application which interprets the movements of the stylus across the writing surface,
translating the resulting strokes into digital text.
2. Printed Character Recognition:
The recognition system consists of two main processing units a character separator and an isolated
character classifier.
Character separation (frequently called segmentation) can work in two modes:
fixed (constrained) spacing mode (where character size is known in advance and therefore
segmentation can be very robust)
variable (arbitrary) spacing (where no a priori information can be assumed)
33
In this case, character spacing is fixed. Hence, segmentation is possible even when fields are
distorted, as
34