You are on page 1of 57

1

Breast Cancer Detection using


Digital Image Processing

Project Report Submitted in the partial fulfillment of the
requirements for the award of the degree of

Bachelor of Technology
in

ELECTRONICS & COMMUNICATON ENGINEERING
By
G. ISHITHA
Reg. No: 2210409112

Under the guidance of

Dr. K. Manjunathachari
Professor
and
Head of the Department




Department Of Electronics and Communication Engineering
GITAM School of Technology
GITAM University
Hyderabad - 502329
April 2013
2


DECLARATION


I here by declare that the Project report titled Breast Cancer Detection using
Digital Image Processing is submitted in partial fulfillment for the award of Bachelor
of Technology in the Department of Electronics and Communication Engineering,
GITAM School of Technology, GITAM University, Hyderabad campus under the
guidance of Dr. K. Manjunathachari, Professor, Department of Electronics and
Communication Engineering, GITAM University, Hyderabad Campus. The results
embodied in this report have not been submitted to any other University or Institute for
the award of any degree or diploma.
.









Place: Hyderabad G. Ishitha
Date: 30/04/2013 Reg. No: 2210409112





3

GITAM UNIVERSITY
Hyderabad - 502329, India

Dated: 30
th
April 2013

CERTIFICATE


This is to certify that the project entitled "Breast Cancer Detection Using Digital
Image Processing" is being submitted by Ms. G. Ishitha, in partial fulfillment of the
requirement for the award of the degree of Bachelor of Technology in Electronics and
Communication Engineering at GITAM School of technology, GITAM University,
Hyderabad Campus is a record of bonafide report carried out by her under our guidance
and supervision.

It is faithful record of work carried out by her at the Department of Electronics
and Communication Engineering, GITAM University, Hyderabad Campus, under
my guidance and supervision.




(Dr. K. Manjunathachari) (Dr. K. Manjunathachari)
Professor Professor and HOD
Department of ECE Department of ECE

4

ACKNOWLEDGEMENT


I convey my gratitude to Dr.K.Manjunathachari, Professor and Head of the
Department for encouraging me and giving a due support in my project work.

I would like to thank Dr. P. Trinatha Rao, Associate Professor, Department of
Electronics and Communication Engineering for his guidance and encouragement.

I thank Mr. N. Shyam Sundar Sagar, Assistant Professor, Department of
Electronics and Communication Engineering for helping me with project schedule and
guiding me throughout the period of project work.

I thank the management for giving me the opportunity and encouraging me in
all stages of my project. I would like to thank my fellow classmates for co-operating
with me.

Finally I would like to thank my parents for always being on my side and
helping me out in every stage of my life.







G. Ishitha
Reg. No: 2210409112
5

CONTENTS

S.No Title Page No.
List of Figures Vi
Abbreviations Vii
Abstract Viii
1
INTRODUCTION

1.1 Breast Cancer 1
1.2 Methods Of Detection 2
1.3 Mammography 4
1.4 Limitations of mammography 6
1.5 Digital Image Processing 7
2
IMAGE ENHANCEMENT

2.1 Introduction 9
2.2 Histogram Equalization 11
2.3 Sharpening 14
2.3.1 First Derivative 14
2.3.2 Second Derivative 15
3
IMAGE SEGMENTATION

3.1 Introduction 21
3.2 Segmentation Techniques 22
3.2.1 Non Contextual Thresholding 24
3.3 Global Thresholding 24
3.4 Problems with Thresholding 26
3.5 Optimum Thresholding 27
3.6 Otsu optimum Thresholding 30
3.7 Advantages of Thresholding 34
3.8 Watershed Segmentation 34
4 FEATURE EXTRACTION
4.1 Introduction 39
4.2 Classifiers 40
4.3 Applications in Medical Field 41
6



4.4 Binarization 43
CONCLUSION AND FUTURE WORK 46
REFERENCES 47
7

LIST OF FIGURES

S.No. Figure No. Figure Name Page No.
1 1.1 Sample Mammogram 5
2 1.2 Mammogram with Tumor 6
3 2.1 Histogram Equalization Image-1 12
4 2.2 Histogram Equalization Image-2 13
5 2.3 Illustration of Derivatives 16
6 2.4 Laplacian Masks 18
7 2.5 Image Sharpening-2 19
8 2.6 Image Sharpening-2 20
9 3.1 Sharpened Image 23
10 3.2 Result Of Global Thresholding 26
11 3.3 Comparison of Thresholding 28
12 3.4(a) Result of Optimum Threshold-1 29
13 3.4(b) Result of Optimum Threshold-2 29
14 3.5 Otsu using Inbuilt Function 32
15 3.6(a) Input Image 33
16 3.6(b) Output Segmented image 34
17 3.7 Synthetically generated blobs 35
18 3.8 Catchment Basins 35
19 3.9 Watershed Result 38
20 3.10 Watershed Output 38
21 4.1(a) Input Mammogram 45
22 4.1(b) Image showing abnormality 45
23 4.2(a) Input Image 45
24 4.2(b) Output Showing Abnormality 45




8

ABBREVIATIONS




CTLM Computed Tomography Laser Mammography
FFDM Full Field Digital Mammography
MRI Magnetic Resonance Imaging
PET Positron Emission Tomography
MRE Magnetic Resonance Elastography































9

ABSTRACT


Breast cancer is one of the major causes of death among women. The most
effective method for early detection and screening of breast cancers is X-ray
mammography. X-Ray Mammography is commonly used in clinical practice for
diagnostic and screening purposes. Reading mammography is a demanding job for
radiologists, and cannot provide consistent results from time to time. Hence several
computer-aided diagnosis (CAD) schemes have been developed to improve the
detection of primary signatures of this disease.

Digital images of mammography are displayed on a computer monitor and can
be enhanced that is either lightened or darkened, before they are printed on film. Image
processing techniques are widely used in several medical areas for image improvement
in earlier detection and treatment stages, where the time factor is very important to
discover the abnormality issues in target images, especially in various cancer tumors
such as breast cancer, lung cancer, etc.

In this project, the detection of breast cancer is done by using low level
preprocessing techniques and Image segmentation. In Image segmentation, the
thresholding technique and watershed algorithm are compared. Feature detection is
done by using Binarization Method.





10

CHAPTER 1
INTRODUCTION

1.1 Breast Cancer

Breast cancer is the most frequent cancer in women worldwide. The disease is
curable if detected early enough. Primary prevention seems impossible since the causes
of this disease are still remaining unknown. The development of breast carcinoma has
been associated with several well-recognized epidemiological risk factors such as early
menarche and late menopause, family history, dietary, environmental factor and genetic
factors.

The cells with similar function grow side by side to form a common tissue, such
as brain tissue or muscle tissue or bone tissue. As these normal cells proliferate, they
begin to crowd and bump into each other and a phenomenon that researchers call cell
recognition occurs and a message is sent back to the individual cells in the tissue to stop
proliferating. Cancer cells do not recognize this phenomenon, and they continue to
grow and multiply and cause the tissue to expand into a larger mass called a tumor.

Small clusters of micro calcifications appearing as collection of white spots on
mammograms show an early warning of breast cancer. Microcalcifications are tiny bits
of calcium that may show up in clusters or in patterns (like circles) and are associated
with extra cell activity in breast tissue. Usually the extra cell growth is not cancerous,
but sometimes tight clusters of micro calcifications can indicate early breast cancer.
Scattered micro calcifications are usually a sign of benign breast tissue.


11

1.2 Methods of Detection
There are several imaging techniques for examination of the breast, including
magnetic resonance imaging, ultrasound imaging, and X-ray imaging.
i)Mammography/Thermography - Mammograms can detect many breast cancers,
but there is concern over false results and the hazards of radiation exposure that result
from the tests. There are two new forms of mammography: Computed Tomography
Laser Mammography (CTLM) and Full Field Digital Mammography (FFDM).

The CTLM
-
(Computed Tomography Laser Mammography) system uses state-of-the-
art laser technology, a special array of detectors and proprietary computed algorithms.
The CTLM

system does not expose the patient to ionizing radiation or require breast
compression. This approach is awaiting FDA approval.

Digital mammography still uses low energy x-rays that pass through the breast exactly
like conventional mammograms but are recorded by means of an electronic digital
detector instead of the film. This electronic image can be displayed on a video monitor
like a TV or printed onto film. The radiologist can manipulate the digital mammogram
electronically to magnify an area, change contrast, or alter the brightness.
ii)Ultrasound or sonogram can be used to determine whether a lump is a cyst
(containing fluid) or a solid mass and to precisely locate the position of a known tumor.
The test is safe and painless, and uses no radiation.
iii)Other Imaging Methods: A number of other imaging methods are now available
for detecting breast cancer. At present, they are used mainly in research studies, and
sometimes to get more information about a tumor found by another method. Each of
12

these new methods generates a computerized image that the doctor can analyze for the
presence of an abnormal breast lump. These include:
a) Scintigraphy [sin-TOG-ra-fee]:Also called scintimammography, this test uses a
special camera to show where a tracer (a radioactive chemical) has adhered to a tumor.
A scanner is then used to see if the breast lump has picked up more of the radioactive
material than the rest of the breast tissue. Dr. Fleming in Omaha has been using this
approach. There are also clinical trials for this approach.
b) MRI: A magnetic resonance imaging (MRI) machine uses a large magnet and radio
waves to measure the electromagnetic signals your body naturally gives off. It makes
precise images of the inside of the body, including tissue and fluids. MRI can also be
used to see if a silicone breast implant has leaked or ruptured.
c) PET scan: Cancer cells grow faster than other cells, so they use up energy faster,
too. To measure how fast glucose (the body's fuel) is being used, a tracer (radioactive
glucose) is injected into the body and scanned with a positron emission tomography
(PET) machine. The PET machine detects how fast the glucose is being used. If it is
being used up faster in certain places, it may indicate the presence of a cancerous tumor.
d)Mayo Clinic researchers are working on a new imaging test called magnetic
resonance (MR) elastography. This test uses a combination of sound waves and MRI
to evaluate the mechanical properties of tissues within the breast. "Conventional MRI
is very sensitive for detecting breast cancer, but unfortunately there are too many false
positives," Dr. Ehman says. "The goal of our research is to determine whether we can
use this new MR elastography technique to improve the accuracy of MRI for breast
cancer diagnosis, thereby reducing the need for biopsies." In addition, mammography
13

does not work as well for women with dense breasts, those who have had lumpectomies
or premenopausal women,. The combination of MRI and MR elastography could be
used as an additional screening tool.
1.3 Mammography
Breast cancer occurs mostly in women, but does occur rarely in men. It is one
of the major causes of death among women. An improvement of early diagnostic
techniques is critical for womens quality of life. Mammography is the main test used
for screening and early diagnosis. Mammography as a mass screening tool is
convenient, inexpensive and have become the modality choice for an early detection of
breast cancers due to its sensitivity in recognizing breast masses. Masses appearing in
breast are three-dimensional lesions representing sign of breast cancer. Mammography
offers high-quality images at a low radiation dose, and is currently the only widely
accepted imaging method used for routine breast cancer screening.
Mammograms can depict most of the significant changes of breast disease. The
primary radiographic signs of cancer are masses (its density, site, shape, borders),
spicular lesions and calcification content. Calcium deposits can also indicate the
existence of a tumor. However, the deposits are often only a few tenths of a millimeter
in size and so deeply embedded in dense tissue that they are nearly undetectable in the
images. Experienced radiologists know where and how to look for such signs.

14


Figure.1.1 Sample Mammogram

As shown in figure1.1, Sample mammogram the dark areas are normal fatty
breast tissue and the lighter areas are denser tissue. The whiter spots are calcifications,
indicated by the red arrows.

The findings on this abnormal mammogram are not necessarily cancerous. Here
you can see breast calcifications through ductal patterns. Micro calcifications are tiny
deposits of calcium that appear as small bright spots in the mammogram. This patient
would have a follow-up mammogram in three months for a comparison.

On a mammogram, a lesion will usually appear as 'brighter' than the surrounding
tissue. This is because things that are denser than fat will stop more x-ray photons,
hence they appear brighter.

15


Figure 1.2 Mammogram with Tumor

(Courtesy of Pramodini Apollo Imaging and Diagnostics Center)


In the figure 1.2, there is a tumor forming in the upper part of the breast, it can
be invasive. Depending on its growth rate and size follow up diagnosis and treatment
is given. It is also possible to determine the size or dimensions of the tumor with the
help of image processing techniques.

1.4 Limitations of Mammography
In film mammography, the image is created directly on film. screen-film
mammography has some limitations, which include:
1) Limited range of X-ray exposure
2) Image contrast cannot be altered after the image is obtained
3) The film acts as the detector, display, and archival medium
4) Film processing is slow and introduces artifacts.
16

One of the difficulties with mammography is that mammograms generally
have low contrast. This makes it difficult for radiologists to interpret the results.
Studies have shown that mammography is susceptible to a high rate of false positives
as well as false negatives, causing a high proportion of women without cancer to
undergo further clinical evaluation or breast biopsy, or miss the best time interval for
the treatment of cancer. Several solutions have been proposed to increase the
accuracy, specificity, and sensitivity of mammography and reduce unnecessary
biopsies.
1.5 Digital Image Processing

The most efficient method to overcome the difficulties with mammography is
use of artificial intelligence. Basically here we use image processing. Digital image
processing can assist with the correct interpretation of the images. In order to increase
radiologists diagnostic performance, several computer-aided diagnosis (CAD) schemes
have been developed to improve the detection of primary signatures of this disease:
masses and micro calcifications.

Digital mammography is used for this purpose which takes an electronic image
of the breast and stores it directly on a computer. It has overcome the limitations and
also has some potential advantages over film mammography like:
1) Wider dynamic range and lower noise
2) Improved image contrast
3) Enhanced image quality
4) Lower X-ray dose.

17

In this project sample digital images are used and perform some image processing
techniques to effectively detect the cancer prone areas. It has following steps:

a) Image Enhancement: In this a mammogram is enhanced using histogram
Equalization and Sharpening techniques.
b) Image Segmentation: The enhanced images are segmented into cancer prone and
normal areas using Thresholding and watershed Algorithm techniques.
c) Feature Detection: The segmented images are further processed using Binarization
technique in which segmented images are overlapped with enhanced images to
detect the cancer prone areas.

18

CHAPTER 2

IMAGE ENHANCEMENT

2.1 Introduction

Enhancement is the process of manipulating an image so that the result is
more suitable than the original for a specific application. Intensity Transformations
are principally used in image enhancement. Image enhancement is the improvement
of digital image quality (wanted e.g. for visual inspection or for machine analysis),
without knowledge about the source of degradation.

Image enhancement can be done in two domains
a) Spatial Domain
b) Frequency Domain
Spatial domain refers to the image processing methods in this category are based
on direct manipulation of pixels in the image. The pixel values are manipulated to
achieve desired enhancement. In frequency domain methods, the image is first
transferred in to frequency domain. It means that, the Fourier Transform of the image
is computed first. All the enhancement operations are performed on the Fourier
transform of the image and then the Inverse Fourier transform is performed to get the
resultant image.

These enhancement operations are performed in order to modify the image
brightness, contrast or the distribution of the grey levels. As a consequence the pixel
value (intensities) of the output image will be modified according to the transformation
function applied on the input values. Image enhancement is applied in every field where
images are ought to be understood and analyzed.
19

For example, medical image analysis, analysis of images from satellites etc.
Image enhancement simply means, transforming an image f into image g using T.
(Where T is the transformation. The values of pixels in images f and g are denoted by r
and s, respectively.

As said, the pixel values r and s are related by the expression,
s = T(r) (2.1)
Where T is a transformation that maps a pixel value r into a pixel value s. The results
of this transformation are mapped into the grey scale range as we are dealing here only
with grey scale digital images. So, the results are mapped back into the range [0, L-1],
where L=2k, k being the number of bits in the image being considered.
Many different, often elementary and heuristic methods are used to improve
images in some sense. The problem is not well defined, as there is no objective measure
for image quality. The image processing methods are very problem-oriented: a method
that works fine in one case may be completely inadequate for another problem.
Image enhancement in spatial domain is done using two methods
a) Histogram equalization
b) Image sharpening.
Histogram Equalization is a basic technique used and can be done using in-
built functions of Matlab Image Processing Toolbox. On the other hand, Image
sharpening requires designing a suitable mask for the application.


20

2.2 Histogram Equalization
Histogram equalization is a common technique for enhancing the appearance of
images. Suppose we have an image which is predominantly dark. Then its histogram
would be skewed towards the lower end of the grey scale and all the image detail is
compressed into the dark end of the histogram. If we could `stretch out' the grey levels
at the dark end to produce a more uniformly distributed histogram then the image would
become much clearer.

The method is useful in images with backgrounds and foregrounds that are both
bright or both dark. In particular, the method can lead to better views of bone structure
in x-ray images, and to better detail in photographs that are over or under-exposed. A
key advantage of the method is that it is a fairly straightforward technique and
an invertible operator. So in theory, if the histogram equalization function is known,
then the original histogram can be recovered. The calculation is
not computationally intensive. A disadvantage of the method is that it is indiscriminate.
It may increase the contrast of background noise, while decreasing the usable signal.

It involves finding a grey scale transformation function that creates an output
image with a uniform histogram. The histogram of a digital image with intensity levels
in the range [0,L-1] is a discrete function h(rk) = nk, where rk is the kth intensity value
and nk is the number of pixels in the image with intensity rk.


The algorithm for enhancement of an image using histogram equalization is as shown
below.
21

Histogram Algorithm
1. Read the image using imread.
2. Covert the image to gray scale level.
3. Display the image and histogram of the input image that is intensity value versus
no of pixels using imhist function.
4. Perform the histogram equalization using histeq function.
5. Display the output image and histogram that is intensity value versus no of pixels
using imhist function.
RESULTS



Figure 2.1 Histogram Equalization-Image1.

22

The results of the above method are shown in the figure 2.1 and 2.2, the original
image has more pixels between the intensity value 100-200. After equalization, the
pixels are equally distributed to all the intensity values.

This technique is applied to many images to find the enhancement quality from
image to image. It is found that image with varied or high differences from pixel to
pixel gives good enhancement value.

Figure 2.2Histogram Equalization-Image2

Histogram equalization often produces unrealistic effects in photographs;
however it is very useful for scientific images like thermal, satellite or x-ray images,
often the same class of images that user would apply false-color to. Also histogram
equalization can produce undesirable effects (like visible image gradient) when applied
to images with low color depth.
23

2.3Sharpening

Image sharpening falls into a category of image processing called spatial
filtering. Spatial filter consists of a neighborhood and a pre-defined operation
performed on the image pixels defining the neighborhood. The result of filtering a new
pixel with coordinated of the neighborhoods center and the value defined by the
operation. If the operation is linear, the filter is said to be a linear spatial filter.

The principal objective of sharpening is to highlight transitions in intensity.
Uses of image sharpening vary and include applications ranging from electronic
printing and medial imaging to industrial inspection and autonomous guidance in
military systems. Because averaging is analogues to integration, it is logical to conclude
that sharpening can be accomplished by spatial differentiation. Fundamentally the
strength of the response of a derivative operator is proportional to the degree of intensity
discontinuity of the image at the point at which the operator is applied.

2.3.1First Derivative
The derivatives of a digital function are defined in terms of differences. There
are various ways to define these differences.

Definition of first derivative
1) Must be zero in areas of constant intensity.
2) Must be nonzero at the onset and end of an intensity step or ramp
3) Must be nonzero along the ramps.
A basic definition of the first order derivative of a one dimensional function f(x) is
the difference

= ( + 1) () (2.2)
24

Partial derivatives are used considering image functions f(x, y).

2.3.2Second Derivative
The second derivative is also same as that of first derivative
Definition of second derivative:
1) Must be zero in constant areas.
2) Must be nonzero at the onset and end of an intensity step or ramp
3) Must be nonzero along the ramps of constant slope.
The shortest distance over which that changes can occur is between adjacent pixels.
Second order derivative is:

2
= ( +1) + ( 1) 2() (2.3)
Although this filter can effectively extract the edges contained in an image, the
effect that this filtering operation has over negative-slope is different from that obtained
for positive-Slope edges. The shape of I and II order filter is shown in fig2.3.
Since the filer output is proportional to the difference between the center pixel
and the small pixel around the center, for negative-slope edges, the center pixel small
values producing small values at the filter output. Moreover, the filter output is zero if
the smallest pixel around the center pixel and the center pixel have the same values.
This implies that negative-slope edges are not extracted in the same way as
positive-slope edges. To overcome this limitation the basic image sharpening structure
must be modified such that positive-slope edges as well as negative-slope edges are
highlighted in the same proportion.
25


Figure 2.3 Illustration of first and second derivatives
Laplacian Operator

Isotropic filters are those whose response is independent of the direction of the
discontinuities in the image to which the filter is applied. The simplest isotropic
derivative operator is laplacian operator. The Laplacian operator is an example of a
second order or second derivative method of enhancement. It is particularly good at
finding the fine detail in an image. Any feature with a sharp discontinuity (like noise,
unfortunately) will be enhanced by a Laplacian operator. Thus, one application of a
Laplacian operator is to restore fine detail to an image which has been smoothed to
remove noise. (The median operator is often used to remove noise in an image). The
Laplacian operator is implemented in IDL as a convolution between an image and a
kernel.
26

In image convolution, the kernel is centered on each pixel in turn, and the pixel
value is replaced by the sum of the kernel mutipled by the image values.In the particular
kernel we are using here, we are counting the contributions of the diagonal pixels as
well as the orthogonal pixels in the filter operation. This is not always necessary or
desirable, although it works well.
Human perception is highly sensitive to edges and fine details of an image, and
since they are composed primarily by high frequency components, the visual quality of
an image can be enormously degraded if the high frequencies are attenuated or
completed removed. In contrast, enhancing the high-frequency components of an image
leads to an improvement in the visual quality. Image sharpening refers to any
enhancement technique that highlights edges and fine details in an image. Image
sharpening is widely used in printing and photographic industries for increasing the
local contrast and sharpening the images.
In principle, image sharpening consists of adding to the original image a signal
that is proportional to a high-pass filtered version of the original image. The key point
in the effective sharpening process lies in the choice of the high-pass filtering operation.
As Laplacian is a derivative operator, its use highlights intensity discontinuities
in an image and deemphasizes regions with slow varying intensity levels.

2
=

2
+

2
(2.4)

2
= ( +1, ) + ( 1, ) 2(, ) (2.5)

2
= (, +1) + (, 1) 2(, ) (2.6)
27

2
(, ) = ( +1, ) + ( 1, ) +(, +1) + (, 1) 4(, ) (2.7)

Thus the basic way in which we use the Laplacian for the image sharpening is

(, ) = (, ) + [
2
(, )] (2.8)
0 1 0
1 -4 1
0 1 0


1 1 1
1 -8 1
1 1 1

Figure 2.4 Laplacian Masks
Where f(x, y) and g(x, y) are input and sharpened images respectively. The
constant c =-1 if the laplacian filter in figure 2.5 is used with values shown. The constant
c=1 if either of the filters have middle term positive and rest of the terms positive in 2.5
masks.
The algorithm for image enhancement using image sharpening is as shown below.
Sharpening Algorithm
1. 1. Read the image using imread.
2. Covert the image to gray scale level.
3. Convert the image into double which enables the image for mathematical
operations.
28

4. Apply the 3X3 laplacian mask on every pixel of the image to obtain the sharpened
image.
5. Display the sharpened image.
The results are show for the same image as used in histogram equalization and results
are compared.
RESULTS:


Figure 2.5 Image Sharpening - 1
29


Figure 2.6 Image Sharpening - 2
As shown in figure 2.5 and 2.6, the original image has smoothened boundaries
and after using sharpening mask, the result has sharp boundaries. Having sharp
boundaries is important especially for this application of cancer detection.
Sharpening shows better results than histogram equalization. Histogram
equalization smoothened the edges which is highly not suitable to the detection of
cancerous tumors. In order to detect the cancer tissue accurately sharpening is very
much needed and thus sharpened image is taken to next level of processing.


30

CHAPTER 3

IMAGE SEGMENTATION
3.1Introduction


Image segmentation is an essential process for most image analysis subsequent
tasks. Segmentation partitions an image into distinct regions containing each pixels
with similar attributes. To be meaningful and useful for image analysis and
interpretation, the regions should strongly relate to depicted objects or features of
interest. Meaningful segmentation is the first step from low-level image processing
transforming a grayscale or colour image into one or more other images to high-level
image description in terms of features, objects, and scenes. The success of image
analysis depends on reliability of segmentation, but an accurate partitioning of an image
is generally a very challenging problem.

Segmentation divides the image into its constituent regions or objects.
Segmentation of medical images in 2D, slice by slice has many useful applications for
the medical professional such as: visualization and volume estimation of objects of
interest, detection of abnormalities (e.g. tumors, polyps, etc.), tissue quantification and
classification, and more.

The goal of segmentation is to simplify and/or change there presentation of the
image into something that is more meaningful and easier to analyse. Image
segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in
images. More precisely, image segmentation is the process of assigning a label to every
pixel in an image such that pixels with the same label share certain visual
31

characteristics. The image segmentation is a set of segments that collectively cover the
entire image, or of contours extracted from the image (edge detection).

Segmentation has to follow the following properties:
1)

=1
.
2)

is a connected set, i=1, 2,n.


3)

= for all I and j, ij.


4) (

) = for i= 1, 2, , n.
5) (

) = for any adjacent regions

and

.

Segmentation algorithms are based on one of two basic properties of intensity
values: discontinuity and similarity. The first category is to partition the image based
on abrupt changes in intensity, such as edges in an image. The second category is based
on partitioning the image into regions that are similar according to a predefined
criterion. All pixels in a given region are similar with respect to some characteristic or
computed property, such as colour, intensity, or. Adjacent regions are significantly
different with respect to the same characteristic(s).

3.2 Segmentation Techniques
Segmentation techniques are either contextual or non-contextual. The latter take
no account of spatial relationships between features in an image and group pixels
together on the basis of some global attribute, e.g. grey level or colour. Contextual
techniques additionally exploit these relationships, e.g. group together pixels with
similar grey levels and close spatial locations.
32


Figure 3.1 Sharpened image
The image that is obtained after image sharpening as shown in figure 3.1 is used for
segmentation purposes.
3.2.1 Non-contextual thresholding
Thresholding is one of the most powerful tools for image segmentation.
Thresholding is the simplest non-contextual segmentation technique. With a single
threshold, it transforms a grayscale or colour image into a binary image considered as
a binary region map. The binary map contains two possibly disjoint regions, one of
them containing pixels with input data values smaller than a threshold and another
relating to the input values that are at or above the threshold. The former and latter
regions are usually labelled with zero (0) and non-zero (1) labels, respectively. The
segmentation depends on image property being thresholded and on how the threshold
is chosen.
33

Generally, the non-contextual thresholding may involve two or more thresholds as well
as produce more than two types of regions such that ranges of input image signals
related to each region type are separated with thresholds.
3.2.2 Simple Thresholding
Thresholding is a non-linear operation that converts a gray-scale image into a
binary image where the two levels are assigned to pixels that are below or above the
specified threshold value The most common image property to threshold is pixel grey
level:
g(x,y) = 0 if f(x,y) < T (3.1)
and
g(x,y) = 1 if f(x,y) T, (3.2)
where T is the threshold.
Using two thresholds, T1 < T1, a range of grey levels related to region 1 can be defined:
g(x,y) = 0 if f(x,y) < T1 OR f(x,y) > T2 and g(x,y) = 1 if T1 f(x,y) T2.
3.3 Global Thresholding

The thresholding is defined for global thresholding as follows
g(x,y) = 0 if f(x,y) < T (3.3)
and
g(x,y) = 1 if f(x,y) T, (3.4)
where T is the threshold.
34

f is the input image and g is the output image. T is a nonnegative threshold.
If T is a constant applied all over the image, the process is called global
thresholding. Global thresholding is widely used in image processing to generate
binary images, which are used by various pattern recognition systems. Typically,
many features that are present in the original gray-level image are lost in the resulting
binary image.
The main problems are whether it is possible and, if yes, how to choose an
adequate threshold or a number of thresholds to separate one or more desired objects
from their background. In many practical cases the simple thresholding is unable to
segment objects of interest.

The algorithm for image segmentation using global thresholding is as follows:

Global Thresholding Algorithm
1. Read the image using imread function.
2. Covert the image to gray scale level.
3. Convert the image into double which enables the image for mathematical
operations.
4. Compare each pixel value with the global threshold value 0.6(say) and assign the
pixels with intensity less than 0.6 as 0 and greater than 0.6 as 1.
5. The output image has two intensity values 0 and 1 resulting in segmented image.
6. Display the output image that is segmented image.


35

RESULTS:



Figure 3.2 Result of Image after Global Thresholding.
The input image the sharpened image and output image is the thresholded two intensity
level image.
3.4 Problems with Thresholding


The major problem with thresholding is that we consider only the intensity, not
any relationships between the pixels. There is no guarantee that the pixels identied by
the thresholding process are contiguous. We can easily include extraneous pixels that
arent part of the desired region, and we can just as easily miss isolated pixels within
the region (especially near the boundaries of the region). These effects get worse as the
noise gets worse, simply because its more likely that a pixels intensity doesnt
represent the normal intensity in the region.

36

When we use thresholding, we typically have to play with it, sometimes losing
too much of the region and sometimes getting too many extraneous background pixels.
(Shadows of objects in the image are also a real painnot just where they fall across
another object but where they mistakenly get included as part of a dark object on a light
background).

Another problem with global thresholding is that changes in illumination across
the scene may cause some parts to be brighter (in the light) and some parts darker (in
shadow) in ways that have nothing to do with the objects in the image.

3.5 Optimum Global Thresholding

A general approach to thresholding is based on assumption that images are
multimodal, that is, different objects of interest relate to distinct peaks (or modes) of
the 1D signal histogram. The thresholds have to optimally separate these peaks in spite
of typical overlaps between the signal ranges corresponding to individual peaks. A
threshold in the valley between two overlapping peaks separates their main bodies but
inevitably detects or rejects falsely some pixels with intermediate signals. The optimal
threshold that minimises the expected numbers of false detections and rejections may
not coincide with the lowest point in the valley between two overlapping peaks
We can deal, at least in part, with such uneven illumination by determining
thresholds locally. That is, instead of having a single global threshold, we allow the
threshold itself to smoothly vary across the image.
A threshold is said to be globally optimal if the number of misclassified pixels is
minimum.
37

The following method shows optimum threshold using or dividing image into
two sub groups and calculating overall threshold based on the two thresholds obtain for
each class.

Figure 3.3 Comparison of thresholding techniques

The code for image segmentation using optimum thresholding is as follows:
Optimum Thresholding Algorithm:
1. Read the image using imread function.
2. Covert the image to gray scale level.
3. Convert the image into double which enables the image for mathematical
operations.
4. Compare each pixel value with the global threshold value 0.99(say) and divide the
image into two classes or regions.
5. Obtain threshold value in each region by finding the average pixel value as t1 and
t2.
6. Overall threshold is obtained by averaging t1 and t2 to get T.
38

7. Compare each pixel value with the optimum threshold value T and assign the
pixels with intensity less than T as 0 and greater than 0.6 as 1.
8. The output image has two intensity values 0 and 1 resulting in segmented image.
RESULTS:

Figure 3.4(a) Image after optimum Thresholding - 1

Figure 3.4 (b) Image after optimum Thresholding - 2
39

3.6 OTSU optimum Threshold
Otsus method is based on threshold selection by statistical criteria. Otsu
suggested minimizing the weighted sum of within-class variances of the object and
background pixels to establish an optimum threshold. Minimization of within-class
variances is equivalent to maximization of between-class variance. This method gives
satisfactory results for bimodal histogram images. Threshold values based on this
method will be between 0 and 1, after achieving the threshold value; image will be
segmented based on it.
The solution is based on only two parameters:
1) The probability density function of the intensity values of each class
2) The probability that each class occurs in a given application.
The solution is optimum in the sense that it maximizes the between-class
variance. The basic idea is that well-threshold classes should be distinct with respect
to the intensity values of their pixels and, conversely that a threshold giving the best
separation between classes in terms of their intensity values would be the best
(optimum) threshold.
The normalized histogram has components

that follows,

= 1,

=0

0
Cumulative sum,

1
() =

=0
(3.5)
The cumulative mean(average intensity) up to level k is given by,
40

() =

=0
(3.6)
and the average intensity of the entire image i.e., the global mean is given by,


1
=0
(3.7)
And

2
( ) =
[

1
() ()]

1
()[1
1
()]
(3.8)
Then the optimum threshold is the value, k*, that maximizes

2
():

2
( ) = max
01

2
() (3.9)
Then using threshold value, the segmentation is done as follows,
g(x,y) = {
1 (, ) >
0 (, )
(3.10)

Otsus algorithm may be summarized as follows:
1. Compute the normalized histogram of the input image. Denote the components of
the histogram by

, i= 0, 1, 2 L-1.
2. Compute the cumulative sums,
1
(k), for k=0, 1, 2 L-1.
3. Compute the cumulative means, m (k), for k=0, 1, 2L-1.
4. Compute the global intensity mean,

.
5. Compute the between-class variance, 2

(k), for k=0, 1, 2L-1.


6. Obtain the Otsus threshold,

, as the value of k for which 2

(k) is maximum. If
the maximum is not unique, obtain

by averaging the values of k corresponding


to the various maxima detected.
41

7. Obtain the separability measure,

,by evaluating Eq. (10.3-16) at k=

.
Otsu Algorithm (Inbuilt)
1. Read the image using imread function.
2. Covert the image to gray scale level.
3. Convert the image into double which enables the image for mathematical
operations.
4. Obtain threshold value using inbuilt function graythresh as follows
[level EM]=graythresh(image)
5. Level gives the threshold value.
6. The image is segmented using the threshold as follows
b=im2bw(a, level).
7. The segmented is displayed using imshow (image) function.
RESULTS


Figure 3.5 Otsu Result Using Inbuilt Function

Otsu Algorithm
1. Compute histogram and probabilities of each intensity level
2. Set up initial and .
42

3. Step through all possible thresholds maximum intensity
a) greater or equal maximum Update and
b) Compute .
4. Desired threshold corresponds to the maximum .
5. You can compute two maxima (and two corresponding thresholds). is the
greater max and is the g.
6. Desired threshold =
RESULTS:

Figure 3.6 (a) Input Image.

Figure3.6 (b) Output segmented Image.
43

3.7 Advantages of Thresholding
The segmented image obtained from thresholding has the advantages of smaller
storage space, fast processing speed and ease in manipulation, compared with gray level
image which usually contains 256 levels. Therefore, thresholding techniques have
drawn a lot of attention during the past 20 years.
3.8 Watershed Segmentation

The term watershed refers to a ridge that divides areas drained by different river
systems. A catchment basin is the geographical area draining into a river or reservoir.

Understanding the watershed transform requires that us to think of an image as a
surface. For example, consider the image below:

Figure 3.7 Synthetically generated image of two dark blobs.
44


Figure 3.8 Showing Catchment Basins
If we imagine that bright areas are "high" and dark areas are "low," then it might
look like the surface (left). With surfaces, it is natural to think in terms of catchment
basins and watershed lines. The Image Processing Toolbox function watershed can find
the catchment basins and watershed lines for any grayscale image. The key behind
using the watershed transform for segmentation is this: Change your image into another
image whose catchment basins are the objects you want to identify.
Image Segmentation using the watershed transforms works well if we can
identify or mark foreground objects and background locations, to find catchment
basins watershed ridge lines in an image by treating it as a surface where light pixels
are high and dark pixels are low.

The watershed transformation considers the gradient magnitude of an image as a
topographic surface. Pixels having the highest gradient magnitude intensities (GMIs)
correspond to watershed lines, which represent the region boundaries. Water placed on
any pixel enclosed by a common watershed line flows downhill to a common local
45

intensity minimum (LIM). Pixels draining to a common minimum form a catch basin,
which represents a segment.

Watershed segmentation classifies pixels into regions using gradient descent on
image features and analysis of weak points along region boundaries. The image feature
space is treated, using a suitable mapping, as a topological surface where higher values
indicate the presence of boundaries in the original image data. It uses analogy with
water gradually filling low lying landscape basins. The size of the basins grows with
increasing amount of water until they spill into one another. Small basins (regions)
gradually merge together into larger basins. Regions are formed by using local
geometric structure to associate the image domain features with local extremes
measurement.

Watershed techniques produce a hierarchy of segmentations, thus the resulting
segmentation has to be selected using either some a priory knowledge or manually.
These methods are well suited for different measurements fusion and they are less
sensitive to user defined thresholds. We implemented watershed algorithm for
mammographic images.
Watershed Algorithm
The watershed transform finds "catchment basins" and "watershed ridge lines"
in an image by treating it as a surface where light pixels are high and dark pixels are
low. Segmentation using the watershed transform works better if you can identify, or
"mark," foreground objects and background locations.
Marker-controlled watershed segmentation follows this basic procedure:
46

1. Compute a segmentation function. This is an image whose dark regions are the
objects you are trying to segment.
2. Compute foreground markers. These are connected blobs of pixels within each of
the objects.
3. Compute background markers. These are pixels that are not part of any object.
4. Modify the segmentation function so that it only has minima at the foreground and
background marker locations.
5. Compute the watershed transform of the modified segmentation function.
NOTE: Separating touching objects in an imageis one of the most difficult image
processing operations, where the watershed transform is often applied to such problem.
Marker-controlled watershed approach has two types: External associated with the
background and Internal associated with the objects of interest.
RESULTS

Figure 3.9 Watershed Result
47


Figure 3.10 Watershed output image
The watershed algorithm has better results than Otsu algorithm but optimum
thresholding suits better for this application. The thresholding factor can be best decided
depending on mammogram using optimum thresholding technique but whereas in Otsu
and watershed the threshold is calculated depending on overall image intensity
considerations.
48

CHAPTER 4

FEATURES EXTRACTION


4.1 Introduction
Feature extraction involves simplifying the amount of resources required to
describe a large set of data accurately. The task of the feature extraction and selection
methods is to obtain the most relevant information from the original data and represent
that information in a lower dimensionality space.
In pattern recognition and in image processing, feature extraction is a special
form of dimensionality reduction. When the input data to an algorithm is too large to
be processed and it is suspected to be notoriously redundant (e.g. the same measurement
in both feet and meters) then the input data will be transformed into a reduced
representation set of features (also named features vector). Transforming the input data
into the set of features is called feature extraction. If the features extracted are carefully
chosen it is expected that the features set will extract the relevant information from the
input data in order to perform the desired task using this reduced representation instead
of the full size input.
Analysis with a large number of variables generally requires a large amount of
memory and computation power or a classification algorithm which over fits the training
sample and generalizes poorly to new samples. Feature extraction is a general term for
methods of constructing combinations of the variables to get around these problems
while still describing the data with sufficient accuracy
It can be used in the area of image processing which involves
using algorithms to detect and isolate various desired portions or shapes (features) of
49

a digitized image or video stream. It is particularly important in the area of optical
character recognition.
The feature is defined as a function of one or more measurements, each of which
specifies some quantifiable property of an object, and is computed such that it quantifies
some significant characteristics of the object.
We classify the various features currently employed as follows:
1) General features: Application independent features such as color, texture, and
shape. According to the abstraction level, they can be further divided into: - Pixel-
level features: Features calculated at each pixel, e.g. color, location.
a) Local features: Features calculated over the results of subdivision of the image band
on image segmentation or edge detection.
b) Global features: Features calculated over the entire image or just regular sub-area
of an image.
2) Domain-specific features: Application dependent features such as human faces,
fingerprints, and conceptual features. These features are often a synthesis of low-level
features for a specific domain.
On the other hand, all features can be coarsely classified into low-level features
and high-level features. Low-level features can be extracted directed from the original
images, whereas high-level feature extraction must be based on low level features.

4.2 Classifiers
After the features are extracted, a suitable classifier must be chosen. A number
of classifiers are used and each classifier is found suitable to classify a particular kind
of feature vectors depending upon their characteristics. The classifier used commonly
is Nearest Neighbor classifier. The nearest neighbor classifier is used to compare the
50

feature vector of the prototype with image feature vectors stored in the database. It is
obtained by finding the distance between the prototype image and the database.
4.3 Applications
The CBIR technology has been used in several applications such as fingerprint
identification, biodiversity information systems, crime prevention, medicine, among
others. Some of these applications are presented in this section.
In Medical applications
Queries based on image content descriptors can help in the diagnostic process.
Visual features can be used to find images of interest and to retrieve relevant
information for a clinical case. One example is a content-based medical image retrieval
that supports mammographical image retrieval. The main aim of the diagnostic method
in this case is to find the best features and get the high classification rate for
microcalcification and mass detection in mammograms .
The microcalcifications are grouped into clusters based on their proximity. A set of the
features was initially calculated for each cluster:
1) Number of calcifications in a cluster
2) Total calcification area / cluster area
3) Average of calcification areas
4) Standard deviation of calcification areas
5) Average of calcification compactness
6) Standard deviation of calcification compactness
7) Average of calcification mean grey level
8) Standard deviation of calcification mean grey level
9) Average of calcification standard deviation of grey level
10) Standard deviation of calcification standard deviation of grey level.
51

Mass detection in mammography is based on shape and texture based features.
The features are listed below:
1. Mass area: The mass area, A = |R|, where R is the set of pixels inside the region
of mass, and |.| is set cardinal.
2. Mass perimeter length: The perimeter length P is the total length of the mass
edge. The mass perimeter length was computed by finding the boundary of the
mass, then counting the number of pixels around the boundary.
3. Compactness: The compactness C is a measure of contour complexity versus
enclosed area, defined as: C = P2 4_A where P and A are the mass perimeter and
area respectively. A mass with a rough contour will have a higher compactness
than a mass with smooth boundary.
4. Normalized radial length: The normalized radial length is sum of the Euclidean
distances from the mass center to each of the boundary co-ordinates, normalized
by dividing by the maximum radial length.
5. Minimum and maximum axis: The minimum axis of a mass is the smallest
distance connecting one point along the border to another point on the border
going through the center of the mass. The maximum axis of the mass is the largest
distance connecting one point along the border to another point on the border
going through the center of the mass.
6. Average boundary roughness.
7. Mean and standard deviation of the normalized radial length.
8. Eccentricity: The eccentricity characterizes the lengthiness of a Region Of
Interest. An eccentricity close to 1 denotes a ROI like a circle, while values close
to zero mean more stretched ROIs.
52

Image features Extraction stage is an important stage that uses algorithms and
techniques to detect and isolate various desired portions or shapes (features) of a given
image. To predict the probability of breast cancer presence, the following method is
used: Binarization
4.4 Binarization
Binarization approach depends on the fact that the number of black pixels is
much greater than white pixels in abnormal mammogram images, so we started to count
the black pixels for normal and abnormal images to get an average that can be used
later as a threshold, if the number of the black pixels of a new image is greater than that
the threshold, then it indicates that the image is normal, otherwise, if the number of the
black pixels is less than the threshold, it indicates that the image in abnormal.
Functions Used
1) IM2BW - Convert image to binary image by thresholding.
IM2BW produces binary images from indexed, intensity, or RGB images. To do this,
it converts the input image to grayscale format (if it is not already an intensity image),
and then converts this grayscale image to binary by thresholding. The output binary
image BW has values of 1 (white) for all pixels in the input image with luminance
greater than LEVEL and 0 (black) for all other pixels. (Note that you specify LEVEL
in the range [0,1], regardless of the class of the input image.)
2) BW = IM2BW(I,LEVEL) converts the intensity image I to black and white.
L = BWLABEL(BW,N) returns a matrix L, of the same size as BW, containing labels
for the connected components in BW. N can have a value of either 4 or 8, where 4
specifies 4-connected objects and 8 specifies, 8-connected objects; if the argument is
omitted, it defaults to 8. The elements of L are integer values greater than or equal to
53

zero. The pixels labeled 0 are the background. The pixels labeled 1 make up one object,
the pixels labeled 2 make up a second object, and so on.
[L,NUM] = BWLABEL(BW,N) returns in NUM the number of connected objects
found in BW.
Binarization Algorithm
1. Binarization is done after segmentation. In this the results of optimum threshold are
used for binarization.
2. After segmentation the number of pixels of high intensity and low intensity are
calculated.
3. If the high intensity pixels are more than low intensity pixels, the segmented image
is made uniform such that image is free of tumors.
4. If the high intensity pixels are less than low intensity pixels, the segmented image
is represented in RGB colours with different colours for different connected
segments such that resulted image shows cancer prone areas.
5. Display the output image.
RESULTS:


Figure 4.1 (a) Input Mammogram (b) Abnormality
54


Figure 4.2 (a) Input Image (b) output showing abnormality
If input mammogram outputs uniform color image, the mammogram is free of
cancer tissues else it shows abnormality by representing the cancer tissue in cyan color.
The choice of color depends on our choice and it should be specified in the code. This
is the most common method used for detection and is very simple and efficient in
execution.


55

CONCLUSION AND FUTURE WORK

Image Processing is an indispensible tool for processing digital images specially
mammograms in detecting cancers. In image enhancement, from the results of
histogram equalization and sharpening, sharpening had better results as per the
application. In image segmentation, otsu algorithm shows over segmentation
characteristics and watershed algorithm has moderate segmentation. Feature extraction
results show whether the mammogram is cancer prone or cancer free.
There are several other algorithms which are sophisticated and have enhanced
features. This report presents the basic processing and can be used for initial diagnosis
purpose.
Technology development is exponentially increasing for the last few decades
and developments in Digital image processing field are also increasing and are
marching towards motion detection, creating robots enabled with the capability of
human eye. Embedding these types of processes and application can make the work of
analyzing and detecting not only breast but also other cancers can be made possible.
Further this process can be implemented in the form of an application so that
we can give any mammogram image and it provides with pre diagnosis result for
cancer detection.

56

REFERENCES

[1] Digital Image Processing by Rafael C. Gonzalez and Richard E. Woods [2006],
2
nd
ed., Pearson education, inc.

[2] Lung Cancer Detection Using Image Processing Techniques by Mokhled S. AL-
TARAWNEH, Computer Engineering Department, Faculty of Engineering, Mutah
University in Leonardo Electronic Journal of Practices and Technologies.

[3] Automatic Detection Of Breast Cancer Mass In Mammograms Using
Morphological Operators and Fuzzy C Means Clustering by S.Saheb Basha,
Dr.K.Satya Prasad, Madina Engineering College, Kadapa, and Jawarharlal Nehru
Technology University, (A.P) India.

[4] Analysis and Diagnosis of Breast Cancer by Poulami Das, Debnath
Bhattacharyya, Samir K. Bandyopadhyay and Tai-hoon Kim.

[5] Detection of Cancer Using Vector Quantization for Segmentation, International
Journal of Computer Applications (0975 8887) Volume 4 No.9, August, 2010.

[6] Detection and Localization of Early Lung Cancer (P & cicetel) by imaging
techniques by Branko Palcic, Ph.D.; Stephen Lam, and Calum MacAulay, Ph.D.

[7] The Ductal Carcinomas: classic presentations on mammography by Jennifer
broder, HMS IV, Advanced radiology rotation Beth Israel Deaconness Medical center.
[8] Implementation of Fuzzy Logic for Detection of Suspicious Masses in
Mammograms using DSP TMS320C6711 by Devesh D. Nawgaje, Dr. Rajendra
D.Kanphade.

57

[9] Enhanced Watershed Image Processing Segmentation by Amir Shahzad,
Muhammad Sharif, Mudassar Raza, Khalid Hussain Journal of Information &
Communication Technology, Vol. 2, No. 1, (Spring 2008) 01-09.

[10] Feature Extraction and Image Processing by Mark S. Nixon and Alberto S.
Aguado.

[11] Feature extraction techniques by Andr Aichert, January 9, 2008., Camp
Medical Seminar WS0708.

[12] Image re-morphing, noise removal, and feature extraction with swarm algorithm
by Horia Mihail Teodorescu, IEEE Student Member, David Malan, Dept. of Computer
Science, Harvard University.

[13] Adaptive document image Binarization by J. Sauvola, M. PietikaKinen,
Machine Vision and Media Processing Group, Infotech Oulu, University of Oulu,
Received 29 April 1998; accepted 21 January 1999.

[14] Stroke-Model-Based Character Extraction from Gray-Level Document Images
by Xiangyun Ye, Mohamed Cheriet, Senior Member, IEEE, and Ching Y. Suen,
Fellow, IEEE, IEEE Transactions On Image Processing, Vol. 10, No. 8, August 2001.

[15] Image Feature Extraction Techniques and Their Applications for CBIR and
Biometrics Systems by Ryszard S. Choras International Journal of Biology and
Biomedical Engineering.

You might also like