You are on page 1of 132

0 ISSN : 2319-6890

ISSN : 2319-6890(Online)
2347-5013(Print)

About Publication House

Innovative Research Publications (IRP) is a fast growing international academic publisher that publishes
International Journals in the fields of Engineering, Science, Management. IRP is establishing a distinctive
and independent profile in the international arena. Our publications are distinctive for their relevance to
the target groups and for their stimulating contribution to R&D. Our Journals are the products of dynamic
International Journal of
interchange between Scientists, authors, publisher and designer.

Objectives:
Engineering Research
·Publishing National and Internationals Journals, Magazine, Books and others in online version as well
as print version to provide high quality and high standard publications in National and International
Journals
(IJER)
·Organizing technical events i.e. Seminars, workshop, conferences and symposia etc. to expose knowledge
of researchers Volume
Volume 4 Issue
Volume 43issue Special
3 3Issue 2115
96
Volume
Volume 343Issue
Issue 107
Volume 4 Issue Special
Volume Issue
·Collaborating with educational and research organizations to expand awareness about R&D Volume
Volume
Volume 4 4&
Issue
Volume
Volume
May 19 & 20,
May 19 Issue
Volume
Volume Issue
3
Special
23Issue
Volume
2015
20, 1856 43
3 Issue
Issue
Issue 24
8
3 Issue
2015
Feb. 01,Oct.
012015
July
March 01,
01,
Nov. 20, 2014
2014
2014
2015
Jan. 01,01, 2015
·Helping to financial weak researchers to promote their researches at world level Volume
Sept.4 Issue
01,
August
April 3
2014
01,
01,20152014
April
June
May01,
March 01, 2015 2014
2014
01, 2014

Our Journals
1. International Journal of Scientific Engineering and Technology
ISSN : 2277-1581
Subject : Science, Engineering, Management and Agriculture Engineering
Last Date for submitting paper : 10th of each month
Web : www.ijset.com, Email : editor@ijset.com

2. International Journal of Engineering Research


ISSN : 2319-6890
Subject : Engineering
Last Date for submitting paper : 10th of each month Editor in Chief :
Web : www.ijer.in, Email : editor@ijer.in
Dr. R.K. Singh

A National
2nd Conference
International on on
Conference "Recent Advances
Convergent in Chemical
Innovative Engineering"
Technologies (ICCIT-2015)
Innovative Research Publications GreenChem-15, on OnMarch 20, 2015
Gulmohar, Bhopal M.P. India, Contant No.:+91-9752135004 International Journal of Engineering Research
May 19 & 20, 2015
Organized By
Web : www.irpindiia.org, Email : info@irpindia.org Web : www.ijer.in, Email
Organized
Department of Chemical Engg, by :Yavatmal
JDIET,
editor@ijer.in
(M.S) India
Contant No.:+91-9752135004
Cambridge Institute of Technology, K.R. Puram, Bangalore
Editorial Board
Editor in Chief
Dr. R. K. Singh,
Professor and Head, Department of Electronics and Communication,
KNIT Sultanpur U.P., India

Managing Editor
Mr. J. K. Singh, Managing Editor
Innovative Research Publications, Bhopal M.P. India

Advisory Board
1. Dr. Asha Sharma, Jodhpur, Rajasthan, India
2. Dr. Subhash Chander Dubey, Jammu India
3. Dr. Rajeev Jain, Jabalpur M.P. India
4. Dr. C P Paul, Indore M.P. India
5. Dr. S. Satyanarayana, Guntur, A.P, India.
Organizing Committee
List of Contents
S.No. Manuscript Detail Page No.

Simulation and Analysis of an Energy efficient protocol Ad-LEACH for Smart Home
1. Wireless Sensor Network 336-339

R. Kavitha, Dr. Nasira G M


Troubleshooter: Solution Finder for log errors from Multiple Solution Sources
2. 340-342
Smitab Patil Dr. D R Shashikumar
A Web Enabled Wireless Sensor Networks System For Precision Agriculture
3. Applications Using Internet Of Things 343-345

Sowmya L., Krishna Kumar P R.


A Survey on Developing an Efficient Optimization Technique for Cluster Head
4. Selection in Wireless Sensor Network 346-348
Sridhar R., Dr. N Guruprasad
Spatial and Location Based Rating Systems
5. 249-352
Vinay Kumar M., N. Rajesh
Energy Efficient Zone-based Proactive Source Routing to Minimize Overhead in
6. Mobile Ad-hoc Networks 353-357

Lakshmi K. M., Levina Tukaram


Homomorphic Encryption based Query Processing
7. Homomorphic Encryption based Query Processing 358-361
Aruna T.M., M.S. Satyanarayana, Madhurani M.S.
Aruna T.M., M.S. Satyanarayana, Madhurani M.S.
Intrusion Detection System against Bandwidth DDoS Attack
8. 362-364
Basavaraj Muragod, Sai Madhvi D.
Information Retrieval With Keywords Using Nearest Neighbor Search
9. 365-367
Prathibha
Mining the Frequent Item Set Using Pre Post Algorithm
10. 368-370
Manasa M. S., Shivakumar Dallali
Design and Implementation of Research Proposal Selection
11. 371-373
Neelambika Biradar, Prajna M., Dr. Antony P J
Light Weight Integrated Log Parsing Tool:: LOG ANALYZER
12. 374-379
Priyanka Sigamani S.,Dr. D. R. Shashi Kumar
Towards Secure and Dependable for Reliable Data Fusion in Wireless Sensor Networks
13. under Byzantine Attacks 380-382
Valmeeki B.R. , Krishna Kumar. P.R., Shreemantha M.C.

Identity and Access Management To Encrypted Cloud Database


14. 383-386
Archana A., Dr. Suresh L., Dr. Chandrakanth Naikodi
An Analysis of Multipath Aomdv in Mobile Adhoc Networks
15. 387-389
S. Muthusamy, Dr. C. Poongodi
Light Weight SNMP Based Network Management and Control System for a
16. Homogeneous Network 390-391
Brunda Reddy H K, K Satyanarayan Reddy
Lagrange Based Quadratic Interpolation for Color Image Demosaicking
17. 392-395
Shilpa N.S., Shivakumar Dalali
Case Study: Leveraging Biometrics to Big Data
18. 396-398
Shivakumar Dalali, Dr. Suresh L., Dr. Chandrakant Naikodi
‘DHERO’-Mobile Location Based fast Services
19. 399-401
Bharath D., Anand S Uppar
‘Im@’- A Technique for Sharing Location
20. 402-405
Bhavya A., Balapradeep K. N., Dr. Antony P. J.
Self-Vanishing of Data in the Cloud Using Intelligent Storage
21. 406-408
Shruthi, Ramya N., SwathiS.M., SreelathaP.K
A Survey on Various Comparisons on Anonymous Routing Protocol in MANET’S
22. 409-411
Dr. Rajashree V. Biradar , K. Divya Bhavani
Stock Market Prediction: A Survey
23. 412-414
Guruprasad S., Rajshekhar Patil , Dr.Chandramouli H, Veena N
Identifying and Monitoring the Internet Traffic with Hadoop
24. 415-418
Ranganatha T.G., Narayana H.M
Authentication Prevention Online Technique in Auction Fraud Detection
25. 419-423
Anitha k., Priyanka M., Radha Shree B
Differential Query Services Using Efficient Information Retrieval Query Scheme In
26. Cost-Efficient Cloud Environment 424-427

Shwetha R., Kishor Kumar K., Dr. Antony P. J.


An Efficient and Effective Information Hiding Scheme using Symmetric Key
27. cryptography 428-431
Sushma U., Dr. D.R. Shashi Kumar
ATM Deployment using Rank Based Genetic Algorithm with convolution
28. 432-435
Kulkarni Manjusha M.
Data Oblivious Caching Framework for Hadoop using MapReduce in Big data
29. 436-439
Sindhuja.M, Hemalatha.S
PAPR Reduction For STBC MIMO-OFDM Using Modified PTS Technique Combined
30. With Interleaving and Pulse Shaping 440-444
Poonam, Sujatha S
Generation of Migration List of Media Streaming Applications for Resource Allocation
31. in Cloud Computing 445-447
Vinitha Pandiyan, Preethi, Manjunath S.
Child Tracking in School Bus Using GPS and RFID
32. 448-449
ShilpithaSwarna, Prithvi B. S., Veena N.
Risk Mitigation by Overcoming Time Latency during Maternity- An IoT Based
33. Approach 450-452
Sanjana Ghosh, R. Valliammai, Kiran Babu T.S., Manoj Challa

Predicting Future Resources for Dynamic Resource Management using Virtual


34. Machines in Cloud Environment 453-457
Vidya Myageri,, Mrs. Preethi. S
Pixel Based Approach of Satellite Image Classification
35. 458-460
Rohith. K.M., Dr.D.R.Shashi Kumar, VenuGopal A.S.
International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Simulation and Analysis of an Energy efficient protocol Ad-LEACH for Smart


Home Wireless Sensor Network

R. Kavitha1, Dr. Nasira G M2


1
Department of Computer Science, Christ University, Bangalore, Karnataka, India
2
Department of Computer Applications, Chikkanna Govt. College, Tirupur, Tamilnadu, India
kavitha.r@christuniversity.in, nasiragm99@yahoo.com

Abstract—Wireless Sensor Network is one of the emerging


technology in the research field. It made up of many
inexpensive, tiny sensors connected through the wireless
network. Routing with minimal energy is the major
challenging task in this network. Many cluster based routing
protocols have been developed to overcome this challenge. In
this paper, we proposed a new energy efficient protocol Ad-
LEACH. The performance of this protocol is compared with
other protocols using MATLAB. The simulation result shows
that proposed protocol gives better performance in terms of
energy efficient. This protocol can be used in smart home
wireless sensor network also. Fig 1. Wireless Sensor Network in a Smart Home

Keywords—Wireless Sensor Network, Cluster, Energy II. LEACH (LOW ENERGY ADAPTIVE CLUSTERING
Efficient, Smart Home HIERARCHY)
The first hierarchical cluster-based routing protocol for
I. INTRODUCTION wireless sensor network is LEACH[1]. This protocol divides the
A wireless sensor Network(WSN) is made up of large nodes in the network into clusters. As shown in Fig 2 each cluster
number of small sensors with low-power transceivers. It helps to has a Cluster Head (CH). This dedicated CH node have extra
privileges. It is responsible for creating and manipulating a
gather data in a different environment. Each sensor collects data
TDMA (Time division multiple access) schedule[7]. The
and sends it through the network to a single processing centre, aggregated data send from nodes to the BS using CDMA (Code
base station (BS). These collected data helps to decide the division multiple access) by CH. Other than the CHs the
features of the environment or to detect a state of an object in a remaining nodes in a network are cluster members. LEACH
network. As shown in Fig 1, in a smart home(SH) protocol is divided into rounds. Each round consists of two
environment[6] each sensor deployed devices are considered as phases:
nodes in WSN. The status of the device information is sensed by  Set-up Phase
the sensor. Then SH-WSN passes the sensed information to the
o Advertisement Phase
BS. The collected information are used to know the current
scenario of the smart home. It plays a great role in controlling the o Cluster Set-up Phase
smart home.  Steady Phase
Each node in the WSN spends energy to transmit collected o Schedule Creation
data to CH. Each CH spends energy to receive data from all the o Data Transmission
nodes in the cluster, to aggregate the collected data and transmit
to the BS. The network protocol plays a vital role in the data A. Setup Phase
communication. Since WSN consume energy for the Each node independently decides that it can become a CH or
communication, it is major and critical task to identify the energy not. When the node became CH, it decides based on the node
efficient protocol. There are many protocols are proposed for served as a CH for the last time. There is a more chance to
WSN. Among those LEACH(Low Energy Adaptive Clustering become CH for a node that hasn't been a CH for long time than
Hierarchy) protocol helps to save energy in the smaller WSN like the nodes that have been a CH recently. During the
advertisement phase, an advertisement packet helps CH to
smart home.

ICCIT15 @ CiTech, Banglore Page 336


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

informs their neighborhood that they become CHs. Non-CH Hierarchical Cluster-Based Routing - HRC[3] generate
nodes receive this CH information with the strongest signal energy-efficient clusters in a sensor network. It generates the CH
strength. By sending their IDs, the member nodes inform the set. Members of a cluster set are selected as a CH using round-
CH that they become a member of that cluster. All the CHs
robin technique. The formed clusters are continued for a short
communicate to the cluster members using TDMA. Now the
CH knows the number of member nodes and their Ids in the period of time called round. A round consists two phases namely
cluster. Based on all messages received within the cluster, the called an election phase and a data transfer phase. In the first
CH creates a TDMA schedule, pick a CDMA code randomly, phase, the sensor nodes generate the cluster with the head-set. In
and broadcast the TDMA table to cluster members. After that the second phase, the members in the head-set transmit the
steady-state phase begins. collected data to the BS as per their turn. The HCR protocol is
more energy efficient than the traditional cluster-based routing
techniques for continuous monitoring applications.
Enhanced-LEACH - En-LEACH [4] is the another version
of LEACH protocol. Probability method is used to select the
cluster CH. The formula used for selecting cluster head is
Cluster Head = Energy of the node / Energy of the Cluster
Objective of En-LEACH
Fig 2. LEACH Protocol
o To handle cluster-head failure
B. Steady-State Phase o To account for the non-uniform and dynamic
Actual data transmission begins in this phase. Nodes starts residual energy of the nodes
sending their data to the CH during their allocated TDMA slot.
After implementation, the result shows the first node death is
Minimal amount of energy is used in this transmission. All the
almost two times later than the LEACH. Last node death occurs
other non-CH node's radio can be turned off until the nodes are
much later than LEACH.
allocated TDMA slot. This minimize the energy dissipation in
these nodes. The CH aggregates the received data and sends it to Ad-LEACH is the new approach for the wireless sensor
the BS. network. In this approach, CH is elected based on two criteria (i)
node's residual energy (ii) node should not serve as a CH
DISADVANTAGE OF LEACH recently. The higher priority node will be elected as a CH. After
this, the next priority node will be considered as CH-Rep(Cluster
LEACH protocol selects the CH randomly without
Head Representative). In LEACH protocol, CH loses more
considering energy consumption. In this protocol, a node with
energy than the normal node because it spends energy to receive
the less energy has the same priority to become CH as the node
data from all the cluster node and to transmit to the base station
with the more energy. This results, all nodes die soon that leads
which is far away. CH spends less energy to transmit since it
the network failure fast. Since LEACH has the drawback, many
transmits single base station and spend more energy to receive
researchers have been done to make this protocol perform
since it receives from many nodes. Here CH-Rep helps CH to
better[5].
save energy. It receives data from all the nodes in a cluster,
aggregate it and transmit to the CH. Now the CH transmit to the
III. LITERATURE REVIEW base station. In Ad-LEACH, CH spends less energy than in
Cluster Based Routing Protocol-CBRP[2] is a distributive LEACH since it receives data from only one node CH-Rep.
energy efficient protocol for data gathering in wireless sensor
network. This protocol elects CH only based on a node’s own IV. IMPLEMENTATION
residual energy. After the CH selection, CBRP establishes a
Nowadays, research in the area of low-energy radio is a great
spanning tree over all of the CHs. Only the root node of the
challenge for researchers. There are distinct theories about the
spanning tree can communicate with the sink node by single-hop
radio model and energy dissipation in the transmit and receive
communication. The energy consumption by the nodes in the
mode. Fig 3 shows the radio energy dissipation model[1] used in
network for all communication is calculated by the free space
our work.
model. CBRP proved that the energy saved extremely and
extends the network lifetime.

ICCIT15 @ CiTech, Banglore Page 337


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Fig 3 First Order Radio Model


In this model, the radio dissipates Eelec = 50nJ/bit to run the
transmitter or receiver circuitry and =100nJ/bit/m2 for the Fig 4a. Sensor dead(dots) and remain alive(circle) after 50
transmit amplifier. By using this radio model, transmit a k-bit rounds in LEACH protocol
message to the distance d the radio spends:

and to receive k-bit message, the radio spends

Fig 4b. Sensor dead(dots) and remain alive(circle) after 50


All the nodes in the network have energy E(n) except base
rounds in Ad-LEACH protocol
station. The characteristics of the test network are specified in
table I. After 50 rounds of transmission, many sensor nodes
dead(dots) and very few sensor nodes are alive(circle) in the
TABLE I. Parameter used in simulation
LEACH protocol. But, in Ad-LEACH many sensor nodes are
Parameter Value remains alive(circle) after 50 rounds of transmission. These
results are shown in figure 4a and 4b respectively.
No. of Nodes 100
Network Size 100 X 100
Node Position Random
Data Size 1000 Bits
No. of Rounds 50

I. RESULT
Ad-LEACH protocol is simulated using MATLAB tool. The
sensor network consists of 100 nodes that scatter randomly in 100 Figure 5 Network Lifetime
X 100 of a square field. The table I shows all parameters used to
implement the Ad-LEACH protocol. In this simulation, all node
begins with 20J of energy. TABLE. II Comparative Study of CBRP, HCR, En-LEACH &
Ad-LEACH
En- Ad-
Features CBRP HCR
LEACH LEACH
One from node's
Node with High
Selection Head-Set Probability residual
Residual
of CH by Round method energy
Energy
Robin &node

ICCIT15 @ CiTech, Banglore Page 338


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Method should After 50 rounds of transmission, comparison of network


not lifetime of the protocols LEACH and Ad-LEACH are shown in
served Figure 5. This simulation proves Ad-LEACH perform well than
as a CH LEACH in terms of network lifetime of the sensor network.
recently There are many protocols proposed for energy efficient WSN.
Network The performance of Ad-LEACH protocol is compared with other
Life time three main protocols CBRP, HCR, and En-LEACH. These four
60% protocols performance are measured with the base protocol
compared 80% more 30% more 50% more
more LEACH. TABLE I shows this result comparative analysis.
with
LEACH
Data Minimal Decided II. CONCLUSIONS
Transfer Spanning Tree by Routing in wireless sensor network for a smart home is a
Mechanism with Single Genetic CDMA TDMA emerge area of research. Since sensor networks are designed for
between Hop Algorithm specific applications, designing efficient routing protocols for
CH &BS Communication (GA) sensor networks is very important. In this paper, we discussed
All cluster Ad-LEACH, a cluster based routing protocol which can be used
node
members in a smaller network application like smart home, minimize the
which
are kept energy usage by the entire network. Ad-LEACH is better than
Only Spanning not
informed LEACH in terms of CH selection, optimizing CH energy
Tree Root node served
Special Head-Set about consumption and extending network lifetime.
will as a CH
Feature approach energy
Communicate recently
status of
with BS will
their REFERENCES
become
cluster-
a CH
head
[1] WendiRabiner Heinzelman, AnanthaChandrakasan, and Hari Balakrishnan,
A node 2000. Energy-Efficient Communication Protocol for Wireless Micro sensor
with Networks. IEEE Proceedings of the 33rd Hawaii International Conference
GA needs
more on System Sciences.
If Same to
Less energy, [2] D Bager Zarei, Mohammad Zeynali and Vahid Majid Nezhad. 2010. Novel
PSV(Parent improve Cluster Based Routing Protocol in Wireless Sensor Networks. IJCSI
energy served
Drawback Selection while International Journal of Computer Science.
save than CH
Value) come generating [3] Sajid Hussain and Abdul W. Matin, Hierarchical Cluster-based Routing in
CBRP recently Wireless Sensor Networks, IPSN, US.
for two nodes hierarchal
will not [4] Suyog Pawar, Prabha Kasliwal. 2012. Design and Evaluation of En-
structure
be next LEACH Routing Protocol for Wireless Sensor Network. IEEE International
CH Conference on Cyber-Enabled Distributed Computing and Knowledge
Discovery.
[5] Debnath Bhattacharyya , Tai-hoon Kim , and Subhajit Pal. 2010
Comparative Study of Wireless Sensor Networks and Their Routing
Protocols.
We tracked the lifetime of the nodes in the network. The [6] R. Kavitha, Dr. G. M. Nasira, Dr. M. Nachamai. 2012. Smart home systems
summarized results are as follows: using wireless sensor network – A comparative analysis. International
Journal Of Computer Engineering & Technology (IJCET).
 The first node die in the 7th round of Ad-LEACH hence [7] Suyog Pawar, Prabha Kasliwal, Design and Evaluation of En-LEACH
it occurs in the 1st round itself in the LEACH protocol. Routing Protocol for Wireless Sensor Network

 At the end of 50th round almost 85 nodes are die in the


LEACH, but in Ad-LEACH only 55nodes are die which
shows the Ad-LEACH protocol can help the network to
keep more life.

ICCIT15 @ CiTech, Banglore Page 339


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Troubleshooter: Solution Finder for log errors from Multiple Solution Sources
SmitaB Patil Dr. D R Shashikumar
M.Tech Student Deptt of CSE, CiTech, Bangalure, Karnataka India
smitapatil.july90@gmail.com

Abstract : Currently, one of the challenges for team- independently and would help the team to correlate the data
Escalation is to have a unified tool through all the Knowledge available with other knowledge sources. The result with respect
Base can be searched together. With this tool-Troubleshooter, to the different solution sources are display on different tabs of
we would be able to facilitate the team to pull the information the tabbed pane where each tabs will display names of solution
from all the knowledge sources such as, engineering KB, sources so that it is easy for the user to know from where the
Documentationand JIRA or even from external Google. solutions are fetched.
Further, in Engineering Knowledge base, we have multiple
sources like bug tracking system, Forum and even customer The tool Troubleshooter also provided with list of
searchable knowledge base. With this proposed tool- standard java errors and also some frequently occurring errors
“Troubleshooter: Solution Finder for Log Errors”, we will be with respect to the different products- SRM, NCM/UIM, ViPR,
integrating all the knowledge sources under one umbrella SMARTS and Watch4Net.By providing bunch of errors
through which the team can search through. This would Troubleshooter helps the user to directly select an error from the
reduce the time taken to search all knowledge sources provided list and perform search operation instead of typing
independently and would help the team to correlate the data error manually and perform search operation.
available with other knowledge sources. Additionally, this tool
The tool is very useful for the user to find solutions
also has the capacity to search based on the product selected
from different solution sources at a time as there is no such
and thus would help in filtering the result which are essentially
application which provides solutions from multiple solution
for the selected product.
sources at time this tool helps them for the same. Initially this
tool will authenticate the user and never ask again for the
Index Terms : Search Solutions in single solution sources, authentication. But without this tool one need to authenticate by
Search Solutions in Multiple solution sources. History for
all solutions sources individually irrespective of other solution
Search Results, Product Based Search.
sources. Without this tool the user needs to give his/her
username and password manually for the multiple times
I. INTRODUCTION whenever solution sources needs an authentication this is one
overhead work for the user.
The escalation team in industryis used to examine the log files
manually for an error. Once the employee comes up with errors
in the log fie, his/her next target is to find solutions for the II. Existing System
obtained error(s) with respective to the products.
There are many systems exists to find the errors from log files
With proposed system-Troubleshooter, we would be for examples Xpolog, Log Analyzer, Event Log Analyzer,
able to facilitate the team to pull the information from all the Piwik, nxlog and Octopussy. These log analyzing systems that
knowledge sources such as, engineering KB, Documentation and are available in the market are used to analyze a log file and to
JIRA or even from external Google. Further, in Engineering find the errors in the log file, but there is no existing system that
Knowledge base, we have multiple sources like bug tracking finds the solutions for the errors-Obtained after log search. By
system, Forum and even customer searchable knowledge base. which we can conclude that there is no existing system that
With this proposed tool-“Troubleshooter: Solution Finder for provides an efficient way for finding solutions from different
Log Errors”, we will be integrating all the knowledge sources solution sources within same framework. But there are
under one umbrella through which the team can search through. Knowledge sources from which user can get solutions
individually irrespective of each other Knowledge Bases.
The tool-Troubleshooter: Solution Finder for log errors
from Multiple Solution Sources provides a solution search from III. Solution search in multiple solution sources
different solution sources with respect to the products like SRM,
NCM/UIM, ViPR, SMARTS and Watch4Net. The tool “Troubleshooter: Solution Finder for log errors from
Multiple Solution Sources” - The objective of this tool is to
The Troubleshooter will also provide a search history provide an efficient way of finding solutions for log errors from
for the user. With search history one can directly go back to the single or multiple solution sources and display solutions within
previous search operations. The GUI of the tool is very simple the same framework- One can easilycorrelate the data available
and user friendly. Troubleshooter: Solution Finder for log errors with all the knowledge sourcesandfind the most relevant solution
from Multiple Solution Sources is a standalone tool so it needs among all the errors obtained from multiple solution sources for
to install on each user’s personal computer. This tool would an errors-Errors obtained by log file search.
reduce the time taken to search all knowledge sources

ICCIT15@CiTech, Bangalore Page 340


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

This tool provides the user to perform search in single or Obtained errors from the log files search are given as
multiple solution sources- Knowledge Base, Clear Quest, JIRA, input to this tool- Troubleshooter: Solution Finder for log errors
Forum and Documentation. We will be integrating all the from Multiple Solution Sources. Then the tool asks the user to
knowledge sources under one umbrella through which the team input the name of solution source(s) and name of the product
can search through. This would reduce the time taken to search based on user requirements. Once after getting proper input from
all knowledge sources independently and would help the team to the user the tool searches for given error in specified solution for
correlate the data available with other knowledge sources. the specified product based on user requirements. After solution
Additionally, this tool also has the capacity to search based on search the tool displays output from different solution sources on
the product selected and thus would help in filtering the result different tabs of display.
which are essentially for the selected product-SRM, NCM,
ViPR, SMARTS and Watch4Net. IV. Search History
The Tool will also provide a search history for the user, History contains an information such as Product selected-the
in case if the user facing same errors again and again very type of product that user has been selected, Searched for-The
frequently that time user can use the history to go back to his/her query for which user want to find the solution from multiple
previous search operations to get relevant solution instead of solution sources with respect to an product, Solution Sources-
searching solution from multiple solution sources for the same Name of solution source(s) where user want to search a query
error again. The GUI of the tool is very simple and user friendly, and Date and Time of search process.
the tool tips are provided for all the components of the tool so
that one can easily operate this tool. This tool is standalone so it V. Results
needs to install on each user‘s personal computer. This tool
reduces workload of user by giving an option to search the error A. Initial Screen
for solutions in multiple solution sources. The result with respect
to the different solution sources are display on different tabs of
the tabbed pane where each tabs will display names of solution
sources so that it is easy for the user to get from where the
solutions are fetched and tool will also provide products filtered
search results- The search is based on particular product based
on user requirement.
The tool also provided with some standard java errors
and also the bunch of errors with respect to the products- SRM,
NCM/UIM, ViPR, SMARTS and Watch4Net. This helps the
user to directly select an error from the provided list and perform
search operation instead of typing error manually and perform
search operation.
The tool is very useful for the user to find solutions
from different solution sources because it integrate all the
knowledge sources under one umbrella through which the team
can search through. Initially this tool will authenticate the user
and never ask again for the authentication. But without this tool Fig 2: Initial Screen of the Tool-Troubleshooter: Solution Finder
one need to authenticate by all solutions sources individually for log errors from Multiple Solution Sources
with irrespective of other solution sources. Without this tool the
user needs to give his/her username and password manually for This snapshot shows how an initial page looks. Initial
the multiple times whenever a solution source needs an page contains knowledge base search where user need to select
authentication this is again an overhead for the user. Knowledge sources and products based on his/her requirements
and post the query in the text field. Query can be entered
manually by user or can be selected from standard java errors list
or can be selected from the product based errors that the tool
provided.

Solution search process takes place by searching the


query posted on the text filed in selected knowledge base for the
selected product. The query that user want to search may be a
standard error, product based error, error found after log search
or it can be manually typed by the user. Results obtained from
multiple solution sources are displayed on the display window.
Fig 1- System architecture of a Tool-Troubleshooter: Solution
The result with respect to the different solution sources are
Finder for log errors from Multiple Solution Sources
display on different tabs of the tabbed pane where each tabs will

ICCIT15@CiTech, Bangalore Page 341


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

display names of solution sources so that it is easy for the user to solutions from different solution sources are displayed on
know from where the solutions are fetched. One can see the different tabs of tabbed pane on display window. Where each
different tabs with the knowledge base name in initial screen fig tabs will display names of solution sources so that it is easy for
2. the user to get from where the solutions are fetched.

B. History Table V. Conclusion


History of the searched results is shown in history tab of the With this Tool- Troubleshooter: Solution Finder for log errors
display window as shown in fig 3. History contains an from Multiple Solution Sources. One can find solution from
information such as Product selected-the type of product that single or multiple solution sources and get the solution from
user has been selected, Searched for-The query for which user each solution sources. Using this proposed tool, we will be
want to find the solution from multiple solution sources with integrating all the knowledge sources under one umbrella
respect to an product, Solution Sources- Name of solution through which the team can search through. This would reduce
source(s) where user wants to search a query and Date and Time the time taken to search all knowledge sources independently
of search process. and would help the team to correlate the data available with
other knowledge sources. Additionally, this tool also has the
capacity to search based on the product selected and thus would
help in filtering the result which are essentially for the selected
product.The tool will also provide a History for searched results.
With this history user can go to his/her previous search if he/she
come up with same errors again.

REFERENCES
i. http://www.xpolog.com/
ii. https://eventloganalyzer.codeplex.com/
iii. http://en.wikipedia.org/wiki/Piwik
iv. https://toolbox.googleapps.com/apps/loganalyzer/
v. http://en.wikipedia.org/wiki/JIRA
vi. http://www.google.com/custom?q
Fig 3: vii. http://nxlog-ce.sourceforge.net/
History for the searched results. viii. http://sourceforge.net/projects/syslog-analyzer/
ix. https://confluence.atlassian.com/display/JIRA/JIRA +Requirements
C. Solutions from Solution Sources.

Fig 4:
Solutions from CQ Knowledge source with respect to ViPR
product for the query selected from product based error.

Fig 4 shows how the tool display the results from a CQ


Knowledge Base for the product ViPR and for the query selected
from the product based error. The solutions are displayed from
CQ Knowledge base in CQ tab of the display. User can further
view the detailed information by clicking on the links that are
displayed on the display window. Similarly the tool will search
solution from different solution sources and the obtained

ICCIT15@CiTech, Bangalore Page 342


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

A Web Enabled Wireless Sensor Networks System For Precision Agriculture


Applications Using Internet Of Things
SOWMYA L1, KRISHNA KUMAR P R2
¹Student of M.Tech(CSE), IV Semester

² Associate Professor, Dept. of M.Tech(CSE), Cambridge Institute of Technology, Bengaluru-036

sowmyaklrsv@gmail.com, rana.krishnakumar@citech.edu.in

ABSTRACT- Environmental Monitoring Systems and Sensors Agriculture environments are complex system where significant
systems have increased in importance over the years. However, changes in one environmental factor could have an adverse
increases in measurement points mean increases in installation effect on another. Environmental factors can affect survival and
and maintenance cost. Not to mention, the measurement points growth, in particular with regards to germination, sprouting,
once they have been built and installed, can be tedious to flowering and fruit development. They can also indicate
relocate in the future. Therefore, the purpose of this Master’s increased risk of disease and be used for prediction of upcoming
thesis is to present a project called “A web enabled wireless changes in the environment. It is therefore of particular interest
sensor network system for precision agriculture application to monitor these environmental factors in particular for any
using Internet of Things” which is capable of intelligently control and management systems that might be implemented.
monitoring agricultural conditions in a pre-programmed Temperature, humidity, pollution sensor, soil moisture, are the
manner. The proposed system consists of three stations: Sensor variables that are of interest to growers. Manual collection of
Node, Router, and Server. To allow for better monitoring of the data for desired factors can be sporadic, not continuous and
climate condition in an agricultural environment such as field produce variations from incorrect measurement taking. This can
or greenhouse, the sensor station is equipped with several cause difficulty in controlling these important factors. Sensor
sensor elements such as Temperature, humidity, pollution networks have been deployed for a wide variety of applications
sensor, and soil moisture. The communication between the and awareness has increased with regards to implementing
sensor node and the server is achieved via wireless ZigBee technology into an agricultural environment. Sensors are
modules. The overall system architecture shows advantages in becoming the solution to many existing problems in industries
cost, size, flexibility and power. It is believed that the outcomes with their ability to operate in a wide range of environments.
of the project allow for opportunities to perform further
Sensor nodes can reduce the time and effort required to monitor
research and development of a ZigBee based Wireless Sensor
an environment. This method reduces the risk of information
Network that is a portable and flexible type of sensing system
being lost or misplaced. It would also allow placement in critical
for an Agricultural Environment.
locations without the need to place personnel at risk. Monitoring
Index Terms – Internet of Things, Precision Agriculture, Zigbee systems can permit quicker response time to adverse factors and
based Wireless Sensor Network. conditions, better quality control to produce and lower labour
cost. The utilization of this technology would allow for remote
1. INTRODUCTION measurements of factors such as temperature, humidity, soil
moisture, pollution.
Agriculture products are dependent upon environmental factors
where plant growth and development are largely affected by the In Agriculture field of studies the sensing devices are mainly
conditions experienced. Similarly diseases that occur due to necessary for two intensions,
environmental factors can cause plant growth to be significantly
i. Sense and communicate with actuators
affected.
ii.To sense the parameter and send the information to remote
Agriculture environments such as fields and greenhouses allow
base station for expert analysis.
growers to produce plants with an emphasis on agriculture yield
and productivity. In addition, it also provides the possibility to In this paper study an attempt has been made to develop
grow plants in environments previously not suited for the task. KrishiSense.A Web enabled WSN system for agriculture
In particular, the use of greenhouse provides plants with application using IOT integrate the open Geospatial Consortium
protection from the harsh weather conditions, diseases and a specified Sensor Web Enablement standards on the sensing
controlled environment. system thereby enabling the interoperability between different
standardized sensing devices.KrishiSense is an interconnection

ICCIT15@CiTech, Bangalore Page 343


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
between multiple researchers,scientists,farmers and extension The system architecture consists of low powered system on chip
community through multiple protocol and distributed web single board computer as a data collection and dissemination
connected platforms,thusfaciliting human participatory sensing. platform. It also acts as WI-FIhotspot for other sensor nodes and
acess point to the farmers/stakeholders.
2. WORKING PRINCIPLE
3.1 Sensing Part
In this project we are developing a system that is based on
precision agriculture system. Here in our project we will be The sensing system comprises of SoC hardware with Linux
using Temperature Sensor, humidity sensor, Soil Moisture operating system and softwares such as Apache. Number of
sensor, Pollution Sensor to monitor the different parameters in sensor like temperature,humidity,soil temperature etc are
the agriculture system and a ZigBee Module to wirelessly connected with hardware with time settings of minimum
transmit to the PC and update the same on the server. From 15minutes.The sensing system transfer the informations to
server it can get updates of all sensor information about remote server through FTP.
agriculture. Soil Moisture and Temperature Sensor is the analog 3.2 Server Part
sensor and it had been connected to the ADC, humidity sensor
and Carbon dioxide sensor and carbon monoxide sensors are The Server part acts as a data assimilation platform of sensing
Digital sensors which had been directly connected to the system. The Remote SOS Server part facilitates the FTP
controller. Here we will be monitoring all these sensors, and the communication with field level which in turn simplify the
data collected by all these sensors are updated on PC that is remote configuration of field sensor nodes.
further updated on the Server. The owner can remotely see any 3.3 Client Part
activity on his/her PDA or smartphone by going on to the server.
Thus, making a very efficient way of monitoring the agriculture The client part of sensing system can communicate through
and increasing the productivity. multiple protocols and multi-modal communication platforms
such as mobile and internet. The sensing system configuaration
3. SYSTEM ARCHITECTURE tasks of information collection and transfer interval,change in
hardware,faultdetection,etc can be authenticate based on remote
In system architecture it consists of three parts they are , i. configuaration module.
Sensing part ii. Server part iii.Client part (fig 1).All three parts
are interconnected to internet forming a WSN based close loop 4. RESULTS
internet of things platform.
The output result of Web enabled WSN system for agriculture
using IOT has been designed to precisely monitor citrus crop
resources like soil, water and weather requirements and
management in vidarbha,semi arid tropical region of
Maharashtra,India. The guidelines for automatic weather station
deployment are considered for sensor placement in agriculture
sensing system.

The main development of this project over existing sensing


systems are, i.real time processing of raw analog voltage
information,ii. It convert raw observations into real values and
iii. Real time update of observations into SOS database.These
facilitates the interoperability of agriculture with any other
sensing system.

In figure, the placement of various sensors is 1) two sensors


each temperature and humidity are placed at height of 5 meter
from ground,2) similarly two sensors are placed at height of 2.5
meter from ground,3) at radial distance of 5 meter from orange
tree, two soil temperature and one soil moisture sensors are
placed at 15, 30 and 30cm depth respectively and 4) remaining
Figure 1. System Architecture channel is closed using pull down resistor(figure 2).

ICCIT15@CiTech, Bangalore Page 344


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
iv. A. Piedra, F.B. Capistros, F. Dominguez, A. Touhafi, "Wireless
Sensor Networks for Environmental Research: A Survey on Limitations and
Challenges", EuroCon 2013, Zagreb, Croatia, 1-4 July, 2013.
v. I. Mampentzidou, E. Karapistoli, A.A. Economide, "Basic Guidelines
for Deploying Wireless Sensor Networks in Agriculture", Fourth International
Workshop on MobileComputing and Networking Technologies, pp. 864-869,
2012.
vi. M. Botts, A. Robin, J. Davidson, I. Simonis, "OpenGIS Sensor Web
Enablement (SWE) Architecture Document", OpenGeospatial Consortium, OGC
06-021r1, 2006, http://www.opengeospatial.org/standards, accessed on: 26
Nov., 2013.

Figure 2. Agricultural sensing system field layout

5. CONCLUSION

The study showed that the selection of proper data processing


location based real time agro-meterological sensing system can
be designed. The web enabled WSN system for agriculture using
IOT provides better plants grow in the best condition. Man
power can be reduce by applying this method.it also helps for
out of seasons plants to be grown. The change in weather can not
affect the plant growth. Finally, the main application is, We can
operate this project to find out geographical area are suitable for
agriculture.

REFERNCES

i. S. Li, J. Cui, Z. Li, "Wireless Sensor Network for Preci7se Agriculture


Monitoring,"Fourth International Conference onIntelligent Computation
Technology and Automation, Shenzhen, China, March 28-29, 2011.
ii. A. Piedra, F.B. Capistros, F. Dominguez, A. Touhafi, "Wireless
Sensor Networks for Environmntal Research: A Survey on Limitations and
Challenges", EuroCon 2013, Zagreb, Croatia, 1-4 July, 2013 e, F. Soto, J.
Suardiaz, et. al., "Wireless Sensor Networks for Precision Horticulture in
Southern Spain", Computers and Electronics in Agriculture, vol. 68, pp. 25-35,
2009.
iii. V.M. Patil, S.S. Durbha, J. Adinarayana, "Standards-Based Sensor
Web for Agro-Informatics Applications", InternationalGeoscience and Remote
Sensing Symposium (IGRASS 2012),Muninch, Germany, pp. 6681-6684, 22-27
July, 2012.

ICCIT15@CiTech, Bangalore Page 345


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

A Survey on Developing an Efficient Optimization Technique for Cluster Head


Selection in Wireless Sensor Network
Sridhar R.1, Dr. N Guruprasad2
1
Dept of ISE, Global Academy Of Technology, Bangalore
2
Dept. of CSE, Raja Rajeswari College Of Engineering, Bangalore
srimln@yahoo.com, nguruprasad18@gmail.com
Abstract - This paper proposes an efficient optimization numerous energy-aware protocols and algorithms for wireless
technique for the selection of cluster head with distinctive sensor networks (e.g. S-MAC [11], SPIN [19]) have been
features, which will produce a dynamic energy efficient proposed by researchers [14]. In order to reduce the amount of
wireless sensor networks. It proposes an energy efficient traffic in the network, we build clusters of sensor nodes as
artificial bee colony (ABC) algorithm for cluster head selection proposed in e.g. [15, 18, and 7]. In wireless sensor networks data
in wireless sensor networks. The optimization is in the sense; transmission is very expensive in terms of energy consumption,
the wireless sensor network will choose the cluster head and while data processing consumes significantly less [9].
then gives efficient sensor readings with less number nodes Minimizing the number of communications by eliminating or
and reduced energy consumption. The cluster heads are aggregating redundant sensed data saves much amount of energy
planned to perform effective aggregation of the sensor [13]. Hierarchical or cluster-based routing, are well-known
readings. The sensor readings are added dynamically to the techniques with special advantages related to scalability and
cluster head from the sensor nodes. The cluster head is then efficient communication. As such, the concept of hierarchical
aggregates the sensor readings and passes to the base station routing is also utilized to perform energy-efficient routing in
(BS). The expecting outcome of the paper is the remarkable WSNs. In a hierarchical architecture, higher energy nodes can be
reduction of energy consumption because of the dynamic and used to process and send the information while low energy
efficient optimization technique for the selection of cluster nodes can be used to perform the sensing in the proximity of the
head in wireless sensor networks. target. Some of routing protocols in this group are: LEACH [14],
PEGASIS [11], TEEN [19] and APTEEN [6]. Placing few
Keywords – Wireless Sensor networks, cluster head,
heterogeneous nodes in wireless sensor network is an effective
optimization
way to increase network lifetime and reliability.
1. Introduction However, the LEACH algorithm selects the cluster heads
dynamically and frequently by round mechanism, which makes
A wireless sensor network consists of tiny sensing devices, the cluster heads broadcast messages to all the general nodes in
which normally run on battery power. Sensor nodes are densely the establishment stage with additional energy consumption.
deployed in the region of interest. Each device has sensing and Thus modification on clustering algorithms becomes inevitable
wireless communication capabilities, which enable it to for energy efficient wireless sensor networks [4].
sense and gather information from the environment and then
send the data and messages to other nodes in the sensor The concept of cluster heads alleviates the some problems
network or to the remote base station[16].Wireless sensor regarding the hierarchical clustering to an extent. I.e., some
networks have been envisioned to have a wide range of sensor nodes become cluster heads and collect all traffic from
application in both military and civilian domains [10]. Due to the their respective cluster. The cluster head aggregates the collected
less power of sensor node energy, researchers have designed lots data and then sends it to its base station [6]. The nodes in
of energy-efficient routing protocols to prolong the life time of wireless sensor networks are often assumed to be organized into
sensor networks [21].The energy source of sensor nodes in clusters. A typical scenario is that sensor readings are first
wireless sensor networks (WSN) is usually powered by battery, collected in each cluster by a designated node, known as cluster
which is undesirable, even impossible to be recharged or head, that aggregates them and sends only the result of the
replaced. Therefore, improving the energy efficiency and aggregation to the base station [1]. In a homogeneous network,
maximizing the networking lifetime are the major challenges in cluster head uses more energy than non-cluster head nodes. As a
sensor networks [20]. Considering the limited energy result, network performance decreases since the cluster head
capabilities of an individual sensor, a sensor node can sense only nodes go down before other nodes do. Besides, if the algorithm
up to very limited area, so a wireless sensor network has a large strives to balance the energy consumption of every node, cluster
number of sensor nodes deployed in very high density (up to heads will be selected dynamically and frequently, which results
20nodes/m) [22, 5, 12], which causes severe problems such as in additional energy consumption for the cluster head set-up. At
scalability, redundancy, and radio channel contention [16]. the same time, some residual energy of general nodes cannot be
used effectively because of receiving broadcast message
Energy efficiency is one of core challenges in wireless sensor frequently from the new cluster head. Thus energy efficient
networks because energy is scarce and valuable. In order to cluster head selection algorithm is very important issue in
minimize energy expenditure and maximize network lifetime, clustered WSNs. Hence defining the optimization technique for

ICCIT15 @CiTech, Bangalore Page 346


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

cluster head selection minimizes the consumption of energy in characterized through novel stabilitynotions and shows which
WSN. transmission scheme is employed and which cluster nodes are
chosen to collaborate with the clusterhead. Extensive simulation
2. Literature Survey results are provided to demonstrate theeffectiveness of their
proposed game model and algorithm.
Our work is motivated by a number of prior works
related to clustering in wireless sensor network. Some of them Dali Wei et al [2] have proposed a distributed clustering
are analyzed here. algorithm, Energy-efficient Clustering (EC) that determines
suitable clustersizes depending on the hop distance to the data
OzlemDurmazIncel et al [17] have proposed a method for sink, while achieving approximate equalization of node lifetimes
Fast Data Collectionin Tree-Based Wireless Sensor Networks in and reducedenergy consumption levels. They additionally
their work, they explored andevaluated a number of different proposed a simpleenergy-efficient multi hop data collection
techniques using realistic simulation modelsunder the many-to- protocol to evaluatethe effectiveness of EC and calculate the
one communication paradigm known as converge cast. They end-to-end energyconsumption of this protocol; yet EC is
first consider time scheduling on a single frequency channel with suitable for any datacollection protocol that focuses on energy
the aim ofminimizing the number of time slots required conservation. Performanceresults demonstrate that EC extends
(schedule length) to complete a converge cast. Next, they network lifetimeand achieves energy equalization more
combined scheduling with transmission power control tomitigate effectively than two well-knownclustering algorithms, HEED
the effects of interference, and show that while power control and UCR.
helpsin reducing the schedule length under a single frequency,
scheduling transmissions using multiple frequencies is more Otgonchimeg Buyanjargal and Youngmi Kwon [22] have
efficient. They gave lower bounds on the schedule length when proposed a modified algorithm of Low Energy Adaptive
interference is completely eliminated, and propose algorithms Clustering Hierarchy (LEACH) protocol which is a well-known
that achieve these bounds. They also evaluate theperformance of energy efficient clustering algorithm for WSNs. Their modified
various channel assignment methods and find empirically that protocol called “Adaptive and Energy Efficient Clustering
formoderate size networks of about 100 nodes, the use of multi Algorithm for Event-Driven Application in Wireless Sensor
frequency scheduling can suffice to eliminate most of the Networks (AEEC)” is aimed at prolonging the lifetime of a
interference. Then, the data collectionrate no longer remains sensor network by balancing energy usage of the nodes. AEEC
limited by interference but by the topology of therouting tree. To makes the nodes with more residual energy have more chances
this end, they constructed degree-constrained spanning treesand to be selected as cluster head. Also, they used elector nodes
capacitated minimal spanning trees, and show significant which took the responsibility of collecting energy information of
improvement inscheduling performance over different the nearest sensor nodes and selecting the cluster head. They
deployment densities. Lastly, Theyevaluated the impact of compared the performance of their AEEC algorithm with the
different interference and channel models on the schedule LEACH protocol using simulations.
length. Dilip Kumar et al. [4] have studied the impact of
Guoliang Xing et al [8], have proposed a rendezvous-based heterogeneity of nodes in terms of their energy in wireless
data collectionapproach in which a subset of nodes serve as sensor networks that are hierarchically clustered. They have
rendezvous points that buffer andaggregate data originated from assumed that a percentage of the population of sensor nodes is
sources and transfer to the base station when itarrives. This equipped with the additional energy resources. They also
approach combines the advantages of controlled mobility and in- assumed that the sensor nodes were randomly distributed and
network data caching and can achieve a desirable balance were not mobile, the coordinates of the sink and the dimensions
between network energysaving and data collection delay. They of the sensor field were known. Homogeneous clustering
proposed efficient rendezvous designalgorithms with provable protocols assume that all the sensor nodes were equipped with
performance bounds for mobile base stations withvariable and the same amount of energy and as a result, they cannot take the
fixed tracks, respectively. The effectiveness of their advantage of the presence of node heterogeneity. Adapting this
approachwas validated through both theoretical analysis and approach, they have introduced an energy efficient
extensive simulations. heterogeneous clustered scheme for wireless sensor networks
based on weighted election probabilities of each node to become
Dan Wu et al [3], have proposed a method to focus on how a cluster head according to the residual energy in each node.
to select a proper transmission scheme, with the goal of Finally, the simulation results demonstrated that their proposed
improving the energy efficiency, e.g., prolonging the network heterogeneous clustering approach was more effective in
lifetime. In particular, they model the transmission scheme prolonging the network life-time compared with LEACH.
selection problem as a nontransferablecoalition formation game,
with the characteristic function based on the networklifetime. Xiang Min et al. [20] have presented the clustering algorithm
Then, a simple algorithm based on a merge-and-split rule and the mainly takes into account reducing the total energy consumption
Pareto order is proposed to form coalition groups among with optimum parameters. By optimizing the one-hop distance
individual sensor nodes.The resulting coalitional structure is and the clustering angle, all nodes are divided into static clusters

ICCIT15 @CiTech, Bangalore Page 347


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

with different sizes, which was maintained the connectivity and References
reduce the energy consumption for inter-cluster communication. Buttyan, L. and Holczer, T., “Perfectly anonymous data aggregation in wireless
sensor networks Mobile Adhoc and Sensor Systems (MASS), IEEE 7th
Besides, with continuous working mechanisms for cluster head
International Conference on, pp. 513-528, 2010.
acting as the local control center, the frequency of cluster head i. Dali Wei, Yichao Jin, SerdarVural, Klaus Moessner, Rahim Tafazolli,
updating was reduced, and the energy consumption for the new "An Energy Efficient Clustering Solution for Wireless Sensor Networks", IEEE
cluster head set-up was reduced. With the clustering algorithm, Transactions on Wireless Communications, VOL. 10, NO. 11, 2011
ii. Dan Wu, YuemingCai, Jinlong Wang, " A Coalition Formation
the total energy consumption for inter-cluster and intra-cluster
Framework forTransmission Scheme Selection in Wireless Sensor Networks",
communications was reduced. The simulation results show that IEEE Transactions OnVehicular Technology, VOL. 60, No. 6 2011.
the system time is extended effectively. iii. Dilip Kumar, Trilok C. Aseri, R.B. Patel, “EEHC: Energy efficient
heterogeneous clustered scheme for wireless sensor networks”, Computer
3. Proposed Methodology Communications, Vol. 32, pp. 662-667, 2009.
iv. D. Tian, and N. D. Georganas,” A Node Scheduling Scheme for
Energy Conservation in Large Wireless Sensor Networks” Thesis,
In 2005, Karaboga proposed an Artificial Bee Colony Multimedia Communications Research Laboratory, School of Information
(ABC), which is based on a particular intelligent behavior of Technology and Engineering, University of Ottawa, 2002.
honeybee swarms.ABC is developed based on inspecting the v. Ewa Hansen, Jonas Neander, Mikael Nolin, Mats Björkman ,
"Efficient Cluster Formation for Sensor Networks ", march 2006, “MRTC report
behaviors of real bees on finding nectar and sharing the ISSN 1404-3041 ISRN MDH-MRTC-199/2006-1- SE, Mälardalen Real-Time
information of food sources to the bees in the hive. Research Centre, MälardaleUniversity, March, 2006"
vi. G. Pei and C. Chien,”Low Power TDMA in Large Wireless Sensor
Agents in ABC are The Employed Bee, The Onlooker Bee and Networks”, Military Communications Conference, vol.1, pp.347–351, 2001.
The Scout. vii. Guoliang Xing, M Minming Li, Tian Wang, WeijiaJia and Jun Huang,
"Efficient Rendezvous Algorithms for Mobility Enabled Wireless Sensor
 The Employed bees: It stays on a food source and provides Networks", IEEE Transactions On Mobile Computing, VOL. 11, No. 1 2012.
the neighborhood of the source in its memory. viii. Hnin Yu Shwe, JIANG Xiao-hong, Susumu Horiguchi, “Energy
saving in wireless sensor networks”, Journal of Communication and Computer
 The Onlooker bees: It gets the information of food sources Volume 6, No.5, 2009
ix. I.F Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, “A survey
from the employed bees in the hive and select one of the on sensor networks. IEEE Communications Magazine”,pp.102–114, 2002.
food source to gathers the nectar. x. J. Kulik, W. Heinzelman, and H. Balakrishnan, “Negotiation-based
protocols for disseminating information in wireless sensor networks,” Wireless
 The Scout: It is responsible for finding new food, the new Networks, vol. 8, no. 2/3, pp. 69–185, 2002.
nectar, and sources. xi. J.M. McCune,”Adaptability in sensor networks” Undergraduate
Thesis in Computer Engineering, University of Virginia, April 2003.
Procedure of ABC: xii. K. Intae and R. Poovendran,“ Maximizing static network lifetime of
Initialize (Move the scouts). wireless broadcast ad hoc networks,” in Proceedings of the IEEE International
Move the onlookers. Conference on Communications pp. 2256–2261.11–15, 2003.
xiii. Liyang Yu , Neng Wang , Wei Zhang and Chunlei Zheng,”GROUP: a
Move the scouts only if the counters of the employed bees hit Grid-clustering Routing Protocol for Wireless Sensor Networks”, In proceedings
the limit. of Wireless Communications, Networking and Mobile Computing, pp. 1 - 5 ,
Update the memory 2006.
Check the termination condition xiv. M. Gerla, T. Kwon, and G. Pei,” On Demand Routing in Large Ad
Hoc Wireless Networks with Passive Clustering”, proceedings of IEEE Wireless
Thus ABC procedure optimizes by choosing the cluster head and Communications and Networking Confernce, pp.100-105.
then gives efficient sensor readings with less number of nodes xv. Mohammad Zeynali, Leili Mohammad Khanli and Amir Mollanejad
and reduced energy consumption. “TBRP: Novel Tree Based Routing Protocol in Wireless Sensor Network”,
International Journal of Grid and Distributed Computing, Vol. 2, No. 4, 2009.
xvi. OzlemDurmazIncel, Amitabha Ghosh, Bhaskar Krishnamachari, and
4. Objectives Krishnakant Chintalapudi, "Fast Data Collection in Tree-Based Wireless Sensor
Networks", IEEE Transactions On Mobile Computing, Vol. 11, No. 1 2012.
xvii. W. Heinzelman, A. Chandrakasan, and H. Balakrishnan,” Energy-
 A study over the recent techniques for developing a Efficient Communication Protocol for Wireless Microsensor Networks”, Maui,
cluster head selection technique for wireless sensor network. Hawaii, In Proceedings of the 33rd International Con-ference on System
Sciences, 2000.
 Developing an optimization technique for the selection xviii. W. Ye, J. Heidemann, and D. Estrin, “An Energy-Efficient MAC
of cluster head for dynamic environment in wireless sensor Protocol for Wireless Sensor Networks,” in Proceedings of IEEE INFOCOM,
network to achieve energy efficient aggregation of sensor pp. 1567-1576, 2002.
readings from cluster head to Base station (BS). xix. Xiang Min, Shi Wei-ren, Jiang Chang-jiang and Zhang Ying, “a
Energy efficient clustering algorithm for maximizing lifetime of wireless sensor
Analysis of the proposed technique using various simulations set networks”, AEU- International journal of electronics and communications,
up with different existing techniques. vol.64, no. 4, pp. 289-298, 2010.
xx. Xianghui Wang and Guoyin Zhang, “DECP: A Distributed Election
5. Possible outcome Clustering Protocol for Heterogeneous Wireless Sensor Networks”,
computational science, vol. 4489/2007, pp. 105-108, 2007.
The expecting outcome of the paper is the remarkable reduction xxi. [22]. Ye, M.; Li, C.F.; Chen, G.; Wu, J.,” EECS: An Energy
of energy consumption because of the dynamic and efficient Efficient Clustering Scheme in Wireless Sensor Networks” In Proceedings of
optimization technique for the selection of cluster head in sensor the IEEE International Performance Computing and Communications
networks. Conference, pp.535-540, 2005

ICCIT15 @CiTech, Bangalore Page 348


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Spatial and Location Based Rating Systems


Vinay Kumar M., N. Rajesh
Dept. of ISE, The National Institute of Enginering, Mysore.
Affiliated to Visveswaraya Technological University, Belgavi, Karnataka
{vinaym884, nrajeshin}@gmail.com

Abstract : Recommender systems forms an integral part of systems.


ones day to day activities in today's world of Internet. We II. MATERIAL AND METHODOLOGY
come across recommender systems in many fields in our day
A. Location aware query model
to day online transactions. But earlier systems are not well
equipped with the spatial ratings of the items based on Location based systems uses the technique in which it
location. In this paper we propose an efficient and scalable provides recommendations based on the location aware
recommender systems which uses location based ratings for ratings which are present in the given system.It supports both
rating the items. the continuous as well as the snapshot queries.
Index Terms— spatial ratings, filtering, preference locality, B. Collaborative filtering technique for item filtering
collaborative filtering, recommender systems.
This is the popular technique which is used by the
I. IN T R O DU C TI O N recommender systems.There are two types of it among which
one is narrow one and the other is more general.Collaborative
With the advent of the Internet we come across different day filtering is the process of filtering for information or patterns
to day transactions being done over the networks.When dealing using techniques involving collaboration among multiple
with such online transactions viz.,mobile Internet,online ticket agents, viewpoints, data sources, etc.[2] Applications of
booking,online shopping,online money transfer over network collaborative filtering typically involve very large data
etc (popular example in today's Internet world is paytm,a sets.This is the general case definition for the collaborative
popular method of doing online money transfer,online filtering.In the new narrow sense of collaborative filtering
recharges,mobile banking and so on) and also we often come technique it makes automatic filtering of items based on the
across large inventory places with large no of listed items.In preferences or tastes of the users.But the difficulties in this
such cases often we look for the ratings provided to different collaborative filtering technique includes it needs active user
items by different users so as to shortlist the items based on participation,it requires easy way of representing the user
their recommendations.Often this types of recommendations interests with the system,and the algorithms which can match
will lead us to the items which are best suited according to our the people with similar kind of interests.
specifications.Even though there are plenty of advantages in
utilizing such ratings and recommendations provided by such C. Filtering technique based on content
users there are certain limitations in such systems.They often do
not consider the spatial location of the users who have given the This is another popular filtering technique which are used in
ratings.In this paper we utilize location based ratings for the set designing the recommender systems.They are based on the
of items being listed on the inventory.For an instance we have description of the given item as well as the preference of the
an web based applications like movie lens and Netflix where we users based on their profile.Widely used algorithm approach
find location based ratings for different items which are is vector space representation.The user profile is created
provided by the different community members.Currently, based on the model of the user's preference and the history of
myriad applications can produce location based ratings that users interaction with the recommender systems.Some of the
embed user and/or item locations.Basically,there are three novel machine learning techniques such as cluster analysis,
classes of location based ratings, namely, spatial ratings for decision trees Bayesian Classifiers,and artificial neural
non-spatial items, non spatial ratings for spatial items, and networks are used in order to identify the probability that the
spatial ratings for spatial items. Certain location aware networks user is going to like an item.Direct feedback from a user,
(e.g., Foursquare [4] and Facebook Places [5]) allow users to usually in the form of a like or dislike button, can be used to
“check-in” at spatial destinations (e.g., restaurants) and rate assign higher or lower weights on the importance of certain
their visit, thus are capable of associating both user and item attributes (using Rocchio Classification or other similar
locations with ratings.Based on the spatial ratings provided by techniques).There are number of content based recommender
the users new users who are currently unaware of certain new systems which are aimed at providing certain movie
items and their quality listings can know many hidden things recommendations such as Internet Movie Database,Rotten
about the different products without actually experiencing it and Tomatoes,Jinni etc.
allows users to save their energy as well as efficiency.Hence
location aware systems are becoming popular today and the D. Hybrid recommender systems
spatial ratings provided by such systems are of better if not the It is a combination of collaborative filtering as well as content
best in nature when compared to the earlier recommender based filtering technique and such approaches has found to be

ICCIT15@CiTech, Bangalore Page 349


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
useful in many scenarios.They can be implemented in several
ways one by making content as well as collaborative based
pedictions separately and then combining them. Netflix is a
popular example of hybrid systems.They makes the comparison
based recommendations of similar kinds of users with similar
actions.

E. Certain popular recommender systems


1. Mobile recommender systems: is trending area of research Fig:- Similarity calculation based on items.
in the field of recommender systems.With the advent of smart In order to compute the similarity we use cosine similarity
phones now it is possible to offer personalized, context- function as follows which is based on the items similarity.
sensitive recommendations.This is very difficult area of
research as the mobile data is heterogeneous, noisy and more
complex for the recommender systems to deal with and they
also suffers from the transplantation problem. recommendations
may not apply in all regions (for instance, it would be unwise to
recommend a recipe in an area where all of the ingredients may Cosine similarity between the items ip and iq. It is calculated
not be available).One popular example of a mobile by using circle co-rated dimensions. Preference locality is the
recommender system and such systems with basic requirement technique which is used for the spatial user ratings for the non
of Internet offers potentially profitable routes for the drivers of spatial items.We require three things for such kinds of
taxi in a city. This system takes as input data in the form of recommendations-locality,scalability and the influence.
GPS traces of the routes that taxi drivers took while working,
which include location (latitude and longitude), time stamps, In location aware systems it produces recommendations on
and operational status (with or without passengers). It then non spatial items using spatial ratings i.e,the tuple (user,
recommends a list of pickup points along a route that will lead ulocation, rating, item),by employing a user partitioning
to optimal occupancy times and profits. This type of system is technique that exploit preference locality.This technique uses
obviously location-dependent, and as it must operate on a hand an adaptive pyramid structure to partition ratings by their user
held or embedded device, the computation and energy location attribute into spatial regions of varying sizes at
requirements must remain low. different hierarchies. For a querying user located in a region
2. Risk aware recommender systems: they are the intelligent R, we apply an existing collaborative filtering technique that
recommender systems which takes into account of varieties of utilizes only the ratings located in R.
risks which are associated with the recommendations and steps
which needs to be taken in order to counter them.And also the Travel penalty is the technique which we utilize in order to
performance of the recommender system depends on the risk provide recommendations for spatial items using non spatial
factor as well.Hence risk has to be taken into account before ratings.i.e., the tuple (user, rating, item, ilocation).Both the
providing any recommendations. techniques like travel penalty and the user partitioning is used
in order to produce the recommendations for the spatial items
F. Preference Locality by using spatial ratings.i.e., the tuple (user, ulocation, rating,
item, ilocation).
Preference locality is a popular methodology which suggests the
preferred users from the spatial region in order to (e.g.,
neighborhood) prefer items (e.g., movies, destinations) that are
different in entirety from the items which are preferred from the
other and even adjacent areas.This technique is adopted widely
in case of location based rating systems in order to provide
recommendations.

III. RESULTS AND TABLES

Fig:- Pyramidal data structure technique for providing


recommendations.
A. Model building and recommendation generation
Given a querying user u, recommendations are produced by
computing u’s predicted rating P(u,i) for each item i not rated
Fig:- Item based collaborative model building. by u [9]:

ICCIT15@CiTech, Bangalore Page 350


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
C. Some experimental results of location aware ratings
system.

Before this computation, we reduce each similarity list L to


contain only items rated by user u.

B. Our Contribution to the paper


Recommender system is not today's hot topic to be discussed.It
had its impact since the inventory of the Internet. Even though
there is a lots of improvements which we have seen from the
traditional recommender system to today’s well equipped
recommender system there is always a scope for an
improvement.Even in today's location based ratings there is no Fig-Quality experiments for varying locality.
proper authentication methods which are employed in order to
register the users to give their recommendations on the
items.Here we propose to design the recommender system with
sophisticated encryption mechanism in order to make sure that
the users who provide the ratings are genuine.And we also
propose to employ automatic database alteration mechanism
where in we maintain the database with their recommendations
and the ratings for an item and after certain period of time the
database is automatically refreshed in order to eliminate the
ratings of inactive users.Only the ratings which are provided by
the active users with certain period of time is retained because
maintaining the older recommendations may be not useful and
there may be a chance that there is a deterioration in the quality
of the items which are listed over a period of time.Even we Fig- Quality experiments for varying answer sizes.
prefer to maintain the databases in every renowned location and
make a provision to give ratings related to their experience by D. Experimental evaluation of location aware systems
registering themselves to the local website and their should be Experiments evaluation are based on three data sets.
provision to be made such that the ratings provided by the users (1)Foursquare: a real data set consisting of spatial user
should be monitored only by the central database administrator ratings for spatial items derived from Foursquare user
not by the owner of the organizations. histories.
We try to adopt RSA encryption algorithm because of its (2) MovieLens: a real data set consisting of spatial user
interoperabilty and one cannot encrypt or decrypt the message ratings for non-spatial items taken from the popular
by using different algorithm than the one which is used to create MovieLens recommender system [7]. The Foursquare and
it and also it is available freely for non commercial use. MovieLens data are used to test recommendation quality.
(3) Synthetic: a synthetically generated data set consisting
c = ENCRYPT (m) = me mod n (1) spatial user ratings for spatial items for venues in the
m = DECRYPT (c) = cd mod n (2) state of
Minnesota, USA; we use this data to test scalability and
The calculation of m,c,d,e,n is done by following the normal query
steps of RSA algorithm. efficiency. Quality of the recommendations results are
depicted as earlier in the above figures.

IV. CONCLUSION

In our proposed spatial aware recommender rating systems


tackles the problems which were not solved by the earlier
traditional recommender systems.Our system deals with three
kinds of ratings-(a) spatial ratings for the non spatial items,(b)
spatial ratings for spatial items,(c) non spatial ratings for
spatial items.In addition to these methods we also add proper
authentication mechanism for the users in order to give
ratings for the items.We adopt travel penalty and user
Fig:- Depicts the application of RSA in providing both partitioning techniques in order to support spatial items and
confidentiality and authentication. the spatial ratings respectively.Both these techniques can be

ICCIT15@CiTech, Bangalore Page 351


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
applied separately or in concert to provide location based REFERENCES
ratings.Experimental results and the evaluation shows how the i. G. Linden et al, “Amazon.com Recommendations: Item-to-Item
spatial recommender systems are more efficient and scalable Collaborative Filtering,” IEEE Internet Computing,vol 7, no. 1, pp. 76–
than the traditional recommender systems. 80,2003.
ii. “Netflix: http://www.netflix.com.”
iii. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl,
ACKNOWLEDGMENT “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,”
in CSWC, 1994.
The successful publishing of this paper would be incomplete iv. “Foursquare: http://foursquare.com.”
without the mention of the people who made it possible and v. “The Facebook Blog, ”Facebook Places”:
http://tinyurl.com/3aetfs3.”
whose constant guidance crowned my effort with success. vi. G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of
I would like to thank the Principal of our college Dr. Recommender Systems: A Survey of the State-of-the-Art and Possible
G.L.Shekar for his whole hearted support in facilitating to Extensions,” TKDE, vol. 17, no. 6, pp. 734–749, 2005.
publish this paper. vii. “MovieLens: http://www.movielens.org/.”
viii. “New York Times - A Peek Into Netflix Queues:
I would like to thank my guide Sri N. Rajesh, Assistant http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-
map.html.”
Professor Department of ISE,NIE-Mysore for his valuable
ix. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-Based
inputs and guidance for the presentation of this paper. Collaborative Filtering Recommendation Algorithms,” in WWW, 2001.
x. J. S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of
Finally, I would like to thank all the teaching and non teaching Predictive Algorithms for Collaborative Filtering,” in UAI, 1998.
staff for their co-operation. xi. W. G. Aref and H. Samet, “Efficient Processing of Window Queries in
the Pyramid Data Structure,” POD.

ICCIT15@CiTech, Bangalore Page 352


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Energy Efficient Zone-based Proactive Source Routing to Minimize Overhead


in Mobile Ad-hoc Networks

Lakshmi K. M., Levina Tukaram


Dept Of CSE, Alpha College Of Engineering, Bangalore, India
lakshmi.km682@gmail.com, levinajunias@gmail.com

Abstract: Opportunistic data forwarding has become a hot Figure 1 shows an example of mobile ad-hoc network and its
topic in the multihop wireless networking. Opportunistic data communication technology. As shown in Figure, an ad hoc
forwarding is not used in mobile ad hoc networks (MANETs) network might consist of several home-computing devices which
due to the lack of an efficient lightweight proactive strong includes laptops, cellular phones etc. Each node can
source routing scheme. Proactive Source Routing uses communicate directly with any other node that resides within its
Breadth First Spanning Trees (BFSTs) and maintains more transmission range. The node needs to use intermediate nodes to
network topology information to facilitate source routing. It relay the messages hop by hop to communicate with nodes that
overhead is much smaller than traditional DV-based protocols, reside beyond this range.
link state (LS)-based routing protocols and reactive source Opportunistic data forwarding utilizes the broadcast
routing protocols but the computational and memory overhead nature of wireless communication links [iv] and data packets are
involved in maintaining BFSTs to reach every node in the handled in a multihop wireless network. In traditional IP
denser networks is high. In this paper Zone-based Proactive forwarding, the intermediate nodes looks up a forwarding table
Source Routing Protocol is proposed. Zone routing protocol to find a dedicated next hop, but Opportunistic data forwarding
(ZRP) uses partition based routing. It uses Source routing broadcasts the data packet and allows potentially multiple
inside the zone and on-demand routing outside the zone. The downstream nodes to act on the packet. One of the initial works
advantages of both proactive and zone based routing protocols on opportunistic data forwarding is selective diversity
is combined by this approach. The simulations shows that the forwarding by Larsson [v]. In this paper the transmitter sends the
Z-PSR, zone based proactive source routing protocol performs packet to multiple receivers and selects best forwarder from
better compared to PSR. these receivers which successfully receives the data and requests
Keywords: PSR, BFST, Link State, Source routing, Ad-hoc the selected node to forward the data. The overhead in this
Network. approach is more and it should be reduced before it can be
implemented in practical networks. This issue was addressed in
I. INTRODUCTION the seminal work on ExOR [vi], which outlines a solution at the
Mobile ad-hoc network (MANET) is a self-organized and self- link and network layers. In ExOR, all the nodes in the work are
configurable wireless Communication network. It represents enabled to overhear all packets on the air and therefore, more
complex distributed systems that contains various wireless number of nodes can potentially forward a packet, provided that
mobile nodes which can freely move and dynamically self- all the nodes should be included in the forwarder list which is
organize into arbitrary and temporary ad-hoc network carried by the packet. The contention feature of the medium-
topologies. It allows people and devices to seamlessly access-control (MAC) sublayer effectively utilized and hence the
internetwork in areas without pre-existing communication forwarder which is very much closer to the destination will
infrastructure, e.g., battlefield communications, emergency access the medium. Therefore, the MAC sublayer can determine
operations, disaster recovery environments. A great deal of the actual next-hop forwarder to utilize the long-haul
research results have been published since its early days in transmissions in a better way.
1980s [i]. The salient research challenges in this area are A lightweight proactive source routing(PSR) protocol
link access control, security, end-to-end transfer and providing is proposed to facilitate opportunistic data forwarding in
support for real-time multimedia streaming [ii]. In the research MANETs. In this protocol, each node maintains a breadth-first
on MANETs, the network layer has received a considerable search spanning tree of the network rooted at itself. This routing
amount of attention. Hence large number of routing protocols information is periodically exchanged among neighbouring
with differing objectives for various specific needs have been nodes for updated network topology information. And hence
proposed in this network [iii]. PSR allows a node to have full-path information to all other
nodes in the network. The communication cost is only linear to
the number of the nodes. Thus, it supports both source routing
and conventional IP forwarding. But the computational and
memory overhead involved in maintaining the BFSTs to reach
every node in the denser networks will be high.
In this paper, Z-PSR (Zone based proactive source
routing protocol is proposed which is lightweight, source
routed, uses Breadth First Spanning Trees and is based on
Fig 1: Mobile Ad-hoc Network PSR[vii] and ZRP[viii].

ICCIT15@CiTech, Bangalore Page 353


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
The remainder of this paper is organized as follows. reactive routing protocols avoids the cost of maintaining routes
Section II reviews related work on routing protocol in MANETs. that are not being used. Ad-hoc On Demand Distance Vector
Section III describes the design and implementation details of (AODV)[xi], and Dynamic Source Routing (DSR) [xii]
our proposed Zone-based proactive source routing scheme. The protocols are the examples of reactive or on-demand protocols.
computer simulation, related experimental results, and
comparisons between PSR and Z-PSR are presented in Section C. Hybrid Routing Protocol
IV. Section V concludes this paper with a discussion of future Hybrid routing protocol is a combination of both
research. proactive and reactive routing protocols. It uses a table driven
approach within a given zone around the node, and a demand-
II. RELATED WORK driven approach is then applied outside of that zone. Example:
Zone Routing Protocol (ZRP), Temporary Ordered Routing
Routing is the process of establishing path and forwarding Algorithm (TORA).
packets from source node to destination node. MANET routing
protocols could be broadly classified into three major categories: Proactive Source Routing (PSR)
proactive, reactive and hybrid as shown in Figure 2. PSR uses Table driven approach and it is the base for
the newly proposed Zone-based proactive source routing
protocol. So it’s required to know the working of PSR to
understand Zone Routing Protocol (ZRP). A lightweight
proactive source routing (PSR) protocol facilitates Opportunistic
MANET Routing Protocols data forwarding in Mobile Ad-hoc Networks. To facilitate
source routing PSR maintains more network topology
information than distance vector (DV) routing. The breadth-first
spanning tree(BFST) of the entire network rooted at itself is
Proactive or Reactive or Hybrid provided by PSR to every node of the network. To facilitate this,
Table Driven On-demand nodes periodically broadcasts the tree structure which has built
to entire network in each iteration. A node can expand and
DSDV DSR ZRP refresh its knowledge about the network topology by
OLSR AODV TORA constructing a deeper and more recent BFST based on the
information collected from neighbours during the most recent
iteration. This routing information will be distributed to the
Fig 2. Routing Protocols in MANETs neighbours in the next round of operation. Thus, Each node will
A. Proactive Routing Protocol or Table driven Routing have full-path information to all other nodes in the network
protocols using this routing scheme. The communication cost of PSR is
only linear to the number of nodes in the network and hence
In Proactive routing protocol, each mobile node maintains a both source routing and conventional IP forwarding is supported
routing table. when a route to a destination is needed, the routing by PSR.
information can be obtained from the routing table immediately. The details of PSR is described in the following three
Proactive protocols maintains the table and keeps on updating it sections. Before that we will review some graph-theoretic terms
as the topology changes in the network. When it is required to used here. The network is modelled as undirected graph G = (V,
forward the data or packet to a particular node, the route can be E), where V is the set of nodes(or vertices) in the network, and E
easily and immediately obtained from the table. So there is no is the set of wireless links (or edges). The edge connected by two
any time delay because there is no time spent in route discovery nodes u and v is denoted as e = (u, v) ∈E if they are close to each
process. So a shortest path can be found without any time delay. other and if they can directly communicate with a given
However in a denser network, these protocols are not suitable reliability. Given node v, N(v) is used to denote its open
due to high traffic. The disadvantages of such protocols are high neighbourhood, i.e., {u ∈ V |(u, v) ∈ E}. Similarly, N[v] is used
latency time in route finding and excessive flooding can lead to to denote its closed neighbourhood, i.e., N(v) ∪ {v}. (Refer [14]
network clogging. Example: DSDV [ix] (Destination-Sequenced for other graph-theoretic notions).
Distance-Vector), OLSR [x] (Optimized link state routing).
A. Route Update
B. Reactive Routing Protocols or On-demand routing The update operation of PSR is iterative and distributed among
protocols all nodes in the network due to its proactive nature. Node v is
Reactive Routing Protocols are also called as on-demand routing only aware of the existence of itself at the beginning. Therefore,
protocols. These protocols are more efficient than proactive only single node is there in its BFST, which is root node v. It is
routing protocols. The idea behind this type of routing protocols able to construct a BFST within N[v], by exchanging the BFSTs
is to find a route between a source and destination whenever it is with the neighbours, i.e., the star graph which is denoted by S v
needed. By this the routing overhead can be reduced, whereas and centered at v. Nodes exchange their spanning trees with their
overhead is more in proactive protocols since the nodes maintain neighbours, in each subsequent iteration. Towards the end of
routes to all other nodes in network without knowing its state of each operation interval, from the perspective of node v, it has
use. So in reactive protocols it is not needed to maintain the received a set of routing messages from its neighbours
routes which are not being used currently. On-demand or packaging the BFSTs. The most recent information from each

ICCIT15@CiTech, Bangalore Page 354


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
neighbour is incorporated by node v to update its own BFST. At best properties of both approaches. ZRP can be called as a
the end of the period, it then broadcasts this tree structure to its hybrid reactive/proactive routing protocol.
neighbours. Formally, v has received the BFSTs from some of It can be assumed that, in an ad-hoc network, the largest
its neighbours. Node v has a BFST which contains received part of the traffic is directed to nearby nodes. Hence, ZRP
updates in recent previous iterations and it is denoted by Tu, and reduces the proactive scope to a zone which is centered on each
cached for each neighbour u ∈ N(v). The union graph node. The maintenance of routing information is easier in a
constructed by node v limited zone. Further, the amount of routing information which
Gv= Sv∪_u∈N(v)(Tu− v). (1) is never used is minimized. The nodes which are farther away
Here, T − x denotes the operation of removing the can be reached still with reactive routing. Since all nodes
subtree of T rooted at node x. Some special cases are: if x is not proactively store local routing information, route requests can be
in T, then T − x = T, and if x is the root of T , then T − x = ∅. more efficiently performed without querying all the nodes in the
Then, node v calculates a BFST of Gv, which is denoted Tv, and network.
places Tv in a routing packet to broadcast to its neighbours. ZRP is having a flat view over the network despite the
use of zones. By this way, the organizational overhead which is
B. Neighbourhood Trimming related to hierarchical protocols can be avoided. Since the
hierarchical routing protocols depend on the strategic assignment
When a neighbour node is deemed lost, all its relevant of gateways or landmarks, every node in the network can access
all levels, especially the top level. The nodes which are
belonging to different subnets must send their communication to
a subnet which is common to both nodes. By this, the parts of
the network may be congested. Since the zones overlap, ZRP
can be categorized as a flat protocol. Hence, network congestion
can be reduced and optimal routes can be detected. Further, the
behaviour of ZRP is adaptive. The behaviour depends on the
behaviour of the users and the current configuration of the
network.

The routing zone has a radius r which is expressed in


information from the topology repository which is maintained by hops. Thus, the zone includes the nodes whose distance from the
the detecting node and also its contribution to the network node in question is at most r hops. An example routing zone is
connectivity should be removed. This process is called shown in Figure 3, where the routing zone of S includes the
neighbourhood trimming. Consider the node v. The neighbour nodes A–I, but not K. The radius is marked as a circle around the
trimming procedure is triggered at v about neighbour u either by node in question. It should be noted that the zone is defined in
the following cases: hops, not as a physical distance. The nodes of a zone are
1) For a given period of time, if there is no routing update (data divided into peripheral nodes and interior nodes. The nodes
packet) has been received from this neighbour. whose minimum distance to the central node is exactly equal to
2) As reported by the link layer, if a data transmission to node u the zone radius r are peripheral nodes and the nodes whose
has failed. minimum distance is less than r are interior nodes. In the Figure
3, the nodes A–F are interior nodes, the nodes G–J are peripheral
C. Streamlined Differential Update nodes and the node K is outside the routing zone.
Here, note that node H can be reached by two paths,
In PSR, The “full dump” routing messages are interleaved with one with the length 2 and one with the length 3 hops. Since the
“differential updates ". The idea is to send the full update shortest path is less than or equal to the zone radius, the node is
messages less frequently than shorter messages containing the within the zone. By adjusting the transmission power of the
difference between the current and previous knowledge of a nodes, the number of nodes in the Figure 3 routing zone can be
node’s routing module. The routing update is further streamlined regulated. Lowering the power reduces the number of nodes
in two new ways. First, we use a compact tree representation in within direct reach and vice versa. To provide adequate
full-dump and differential update messages to halve the size of reachability and redundancy, the number of neighbouring nodes
these messages. Second, as the network changes every node should be sufficient. On the other hand, a too large coverage
attempts to maintain an updated BFST so that the differential results in many zone members and the update traffic becomes
update messages are even shorter. excessive. Further, large transmission coverage adds to the
probability of local contention.
III ZONE-BASED PROACTIVE SOURCE ROUTING
Protocol Design
Proactive routing uses more bandwidth to maintain The new routing protocol proposed in this paper is named as Z-
Fig 3. Routing zone of node S with zone radius β=2 PSR. We are combining the advantages of both PSR and ZRP
routing information, while reactive routing involves long route hence the name. Basic problems of PSR are discussed below.
request delays. Reactive routing protocols also floods the entire In a denser network, the overhead involved in
network inefficiently for route determination. The Zone Routing maintaining the BFST to reach every node in the network will
Protocol (ZRP) addresses these problems by combining the become high in case of PSR. The time taken to search for a route

ICCIT15@CiTech, Bangalore Page 355


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
from the set of BFSTs is also high. Even though PSR is reducing 70
the overhead in terms of communication bytes, it fails to reduce

Length of BFST is Mainened by


the computational overhead and memory overhead incurred by 60
each node in finding out the route. This results in high energy 50
consumption.

at each node
The objectives of the Zone-based proactive source 40
routing protocol are as follows
30 Z-PSR
1. Develop a routing protocol which minimizes the computation
PSR
overhead in searching for a route. 20
2. The protocol should reduce the memory occupied by each
BFST. 10
3. The protocol should find route to the destination with 0
minimum delay. 1 2 3 4 5 6 7 8 9 10111213141516
4. The minimize energy consumption compared to the existing
PSR protocol. Node Id

The following steps are taken, in order to meet the Fig.5 Length of BFST at each node
above objectives.
1. Each node will maintain a BFST of its one hop or two hop Below Fig.6 Shows the Packet delivery ratio, for both
neighbours only, as opposed to PSR where every node needs to PSR and Z-PSR, Z-PSR maintains a 99.9% delivery ratio.
maintain BFST to reach every other node in the network.
2. Whether to maintain one hop or two hop neighbours BFST is
decided based on parameter radius. If radius = 1, maintain BFST
1.2
to reach one hop neighbours. If Radius =2, maintain BFST to
reach two hop neighbours and so on. The Simulations in this
1
paper has used radius 2.
3. When a node needs to send data to its one hop or two hop
Packet delivery ratio

neighbours, it will use BFSTs maintained at that node. When it 0.8


needs to send data to other nodes, (other than one/ two hop
neighbours), it needs to send data to one of the two hop 0.6 PSR
neighbour which will have BFST to reach the destination.
4. The challenge in this protocol is to determine which two hop 0.4 Z-PSR
neighbour will have the BFST to reach the destination. A node
will be receiving BFSTs from all the nodes, but it need not store 0.2
them. Periodically update messages also sent by the
neighbouring nodes. So when a node needs to transmit data to a 0
node which is not its one/two hop neighbour, it has to check the 1 20 40 60 80 100 120 140
update from neighbours to check if it has a path to the
destination. The BFST messages will be transmitted as a Simulation time In seconds
broadcast messages. Therefore , here the protocol is actually
following the concept of Link state vector algorithm, which says
pass information about neighbours to all the nodes in the Fig.6 Packet Delivery ratio Vs. Simulation Time
network.
5. Thus only when needed, the node will accept and process the
broadcast messages carrying BFST of other nodes. V. CONCLUSION
6. This reduces computation overhead and memory overhead
maintaining the communication overhead at same level as PSR. Routing in ad hoc network is always a challenging one. This
paper proposed a routing protocol based on two existing
protocols ZRP and PSR. The simulations results show that the
IV. PERFORMANCE EVALUATION proposed protocol outperforms the existing PSR protocol which
acts as a base for this new protocol. Simulations can be extended
Fig.5 shows the graphical result of length of the BFST to be to change the number of nodes, node mobility.
stored at each node. It is clear from the graph that Z-PSR needs
to store only shorter length BFSTs compared to PSR. REFERENCES

i. Chlamtac, M. Conti, and J.-N. Liu, “Mobile ad hoc networking:


Imperatives and challenges,” Ad Hoc Netw., vol. 1, no. 1, pp. 13–64, Jul. 2003.
ii. M. Al-Rabayah and R. Malaney, “A new scalable hybrid routing
protocol for ANETs,” IEEE Trans. Veh. Technol., vol. 61, no. 6, pp. 2625–
2635,Jul. 2012.

ICCIT15@CiTech, Bangalore Page 356


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
iii. R. Rajaraman, “Topology control and routing in ad hoc networks: A viii. Zone Routing Protocol available online
survey,” ACM SIGACT News, vol. 33, no. 2, pp. 60–73, Jun. 2002. http://www.cs.mun.ca/~yzchen/papers/tvt2014.pdf.
iv. Y. P. Chen, J. Zhang, and I. Marsic, “Link-layer-and-above diversity ix. C. E. Perkins and P. Bhagwat, “Highly dynamic Destination-
in multi-hop wireless networks,” IEEE Commun. Mag., vol. 47, no. 2,pp. 118– Sequenced Distance Vector Routing (DSDV) for mobile computers,” Comput.
124, Feb. 2009. Com mun. Rev., vol. 24, pp. 234–244, Oct. 1994.
v. P. Larsson, “Selection diversity forwarding in a multihop packet x. T. Clausen and P. Jacquet, “OptimizedLink State Routing Protocol
radio network with fading channel and capture,” ACM Mobile Comput. (OLSR),”RFC 3626, Oct. 2003. [Online].Available:
Commun. Rev., vol. 5, no. 4, pp. 47–54, Oct. 2001. http://www.ietf.org/rfc/rfc3626.txt
vi. S. Biswas and R. Morris, “ExOR: Opportunistic multi-hop routing for xi. C. E. Perkins and E. M. Royer, “Ad hoc On-Demand Distance Vector
wireless networks,” in Proc. ACM Conf. SIGCOMM, Philadelphia, PA, USA, (AODV) routing,” RFC 3561, Jul. 2003. [Online]. Available:
Aug. 2005, pp. 133–144. http://www.ietf.org/rfc/rfc3561.txt
vii. Zehua Wang, Cheng Li ; Yuanzhu Chen, “PSR: A lightweight xii. D. B. Johnson, Y.-C. Hu, and D. A. Maltz, “On The Dynamic Source
Proactive Source Routing protocol for Mobile Ad Hoc Networks” in IEEE Routing Protocol (DSR) for mobile adhoc networks for IPv4,” RFC 4728, Feb.
transactions on Vehicular Technology, Vol. 63, No. 2, February 2014. 2007.[Online].Available: http://www.ietf.org/rfc/rfc4728.txt.

ICCIT15@CiTech, Bangalore Page 357


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Homomorphic Encryption based Query Processing


Aruna T.M., M.S. Satyanarayana, Madhurani M.S.
Department of I S E, SVCE,Banglore Karnataka India
aruna150888@gmail.com, satyanarayanams@outlook.com, ms.madhu2009@gmail.com

ABSTRACT: In a private database query system a client issues without learning what the query was or even how many records
queries to a database server and obtains the results without match the query. The client learns nothing else about the
learning anything else about the database and without the database contents.
server learning the query. In this work we develop tools for Unfortunately, being a generalization of SPIR, private database
implementing private database queries using homomorphic queries is subject to all the same inherent inefficiency constraints
encryption (HE), that is, using an encryption system that as SPIR, making the design of practical schemes for private
supports only limited computations on encrypted data. We database queries a challenging task. In this work we explore the
show that a polynomial encoding of the database enables an use of homomorphic encryption (SWHE) [3] for the design of
efficient implementation of several different query types using private database query protocols. In particular, we show that
only low-degree computations on cipher texts. Specifically, we certain polynomial encodings of the database let us implement
study two separate settings that offer different interesting query types using only homomorphiccomputations
privacy/efficiency tradeoffs. In the basic client-server setting, involving low-degree polynomials. There are now several
we show that additive homomorphisms are sufficient to encryp- tion schemes that efficiently support the low-degree
implement conjunction and threshold queries. We obtain homomorphiccomputations on encrypted data that we need [4,
further efficiency improvements using an additive system that 5].
also supports a single homomorphic multiplication on In this work we consider two different settings. The first is the
ciphertexts. This implementation hides all aspects of the traditional, two-party, client- server setting. In this setting the
client’s query from the server, and reveals nothing to the client server has the database, the client has a query, and we seek a
on non-matching records. To improve performance further we protocol that gives the client all (and only) those records that
turn to the “Isolated-Box” architecture of De Cristofaro et al. match its query without the server learning what the query is. As
In that architecture the role of the database server is split mentioned above,in this setting the server must process the
between two non-colluding parties. The server encrypts and entire database for every query (or else it would learn that the
pre-processes then-record database and also prepares an unprocessed records do not match the query). Moreover the
encrypted inverted index. The server sends the encrypted server has to return to the client as much data as the number of
database and inverted index to a proxy, but keeps the records in the database, or else it would learn some information
decryption keys to itself. The client interacts with both server about the number of records that match the query.
and proxy for every query and privacy holds as long as the
server and proxy do not collude. We show that using a system To bypass these severe limitations, we consider also a different
that supports only log(n) multiplications on encrypteddata it is model in which the database server is split into two entities
possible to implement conjunctions and threshold queries (called here “server” and “proxy”), and privacy holds only so
efficiently.We implemented our protocols for the Isolated-box long as these two entities do not collude. This approach was
architecture using the ho- momorphic encryption system by taken in particular by De Cristofaro et al. [6], where they support
Brakerski, and compared it to a simpler implementation that private evaluation of a few simple query types and report
only uses Paillier’s additively homomorphic encryption system. performance very close to a non-private off-the-shelf MySQL
The implementation using some- what homomorphic system. However, the architecture of De Cristofaro et al. cannot
encryption was able to handle a query with a few thousand handle conjunctions: the client can ask for all the records with
matches out of a million-record database in just a few minutes, age=25 OR name=‘Bob’, but cannot ask for all the records with
far outperforming the implementation using additively age=25 AND name=‘Bob’. In this work we show how to
homomorphic encryption. implement conjunctions, disjunctions, and threshold queries in a
similar architecture.
Keywords: Cipher Text, Homomorphic Encryption,
Threshold, Non Colluding Parties. 1.1. Our Protocols
The protocols and tools we present in this work are
1. INTRODUCTION aimed at revealing to the client the indexes of the records that
Enabling private database queries is an important (and match its query, leaving it to a standard follow-up protocol to
hard) research problem arising in many real- world settings. The fetch the records themselves. Also, we only consider honest but
problem can be thought of as a generalization of symmetric curious security in this work. Our protocols can be enhanced to
private information retrieval (SPIR) [1, 2] where clients can handle malicious adversaries using generic tools such as [7]. It is
retrieve records by specifying complex queries. For example, the an interesting open problem to design more efficient protocols in
client may ask for the records of all people of age 25 to 29 who the malicious settings specific to the private database queries
also live in Bangalore, and the server should return these records problem.

ICCIT15@CiTech, Bangalore Page 358


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
The approach
∑ that underlies all our protocols is to encode the SELECT ⋆FROM db WHERE a1 =v1 AND · · · AND at
database ast one or more polynomials, manipulating these =vtthe client (with oblivious help from the server) computes the
polynomials using the clients’ query so as to obtain a new tags tgi=Hash(“ai=vi ”) and sends them to the proxy. The
polynomial whose roots are the indexes of the matching records. proxy fetches all the encrypted polynomials Ai (x), chooses
This representation is well suited for conjunction and threshold random polynomials Ri(x) of “appropriate degrees” and
queries, since it lets us use techniques similar to the Kissner- computes the encrypted polynomial B(x) = i=1 Ri(x)Ai (x).
Song protocol for (multi-)set intersection [8] (building on prior The proxy returns the encrypted B to the client, who again uses
work by Freedman et al. [9]). We sketch our protocols below. oblivious help from the server to decrypt B, and then factors it to
find its roots, which are the indexes of the matching records
1.1.1. The Two-Party Setting (whp).
In this setting, the server has a database and the client has a This technique offers a space/degree tradeoff, where the proxy
secret-key for a SWHE scheme. The server encodes the database stores more information if we are to use a SWHE scheme
as a bivariate polynomial D(x, y), where for every record number supporting only lower-degree function. In one extreme case,
r and for every column (or attribute) a, if record r has value v using fully homomorphic encryption, the proxy need not store
for attribute a then D(r, a) = v. (The space that it takes to specify any more information than in the basic protocol above. On the
this polynomial D is roughly the same as the size of the original other extreme, we can use a quadratic homomorphic scheme by
database.) having the proxy store roughly m times as much as in the basic
Consider a conjunction query specified by the attribute-value protocol (for an m-record database). In the middle, we can have
pairs {(ai , vi ) : 1 ≤ i ≤ t}, i.e., the SQL query the proxy storage grow by an O(log m) factor, and use a O(log
SELECT ⋆FROM db WHERE a1 =v1 AND · · · AND at m)-homomorphic scheme. We discuss in Section 3 some other
=vtThe client constructs a univariate query polynomial Q(y) such optimizations to this three-party protocol. Also, in Section 4 we
that Q(ai) = vi for i = 1, . . . , t, and sends to the server the discuss an optimization that applies in both the 2-party and 3-
encrypted coefficients of the polynomial Q. For simplicity, we party settings, where we use homomorphic batching (similar to
assume for now that the client also sends to the server all the [13, 14]) to speed up the computation.
attributes aiin the clear (but of course not the corresponding
values vi ). 2. HOMORPHIC ENCRYPTION
Given the database polynomial D(x, y) and the encrypted query
polynomial Q(y), the server uses the additive homomorphism of Fix a particular plaintext space P which is ring. (For
the cryptosystem to compute the encrypted polynomial A(x, y) = example, our plaintext could be bits, P = F2,or binary
D(x, y) − Q(y). Note that for every record r in the database and polynomials modulo a cyclotomic polynomial, P =
every attribute aiin the query,we have A(r, ai) = 0 if and only if F2[X]/Φm(X), etc.) Let C be a class of arithmetic circuits over
D(r, ai) = vi , namely, this record matches the condition ai=vi the plaintext space P. A somewhat homomorphic (public-key)
from the query. encryption relative to C is specified by the usual procedures Key
The server returns to the client the encrypted polynomial B, and Gen, Enc, Dec (for key generation, encryption, and decryption,
the client decrypts and factors B to find its roots, thus learning respectively) and the additional procedure Evalthat takes a
the indexes of the records that match its query. The client can circuit from C and one cipher text per input to that circuit, and
use PIR or ORAM protocols to get the records themselves. In returns one cipher text per output of that circuit. The security
Section 6 we describe this protocol in more detail, and show requirement is the usual notion of semantic-security [16],
how to adapt it to the harder case where the attributes ai must namely it should be hard to distinguish between the encryption
also be kept secret, and also how to modify it to handle of any two messages, even if the public key is known to the
disjunctions and threshold queries. attacker and even if the two messages are chosen by the attacker.
The functionality requirement from homomorphic schemes is
1.1.2 The Three-Party Setting that for every circuit π ∈ C and every set of inputs to π, if we
The three parties in this setting are a client with a query, choose at random the keys, then encrypt all the inputs, then run
a proxy that has an inverted index for the database (as well as an the Evalprocedure on these cipher texts and decrypt the result,
encrypted and permuted version of the database itself), and a we will get the same thing as evaluating π on this set of inputs
server. We stress that wedo not make a black-box use of the (except perhaps with a negligible probability). See [3] for more
set-intersection protocol, in particular we don’t know if other details. In this work we use “low degree” somewhat
protocols for set-intersection (e.g., [10, 11, 12]) can be used in homomorphic encryption, namely homomorphic encryption
our setting. Who prepared the inverted index during a pre- schemes relative to the class of low degree polynomials. The
processing step and now keeps only the keys that were used to degree of polynomials that we need to evaluate varies between
create this inverted index. Specifically, the server keeps some protocols. Some require only additive homomorphism, while
“hashing keys” and the secret key for a SWHE scheme. For others require that the scheme support polynomials of higher
every attribute-value pair (a, v) in the database, the inverted degree (as much as O (logm) for an m-record database).
index contains a record (tg, Enc (A(x))) where tg is a tag, Two important properties of SWHE schemes are compactness
computed as tg=Hash(“a=v”), and A(x) is a polynomial whose and circuit privacy. Compactness roughly means that the size of
roots are exactly the records indexes r that contain this attribute- the evaluated cipher text does not depend on the complexity of
value pair. the circuit that was evaluated. Circuit privacy means that even
In the basic three-party protocol, given a query the holder of the secret key cannot learn from the evaluated
cipher text anything about the circuit, beyond the output value.

ICCIT15@CiTech, Bangalore Page 359


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
cache and 96 GB of available memory. Note that because NTL
3. HOMOMORPHIC ENCRYPTION SCHEMES is not thread-safe, all of our experiments were conducted in a
Paillier Cryptosystem single-threaded, single-processor environment. Memory usage
Recall that the Paillier cryptosystem works over Z∈n2 during the computation generally stayed below 10 GB. In the
for an RSA-modulus n of unknown factorization. The scheme Paillier-based scheme, we used a 1024-bit RSA modulus for all
has plaintext space P = Zn and ciphertext space Z∈n2 . The of our experiments. For the Brakerski system, we chose
scheme is additively homomorphic, with homomorphic addition parameters m, p, q to obtain 128-bit security and a false positive
implemented by multiplying the corresponding cipher texts in rate of λ = 10−3 according to the analysis presented in the
Z∈n2 . Similarly, we can homomorphically multiply a Appendix. Since the Brakerski system supports both the
ciphertextcZ∈n2 by a constant a ∈Zn by computing ca mod n2. batching and modular reduction optimizations described in
Brakerski’s Leveled Homomorphic CryptosystemFor our Section 4 and Section 3.2, respectively, we considered three
homomorphic encryption system, we use a ring-LWE-based different experimental setups to assess the viability of these
variant of Brakerski’s scale-invariant homomorphic optimizations. Below, we describe each of our experiments. The
cryptosystem [5].Specifically, our implementation operates over parameters used in our FHE scheme for each setup is listed in
polynomial rings modulo a cyclotomicpolynomial.LetΦm(x) Table 1.
denote the mthcyclotomic polynomial. Then, we work over the
ring R = Z[x]/Φm(x).Specifically, we take our plaintext space to NoMR:
be P = Rp= Zp[x]/Φm(x) and our ciphertext space to be Rq= Brakerski scheme without the modular reduction
Zq[x]/Φm(x). In this scheme, our secret keys and ciphertexts are optimization.In the NoMRsetup, we just used the batching
vectors of elements in Rq. Now, if c1 and c2 are encryptions of capabilities of the Brakerski system. Since we were not
messages m1 and m2 under a secret key s, then c1 +c2 is an performing the modular reduction optimization from Section 3.2,
encryption of m1 +m2. To homomorphically multiply a this setup only required homomorphic addition. Because we did
ciphertextc by a public scalar a ∈Rp,we compute ac. not need homomorphic multiplication, we were able to use
Homomorphic multiplication of two ciphertexts is performed smaller parameters for the Brakerski system, and therefore,
using a scaled tensor product. That is,cprod=⌊pq· (c1 ∈c2)⌉ reduce the cost of each homomorphic operation.
is an encryption of m1m2 under the tensored secret key s ⊗s.
Here, [x] denotes rounding x to the nearest integer. Using a MR:
technique called key-switching, the resulting product Brakerski scheme with the modular reduction optimization. In
ciphertextcprodcan be transformed into a regular ciphertextc′ the MR setup, we considered the modular reduction
prod encrypted under s such that c′ prod is a valid encryption of optimization. Recall that in the final step of the three-party
m1m2. As noted in Section 4, one of the main advantages of protocol, the proxy computes the polynomial B(x) = A1(x)R1(x)
using a ring-LWE-based homomorphic scheme is the fact that + A′(x)R′(x) where the degree of A1(x) is less than the degree of
we can pack multiple plaintext messages into one ciphertext A′(x). When we perform modular reduction, we first compute
using a technique called batching. To use batching we partition a A′(x) (mod A1(x)) and then compute B(x) (mod A1(x)). Observe
database with r records into ℓ separate databases, each that this optimization both reduces the degree of the polynomial
containing approximately r/ℓ records. If we assume that the B(x) that the proxy sends to the client as well as the cost of the
records associated with each tag are split uniformly across the computation of B(x). To perform this optimization, the FHE
databases, then the degrees of the underlying polynomials are scheme must support at least one multiplication. Enabling
correspondingly reduced by a factor of ℓ. In our implementation, support for homomorphic multiplication translated to larger
ℓ ≥ 5000, so this translates to a substantial improvement in parameters in the scheme, thus increasing the cost of each
performance.We now consider a choice for the plaintext homomorphic operation. Since we are performing fewer
modulus p for use in the Brakerski scheme. From Lemma 1, we operations overall, however, the modular reduction can yield
have that the probability of a false positive (mistaking an substantial gains in the case where there is a significant
element not in the intersection to be in the intersection) is given difference in the number of records associated with the smallest
by |U| / |Fp|. If we tolerate a false positive rate of at most 0 < λ and largest tag in the query. We assessed these tradeoffs in the
<1, then we require that |Fp| ≥ 1λ|U| = r λ, where r is the MR experiment. Due to the significant cost of performing
number of records in the database.Additionally, to maximize the homomorphic multiplication, we focused on the case where we
number of plaintext slots, we choose p such that p = 1 (mod m). just needed a single multiply.
To summarize, we choose our plaintext modulus p such that p =
1 (mod m) and p ≥ rλ MRNoKS:
Brakerski scheme with the modular reduction
4. EXPERIMENTAL SETUP optimization but without key switching.Recall that when we
We can implement the three-party protocol using both homomorphically multiply two ciphertexts in the Brakerski
the Paillier and Brakerski cryptosystems as the underlying system, we obtain a tensoredciphertext (e.g., a higher-
homomorphic encryption scheme. Our implementation was done dimensional ciphertext) encrypted under a tensored secret key.
in C++ using the NTL library over GMP. Our code was Normally, we perform a key-switching operation that transforms
compiled using g++ 4.6.3 on Ubuntu 12.04. We ran all timing the tensoredciphertext into a new ciphertext encrypted under the
experiments on cluster machines with multicore AMD Opteron normal secret key. If left unchecked, the length of the ciphertexts
processors running at 2.1 GHz. The machines had 512 KB of grows exponentially with the number of successive
multiplications. Thus, the key-switching procedure is important

ICCIT15@CiTech, Bangalore Page 360


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
for constraining the length of the ciphertexts. In our application, quantify this improvement showing a real-world example where
we perform a single multiplication, and so the key-switching lattice-based homomorphic systems can outperform their
procedure may be unnecessary. Since the keyswitching factoring-based counterparts.
operation has non-negligible cost, we can achieve improved
performance at the expense of slightly longer ciphertexts (and REFERENCES
thus, increased bandwidth) by not performing the key switch.We i. B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private
information retrieval,” J. ACM, vol. 45, no. 6, pp. 965–981, 1998.
assessed this time/space tradeoff in the third setup, denoted
ii. Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin, “Protecting data
MRNoKS. privacy in private information retrieval schemes,” in STOC '98, 1998, pp. 151–
160.
Query type iii. C. Gentry, “A fully homomorphic encryption scheme,” Ph.D.
dissertation, Stanford University, 2009, crypto.stanford.edu/craig.
In each of our experiments, we operated over a database
iv. Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “Fully homomorphic
with 106 records and performed queries consisting of five tags. encryption without bootstrapping,” in Innovations in Theoretical Computer
As usual, let d1 ≤ d2 ≤ · · · ≤ d5 denote the number of elements Science (ITCS'12), 2012, available at http://eprint.iacr.org/2011/277.
associated with each tag tg1, . . . ,tg5. We profiled our system on v. Z. Brakerski, “Fully homomorphic encryption without modulus
switching from classical gapsvp,” in Advances in Cryptology - Crypto 2012, ser.
two different sets of queries: balanced queries and unbalanced
Lecture Notes in Computer Science, vol. 7417. Springer, 2012, pp. 868–886.
queries. In a balanced query, the number of elements associated vi. E. D. Cristofaro, Y. Lu, and G. Tsudik, “Efficient techniques for
with each tag was approximately the same: d1 ≈ d2 ≈ · · · ≈ d5. privacy-preserving sharing of sensitive information,” in Trust and Trustworthy
In an unbalanced query, the number of elements associated with Computing - TRUST 2011, ser. Lecture Notes in Computer Science, J.M.
McCune, B. Balacheff, A. Perrig, A.-R. Sadeghi, A. Sasse,and Y. Beres, Eds., vol.
each tag varies significantly. Specifically, d1 is at most 5% of
6740. Springer, 2011, pp. 239–253.
d5. There are many examples where a query would be vii. Y. Ishai, M. Prabhakaran, and A. Sahai, “Founding cryptography on
unbalanced. For instance, consider a database of people living in oblivious transfer - efficiently,” in CRYPTO, 2008, pp. 572–591.
Northern California and suppose we run a query for the records viii. L. Kissner and D. X. Song, “Privacy-preserving set operations,” in
Advances in ryptology -CRYPTO 2005, ser. Lecture Notes in Computer Science,
of people named Joe and who live in San Francisco. In this case,
V. Shoup, Ed., vol. 3621. Springer, 2005, pp. 241–257.
the number of people living in San Francisco will be ix. M. J. Freedman, K. Nissim, and B. Pinkas, “Efficient private
significantly greater than the number of people named Joe. matching and set intersection,” in Advances in Cryptology - EUROCRYPT 2004,
Queries like these where we compute an intersection of a large ser. Lecture Notes in Computer Science, C. Cachin and J. Camenisch, Eds., vol.
3027. Springer, 2004, pp. 1–19.
set with a much smaller set are very common and so, it is
x. S. Jarecki and X. Liu, “Fast secure computation of set intersection,”
important that we can perform such queries efficiently. Finally, in Security and Cryp-tography for Networks - SCN 2010, ser. Lecture Notes in
for each query, we measured the computation time as well as the Computer Science, J. A. Garay and R. D. Prisco, Eds., vol. 6280. Springer, 2010,
total network bandwidth required by each of our setups. Note pp. 418–435.
xi. E. D. Cristofaro, J. Kim, and G. Tsudik, “Linear-complexity private
that due to the poor scalability of the Paillier system, we were
set intersection protocols secure in malicious model,” in Advances in Cryptology
not able to perform the full set of experiments using the Paillier - ASIACRYPT 2010, ser. Lecture Notes in Computer Science, M. Abe, Ed., vol.
cryptosystem. 6477. Springer, 2010, pp. 213–231.
xii. Y. Huang, D. Evans, and J. Katz, “Private set intersection: Are
garbled circuits better than custom protocols?” in Proceedings of the Network
5. CONCLUSION AND FUTURE ENHANCEMENT and Distributed System Security Symposium - NDSS 2012. The Internet Society,
2012.
This paper presents new protocols and tools that can be xiii. N. P. Smart and F. Vercauteren, “Fully homomorphic SIMD
used to construct a private database query system supporting a operations,” Manuscript at http://eprint.iacr.org/2011/133, 2011.
rich set of queries. We showed how a polynomial representation xiv. C. Gentry, S. Halevi, and N. P. Smart, “Fully homomorphic
encryption with polylog overhead,”Eurocrypt 2012, to appear. Full version at
of the database enables private conjunction, range, and threshold http://eprint.iacr.org/2011/566, 2012.
queries. The basic schemes uses only an additive homomorphic xv. P. Paillier, “Public-key cryptosystems based on composite degree
system like Paillier, but we showed that significant performance residuosity classes,” in Proc.of EUROCRYPT'99, 1999, pp. 223–238.
improvements can be obtained using a stronger homomorphic xvi. Private Database Queries Using Somewhat Homomorphic
Encryption Dan Boneh Stanford University Craig Gentry IBM Research
system that supports both homomorphic additions and a few ShaiHalevi IBM Research Frank Wang Stanford University David Wu Stanford
homomorphic multiplications on ciphertexts. Our experiments University

ICCIT15@CiTech, Bangalore Page 361


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Intrusion Detection System against Bandwidth DDoS Attack


Basavaraj Muragod, SaiMadhvi.D
Computer Science & Engg Dept, RYMEC , Bellary KA , India
Email:muragodbasu05@gmail.com, saidmadhavi@yahoo.co.in

ABSTRACT : Distributed denial of service (DDoS) is a monitor packets on the network and compare them against a
rapidly growing problem. Traditional architecture of the database of signatures or attributes from known malicious
internet is more exposed to Bandwidth distributed denial of threats. In Anomaly-based intrusion detection will monitor
service (BW-DDoS) attacks. These attacks disrupt network network traffic and compare it against base profile. The base
infrastructure operation by sending a huge number of packets profile will identify what is normal for that network and what
to cause congestion and delayed response. Attacker disrupts sort of bandwidth is used and what protocols and ports are
connectivity between client and server. According a recent used, and devices are connected to each other and alert the
survey of Akamai’s Prolexic Quarterly Global DDoS Attack user when traffic is detected which is anomalous and
Report Quarter 4 of 2014 Compared to Q1 of 2014 then find significantly different than the base profile. In this paper, we
39% increase in bandwidth-DDoS attack. In this paper, we used anomaly intrusion detection method to identify the
exposed the different types of BW-DDoS attacks on the intruder in the network.
internet and also we build an intrusion detection system to
detect DDoS attacks. II. Material and Methodology
Keywords — DDOS, Bandwidth-DDoS, Internet, We observed some information on DDoS attack statistics
Congestion. obtained in the first quarter of 2014 on networks of various
sectors in the world including financial sector networks. The
I. I nt ro d uct io n source of data is Prolexic Attack Report Q4 2014 [2] provided
The internet is a group of two or more devices or nodes or by Prolexic Technologies. The world largest and most trusted
terminals which are connected by a large number of network DDoS attack mitigation provider. Ten of the world’s largest
devices. Denial of Service (DoS) attacks is very common in the banks and the leading e-commerce companies get the services
world of internet today. A distributed denial of service (DDoS) of Prolexic to protect themselves from DDoS attacks. The
attack is a form of DoS, which uses multiple machines to range of data is based on all DDoS attacks dealt by Prolexic
prevent the legitimate use of services. Internet services are in different regions of the world. Some key information
more exposed to Bandwidth DDoS attacks. The increase of extracted from the report regarding the comparison of first
these attacks has made servers and network devices at risk. quarter of 2014 with the last quarter of 2014 is:
BW-DDoS aim to deny normal services for legitimate users by i) Total number of DDoS attacks was increased by
sending huge traffic to the machines or networks to exhaust 25%.
services, connection and the bandwidth. The BW-DDoS ii) Total number of BW-DDoS attacks increased by
attacker uses the different methods and attacking agents like 39%.
zombies. Zombies are groups of computers connected to iii) 60 to 86.5 percent of BW-DDoS attacks targeted
internet that compromised by an attacker and it can be used to the network.
perform malicious task on victim and Fig.1.1 shows the attacker iv) A decline was observed in UDP flood attacks
uses three zombies to generate high volume of malicious traffic
to network over Internet cause legitimate user unable to access 2.1. Motivation behind BW-DDoS Attacks
the services. The motivation behind the BW-DDoS attacks by personal,
social or financial benefits. Attacker may do so due to
personal revenge, getting publicity or some political
motivation. However, most BW-DDoS attacks are launched
by organized groups targeting financial websites such as
banks or stock exchange.

2.2. Different Types of BW-DDoS Attacks


In this paper, we discussed different types of BW-DDoS
attacks on the internet.

2.2.1. Flood Attack


Fig. 1.1 Illustration of BW-DDoS Attack scenarios Flood attack is a direct attack in which zombies flood the
To solve the security issues we need an Intrusion detection victim system directly with amount of traffic. The large the
system. We used intrusion detection method to detect the amount of traffic changes the victim’s network bandwidth so
intruder in the network. IDS can be categorized into two that other legitimate users are not able to access the service or
models: Signature-based intrusion detection and anomaly-based experience server slow down. Normally in those attacks, the
intrusion detection. At Signature-based intrusion detection will following packets are used.

ICCIT15@CiTech, Bangalore Page 362


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
2.3.1. Filtering
TCP floods: A stream of TCP packets with various flags set Filtering method takes place in various locations in the
are sent to the victims IP address. The SYN, ACK and RST network. It can be used at source end or core end (router) or
flags are commonly used victim end. To be effective BW-DDoS detection filtering
must occur before the congested link.
ICMP/IGMP Flood: The Internet Control Message protocol 2.3.2. Rate limiting
(ICMP) is used for sending control messages in a TCP/IP The rate limiting method used to limit or block the unwanted
network and reporting possible errors in the communication traffic flow in the network.
environment. The Internet Group management Protocol 2.3.3. Detouring
(IGMP) is a communication protocol used by the host and Detouring method used to bypass the default routing if attack
router on an IP network to establish multicast group happened in the network. It reduces the bandwidth-DDoS
memberships. The attacker launches flood attack on the target attack when some routes are congested.
with massive ICMP/IGMP message to consume the bandwidth In above mentioned methods can be applied by control
of the target. centers that may be located at different locations in network
such as source end , core end, victims end, distributed ends.
UDP flood: In UDP (user datagram protocol) flood attack is a At source end detection, the source devices identify malicious
connectionless transport protocol. The Attacker send huge packets in outgoing traffic and filter or rate-limit the traffic.
amount of UDP packets to the target. It can lead congestion in At core end detection, any router in the network can identify
victim’s bandwidth and degrade services. the malicious traffic and filter or rate limit. In case of filtering
technique, it is possibility that legitimate packets would also
2.2.2. Reflection Attack be dropped. Also it is better place to rate limit all the traffic.
Reflection attacks fool legitimate users by sending not At victim end detection, the victim detects malicious
requested response to victim hosts. The reflection attack also incoming traffic and filter or rate limits the same. It is a place
known as DRDoS (Distributed Reflection Denial of Service). It where a legitimate and traffic can be clearly distinguished.
exploits the requested responses of routers and servers Hence attack traffic reaching the victim may have several
(reflectors) and to reflect the attack traffic as well as hide the effects such as denied services and bandwidth saturation.
attack source. In our proposed system, we used distributed ends because
distribution of the methods of detection and mitigation at
different ends can be more advantage. Attack can be
identified at victim end for which an attack signature can be
generated. Based on this signature, the victim can send
request to upstream routers to rate limit such attack traffic and
we used intrusion detection system to detect attacks and
prevention system at network level.

Fig.2.2 Reflection attack Scenario III. Results and Tables

2.2.3. Amplification attack The simulation is implemented in the java platform. In our
Amplification attack most effectively uses the zombie’s simulation, we used some parameters to establish a proposed
bandwidth. Each packet sent by a compromised computer cause system to identify intruder in the network and how
transmission of large packets to the victim by non compromised performance of network affected by these attack? Simulation
machines. The response data must be larger than the request parameters are provided in table I. we implement router based
data in size. The larger the amplification means effective identification to send queries to the destination based on the
bandwidth consumption. bandwidth provided to the intermediate nodes. We observed
DNS Amplification Attack that attacker like zombies act like it’s having higher
Domain Name System (DNS) is a core service of the Internet. bandwidth to transfer query to destination. It leads to access
Since the DNS response packets are larger than the query traffic to network and delays the response from the network.
packet. The attackers send queries to the open DNS resolvers Table I Simulation parameters
with large size UDP messages and spoof the source IP address Number of nodes 10
as the target address. Upon receiving the query request, the Mac 802.11
DNS resolver will send back the resolution to the attack target.
Flooded by large quantities of resolution responses, the target Simulation Time 20sec
will suffer network congestion leading to the Bandwidth
distributed denial of services. Traffic Source CBR
Packet Size 512
2.3. Methods for Attack detection and mitigation
The proposed system used to identify the attacker and traffic in Dimension of area 800×600
the network. These methods includes filtering, rate limiting,
detouring method are used.

ICCIT15@CiTech, Bangalore Page 363


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
The simulation is carried with normal case and attack case. We Acknowledgement
simulated BW-DDoS attacks consisting of client and server and It’s my immense pleasure to express my deep sense of
in between number zombies are distributed. Each zombie has gratitude and indebtedness to my highly respected and
some bandwidth and sends its traffic to server results traffic in esteemed Miss. SaiMadhavi.D her invaluable guidance,
the network and using IDS manger find out the attacker shows inspiration, constant encouragement sincere criticism and
in fig. 3.1 shows the bandwidth Vs time. sympathetic attitude could make this paper possible

References

i. A. Mitrokotsa, and C. Douligeris, “Denial-of-Service Attacks,”


NetworkSecurity: Current Status and Future Directions (Chapter 8), Wiley
Online Library, pp. 117-134, June 2006.
ii. P. T. Inc., “Prolexic Attack Report, Q3 2014 – Q1 2013,”
http://www.prolexic.com/attackreports,.
iii. L. Zhang, S. Yu, D. Wu, and P. Watters, “A Survey on Latest
BotnetAttack and Defense,” Proc. of 10th Intl’ Conference On Trust, Security
And Privacy in Computing and Communications (TrustCom), IEEE, pp.
53-60, November 2011.
iv. A. Mishra, B.B. Gupta, and R.C. Joshi, “A Comparative Study of
Distributed Denial of Service Attacks, Intrusion Tolerance and Mitigation
Fig. 3.1.Bandwidth Vs time Techniques,” Proc. of European Intelligence and Security Informatics
Conference (EISIC), IEEE, pp. 286-289, September 2011.
v. K. W. M. Ghazali, and R. Hassan, “Flooding Distributed Denial
This graph represents heavy traffic in the network duet to the of Service Attacks-A Review,” Journal of Computer Science 7 (8), Science
attacking flow; hence congestion occurred in the network. Publications, 2011, pp. 1218-1223.
Since the proposed scheme is identified congestion in the vi. H. Beitollahi, and G. Deconinck, “Denial of Service Attacks: A
Tutorial,”, Electrical Engineering Department (ESAT), University of
network using IDS manager. Leuven, Technical Report: 08-2011-0115, August 2011
vii. N. Ahlawat, and C. Sharma, “Classification and Prevention of
IV. Conclusion Distributed Denial of Service Attacks,” International Journal of Advanced
Engineering Sciences and Technologies, vol. 3, issue 1, 2011, pp. 52-60..
viii. B. B. Gupta, P. K. Agrawal, R. C. Joshi, and M. Misra,
The ultimate goal of bandwidth DDoS attacks is targeting the “Estimating Strength of a DDoS Attack Using Multiple Regression
server and using up the bandwidth of the network device is to Analysis,”Communications in Computer and Information Science, Springer,
consume the bandwidth of the server or the link of the network 2011,vol. 133, part 3, pp. 280-289.
devices. It causes the legitimate traffic and unable to access the ix. B. B. Gupta, P. K. Agrawal, A. Mishra, and M. K. Pattanshetti,
“On Estimating Strength of a DDoS Attack Using Polynomial Regression
target system. Our proposed mechanism helps to detect Model,” Communications in Computer and Information Science,
bandwidth DDoS attacks. We believe that this is an acceptable Springer, 2011, vol. 193, part 2, pp. 244-249.
performance, given that the attack prevented has a large impact x. S. Kent, and R. Atkinson, “Security Architecture for the Internet
on the performance of the protocol. The proposed mechanism Protocol,” RFC 2401, November 1998.
can also be helped in for securing the network from other
DDoS attacks by changing the security parameters in
accordance with the nature of the attacks.

ICCIT15@CiTech, Bangalore Page 364


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Information Retrieval With Keywords Using Nearest Neighbor Search

Prathibha
Deptt of CSE, VTU, SJBIT Bangalore, Karnataka India
pattidiggavi@gmail.com

Abstract: Many search engines are used to search anything number of objects to be examined. The IR2-tree, however, also
from anywhere , this system is used to search fast nearest inherits a drawback of signature files: false hits. That is, a
neighbor using keyword in which the existing system works signature file, due to its conservative nature, may still direct the
mainly on finding top-k nearest neighbor, whereas each node search to some objects, even though they do not have all the
has to match whole querying keywords. It does not consider the keywords. To overcome such problems involved spatial inverted
density of objects in spatial space also this are low efficient for index which will convert in tuples and find the matching
incremental queries. The existing system works on Ir2-Tree but keyword through latitude and longitude using the nearest
it is not such efficient, to overcome this the triplet form of neighbour search. Spatial keyword query typically takes a
spatial index are introduced and which is accurate and consist location and a set of keywords as input parameters and returns
of efficient responsive time. the matched objects according to certain spatial constraints and
Keywords—query keywords,NN search,IR2tree textual patterns.

II. RELATED WORK


I. INTRODUCTION
Literature review is contains the points IR2 - Tree, Drawbacks
In modern century technology place an important role, it shows of the IR2-tree, Previous methods. IR2 – Tree The IR2 – Tree
an immense presence of every person, and it has reduced human [1] combines the R-Tree and signature file. First we will review
works. Under certain circumstances they act to be tedious, Signature files. Then IR2-trees are discussed. Consider the
preferably to the middle educated people. Here in our nearest knowledge of R-trees and the best- first algorithm [1] for Near
neighboring search, it finds the nearest locations available to the Neighbor Search. Signature file is known as a hashing-based
user. We may guess that we can find solution through internet. framework and hashing -based framework is which is known as
Yes it is possible, but it takes some smothering steps to locate superimposed coding (SC)[1].
our destination. It gives a lot of inappropriate details and it takes
time to solve the process. This can be done when there is no Drawbacks of the IR2-Tree
emergency, but in the most of situation time place a predominant IR2-Tree is first access method to answer nearest neighbour
role. People search instantaneously, so time act as a crucial queries. IR2-tree is popular technique for indexing data but it
factor. Here by using our method we can easily track down the having some drawbacks, which impacted on its efficiency. The
exact place. There are easy ways to support queries that disadvantage called as false hit affecting it seriously. The
combinespatial and text features. For example, for the number of false positive ratio is large when the aim of the final
abovequery, we could first fetch all the restaurants whose menus result is far away from the query point and also when the result
contain the set of keywords and then from the retrieved is simply empty. In these cases, the query algorithm will load the
restaurants, findthe nearest one. Similarly, one could also do it documents of many objects; as each loading necessitates a
reverselyby targeting first the spatial conditions – browse all the random access, it acquires costly overhead [1].
restaurants in ascending order of their distances to the query Keyword search on spatial databases This work, mainly focus
point until encountering one whose menu has all the keywords. on finding top-k Nearest Neighbors, in this method each node
The major drawback of these straightforward approaches is that has to match the whole querying keywords. As this method
they will fail to provide real time answers on difficult inputs. A match the whole query to each node, it does not consider the
typical example is that the real nearest neighbor lies quite density of data objects in the spatial space. When number of
faraway from the query point, while all the closer neighbor are queries increases then it leads to lower the efficiency and speed.
missing at least one of the query keywords. The best method to They present an efficient method to answer top-k spatial
date for nearest neighbor search with keywords is due to Felipe keyword queries. This work has the following contributions: 1)
et al. [12]. They nicely integrate two well-known concepts: R- the problem of top-k spatial keyword search is defined. 2) The
tree [2], a popular spatial index, and signature file [11], an IR2-Tree is proposed as an efficient indexing structure to store
effective method for keyword-based document retrieval. By spatial and textual information for a set of objects. There are
doing so they develop a structure called the IR2-tree [12], which efficient algorithms are used to maintain the IR2-tree, that is,
has the strengths of both R-trees and signature files. Like R- insert and delete objects. 3) An efficient incremental algorithm is
trees, the IR2-tree preserves objects’ spatial proximity, which is presented to answer top-k spatial keyword queries using the IR2-
the key to solving spatial queries efficiently. On the other hand, Tree. Its performance is estimated and compared to the current
like signature files, the IR2-tree is able to filter a considerable approaches. Real datasets are used in our experiments that show
portion of the objects that do not contain all the query keywords, the significant improvement in execution times. Disadvantages: -
thus significantly reducing the

ICCIT15@CiTech, Bangalore Page 365


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
1. Each node has to match with querying keyword. So it affects of indexer is use to build hybrid index structures integrate text
on performance also it becomes time consuming and and location information. (3) The work of ranker is to ranks
maximizing searching space.
2. IR2-tree has some drawbacks. Disadvantages: -
1. Indexer wants to build hybrid index structures to integrate text
Processing Spatial-Keyword (SK) Queries in Geographic and location information of web pages. To textually index web
Information Retrieval (GIR) Systems. Location based pages, inverted files are a good. To spatially index web pages,
information stored in GIS database. These information entities of two-dimensional spatial indexes are used, both include different
such databases have both spatial and textual descriptions. This approaches, this cause to degrading performance of indexer.
paper proposes a framework for GIR system and focus on
indexing strategies that can process spatial keyword query. The 2. In ranking phase, it combine geographical ranking and non-
following contributions in this paper: 1) It gives framework for geographical ranking, combination of two rankings and the
query processing in Geo- graphic Information Retrieval (GIR) computation of geographical relevance may affects on
Systems. 2) Develop a novel indexing structure called KR*-tree performance of ranking.
that captures the joint distribution of keywords in space and
significantly improves performance over existing index
structures. 3) This method have conducted experiments on real III. PROPOSED SYSTEM
GIS datasets showing the effectiveness of our techniques A spatial database manages dimensional objects (such as points,
compared to the existing solutions. It introduces two index rectangles, etc.) and provides quick access to those objects. The
structures to store spatial and textual information. importance of spatial databases is, it represents entities of reality
A) Separate index for spatial and text attributes: in geometric manner. For example, locations of restaurants,
hotels, hospitals are described as points in map, whereas larger
Advantages: - extents like parks, lakes and landscapes as a mix of rectangles.
1. Easy of maintaining two separate indices. In this paper we design a proposed system called spatial inverted
2. Performance bottleneck lies in the number of candidate object index (SI-index). SI-index preserves the spatial location of data
generated during the filtering stage. points and builds R-tree on every inverted list at little space
overhead. The figure1 shows architecture diagram of Proposed
Disadvantages: - system, end user sends a spatial keyword query to the server,
1. If spatial filtering is done first, many objects may lie within a server retrieves data from database to process the query and
query is spatial extent, but very few of them are relevant to sends results back to end user. Server frequently updates
query keywords. This increases the disk access cost by location information to the database.
generating a large number of candidate objects. The subsequent
stage of keyword filtering becomes expensive.

B) Hybrid index

Advantages and limitations: -


1. When query contains keywords that closely correlated in
space, this approach suffer from paying extra disk cost accessing
R*-tree and high overhead in subsequent merging process.
Fig.1 Architecture of the proposed system
Hybrid Index Structures for Location-based Web Search.
There is more and more research interest in location-based web Let us take take an example as shown in the figure2 first we start
search, i.e. searching web content whose topic is related to a by locating the leaf node(s) containing q. Next, imagine a circle
particular place or region. This type of search contains location centered at q being expanded from a starting radius of 0; we call
information; it should be indexed as well as text information. this circle the search region. Each time the circle hits the
text search engine is set-oriented where as location information boundary of a node region, the contents of that node are put on
is two-dimensional and in Euclidean space. In previous paper we the queue, and each time the circle hits an object, we have found
see same two indexes for spatial as well as text information. This the object next nearest to q. Note that when the circle hits a node
creates new problem, i.e. how to combine two types of indexes. or an object, we are guaranteed that the node or object is already
This paper uses hybrid index structure, to handle textual and in the priority queue, since the node that contains it must already
location based queries, with help of inverted files and R*-trees. have been hit (this is guaranteed by the consistency condition).
It considered three strategies to combine these indexes namely:
1) inverted file and R*-tree double index.2) first inverted file
then R*-tree.3) first R*-tree then inverted file. It implements
search engine to check performance of hybrid structure, that
contains four parts:(1) an extractor which detects geographical
scopes of web pages and represents geographical scopes as
multiple MBRs based on geographical coordinates. (2) The work
ICCIT15@CiTech, Bangalore Page 366
International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Fig.2 The circle around query object q depicts the search region
after reporting o as next nearest object. The figure above shows the comparision result of the both
existing IR2 tree and the proposed NN search algorithm in mili
We design a variant of inverted index that is optimized for seconds and prove the effieciency of the NN search.
multidimensional points, and is thus named the spatial inverted
index (SI-index).This access method successfully incorporates V. CONCLUSION
point coordinates into a conventional inverted index with small There are many applications seen for calling a search engine
extra space, owing to a delicate compact storage scheme. thats ready to with efficiency support novel varieties of
Meanwhile, an SI-index preserves the spatial locality of data abstraction queries that are integrated with keyword search. The
points, and comes with an R-tree built on every inverted list at present solutions to such queries either incur preventative space
little space overhead. As a result, it offers two competing ways consumption or are unable to provide real time answers. The
for query processing. planned system has remedied the situation by developing an
• We can (sequentially) merge multiple lists very much like access methodology referred to as the abstraction Inverted index
merging traditional inverted lists by ids. (SI-index). Not solely that the SI-index is fairly space
• Alternatively, we can also leverage the R-trees to browse the economical, however additionally its the flexibility to perform
points of all relevant lists in ascending order of their distances to keyword-augmented nearest neighbour search in time thas at the
the query point. order of dozens of milliseconds. Moreover, because the SI-
index relies on the standard technology of inverted index, its
NN Search Algorithm readily incorporable in a business search engine that applies
huge similarity, implying its immediate industrial merits.
Figure3 shows the flow diagram of the NN search in which first
it gives the latitude and longitude with keywords and range and
then it calculates the distance of nearest one and sort the REFERENCES
distances in ascending order and filter the places and match the i. D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial
keywords with query keywords and then finally give matched databases. In Proc. of International Conference on Data Engineering (ICDE),
placed list. pages 656–665, 2008.
ii. X. Cao, L. Chen, G. Cong, C. S. Jensen, Q. Qu, A. Skovsgaard, D.
Distance Wu, and M. L. Yiu. Spatial keyword querying. In ER, pages 16–29, 2012.
Lat,lang Sorting
keyword range
calculation
Process
Filtered Places iii. G. Cong, C. S. Jensen, and D. Wu. Efficient retrieval of the top-k most
Process
relevant spatial web objects. PVLDB, 2(1):337–348, 2009.
iv. R. Hariharan, B. Hore, C. Li, and S. Mehrotra . Processing spatial
keyword (SK) queries in geographic information retrieval (GIR) systems. In
Keyword
Proc. of Scientific and Statistical Database Management(SSDBM), 2007.
Matched v. Yanwei Xu,Jihong Guan,Fengrong Li,shuigeng Zhou Scalable
matching
places list
algorithm continual top-k keyword search in relational databases. Data and knowledge
Engineering 86(2013)206-223.
Fig.3 Flow diagram of NN Search vi. Hristidis and Y. Papakonstantinou. Discover: Keyword search in
relational databases. In Proc. of Very Large Data Bases (VLDB), pages 670–
681, 2002.
IV. RESULT AND ANALYSIS I. Kamel and C. Faloutsos. Hilbert R-tree: An improved r-tree using
When look on to the comparisons between existing and proposed fractals. In Proc. of Very Large Data Bases (VLDB), pages 500–509, 1994.
system, the primary set of experiments is to check the vii. Lu, Y. Lu, and G. Cong. Reverse spatial and textual k nearest
neighbor search. In Proc. of ACM Management of Data (SIGMOD), pages 349–
performance of various mixtures of fast neighbour search and 360, 2011.
existing search methods. All methods are tested below two viii. S. Stiassny. mathematical analysis of various superimposedcoding
request patterns: information analysis and results. In additional methods. Am. Doc., 11(2):155–169, 1960.
specific the chapter particularly curious about the overall ix. D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M.
Kitsuregawa. Keyword search in spatial databases: Towards searching by
number of results and search delay during a spatial data search document. In Proc. of International Conference on Data Engineering (ICDE),
and also the average interval of an information extraction since pages 688–699, 2009.
they are the dominant factors affecting service quality
experienced by the users.

ICCIT15@CiTech, Bangalore Page 367


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Mining the Frequent Item Set Using Pre Post Algorithm


Manasa M. S., Shivakumar Dallali
Deptt of CSE, CiTech ,K R Puram, Bengaluru ,Karnataka, India

Abstract : The fundamental technique adopted in data mining method is used to construct the tree like structure and frequent
to retrieve the data from the database are apriori algorithm, fp item set for the data stored in the vertical data format.
tree, ecalt algorithm using the rule of data mining like
association rule, classification, clustering etc. The apriori There are many technique to find out the frequent item set but
algorithm traverse many times into database to generate they were having many drawbacks like it take more time to
frequent item and it take more space, where as the fp tree is construct the tree structure and they are becaming more
advantageous compare to the apriori algorithm but it will not complex to understand and it need more space to store data. So
use the memory up to mark. So to overcome the drawback of overcome this drawback, in recent years Deng and Wang gave
this algorithm, in this paper we are adopting new technique us the new technique called prepost code to generate the
called ppc-tree, this tree is constructed based on the pre -post frequent dataset, this method is based on the fp tree structure.
traverse, the prepostalgorithm it construct the tree based Here the data will be stored in the form of tree like structure.
vertical traverse of the database, it also scan the database twice The prepostcode(ppc tree) is having two step of execution. First
and construct the tree, the ppc tree look similar to fp-tree but it construct the tree like structure by traversing into data set and
tree constructed in the vertical. time and space utilized by this then using the tree structure it construct the frequent item set
algorithm is less compare to other technique,the experiment using the aprori algorithm.
show the performance, stability and scalability of the
algorithm. 2 Related work

Keywords: apriori algorithm, fp tree, ppc tree, prepost The algorithm that we are using now for mining the frequent
algorithm itemset is the combination of the apriori method and fp growth
method. The apriori methodis scanning the database and prune
1 Introduction for the frequent item set. apriori algorithm will work based on
the candidate generate and test strategy, it scan the database of
The new technique was proposed for data mining that is the n item, if item present in the database is not frequent then it
frequent item set. It was proposed by threepeople, they are generate the k candidate itemset of frequent item, then this
Agrawal, Imielinski, and Swami(1993). Since the frequent item frequent itemset will used for generating the next frequent
set is a technique it used with basic technique of datamining like item,this procedure will continue till the all frequent item
classification ,clustering, etc..And based on this technique new generate, after all frequent itemset is generated till purne the
algorithm was proposed for every application, that algorithm database. 1994), adopt the Apriori-like method.
provide more efficiency, scalability,optimal method.The
frequent item set thecnique is further classified into three group. The advantages of Apriori-like method is it provide good
performance by reducing the size of candidates. The apriori
1Technique based candidate generate-and-test strategy: this method is very expensive and here we need to scan the database
is the basic technique of dataming, here data set will generated repeatedly and then we need to check a large set of candidates in
repeatedly until all the candidates generated. Its work like first it database for matching item.
generate the one set candidate and further the first generated
candidate set is used to generate the next set of candidate set and The fp growth will store the data in database using datastructure
this set used to generate next set of candidate.This method will tree called fp tree, which will not use the candidate generation
be continued till the all candidate is generated. method and it will use the partition, divide and conquer method
to store the data. Advantages of fp tree is it will reduce the
2Technique based on divide and conquer strategy: this search space and will generate frequent time set without using
method of dataset is compressed using divide and conquer the candidate generation.
method to construct the tree like structure like fp tree and
frequent item etc…this fptree were used to understand the The ppc algorithm will work based by combining the
reduce the space and to increase the time efficiency advantages of apriori algorithm and fp growth.

3Technique based on hybrid approach: here the both 3 ppc tree


technique like candidate generate and test strategy and the divide
and conquer method is combined to generate the data set.The 3.1 problem statement of ppc tree
candidate generate and test method is used to compress the
candidate set is vertical data format and the divide conquer Let I = {i1, i2, . . . ,im} be the universal item set. Let DB = {T1,
T2, . . . ,Tn} be a transaction database, where each Tk(1 _ k _ n)
is a transaction which is a set of items such that Tk⊆ I. We also
call Aan itemset if A is a set of items. Let A be an itemset. A

ICCIT15@CiTech, Bangalore Page 368


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
transaction T is said to contain Aif and only if A ⊆ T. Let SPA the node is determined by traversing the tree in preorder, the root
be the support of itemsetA, which is the number of transactions node N is traverse before the child node of its from left to right
in DB that contain A. Let ξ be the predefined minimum support and the time N for the traversal of node is record by preorder
and |DB| be the number of transactions in DB. An itemsetA is records.thetravesing of the postorder traversal node is N visted
frequent if SPA is no less than ξ × |DB|. Given a transaction and named as root after all its child nodes were traversed from
database DB and a ξ, the problem of mining frequent itemsets is left to right.
how to discover the set of all itemsets that have support no less 3.
than ξ×|DB|, which is also called minimum threshold. fter a FP-tree is built, it will be used for frequent pattern mining
during the total process of FP-growth algorithm, which is a
3.2 overview of ppc tree recursive and complex process. However, PPC-tree is only used
ppc tree is a tree like structure it consists of the 5 value for each for generating the Pre-Post code of each node. Later, we will
node of a tree , the values are: item name, count, child nodes, find that after collecting the Pre-Post code of each frequent item
preorder and postorder. the order index of the child node is at first, the PPC-tree finishes its entire task and could be deleted.
calculating in two manner :when traversing this tree in a pre-
order manner and the order index of this node when traversing Algorithm of PPC-tree Construction
this tree in post-order manner. Input: A transaction database DB and a minimum support
threshold ξ.
let we understand the working of ppc tree by considering Output: PPC-tree, F1 (the set of frequent 1-patterns)
example, database d is in use and the support value will be Method: Construct-PPC-tree(DB, ξ) {
=20%,then firstly algorithm removes all the item frequency //Generate frequent 1-patterns
which is less than minimum support value ,and sort the (1) Scan DB once to find the set of frequent 1-patterns (frequent
frequency item which satisfy the minimum support value in items) F1 and their supports. Sort F1 in support descending
decreasing order.Then it insert the data which is having support order as If, which is the list of ordered frequent items.
value <=20% then tree is constructed by traversing in preorder //Construct the PPC-tree
and postorder with each node n list and node list.The (2) Create the root of a PPC-tree, PPT, and label it as “null”.
advantages of Pre Post code algorithm : (i) N-lists are much Scan DB again. For each transaction T in DB, arrange its
more compact than previously proposed vertical structures, (ii) frequent items into the order of Ifand then insert it into the PPC-
the support of a candidate frequent itemset can be determined tree. (This process is the same as that of FP-tree [2].)
through N-list intersection operations which have a time //Generate the Pre-Post code of each node
complexity of O(m ? n ? k), where m and n are the cardinalities (3) Scan PPC-tree by preorder traversal to generate the pre-
of the two N-lists and k is the cardinality of the resulting N-list. order. Scan PPC-tree again by postorder traversal to generate the
post-order. }
3.3 ppc tree construction and design theppc tree algorithm can be understand easily by using the
example
Definitions of ppc tree
Example: Let the transaction database be DB, the left two
Definition 1:ppc tree is a tree like structure
columns of Table 1 and support count= 40%.The PPC-tree
1.one node of tree in named s the root node ,its having null value storing the DB is shown in Figure 1. It should be noted that
and child nodes of root node is called as the subtree of the based on Algorithm 1 the PPC-tree is constructed via the last
tree. column of Table 1

2 Each node of the subtree consists of five fields namely: item- Table 1: transaction database
name, count, childNode-list, pre-order, and post-order. item –
name will specific the frequent item of that node. Count Ordered
valuesay the number of transactions presented by the portion of Id Item frequent items
the path reaching this node. childNode-listsay the number of the 1 a, c, g c, a
children of the node. pre-orderthis say about the preorder rank 2 e, a, c, b b, c, e, a
of the node. post-orderthis say about the postorder rank of the 3 f, e, c, b, i b, c, e, f
node. 4 b, f, h b, f
5 b, f, e, c, d b, c, e ,f
The difference between the fp tree and ppc tree
Obviously, the second column and the last column are
1. equivalent for Fmining frequent patterns under the given
P-tree has twofield in each node ,one is node list and another one minimum support threshold. In the last columns of Table 1, all
is header table structure its hold the connection between node infrequent items are eliminated and frequent items are listed in
which same item node ,but in ppc we don’t have that ,here we support-descending order. This ensures that the DB can be
are using preorder and post order method . efficiently represented by a compressed tree structure.
2. For Pre-Post code I generation, we traverse the PPC-tree twice by
n ppctree every node will be having the preorder filed and post preorder and postorder. After that, we get the Figure 1. In this
order field, but in fp tree we are not having this.the preoder of

ICCIT15@CiTech, Bangalore Page 369


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
figure, the node with (3,7) means that its pre-order is 3, post- 5 Conclusion
order is 7, and the item-name is b, count is 4. The ppc tree is used to retrieve the frequent item set
from database in the data mining. Using ppc tree we can
improve the fastness retrieving of the data.ppc tree use prefix
method to construct the tree structure of the data in the database.
The tree structure is constructed vertical, the database is
scanned twice and generate the frequent item set using the
apriori algorithm.
The ppc tree based on the prepost algorithm is more
efficient than the fundamental technique used in the data mining
to mine the data. We can further develop the paper by applying
parallel algorithm.

6 Reference
i. Agrawal R, Srikant R. Fast algorithms for mining association
rules[C]lProc. 20th int. conf very large data bases, VLDB. 1994, 1215487-499.
ii. Deng Z H, Wang Z H, Jiang J .I. A new algorithm for fast mining
frequent itemsets using N- lists[J]. Science China Information Sciences,
Fig: ppc tree structure 2012,55(9) 2008-2030.
iii. Savasere A, Omiecinski E, Navathe S. An efficient algorithm for
mining association rules in large databases. In: The 21th International
4 Experimental result Conference on Very Large Data Bases (VLDB'95), Zurich, 1995. 432- 443.
The three computers have same configuration, CPU is AMD iv. H. Mannila, H. Toivonen, and A. Verkamo.Efficient algorithm for
discovering association rules. AAAI Workshop on Knowledge Discovery in
Athlon dual-core processor, clocked at 2.11GHz, memory size
Databases, pp. 181-192, JuL 1994.
2G. T10I4D100K and Pumsb act as experimental data. We v. Shi Yue-mei. Hu Guo-hua. A Sampling Algorithm for Mining
compare the runtime of three algorithms Pre Post, FP tree and Association Rules in Distributed Database[C]. In:2009 First International
apriori algorithm when they are performed on the two Workshop on Database Technology and Applications, 2009, 431-434.
datasets.From experimental results shown above we know the vi. Han J, Pei J, Yin Y Mining frequent patterns without candidate
generation[C]/ACM SIGMOD Record. ACM, 2000, 29(2): 1-12.
runtime will become shorter when support increases. vii. Mobasher B, Dai H, Luo T, et aL Effective personalization based on
It'sevident.also reflects performance of the parallel algorithm is association rule discovery from web usage data[C]lProceedings of the 3rd
not as good as PrePost on small dataset.The reason is each node international workshop on Web information and data management. ACM,
needs to send message to others in clusters, but delay of network 20019-15.
bandwidth is unpredictable, so I/O operation occupies main
runtime, thus affecting the performance of the algorithm.
Contrarily, Pre Post has an advantage of data localization. But
when the dataset is large, PrePost at a lower support threshold
can not be performed due to memory overflow.

Graph1: Run time using the prepost ,fp tree, apriori algorithm

ICCIT15@CiTech, Bangalore Page 370


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Design and Implementation of Research Proposal Selection

Neelambika Biradar, Prajna M., Dr. Antony P J


Deptt. of CSE, KVGCE Sullia D.K
neelambika.biradar1@gmail.com, prajnamanu@gmail.com, antonypjohn@gmail.com

II. LITERATURE SURVEY


Abstract— As a large number of research centers, Ontology patterns were introduced by Blomqvist and Sandkuhl
educational, institutes are opened day by day, the research in 2005[1]. Later the same year, Gangemi (2005) presented his
project selection has became an important tasks for different work on ontology design patterns[2]. Rainer Malik et al.,
government and private research funding agencies. As a large (2006). have used a combination of algorithms of text mining to
number of research papers are received, next step is to extract the keywords relevant for their study from various
group them according to their similarities in the research databases.
disciplines. By implementing the text mining approach,
classification of project proposals can be done The selection of research proposals in existing system is done
automatically. And ranking of proposals can be done based on manually. Here proposals needs to submit to funding agency and
the value of feature vector which gives the effective proposals according to name and keywords used in proposals are classified
in sorted fashion. The outcome will conclude us the effective into groups. These all things has done by manually means by
way of selecting the best research proposals among them. human beings.

Keywords—Ontology based text mining, Classification, But this is not suitable for large data because it might make
Clustering. misplacements of proposals to wrong groups due to manual
process. Misplacement of proposals can be happen for following
I. INTRODUCTION reasons. First, keywords might give an incomplete meaning
In computer science, ontology can be said as set of concepts about whole proposals. Second, keywords which are provided
that is knowledge within domain and relation between the by applicants may have misconception and also we can say
pairs of concepts. Ontology is used in various domains as a keywords will give only partial representation of proposals.
form of knowledge representation about the world. In this Third, manual grouping which is done by area expert.
project, ontology is a model for describing the world that gives
the mapping between the properties and relationship types III. BACKGROUND
which gives the close relationship between the ontology and real This project uses the concept of ontology based text mining
world. approach such as classification and clustering algorithms. The
proposed system builds the research ontology and applies the
Research project selection is an important task in government as decision tree algorithm to classify the data into the disciplines
well as private funding agencies. It is a challenge for multi using created ontology and then the resultant of classification is
process task that begins with a call for proposals by the funding helps to make clusters of similar data.
agencies. Earlier it was a manual method for classifying but
this method has extended from manual to automatically to be A. Ontology
done based on feature vector value. After submission of the Ontology has several technical advantages like flexibility and
proposals, the next step is to apply the preprocessing step like easily accommodates heterogeneous data. Nowadays,
data cleaning to remove all the stop words from the proposals. ontology has become a prominent in the research field
especially in computer science. Ontology is knowledge
The web technology has defined many stop words. By applying repository which defines the terms and concepts. And also
the preprocessing step like data cleaning, can remove all the stop represent the relationship between various concepts. It’s a tree
words from all the submitted proposals. The Obtained clean data like structure defined by author Gangemi A, 2005. Ontology
words can be considered as tokens, assigned with unique id to in this paper is created by submitting proposals which
each tokens. Then calculate the number of times the token has containing the keywords, which are the representation of overall
been repeated, which gives the frequency of tokens. Now apply project. Creating a list of keywords from specific area itself is
the frequency tokenized algorithm for calculating the inverse an area of ontology [2]. By creating this it will be easy to
document frequency text which gives number of classify the proposals into their respective area by checking
documents or proposals. By multiplying the frequency of number of times words have been appeared in paper.
text to the obtained, IDFT value, will get the feature vector
value. Finally the proposal whose feature vector value is highest B. Classification
that will appear at the top of list in descending order based on is Based on the data, input text data can be classified into
value. number of classes in classification. Various text mining
techniques are used for classification of text data such as

ICCIT15@Citech, Bangalore Page 371


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Support Vector, Machine, Bayesian, Decision Tree, Neural
Network, Latent Semantic Analysis, Genetic algorithm, [8] IV. PROPOSED SYSTEM
etc.
The proposed system is based on the ontology based text
Classification is the main steps involved are mining. It includes four phases. Ontology based text mining
cluster the research proposals according to their domain.
1. Document preprocessing Unstructured text is processed and extracted interesting
2. Feature extraction/selection information and knowledge by applying text mining.
3. Model selection
4. Training and testing the classifier

Pre-processing: Data pre-processing reduces the size of the


input text documents significantly. It involves the boundary like
sentence boundary, natural language stop words elimination and
stemming. Stop words are functional words which occur
frequently in language of text so that they are not useful for
classification. Feature extraction: The linked list which contains
the pre- processed data is use for collecting feature of that
document. This is done by comparing the linked list with
keywords of ontology of different area. So the refined vector
will act as feature vector for that proposal.

Model selection: Now, the way by which that paper is


categorized into research area is clustering of proposal. This is
done by many approaches but this paper use the k-means
algorithm for it. Here created ontology is used for training the
network. Here created ontology and feature vector both so it
will train then specify the corresponding research area.

Training and Testing: Here created research projects feature


vector are transfer in the form of input as the training data to
network for training. And this trained network is test with
different proposal’s feature vector so one can obtain belonging
class of proposal.

C. Clustering
Number of similar objects collected and grouped together is
called a cluster. Following are few of the definitions of the
cluster. number of documents to the frequency. And by
multiplying this value with frequency will get feature vector
1. A cluster is a set of entities which are like and entities value. Finally we can rank the papers based on the feature vector
from different clusters are not alike. value.
2. A cluster is an aggregation of points in the test space
such that distance between any two points in the cluster is less V. METHODOLOGY
than the distance between any point in cluster. In this paper research projects are clustered into specific area
3. Clusters are connected regions of multi dimensional using ontology of different areas. Following are the modules for
space containing high density of points separated by low density proposed system and is also shown in Figure 2.
of points.
Module 1: In the first module users have to submit the
4. Clustering means grouping of similar types of objects
into one cluster proposals. At a time five proposals can be submitted.
Proposals along with their abstract will be sent and those will be
stored on ontology.

Clustering is a technique used to make group of documents Module 2: By applying preprocessing step like data cleaning,
having similar features. Documents within cluster have we can remove all the stop words from proposals. Then
similar objects and dissimilar objects as compared to any obtained cleaned data will be given as input to next module.
other cluster. Clustering algorithms creates a vector of topics for
document and measures the weights of how well the
documents file into each cluster [9].

ICCIT15@Citech, Bangalore Page 372


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

( feature vector values papers will be ranked.


Module 3: The cleaned data words are called tokens which
assigns with unique id. Number of times token has been
repeated is called frequency computation.

Module 4: The Log of ratio of number of documents to the


frequency of text is called inverse document frequency text.

Figure 2: System Architecture

VI. RESULT AND DISCUSSION

Here result of proposed system has been discussed. Any


government and private funding agency will call for proposals.
Then users can submit their research proposals and by applying Figure 3: Result Analysis
pre-processing steps like data cleaning, all the stop words can
be removed. Each token have their own unique id, then calculate
number of times token has appeared which gives frequency. REFERENCES
By using appropriate formula we can calculate IDFT and
feature vector as shown in below screen shot. Based on the In i. Blomqvist E and Sandkuhl K (2005),―Patterns in ontology
engineering:Classification of ontology patterns‖, In Proceedings of the 7th
the above Figure 3, few of the feature vector value is zero, it International Conference on Enterprise Information Systems.
means that few of tokens are unique, not repeated in any of the ii. Gangemi A (2005), ―Ontology design patterns for semantic
proposals, and leads the frequency value to 1. Hence by webcontent‖, In The Semantic Web ISWC 2005 Springer.
applying Eq (1), we got the result as shown in screen shot iii. S. Bechhofer et al., OWL Web Ontology Language Reference, W3C
recommendation, vol.10, p. 2006-01, 2004.
in last two columns using Equations 1 and 2 respectively. iv. Henriksen A D and Traynor A J (1999), ―A practical R&D project
Finally based on the feature vector value papers will be sorted. selection scoring tool,‖ IEEE Trans. Eng. Manag., Vol. 46, No.
v. Y. H. Sun, J. Ma, Z. P. Fan, and J. Wang, ―A group decision
support approach to evaluate experts for R&D project selection, IEEE Trans
VII. CONCLUSION Eng. Manag., vol. 55, no. 1, pp. 158–170, Feb.2008.
This paper has presented the ontology in text mining for vi. Jian Ma, Wei Xu, Yong-hong Sun, Efraim Turban, Shouyang
Wang.An Ontology-Based Text- Mining Method to Cluster Proposals for
grouping of proposals. Research ontology is constructed to Research Project Selection,IEEE Trans an systems and humans vol.42,no.3
categorize the concepts in different discipline area. And also May2012
form a relationship among them. The text mining technique vii. jay prakash oandey et al,automatic ontology creation for research
provides different methods like classification, clustering etc. paper classification vol. 2, no 4 Nov 2013
viii. Jain, A. K., and Dubes, R. C., ―Algorithms for clustering
for extracting important information from unstructured text ata‖,Prentice- Hall.,1988.
document. Feature vector value will be calculated based on ix. Rainer Malik, Lude Franke and Arno Siebes
number of times token has been repeated in proposals with (2006),―Combination of text-mining algorithms increases the
product of inverse document frequency text value. Finally Performance‖, Bioinformatics
x. A. Maedche and V. Zacharias, Clustering “Ontology-based
proposals are ranked based on the feature vector value. The
Highest feature vector valued proposal will give effective. Metadata in the Semantic Web”. In Proceedings of the 6th European
Conference on Principles and Practice of Knowledge Discovery in
research proposal. Hence we can reduce the time consumption Databases (PKDD'02), Helsinki, Finland, pp. 342-360, 2002.
by compare to manual approach. xi. W. D. Cook, B. Golany, M. Kress, M. Penn, and T.
Raviv,―Optimal allocation of proposals to reviewers to facilitate effective
ranking,‖ Manage. Sci., vol. 51, no. 4, pp. 655–661, Apr. 2005

ICCIT15@Citech, Bangalore Page 373


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Light Weight Integrated Log Parsing Tool:: LOG ANALYZER


Priyanka Sigamani S.,Dr. D. R. Shashi Kumar
Dept. Of CSE, Cambridge Institute of Technology, Bangalore, Karnataka
Email-ID{priyanka.13scs22@citech.edu.in,hod.cse@citech.edu.in}

Abstract-The amount of content in the Log Files are domain from which the messages must be interpreted. They also
increasing drastically, manual method to find the errors in provide solution to the bugs once they are examined.
those files and fix the issue is becoming very difficult and
complex, time consuming and it is not a very efficient method Similarly the products like W4N, ViPR, NCM, ECS,
to be followed. Log Analyzer Tool overcomes all the difficulties SRM, SMARTS etc. also generates Log files which are to be
faced so far. It is highly automated with advanced analyzed by the log analyst to fix them and ensure the products
functionalities which are not provided by the other tools. Log are bug free in order to keep the product up and running without
Analyzer facilitates the Analyst’s to find bugs in the log file any interruption. If any bugs found it has to be fixed as soon as
with less efforts. Few Tools fail to open one complete file into possible to avoid other malfunctions. To find out the bugs easily
the tool to perform search, this tool not just searches the errors in the log files and to fix them at the earliest LOG ANALYZER
in the single file it searches for errors in multiple files and was developed. This tool helps the analyst to find out the bugs
even an entire folder. It displays the result to the user with within the log files in many ways, it also provides 3 main
necessary highlighters and other options. It provides simple functionalities to locate a bug in the Log Files.
and advanced search options to the user.
III. SCOPE
Keywords- Log Analyzer, Simple Search, Keyword Search,
Date and Time Range Search. This project ensures finding bugs within the log files providing
the below features:-
I. INTRODUCTION
 Standalone Desktop Based Tool.
Log analysis (or system and network log analysis) is an
 Easy, Friendly User interface.
art and science seeking to make sense out of computer-generated
records. The process of creating such records is  Ability to perform search on single, multiple files.
called datalogging. Logs are emitted by network devices,  Support for Multiple Products: Logs from multiple
operating systems, applications and all manner of intelligent or products like W4N, ViPR, and NCM etc…can be searched
programmable device .A stream of messages in time-sequence within the framework.
often comprise a log. Logs may be directed to files and stored on  Supports Multiple File Type
disk, or directed as a network stream to a log collector. Log  Search Entire Folder.
messages must usually be  Multiple Search Option.
interpreted with respect to the internal state of its source (e.g.,  Multiple Options to Display Errors.
application) and announce security-relevant or operations-  Provide Highlighters for the Errors found.
relevant events (e.g., a user login, or a systems error).  Adjustable Panels for readability.

Logs are often created by software developers to aid in IV. LITERATURE SURVEY
the debugging of the operation of application .The syntax and
semantics of data within log messages are usually application or Literature survey is mainly carried out in order to
vendor-specific. Terminology may also vary; for example, analyze the background of the current project which helps to
the authentication of a user to an application may be described as find out flaws in the existing system & guides on which
a login, a logon, and a user connection or authentication event. unsolved problems can be workout. Log Analyzer tools are
Hence, log analysis must interpret messages within the context available in the market most of them are web based application
of an application, vendor, system or configuration in order to with limited features.
make useful comparisons to messages from different log
sources. Log message format or content may not always be fully 1. What is a Search Log?
documented.
A search log is a file (i.e., log) of the communications
II. MOTIVATION (i.e., transactions) between a system and the users of that system.
Rice and Borgman (1983) present transaction logs as a data
Task of the log analyst is to induce the system to emit collection method that automatically captures the type, content,
the full range of messages in order to understand the complete or time of transactions made by a person from a terminal with
that system. Peters (1993) views transaction logs as

ICCIT15@CiTech, Bangalore Page 374


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
electronically recorded interactions between on-line information to text-based data, such as log files, XML files, and TSV/CSV
retrieval systems and the persons who search for the information text files, as well as key data sources on the Microsoft Windows
found in those systems. For Web searching, a search log is an operating system, such as the Windows Event Log, IIS log, the
electronic record of interactions that have occurred during a registry, the File System, the Active Directory services and
searching episode between a Web search engine and users much more.
searching for information on that Web search engine. A Web
search engine may be a general-purpose search engine, a nice Piwik, Oracle log analyser, Wget are few other log analysis
search engine, a searching application on a single Website, or tool.
variations on these broad classifications. The users may be
humans or computer programs acting on behalf of humans. V. EXISTING SYSTEM
Interactions are the communication exchanges that occur
between users and the system. Either the user or the system may Presently there is no specific tool available for to find
initiate elements of these exchanges. exact bugs in the log file. Different Log Analysts use different
method to find the bugs within the log file .One such Tool which
2. How are These Interactions Collected? they currently use is Note Pad++.Note Pad++ searches a file for
multiple keywords. You can specify a list of keywords to search
The process of recording the data in the search log is for in the current file, and filter out lines that match any of the
relatively straightforward. Servers record and store the keywords in this list. It was developed mainly for analyzing log
interactions between searchers (i.e., actually Web browsers on a files where you are interested in more than one keyword and the
particular computer) and search engines in a log file (i.e., the order in which they appear.
transaction log) on the server using a software application. Thus,
most search logs are server-side recordings of interactions.  Matching lines are listed with their line numbers in a
Major Web search engines execute millions of these interactions separate panel in the plug-in window
per day.  Double clicking a matched line in this panel will take
you to the corresponding line in the original document
The server software application can record various  Options to copy the filtered lines to clipboard and
types of data and interactions depending on the file format that highlight matches in the original file
the server software supports.
 Supports case sensitive search, whole word matching
and regular expressions. Regexp is enabled by default.
3.Why Collect This Data?

Once the server collects and records the data in a file, VI. PROBLEM STATEMENT
one must analyze this data in order to obtain beneficial Note Pad++ is not efficient in all ways.
information.
 It only performs keyword search.
Few Tools are given below:-  It does not perform other advance searches like
automatic grepping the exceptions from the log file.
 PowerGREP:-PowerGREP is a powerful Windows grep  Fails to a load a log file more than few MB’s length.
tool. Quickly search through large numbers of files on your PC  Using this tool is not that effective. Log Analyst must
or network, including text and binary files, compressed archives, go through the log files line by line in order to fix a bug
MS Word documents, Excel spreadsheets, PDF files,  Time consuming
OpenOffice files, etc. Find the information you want with
 If a larger log file has to be searched it has to be first
powerful text patterns (regular expressions) specifying the form
split into many chunks, open each chunk and find the errors in it
of what you want, instead of literal text. Search and replace with
manually.
one or many regular expressions to comprehensively maintain
 To split the files some other tool should be used first
web sites, source code, reports, etc. Extract statistics and
and then open each split in Note Pad++ every time.
knowledge from logs files and large data sets.
 If there are many larger files as such the time
 Weblog expert:-WebLog Expert is a fast and powerful
complexity increases.
access log analyzer. It will give you information about your site's
visitors: activity statistics, accessed files, paths through the site,  If the issue is critical it has to be escalated within short
information about referring pages, search engines, browsers, amount of time ,if this is the case to split it and them find errors
operating systems, and more. The program produces easy-to- in each split it will affect the client’s environment who is waiting
read reports that include both text information (tables) and for the issue to get fixed.
charts. View the WebLog Expert sample report to get the general
idea of the variety of information about your site's usage it can VII. PROPOSED SYSTEM
provide. The proposed system has a lot of new features which would
help the log analysts perform the log analysis quickly and
 Log Parser Lizard:- Log Parser Lizard is a GUI for
accurately. Following are the features of the Log Analyzer.
Microsoft Logparser, definitely the best one available on the
market today. Log Parser is a very powerful and versatile query  Standalone Desktop Based Tool: The Tool will be a
software tool that provides universal query access (using SQL) standalone desktop based application which helps the users to

ICCIT15@CiTech, Bangalore Page 375


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
install them on Laptop or Personal computer. It is a light weight Start Time- Green, End Time –Green.
tool(uses minimum system resources)
 Easy, User Friendly User interface: The Tool has a  Adjustable Panels for readability: This feature is
very understandable GUI where Tool Tips for all the mainly for the Log Analyst’s readability.
components will be provided to guide the user in the correct path
in order to use the tool effectively and more efficiently.  Display the Progress of the Search: When the User is
 Ability to perform search on single and multiple performing a search notify if it is progressing and when the
files: When other tools have problem opening a single file to search is completed.
find errors this tool has a ability to perform search on multiple  Abort Operation: If the user is not interested in
files on a single selection. performing the search he\she must be able to abort the search at
 Provides folder Search option: where a search error in any point of time.
a single large file is a challenge here an entire folder could be  Open Files: The Tool also enables the Log Analyst’s to
searched. open the selected file from the tool with the supporting software
 Support for Multiple Products: Logs from multiple required to open that particular file. The files are stored on a
products like W4N, ViPR, and NCM etc…can be searched table for each of the search options. It is stored along with the
within the framework. This tool is not specific to a product. path for the Log Analyst to identify which log file it is while
 Supports Multiple File Type: The tool supports performing search on multiple files or a folder.
different files types. Following files are accepted as input file
.txt, .log. VIII. SYSTEM ARCHITECTURE.
 Multiple Search Options:
o Simple and Advance Search: In Simple Search log This section presents the architecture for the Log Analyzer Tool.
analyst can select a file or multiple files or a complete folder and
search for both Warning and severe or restrict the search for
either of one.
o Manual Keyword Based Search: In Keyword Search
log analyst can select a file or multiple files or a complete folder
and search for the keyword of the Log Analyst’s choice. Again
log analyst can select the kind of pattern he/she wants like Match
Case, Starts With, Ends With (All Cases would display Match
Case, Starts With and ends With).
o Automatic search of Standard Java Errors in the log
File: When Simple, Keyword or Date and Time range searches
are performed on the log files Java Exception are automatically
grepped and displayed to the User.
Figure 1. Log Analyser Architecture
o Search based on Date and Time Range: This is a
unique feature and more useful feature where a time range can Input: Input to the tool is Log file(s), Folder.
be specified and everything within the time range will get
displayed to user. Operation: Simple,Keyword and Date and Time Range Search
 Multiple Options to Display Errors: Errors which are are the three modules which are implemented in this project.
displayed to the user have different colour Example: - SEVERE
– Red and WARNING-Green For the number of files selected Output: Log Analyst expected parsed result.
corresponding tabbed panes gets generated and displays the
errors specific to the file names. 1. Detailed System Architecture
 Close Options: The Tabs created can be independently The detailed architecture explains how the of flow of the
closed or on right click there is option available to close the Log Analyzer Tool works Initially when the Log Analyzer
entire tabs which are opened Tool is accessed it loads the Initial screen with a tool bar
 Provide Highlighters to the Errors found: For all the and a display screen. Tool Bar contains all the components
searches performed Simple, Keyword and Date and Time Range which facilitate different operation which the Log Analyzer
Search Colour Highlighters are provide for the Log Analyst to provides and the display screen displays the analyst
locate the errors in the file. Different colours indicating severity expected result.
level.
o Simple Search: 2. Modules

SEVERE –Red, WARNING-Green This project mainly consists of 3 Modules. They are: -

o Keyword Search:  Simple Search


Searched Keyword will get a Yellow Highlight.  Keyword Search
 Date and Time Range Search
o Date and Time Range search:

ICCIT15@CiTech, Bangalore Page 376


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
user has started the search, options are given to abort the
operation. Every tab changes colour to indicate that the
operation has completed in that File. Progress Bar is provided to
indicate the operation is still in progress and also a text saying
“Search Completed” pops after the entire search has completed.

2.2 Keyword Search

Figure 2. Detailed Architecture of Log Analyzer

2.1 Simple Search

Initially the user is given option to select a single log file or


multiple log files or an entire log folder based on the users
choice to perform the simple search operation. Then user must
select files for which log parsing must be performed. After the Figure 4. Block Diagram for Keyword Search
User selects the files, the selected file names along with entire
path are displayed on a table which allows the analyst to know Keyword Search is used when the analyst already
on what component the search is performed, the tool also allows guesses what could be the reason for the failure of the product.
analyst to open the file within framework to cross check if the So this module allows the analyst to selects the Log files or entire
analyst is performing the operation on the file which is of folder of log files, then the analyst enters the possible query for
interest. It also provides options to delete single file from the which they think the failure has occurred. Selects the Matching
table or the entire list of files from the tables. Then selects one of Type (Match Case, Starts With, Ends With, All Cases). Further
the error display options (Default, Severe, and Warning). start search, here the grepped errors are displayed with yellow
highlighter. At any point of time the search could be aborted and
resumed providing flexibility for the analyst.

2.3 Date and Time Range Search

Similar to Simple and Keyword Searches Date and


Time Range Search greps the content within the Start and
End Time and displays it to the user/Analyst to study about
the product with information available from the log files.
Analyst studies the log information for which the failure
occurred in the product and provides relevant solution to
overcome the problem faced by the clients.

Figure 3. Block Diagram for Simple Search

By selecting Default the tool displays both Severe and


Warning with highlighters for user to easily locate them, on
selecting Warning or Severe it displays only them. Further the
analyst clicks on Start Search button which performs the search
on the files for the selected error display option , displaying the
grepped result by creating individual tabs for the files to avoid
confusion to know which result belongs to which file. Once the Figure 5. Block Diagram for Date and Time Range Search

ICCIT15@CiTech, Bangalore Page 377


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
IX.IMPLEMENTATION 1. Log Analyzer Build

The implementation phase of the project is where the


detailed design is actually transformed into working code. Aim
of the phase is to translate the design into a best possible
solution in a suitable programming language. This section covers
the implementation aspects of the project, giving details of the
programming language and development environment used.

1. Language Used For Implementation


Java is chosen as the programming language. Few
reasons for which Java is selected as a programming language
can be outlined as follows:-Platform Independent, object-
oriented, distributed, Rich Standard Library and mainly for the
below reason

Swing Support. Swing was developed to provide a more 2. Initial Screen after tool as Launched
sophisticated set ofGUI components than the earlier Abstract
Window Toolkit. Swing provides a native look and feel that
emulates the look and feel of several platforms, and also
supports a pluggable look and feel that allows applications to
have a look and feel unrelated to the underlying platform. It
posses these traits Platform-independence, Extensibility, Look
and feel.

2. Development Environment
A platform is a crucial element in software
development. A platform might be simply defined as “a place
to launch software”. In this project, for implementation purpose
3. Simple Search
NetBeans -IDE 8.0.1 is used.

2.1NETBEANS- IDE 8.0.1


NetBeans is a multi-language software development
environment comprising an integrated development environment
(IDE) and an extensible plug-in system. It is written primarily in
Java and can be used to develop applications in Java and, by
means of the various plug-ins, in other languages as well,
including C, C++, COBOL, Python, Perl, PHP, and others.
NetBeans employs plug-ins in order to provide all of its
functionality on top of (and including) the runtime system, in
contrast to some other applications where functionality is
typically hard coded.

The NetBeans SDK includes the NetBeans java 4. Keyword Search


development tools (JDT), offering an IDE with a built-in
incremental Java compiler and a full model of the Java source
files. This allows for advanced refactoring techniques and code
analysis.

X. INTERPRETATION OF RESULTS
The following snapshots define the results or outputs
that we will get after step by step execution of all the modules of
the system.

Below are the snapshots for the modules Simple


Search, Keyword Search and Date and Time Range Search with
different functionalities.

ICCIT15@CiTech, Bangalore Page 378


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
ACKNOWLEDGEMENT

5. Date and Time Range Search This project is supported by EMC Software & Services India
Pvt. Ltd. Bangalore. My Sincere thanks to the Escalation
Engineering Team Members for all their support and guidance to
carry out this work.

REFERENCES

[1] Oracle Learning Swing with NetBeans IDE


[2] Java Swing, O’reilly, Marc Loy, Robert Eckstein.
[3] Oracle Java Documentation Creating a GUI with Swing.
[4] Core Swing : Advanced Programming by Kim Topley.
[5] The Complete Reference Java, 7th Edition, Herbert Schildt,
McGraw Hill Professional.
[6] http://www.google.com
Start Time [7] http://www.stackoverflow.com
[8] http://www.w3schools.com

End Time

XI. CONCLUSION

This Tool reduces the time of the Analyst in finding the


bugs within the Log Files manually. It facilitates the Analyst in
finding the bugs within the Log Files or the folder consisting log
files in less time providing operations like Simple Search,
keyword Search, Date and Time Range Search within the same
framework. The selection of the search type depends on the
situation, experience of the Analyst. When loading a single Log
File onto the application was a problem Log Analyzer searches
for errors/bugs in the log file or log files or a complete folder
consisting of log files providing proper Highlighters for the
Search Type.

XII. FUTURE ENHANCEMENT

In future we could Add Remote host within the frame


work which would directly search the bugs for the product on
the client machine rather than asking the clients to make a copy
of the Log Files in a shared location which the analyst and the
client has access to. Another enhancement could be, provide an
option to Display Errors like All errors or distinct errors
where All errors option could be used when analyst is interested
to know the count of the bugs within the log file and Distinct
error option could be used to see only one occurrence of the a
error which is the cause for the failure.

ICCIT15@CiTech, Bangalore Page 379


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Towards Secure and Dependable for Reliable Data Fusion in Wireless Sensor
Networks under Byzantine Attacks
Valmeeki B.R. , Krishna Kumar. P.R., Shreemantha M.C.
Dept. of M.Tech (CSE) ,Cambridge Institute of Technology, B’lore-36
valmeeki1991@gmail.com,, rana.krishnakumar@citech.edu.in,, smchatrabana@gmail.com

Abstract - The Data Storage‘s attack is a severe attack that can data that is 1024 bits (128 bytes) and a state vector that is 512
be easily launched by a pair of external attackers in Wireless bits (64 bytes) in size, and it produces a modified state vector. It
Sensor Networks. In this attack, an attacker sniffs packets or is follow-on to the earlier hash algorithms MD5 and SHA-1, and
data at one point in the network by injecting fake contents or it is becoming increasingly important for secure internet traffic
wrong waiting time for corresponding nodes. In this system, and other authentication problems. As the SHA512 processing
the system proposes novel attackers detection and positioning involves a large amount of computations, it is critical that
scheme based on mobile (Location Based Server) LBS, which applications use the most efficient implementations available.
can not only detect the existence of Network Node attacks, but
also accurately localize the attackers for the system to The algorithm operates on 64-bit QWORDs, so the state is
eliminate them out of the network and enhancing the digital viewed as 8 QWORDs (commonly called A…H) and the input
signature value using Secure Hash Algorithm – 512(SHA-512) data is viewed as 16 QWORDs. The standard for the SHA-2
due to security reason. algorithm specifies a procedure for adding padding to the input
data to make it an integral number of blocks in length. This
Index terms – Wireless Sensor Networks, Location Based happens at a higher level than the code described in this
Server, Digital Signature, Secure Hash SHA-512. document. This paper is only concerned with updating the hash
state values for any integral number of blocks.
1. INTRODUCTION
The SHA512 algorithm is very similar to SHA256, and most of
Wireless Sensor Networks are spatially distributed autonomous the general optimization principles described in this system
sensor to monitor physical or environmental conditions such as apply here as well. The main differences in the algorithm
temperature, pressure and sound. In order to find the attacker specification are that SHA512 uses blocks, digests and data-type
details in the mobile phone, the mobile phone consists of three of computation twice the size of SHA256. In addition, SHA512
logical parts which are involved in the data exchange. The is specified with a larger number of rounds of processing (80
hardware component is the insecure communication unit of the rather than 64).
device responsible for the Bluetooth, Location Base Server
(LBS) or Mobile Device for communication with the external 2. IMPLEMENTATION
machine. The mobile user can connect with the LBS Server via
Bluetooth device to communicate with the mobile. The user will
find the Bluetooth server name and then login into mobile to
view all current attackers in the Storage Node which is Wireless
Sensor Networks.

Wireless Sensor Networks is the one of the areas in the field of


wireless communication, where in delay is particularly high.
They are promising technology in vehicular, disaster response,
under water and satellite networks. Delay tolerant networks
characterized by large end to end communication latency and the
lack of end to end path from a source to its destination and they
pose several challenges to the security of WSNs. In the network
layer there are many attacks so we consider most common types
of attacks on these networks. With the help of these attacks they
give serious damages to the network in terms of latency and data
availability. Using entropy anomaly detection algorithm
motivated by the detection of external attackers and to prevent
them from the attacking data from the outside environment.

SHA512 is one member of a family of cryptographic hash


functions that together are known as SHA-2. The basic
computation for the algorithm takes as input a block of input Fig 1: Finding the attacker in mobile devices using Location
Based Server.
ICCIT15@CiTech, Bangalore Page 380
International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Rapid growth in the Wireless technology and mobile devices to Entropy based anomaly detection scheme incorporates
deliver new types of location centric applications and services to knowledge and behavior for detecting the varying attacks in
users. Location base Service is also known as LBS, are a general wireless sensor network. Compared with existing mechanisms,
class of computer program-level services that use location data the entropy scheme achieves high filtering probability and high
to control features. As such LBS is an information system and reliability and also optimal utilization of energy. This work has
has a number of uses in social networking today as an been implemented in Java language, the results shows the
entertainment service, which is accessible with mobile devices effective data transmission in wireless sensor networks. Figure 2
through the mobile network and which uses information on the illustrates the total energy of all sensor nodes in the data
geographical position of the mobile device. This has become transmission, which also indicates the balance of energy
more and more important with the expansion of the smart phone consumption in the network and Figure 3 shows the comparison
and tablet markets as well. of time in the data transmission. The results demonstrate that the
Entropy based Anomaly Detection System scheme achieves high
Sender will first browse the file which it wants to send to the en-routing filtering probability and high reliability and also
destination, initially it redistribute the SHA512 standards. Then optimum utilization of energy.
the sender sends browsed file to the router before it delivers the
file to the destination. Once receiving the file from the sender
the router checks the details of end users and attacker details.

The attacking system may be of two types- injecting fake


contents to the particular nodes and wrong waiting time or time
delay in delivering the file to the destination.

The type of attacking system can be viewed by the mobile user


by connecting with mobile LBS via Bluetooth device. The LBS
first checks for the mobile user login verification and sends
attacker details to the mobile user. Mobile user request for the
attacker details to the Location Based Server, the LBS sends the
attacker response details to the mobile user so that user can view
the attacker details.

3. PERFORMANCE EVALUATION

Fig 3:
Comparisons of Energy Consumption in Data Transmission

4. CONCLUSION

The malicious node detection scheme for adaptive data fusion


under time varying attacks. The detection procedure is analyzed
using the entropy-defined trust model, and has shown to be
optimal from the information theory point of view. It is observed
that nodes launching dynamic attacks take longer time and more
complex procedures to be detected as compared to those
conducting static attacks via Hand Held mobile devices using
LBS and enhancing the digital signature scheme using SHA512
Standards. The adaptive fusion procedure has shown to provide
significant improvement in the system performance under both
static and dynamic attacks. Further research can be conducted on
Fig 2: Comparison of time delay in data transmission
adaptive detection under Byzantine attacks with soft decision
reports.

ICCIT15@CiTech, Bangalore Page 381


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

REFERENCES vi. Intel® Advanced Vector Extensions Programming Reference


http://software.intel.com/file/36945.
i. “Federal Information Processing Standards Publication 180-2 vii. K. Liu, J. Deng, P.K. Varshney, and K. Bala krishnan, “An
SECURE HASH STANDARD” http://csrc.nist.gov/publications/fips/fips180 Acknowledgment-Based Approach for the Detection of Routing Misbehavior in
2/fips180-2.pdf MANETs,” IEEE Trans. Mobile Computing, vol. 6, no. 5, pp. 536-550, May
ii. “Processing Multiple Buffers in Parallel to Increase Performance on 2007.
Intel® Architecture Processors” viii. K. Aberer and Z. Despotovic, “Managing Trust in a Peer-2-Peer
iii. Fast SHA256 Implementations on Intel® Architecture Processors Information System,” Proc. 10th Int’l Conf. Information and Knowledge
http://download.intel.com/embedded/processor/whi epaper/327457.pdf Management (CIKM ’01), pp. 310-317, 2001.
iv. “Fast Cryptographic Computation on Intel® Architecture Processors ix. Y. Zhu, D. Guo, and M.L. Honig, “A Message Passing Approach for
Via Function Stitching”. Joint Channel Estimation, Interference Mitigation and Decoding,” IEEE Trans.
v. Optimized SHA512 Source Code Wireless Comm., vol. 8, no. 12, pp. 6008 6018, Dec. 2009.
http://www.intel.com/p/en_US/embedded/hwsw/te hnology/packet- x. W. Yu and K.R. Liu, “Game Theoretic Analysis of Cooperation
processing#docs Stimulation and Security in Autonomous Mobile Ad-Hoc Networks,” IEEE
Trans. Mobile Computing, vol. 6, no. 5, pp. 507 521, May 2007.

ICCIT15@CiTech, Bangalore Page 382


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Identity and Access Management To Encrypted Cloud Database

Archana A., Dr. Suresh L., Dr. Chandrakanth Naikodi


Deptt of CSE, Cambridge Institute of Technology, Bangalore, India

Abstract—Security is one of the top concerns about cloud databases information is based on one master key shared by all
computing and the on demand business model. Worries over users
data privacy and financial exposure from data breaches may
be the cloud service providers greatest road backs to new The enforcement of access control policies through encryption
business. As the cloud infrastructure grows so does the schemes guarantees that data outsourced to the public cloud
presence of unsecured privileged identities that hold elevated databases are always managed in an encrypted way, thus
permission to access data, run program and change guaranteeing confidentiality for data in use and at rest in the
configuration settings. when the data is place in cloud ,the cloud. It minimizes information leakage in the case of user key
cloud provider should ensure the security and availability of loss or a compromised client machine and even in worst scenario
data. Encryption helps in securing the data but still is not where a malicious but legitimate user colluders with a cloud
complete. In this paper we propose an architecture that provider personnel by disclosing his decryption keys. In such a
implements identity and access management in encrypted case a partial data leakage is inevitable but is limited to the data
cloud databases. By enforcing access control and identity set accessible by the additional information about other data that
management, the users are guarantee in their security of data. remain inviolable through standard attack techniques.
this approach minimizes the data leakage problem. the
correctness and feasibility of the proposal is demonstrated Access control is only one subset of identity management
through formal models, while the integration in a cloud base (IM).identity
architecture is left to future work.
management covers a whole range of functions such as access
Index Terms-Cloud Security, Confidentiality, Identity and control ,user provisioning, directory services, account auditing,
Access Control Management role and group management, single sign-on(sso) and privileged
account management.
I.INTRODUCTION
Access control differs from identity management in that access
In a cloud context where confidential information is placed in control is strictly concerned with providing authentication
infrastructure of untrusted 3rd parties ensuring confidentiality credentials. In this approach the point is to provide user access,
and security of data is of main importance [2][5].In order to not prove their identity. This narrow focus according to identity
fulfill these requirements there are few data management management experts, leads to cases of mistaken identity. people
choices. The original data should be accessible only by the who shouldn’t have access to system like malicious users.
trusted parties. if the data is accessed by any untrusted party then masquerade as legitimate users to gain unauthorized access. In
the data needed to be encrypted .To satisfy all these this way identity management revolves around verifying users
requirements has different levels of complexity depending on the ideally with multiple pieces of proof of their identity before
type of cloud service. There are several solutions ensuring issuing of credentials.
confidentiality .confidentiality is a major concern and can be
ensured in several ways in storage as a service(sos).but in data II.LITERATURE SURVEY
base as a service paradigm ensuring confidentiality is still an
open research area. In this context secure DBaas is used, which Security in cloud is one of the major areas of research. The
does not expose the unencrypted data to the cloud provider and survey shows that the researchers are focusing on various
ensures the DBaas qualities, such as availability (readiness of techniques to enhance the data security in cloud.
data),efficiency of data(reliability)and elastic scalability[8].
Ryan K L Ko et.al [4]studied the problems and challenged of the
The confidentiality of data stored in cloud can be achieved trusted cloud, where the unauthorized user can access the entire
through encryption but must guarantee that all decryption keys data without disturbing the actual user. An unauthorized person
are managed by the tenant(client)/end-user and never by the may do two things which is accessing the data and putting
cloud provider. We cannot adopt the transparent data encryption duplicate data because cloud storage provides geographical
feature[7][1] because this approach h makes it possible to build a databases. It is not a trusted one to store the data of the users.
trusted DBMS over untrusted storage. The DBMS is trusted and
For this problem Ryan K L Ko et al proposed a Trust Cloud
decrypts data before their use. therefore this approach is not
framework ,to achieve a trusted cloud to the user, to provide a
applicable to the DBaas, because we consider the cloud provider
service by making use of detective controls in cloud
is untrusted.
environment. Detecting process has accountability access with
Even the proposal of the main authors in[8]has some risks of the cloud. Here user is a responsible person for their data, hence
information leakage because the encryption of the cloud user must tell the accountability with the cloud. Here user is a

ICCIT15@Citech, Bangalore Page 384


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
responsible person for their data, hence user must tell the problem from the untrusted one. Hence this approach provides
accountability with the technical and policy based services. by privacy, security ,accountability and auditability.
providing the accountability through user it may solve the
Muhammad RizwanAsgharet.al[4] discusses the problems of Figure 1 evidence the considered scenario, where the cloud
enforcing security policies in cloud environment. With the high databases stores all the tenant data and the tenant manages the
growth of data in cloud they where problem arises due to following types of information: access control policies, users
untrusted person access of the data. To ensure the security is credentials, root credentials tenant data denote the information
immature, they didn’t ensure for the safe data in cloud that is stored in the cloud databases. They are accessed by the
environments. Security problem is great issue, here we enforce trusted database users, such as Alice and bob, through SQL
the security for the owners data. Providing high security they operations. Access control policies are the rules of the tenant that
may have high expensive for the usersL.ferreti et al studied the define which data can be accessed by each
problem of data leakage of the legitimate user in cloud user.Forexample,Aliceauthorized(tenant) data denotes all and
environment by the cloud provider; they didn’t give better only tenant data on which Alice has legitimate access as defined
security to the user for their personal data or internal data. Main by the access control policies. Users authorized data can be
problem arises because of no encrypted data were found, and accessed by one or multiple tenant users, such as Alice & bob
also it provide the security for the front end database only and authorized data. Alice credentials include all information that
not controlled the backend databases, so the malicious attackers she requires to access and execute SQL operations on all and
may gain the data access to the outsourced data. only her authorized data. The DBA is the only tenant subject that
has access on root credentials granting complete access on cloud
Luca Ferretti[7] proposed a novel scheme that integrates data database information and data.
encryption with user access control mechanisms. It can be used
to guarantee confidentiality of data with respect to a public cloud IV. DESIGNAND IMPLEMETATIONS
infrastructure and minimize the risks of internal data leakage
even in the worst case of a legitimate user colluding with some
cloud provider personnel.

III.ENCRYPTION AND IAM MODEL

We consider a typical scenario in which a tenant organization


requires a database service from a public cloud DBaas provider.
In the tenant, there is a database administrator role(DBA) and
multiple database users. The DBA is a trusted subject. He has
complete access on all database information, and is in charge of
enforcing the access control policies of the tenant. Each tenant
user has a different level of trust and a consequent authorization
to access a specified subset of the database information. this
database view is limited by the tenant access control policies that
are implemented through authorization mechanism.

IAM technology can be used to initiate, capture, record and


manage user identities and their related access permissions in an
automated fashion. This ensures thataccess privileges are
granted according to our interpretations of policy and all
individuals and services are properly authenticated, authorized
and audited.

In IAM implementations there are four steps

*Assert inventory

*Risk assessment

*Architecture review

*implementation
Fig 1:reference model for a multi user accessing encrypted
cloud databases These steps should flow from your information security policy,
which company has already drafted.

ICCIT15@Citech, Bangalore Page 385


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Security should be provided to the entire organization beginning
from hardware, servers, routers and workstations including the
databases apart from this there is also software ,applications and
crucial data like customers and employees information and
transaction records.

First the inventory is divided based upon the risk example high
risk data and low risk data.

Then risk is determined by accessing the values of data and


determining how much loss or damage it could cause. The main
considerations are that the access control policy is based on level
of risk .high risk asserts requires stronger control. Example: A
highly expensive two factor authentications is required for
access to publicly available information.

The third is architecture review, which is considered with what


systems are running. is it windows or Unix. for windows active
directory might be the access management system of choice,
since it’s primarily designed for windows architecture. for Unix
and Linux based system it might be LDAP.

Finally the implementation is based on how many different


applications will need to be accessed. If there are multiple
applications each with their own user ID and password then
Single Sign on(SSO) system will be considered.

V. CONCLUSION

We propose an design where the IAM is integrated to the


encrypted cloud databases. The IAM mechanism helps an
organization to have an control access to critical business system
and data. This method also ensures the role based control
governance with regard to accessmanagement. It also limits the
risk of information leakage due to internal users. The paper gives
an overall idea and the formal models that demonstrate the
correctness and feasibility of the proposed scheme.

REFERENCES
i. G.Cattaneo,L.catuagno,A.Dsorbo,and P.Persiano,”The Design and
implementation of a Transparent Cryptographic File System For
Unix,”proc.FREENIX track:2001 USENIX Ann.Technical Cong.April.2001
ii. M.Armbrustetal,”A view of cloud computing,”comm..of the
ACM,vol.53,no.4,pp.50-58,2010.
iii. RyanKLKO,Peterjagadpramana,MirandaMowbray,siani Pearson,
Markus Kirchberg,QianhuiLiang,Bu Sung Lee,”Trust Cloud: A framework for
Accountability and trust in cloud computing 2011 IEEE.
iv. Muhammad RizwanAsghar,MihaelaIon,BrunoCrispo,”ESPOON
Enforcing Encrypted Security Policies in Outsourced Environment”,2011 Sixth
International conference on Availability Reliability and Security.
v. W.Jansen and T.Grance,”Guidelines on Security and Privacy in
Public Cloud Computing,” Technical Report Special Publication 800-
144,NIST,2011
vi. Luca Ferreti,MicheleColajanni and Micro Marchetti,Access Control
enforcement on query aware encrypted cloud databases IEEE 2013
vii. “OracleAdvancedsecurity”,Oracle corporation,
viii. http://www.Oracle.com/technetwork/database/options/advanced_secu
rity,April 2013.
ix. L.Ferretti,M.colajanni,and .Marchetti,”Distributed,Concurrent,and
independent Access to Encrypted cloud Databases.”IEEE Transaction on
Parallel and distributed system, 2014

ICCIT15@Citech, Bangalore Page 386


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

An Analysis of Multipath Aomdv in Mobile Adhoc Networks


S. Muthusamy, Dr. C. Poongodi
Department of IT, Bannari Amman Institute of Technology, Sathyamangalam
muthusamybecse@gmail.com, poongodic@bitsathy.ac.in

ABSTRACT: A Mobile Ad-hoc Network (MANET) is a One of the typical features of MANET is each node must be able
dynamic wireless network that can be formed without the need to act as a router to find out the optimal path to forward a packet
for any pre-existing infrastructure. It is an autonomous system with low cost. As nodes may be moving continuously, entering
of mobile nodes connected by wireless links. Each node in a and leaving the network, the topology of the network will
MANET operates as a router to forward packets and also as an change automatically. For civilian and military applications
end system. The nodes are free to move into a network in self MANETs provide an emerging technology. One of the important
manner. These nodes often changes location. Proactive, research areas in MANETs is establishing and maintaining the
Reactive and Hybrid are the three main classes of routing ad hoc network through the use of routing protocols.
protocols. A Reactive (on-demand) routing strategy is a
popular routing category for wireless ad hoc routing .The Routing In MANET
design for as Reactive routing follows the idea that each node
tries to reduce routing overhead by sending routing packets Routing based on the straight flow of data from source
whenever a communication is requested. This survey compare to destination in order to maximize the network performance. It
the performance of two on demand reactive routing protocols has two fundamental requirements on the routing protocol such
for MANETs namely Ad hoc On Demand Distance Vector as (i) The protocol should be distributed and (ii) The protocol
(AODV), and Ad-hoc On-demand Multipath Distance Vector should able to compute multiple loop-free routes while keeping
Routing (AOMDV) .AODV are reactive gateway discovery the communication overhead to a minimum.
algorithms where a mobile device of MANET connects by Attacks In MANET
gateway only when it is needed. AOMDV was designed to solve Attacks in MANET can be categorized into Passive
problem in highly dynamic ad hoc networks where link failures attack and Active attack Passive attack This attack does not
and route breaks occur commonly. AOMDV maintains routes actually disrupt the operation of the operation of the network.
for destinations and uses sequence numbers to determine the Example: Snooping is unauthorized access to another person’s
freshness of routing information to prevent routing loops in data. Active attack This attack attempts to alter or destroy the
active communication. AOMDV is a timer-based protocol and data being exchanged in the network.
provides a way for mobile nodes to respond to link breaks and
topology changes. This survey states that Performance of Challenges in MANET
AOMDV is better than AODV by Packet Delivery Ratio, Life One of the main challenges in ad-hoc networking is the
Time of Network, Life Time of System and End-to-End Delay. efficient delivery of data packets to the mobile nodes. Here the
topology is not predetermined because the network does not
Key Words: - MANETS, AODV, DSR, AOMDV, MANET, have centralized control mechanism. Routing in ad-hoc networks
Routing. can be viewed as a challenge due to the frequently shifting
1.INTRODUCTION topology.

Network The design of robust routing algorithms that adapt to


Based on information technology (IT), a network is the frequent and randomly changing network topology is another
called as a series of points or nodes interconnected by big challenge. This paper compare and evaluate the performance
communication paths or links. Networks can also be of two types of On demand routing protocols namely Ad-hoc
interconnect with other networks by routers .Networks contain On-demand Distance Vector (AODV) routing protocol which is
sub networks also. A group of interconnected computers and unidirectional path and Ad hoc On-demand Multi path Distance
peripherals wireless is capable of sharing software and hardware Vector (AOMDV) routing protocol which is multipath.
resources between many users either by using wire like cable AOMDV incurs more routing overhead and packet delay than
and. The Internet is a global network of networks. by Through a AODV but it had a better efficiency when it comes to number of
system of routers, servers, switches, and the link connecting with packets dropped and packet delivery.
each other a system that enables users of telephones or data
communications lines to exchange information over long 2. LITERATURE SURVEY
distances. Information communication is a necessary practice in
Information Era that is done by forwarding information from one
MANET node to another node. Information forwarding task is done with
A mobile ad hoc network (MANET) is a collection of the help “Routing”. Routing is a challenging task since there is
mobile nodes in a wireless architecture. MANET dynamically no central coordinator, such as base station, or fixed routers in
establishes the network in the absence of fixed infrastructure. other wireless networks that manage routing decision. Each node

ICCIT15@ Citech, Bangalore Page 387


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
act as a router/base station to forward the information, Hence a On the basis of this process a path is recorded in the intermediate
special form of routing protocols is necessary, There are ample nodes. This path identifies route and is called the reverse path.
number of routing protocols have been developed for MANETs. Since each node forwards route request message to all of its
Routing protocols for Mobile ad hoc networks broadly classified neighbors, more than one copy of the original route request
into the following categories: message arrive at a node. A unique id is assigned, when a route
1. Proactive or table-driven routing protocols (DSDV) request message is created. When a node receives the RREQ, it
2. Hybrid Routing Protocols (ZRP). checks id and the address of the initiator, if it had already
3. Reactive or on-demand routing protocols (DSR, processed that request, it discarded that message. Node that has
AODV, AOMDV). information about the path to the destination sends route reply
Message to neighbor from which it has received route request
[6]Proactive (table-driven) routing protocol is an approach message. The neighbor does the same. Due to the reverse path it
where each router can build its own routing table based on the possible. Then the route reply (RREP) message travels back
information that each router or node can learn by exchanging using reverse path. When a route reply message reaches the
information among the network’s routers. This is achieved by initiator the route is ready and the initiator start sending data
exchanging update messages between routers on a regular basis packets.The AODV protocol maintains an invariant that
to keep the routing table at each router up-to-date. Then, each destination sequence numbers monotonically increase along a
router consults its own routing table to route a packet from its valid route, thus preventing routing loops.Less memory space is
source to its destination. When a source node or an intermediate required as information of only active routes are
node consults the routing table, the path information, which is maintainedIncreases the performance.
up-to-date, is immediately available and can be used by the
node. This is because each router or node in the network [2] The Dynamic Source Routing (DSR) is an on demand source
periodically updates routes to all reachable nodes via routing protocol that employs route discovery and route
broadcasting messages that the node received from the other maintenance procedures. In DSR, each node maintains a route
nodes in the network. The advantage of these protocols hence cache with entries that are continuously updated as a node learns
maintains routing information about the available paths in the new routes. A node wishing to send a packet will first inspect its
network even if these paths are not currently used. route cache to see whether it already has a route to the
destination. If there is no valid route in the cache, the sender
[7]Hybrid routing protocols aggregates a set of nodes into zones initiates a route discovery procedure by broadcasting a route
in the network topology. Where the network is partitioned into request packet, which contains the address of the destination, the
zones and proactive approach is used within each zone to address of the source, and a unique request ID. As this request
maintain routing information. The reactive approach is used to propagates through the network, each node inserts its own
route packets between different zones. The route to a destination address into the request packet before rebroadcasting it. As a
that is in the same zone is established without delay and a route consequence, a request packet records a route consisting of all
discovery and a route maintenance procedure is required for nodes it has visited. When a node receives a request packet and
destinations that are in other zones in the network. The zone finds its own address recorded in the packet, it discards this
routing protocol (ZRP) and zone-based hierarchical link state packet and does not rebroadcast it further. A node keeps a cache
(ZHLS) routing protocol. The advantage of this protocol is of recently forwarded request packets, recording their sender
Compromise on scalability issue in relation to the frequency of addresses and request IDs, and discards any duplicate request
end-to-end connection, the total number of nodes and the packets.
frequency of topology change.
3. PROBLEM STATEMENT
[1]Ad hoc On-Demand Distance Vector Routing (AODV) is an
on-demand, single path, loop-free distance vector protocol. It A. OVERVIEW
combines the on-demand route discovery mechanism in DSR Ad-hoc On-demand Multi path Distance Vector
with the concept of destination sequence numbers from Routing (AOMDV) [9] protocol is an expansion of the AODV
DSDV.However, unlike DSR which uses source routing; AODV protocol for computing multiple loop-free and link disjoint paths
takes a hop-by-hop routing approach. The Ad hoc On-Demand [1]. The routing table entries for each destination contain a list of
Distance Vector Routing protocol enables dynamic, self-starting, the next-hops address along with the corresponding hop counts.
multi-hop routing between participating mobile nodes wishing to The same sequence number should be followed for all the next
establish and maintain an ADHOC network. hops. This helps in keeping track of a route.

[8] The operation of the protocol has two phases: route B. HOP COUNT
discovery and route maintenance. In Ad-hoc routing, when a The advertised hop count should be maintained for each
route is needed to some destination, the protocol starts route destination which is defined as the maximum hop count for all
discovery. Then the source node sends route request (RREQ) the paths .This will be used for sending route advertisements for
message to its neighbors, if those nodes do not have any the destination .An alternate path to the target defines duplicate
information about the destination node, then they send the route advertisement received for each node. If alternate path has
message to all its neighbors and so on, if any neighbor node has a less hop count than the advertised hop count for that
the information about the destination node, the node sends route destination loop freedom is guaranteed for a node by accepting
reply message to the route request message initiator. alternate paths to destination. The advertised hop count therefore

ICCIT15@ Citech, Bangalore Page 388


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
does not change for the same sequence number because of the disjoint loop-free paths in a route discovery. We assume
maximum hop count is used. The next-hop list and the that every node has a unique identifier (UID) (e.g., IP
advertised hop count are reinitialized when a route address), a typical assumption with adhoc routing protocols.
advertisement is received for a destination with a greater or simplicity, we also assume that all links are bidirectional,
sequence number. that is, a link exists between a node i to j if and only if there is a
AOMDV allows intermediate nodes to reply to RREQs link from j to i. AOMDV can be applied even in the presence of
while selecting disjoint paths. AOMDV is a better on-demand unidirectional links with additional techniques to help discover
routing protocol than AODV since it provides better statistics for bidirectional paths in such scenarios.
packet delivery and number of packets dropped.
AOMDV can be used to discover routes with node-
disjoint or link-disjoint. To find node-disjoint routes, each node SNO EXISTING(AODV) AOMDV
Fig
does not immediately reject duplicate RREQs. Each RREQs
ure 1 Distance vector concept uses Finds routes on demand
arriving via a different neighbor of the source defines a node-
1:- hop-by-hop routing approach using a route discovery
disjoint path. This is because nodes cannot be broadcast
duplicate RREQs, so any two RREQs arriving at an intermediate Arc
2 Propagation from the source Traverse these reverse paths
node via a different neighbor of the source could not have hite towards the destination back to form multiple
traversed the same node. ctur establishes multiple reverse forward paths to the
In an attempt to get multiple link-disjoint routes, the paths both intermediate nodes destination at the source and
e intermediate nodes.
destination replies to duplicate RREQs, the destination only as well as at the destination.
Des
replies to RREQs arriving via unique neighbors. After the first
ign 3 Multiple paths discovered are
hop, the RREPs follow the reverse paths, which are node disjoint loop-free and disjoint
Multiple paths discovered
and thus link-disjoint. The trajectories of each RREP may Of are loop-free and disjoint
AO also detailed description of
intersect at an intermediate node, but each takes a different route update rules used at
reverse path to the source to ensure link disjointness. MD each node and the
V multipath route discovery
4.ARCHITECTURE DESIGN procedure.
Pro
toco 4 Overhead incurred in Overhead n o t incurred
Comparison between AODV and AOMDV discovering multiple paths. in discovering multiple
l
The architecture design as shown in Figure 1 in this paths.
section is to extend the AODV protocol to compute multiple
5 It does not employ any employ special control
special control packets packets

AOMDV gives better performance as compared to AODV and


DSR in terms of packet delivery fraction and throughput but
worst in terms of end-to end delay

REFERENCES
i. W. Heinemann, A. Chandrakasan, and H. Balakrishnan, “An
Application-Specific Protocol Architecture for Wireless Micro sensor
Networks,” IEEE Trans. Wireless Comm., vol. 1, no. 4, pp. 660-670, Oct.
2002.
ii. L.B. Oliveira et al., “SecLEACH-On the Security of Clustered
Sensor Networks,” Signal Processing, vol. 87, pp. 2882-2895, 2007.
iii. P. Banerjee, D. Jacobson, and S. Lahiri, “Security and
Performance Analysis of a Secure Clustering Protocol for Sensor
Networks,”Proc. IEEE Sixth Int’l Symp. Network Computing and Applications
(NCA), pp. 145-152, 2007.
iv. K. Zhang, C. Wang, and C. Wang, “A Secure Routing Protocol for
Cluster-Based Wireless Sensor Networks Using Group Key Management,”
5. CONCLUSION Proc. Fourth Int’l Conf. Wireless Comm., Networking and Mobile Computing
(WiCOM), pp. 1-5, 2008.
This paper evaluated the performance of AODV, v. Shamir, “Identity-Based Cryptosystems and Signature Schemes,”
Proc. Advances in Cryptology (CRYPTO), pp. 47-53,1985.
AOMDV and DSR using ns-2. Comparison was based on the
vi. J. Liu et al., “Efficient Online/Offline Identity-Based Signature for
packet delivery fraction, throughput and end-to-end delay. We Wireless Sensor Network,” Int’l J. Information Security, vol. 9, no. 4, pp. 287-
concluded that in the static network (pause time 50 sec), 296, 2010.

ICCIT15@ Citech, Bangalore Page 389


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Light Weight SNMP Based Network Management and Control System for a
Homogeneous Network
Brunda Reddy H K, K Satyanarayan Reddy
Dept. of CSE(M.Tech), Cambridge Institute of Technology, B’lore-36
brundha1991@gmail.com, satyanarayanreddy.cse@citech.edu.in

Abstract—Network information helps in dissects faults and area, and there are a many number of interesting mechanisms
errors in a network. Remedying such faults and errors is a -such as ping, Trace route, DNS, address resolution protocol
major task of an organization network management system. (ARP), and SNMP- available to discover network elements
This paper introduces a mechanism that uses a Light Weight and the connectivity among them.
Simple Network Management Protocol (SNMP) - based R.Siamwalla et al. [ii] did a good work and proposed
solution that addresses discrete kinds of network devices and mechanisms to discover topology by combining ping, trace
discovers Interface-to-Interface connectivity among the route, SNMP, DNS, and ARP. These methods can able to
devices and basic information of those network devices This discover only L3-level devices, and thispaper did not propose
paper proposes a algorithm to discover network, discover any method to discover L2 – or host-level devices, though
device type, and Interface-to-Interface connectivity This paper they proved that SNMP performs better than all other
concentrates on a subnet of an organizations network. mechanisms. Yuri et al. [iii] proposed a mechanism that is
heterogeneousbut this mechanism requires ICMP spoofing in
Keywords—MIB, OID, SNMP, Topology, Subnetwork. order to get complete forwarding table, which is not allowed
in most of today’s networks. Though they did a good work in
explaining the connectivity algorithm they failed to provide
I. I NT RO DU CT I O N details on SNMP MIBs required for collecting network
Network topology is an illustration of nodes and links in a topology information. Lowekamp et al. [iv] proposed a
network and nodes are interconnected with each other. The mechanism by which they would not require complete
network topology can be classified as a physical network forwarding information of bridges; their approach
topology, which is referred as enacting of the physical contradicted of Yuri et al. [iii]. SumanPandey et al. [i]
connectivity relationships that exist among entities or nodes in a extended the work of Lowekamp et al. [iv] and proposes a
network. A physical network corresponds to many logical complete topological discovery mechanism to discover L2-
topologies, in which a network is divided into logical segments L2, L3-L3, L2-L3, and L2 and L3 to end host connectivity.
through subnets. This paper extend the work of SumanPandey et al. [i] and
An organization consists of many departments and an discovers details of each network devices discovered in the
organization’s level network consists of many subnetworks. organisations subnetwork that is supported by the SNMP and
Network topologies can constantly changes as nodes and links for those devices that does not support SNMP, this paper
join a network and network capacity is increased to deal with make use of ICMP echo request to check the device is alive or
added traffic. Keep track of network topology manually is a not and displayed ping information and some basic
frustrating and often impossible job. An inexperienced network information of the device.
administrator joining an organization faces many problems due Organization of this paper is as follows: The network
to the lack of a discovery tool. Even for the experienced person, topology discovery algorithms are explained in section 2, and
keeping track of devices and their connectivity details, without implementation is explained in section 3, and our conclusion
having a proper method of visually presenting them becomes a and future works are explained in section 4.
difficult task. In order to avoid these problems accurate
topology information is necessary for simulation, Network II. NETWORK DISCOVERY ALGORITHM
management and so on. In this section, discovering network nodes and the
Thus, there is a considerable need for automatic discovery of connections between them and details of each discovered
network topology. This paper proposes a Light Weight SNMP devices are explained. Since the approach of this paper is
based solution. The solution using SNMP is simple and mainly depending on the SNMP, it first analyzes the
effective and it is easy to use because even if the host or any Management Information Base (MIB) objects required to
device that does not support SNMP, can still find the discover the network and the devices in the network. Then
connections. This paper generates a solution that performs those MIB’s are used to discover the network, type of device,
better than other systems and generates least amount of traffic details of particular device and connectivity between switch
and network bandwidth will be less. This paper concentrates on and network devices.
subnetwork of the organization-level topological discovery.
A. MIBs used
Related work: Discovering the topology of the Internet is a Discovery mechanism used in this paper is completely based
problem that has attracted the attention of many networking re- on SNMP. Table 1 explains all the SNMP MIB Objects
searchers. Network connectivity discovery is a well known needed.

ICCIT15@Citech, Bangalore Page 388


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
TABLE I ipNetToMediaPhyAddress contains MAC address for all
MIBS USED FOR NETWORK DISCOVERY devices connected to switch.
MIB-II RFC 1213[ix],RFC 1759[vi],RFC 1514[vii]. As soon a node is discovered, all unique
sysServices, sysDescr, ifTable, ipAddrTable, ipRouteTable, ipNetToMediaNetAddress entries are used to discover
ipNetToMediaTable, hrSystemUptime, hrSystemNumUsers, another set of new nodes. One particular device connected to
hrSystemProcesses, hrMemorySize, hrStorageTable, switch can help in discovering more devices and the
hrDeviceTable, prtInputTable, prtOutputTable. discovery process is recursive.
BRIDGE-MIB for connectivity Discovery[x] For the devices that does not support SNMP, icmp echo
dot1dTpFdbAddress, dot1dTpFdbPort requests are used to check a device is alive or not. If the
device is alive display ping details and some basic
information for that device. Depending on the number of
B. Overall Network Discovery Algorithm subnetworks a device can contain multiple IP address. All the
Algorithm 1 shows the overall network connectivity discovery. IP address of a device can obtain using ipAdEntAddr object
To discover a network the basic input needed for our system is of ipAddrTable.
IP address of at least one switch used in a department or subnet
TABLE 3
in the organization; barrier information i.e., IP address of local ALGORITHM 2: RECURSIVE ALGORITHM FOR DEVICE DISCOVERY
host, community string of SNMP, SNMP version, port number. 1. Set of switch IP Address, discovering devices
Network discovery process uses an ARP cache table, ICMP through each switch
utilities to discover the devices in the networks. For each 2. ipNetToMediaNetAdress (IP address)
discovered device in the network, it checks whether the device a) for each switch, get all the IP address of
is alive or not then if device is alive it checks SNMP is devices from ipNetToMediaNetAddress
supported or not. If the device support SNMP then discover the b) if there is no ipNetToMediaNetAddress,
device type, such as a device is L2/L3 switch, workstation, and then return;
printer. c) Call ipNetToMediaNetAddress recursively
Depending on the type of the device, the appropriate MIB for all the switch IP address
information is fetched from the SNMP agents and stored in the
database. The MIB information retrieved from SNMP agent is 3. ipNetToMediaPhysAddress (MAC address)
used to find the connectivity among devices in the network. In a) for each switch, get all MAC address from
this way, discovering connections between L2/L3 switches and ipNetToMediaPhysAddress
work stations and all other devices such as printers in the b) if there is no ipNetToMediaPhysAddress,
network. then return;
TABLE 2
c) call ipNetToMediaPhysAddress
ALGORITHM 1: OVERALL ALGORITHM recursively for each switch
1. Take a switch IP Address as input D. Device Type Discovery Algorithm
2. Network devices discovery The type of device is discovered by using sysServices MIB
a) Device discovery using object and convert value of sysServices into a seven-bit
ipNetToMediaNetAddress string. Each bit corresponds to the 7 layer of OSI network
b) Device discovery using model
ipNetToMediaPhysAddress For the devices that are discovered by the switch, type can be
3. Device type discovery discovered by printer MIB and bridge MIB. If the devices that
4. Device details discovery based on device type support bridge MIB then the devices is of type switch and the
5. Connectivity discovery devices that support printer MIB , then the device is of type
printer and devices that does not support both MIBs is of type
C. Recursive Network Device Discovery Algorithm workstation or end host. In this way the input switch type and
This paper make uses of a simple, workable architecture of RFC the devices connect to switch type can be discovered.
TABLE 4
1213, RFC 1759, and RFC 1514 managed objects for managing ALGORITHM 3: DEVICE TYPE DISCOVERY ALGORITHM
TCP/IP and UDP-based networks. This paper utilizes the I. Discovering switch type
minimum and workable architecture of RFC 1213 to discover 1. For each switch given as input
network topology. This RFC defines managed objects and they 2. sysServices object is used
are standard and implemented by all vendors. The information 3. convert sysServices in to seven-bit string
found in RFC 1213 is sufficient for discovering all the devices 4. type of switch is obtained on the basis of
in the network. enabled bits of seven bit string of sysServices
Network devices are discovered by using the 5. repeat
ipNetToMediaTable object. This paper utilizes II. Discovered devices type discovery
ipNetToMediaPhysAddress and ipNetToMediaNetAddress 1. Check if device supports printer MIB, then
maintained by the ipNetToMediaTable. return printer
ipNetToMediaNetAddress contains all the IP address of the 2. Else if device supports bridge MIB, then return
devices connected to a particular switch and switch

ICCIT15@Citech, Bangalore Page 389


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
3. Else return workstation is said as learned port and the ports to which devices are
4. Repeat Discovered device type discovery for all connected physically are said as physical port of the switch.
devices discovered By this way we can also find which switch is connected to
which port of the switch. This method is heuristic but by
verifying it manually we found that this method generates
correct results for some switch of the organizations
subnetwork and this method has to be refined and we will do
this as a future work.
TABLE 5
Algorithm 4: Connectivity Discovery Algorithm
E. Device Details Discovery 1. For each switch, to discover the interface to which the
As the type of all devices is discovered, the details of devices are connected
particular type of device are retrieved from their specified a) Get the set of MAC address of the switch from
MIB. For switches, interface details, route details, and devices dot1dTpFdbAddress i.e., {Mia…….Mir}
connected details are fetched. Interface details are maintained b) Get the set of port numbers of the switch from
by the ifTable object. For route details of each switch, dot1dTpFdbport i.e., {Pia…….Pir}
ipRouteTable MIB object is used. For end hosts details c) Get the set of MAC address of the switch from
discovery, this paper utilizes RFC 1514 supported MIBs. ipNetToMediaPhysAddress i.e., {Mja……..Mjr}
Workstation details include uptime, number of users, current d) Get the set of IP address of the switch from
processes, memory size, storage details and device description. ipNetToMediaNetAddress i.e., {Nja……..Njr}
For retrieving these details hrSystemUptime object, e) To get the IP address of the device, map set1 i.e.,
hrSystemNumUsers object, hrSystemProcesses, {M…….M}and set 3 i.e., {M……..M}
hrMemorySize object, and for storage details hrStorageTable f) If any MAC address from both sets matches,
MIB is used. For description of each workstation, then the corresponding IP address and port
hrDeviceTable’shrDeviceDescr object is used. For printer number is retrieved
details discovery, this paper utilizes RFC 1759 MIB module. g) Repeat from 1
We retrieved Input, Output details of the printer. prtInputTable
is used to retrieve input details of printer and prtOutputTable is III. IMPLEMENTATION.
used for output details of each printer. We used Java 1.6, Tomcat Apache, SNMP4J API [viii], Net
beans 6.9.1 IDE with JDK and JFree Chart [v] for plotting
F. Connectivity Discovery graphs. We developed and tested our system on Red Hat Linux
An organization network is made up of distinct types of devices. 5.4 with 2.80 – GHz, Pentium 4 CPU with 512 MB RAM.
To find connections between the various types of devices and We applied our discovery system to a subnet of an
the switch is a challenging work. In this section, we find organization network: a department, ISRO (ISAC). We found
interface to interface connectivity; the Algorithm 4 explains few switches, printers, workstations and a server and gateway.
connectivity between devices of network and the switch. After In those switches one switch s a central LAN switch and it is
the type of device is discovered, the process of discovering used to connect two departments. We noticed that the number
connectivity begins. of switches, routers, is same but the number of work stations
For discovering connectivity, only Bridge MIB is used. In varied. In the devices discovered, some may not in use or not
bridge MIB of switch dot1dTpFdbTable MIB object is used. In alive. We used icmp echo request to check whether the device
this object, dot1dTpFdbAddress object contains the mac address is alive or not. For the device that is alive, we check whether
of the devices which are connected to the switch and those devices support SNMP or not. If SNMP is not supported
dot1dTpFdbport contains the port numbers of the switch to we displayed some basic information and ping details of that
which devices are connected. Using these two MIB objects we device to show that the device is alive but SNMP is not
find the connectivity between switch and the devices. Algorithm supported. we compared the time taken for the device
4 explains the procedure. discovery through each switch. Figure shows the time taken to
Since dot1dTpFdbAddress contains MAC address of the discover the number of devices. Our system took 3 seconds to
devices connected to switch we need to retrieve IP address for discover 160 devices connected to central LAN switch that
those devices. To retrieve IP address, we make a set of all mac switch 1 and 2 seconds to discover 33 devices, 33 devices, 33
address and IP address of devices connected to switch from devices and 32 devices connected to switch 2, switch 3, switch
ipNetToMediaPhysAddress and ipNetToMediaNetAddress 4 and switch 5 respectively. Our system took 3 to 4 seconds to
respectively and make a set of all mac address, port numbers discover device type and details of each device.
from dot1dTpFdbAddress and dot1dTpFdbport respectively. By
mapping the ipNetToMediaPhysAddress set and
dot1dTpFdbAddress set, the matched set of MAC address are
retrieved. For those MAC address the corresponding IP address
and port numbers are retrieved from ipNetToMediaNetAddress
set and dot1dTpFdbPort set respectively. One switch can be
connected to other switch through some interface that interface

ICCIT15@Citech, Bangalore Page 390


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
ACKNOWLEDGMENT
Switch 1 Switch 2 Switch 3
Switch 4 Switch 5 Many thanksto Dr.Suresh L, Principal, Cambridge Institute of
180 Technology, Bangalore, for his continuous encouragement in
160 every aspect during the course of work and guidance in
Number of Devices

140 betterment of the same. I would also wish to thank Shri. D K


120 Mohan, Chaiman, Cambridge Institute of Technology for
100 providingexcellentinfrastructure and platform over the course
80 of period of work.
60 I am thank full to the esteemed organization(ISRO) for giving
40 me an opportunity to do the project and I also thank full to
20 my guide B.Prabakaran for his guidance to complete this
0 project. My special thanks to Nikhil Kumar who is a
0 2 4 6 administrator in a department of ISRO for his support in
testing this project.
Time taken for discovery in seconds
Figure 1. Test Results of ISAC Department. REFERENCES

IV. CONCLUSION AND FUTURE WORK. i. SumanPandey, Mi-Jung Choi, Sung- Joo Lee, James W. Hong “IP
Network Topology Discovery Using SNMP”, POSTECH, Korea 2013.
In this paper, we focus on discovering the devices of ii. R.Siamwalla, R. Sharma, and S. Keshav, “Discovering internet
subnetwork of an organization we also discover the connectivity topology, ” Cornell Univ., Ithaca, NY, Techical Report.
between switch and devices and some details of those devices. iii. Y. Breitbart, M. Garofalakis, B. Jai, C. Martin, R. Rastogi, A.
We discovered different type of devices, including switches, Silberschatz, “Topology Discovery in Heterogeneous IP Networks: The
NetInventory System,” IEEE/ACM Transactions on Networking.
printers and end host and enhanced the already existing iv. B. Lowekamp, D. R. O’Hallaron, T. R. Gross, “Topology
technique of device type discovery. We utilized the SNMP discovery for large Ethernet networks,” ACM SIGCOMM, San Diego, CA,
mechanism, which is the most efficient mechanism and this USA, pp. 237~248.
generates the least amount of traffic in comparison to v. JFreeChart implementations, http://www.jfree.org.
vi. R. Smith, F. Wright, S. Zilles, J Gyllenskog, “Management
mechanisms in other research. Information Base for Printer” RFC 1759, IETF, March 1995
Since our discovery system is applied to a subnetwork of an vii. P. Grillo, S. Waldbusser, “Host Resources MIB”, RFC 1514,
organization, our future goal is to discover the entire September 1993.
organization’s network. For visualizing a network we aim to viii. SNMP, SNMP4J API, http://www.smp4j.org
ix. K. McCloghrie, M. Rose, “Management information Base for
represent a network in a graphical form and include more link Network Management of TCP/IP- based Internets, MIB-II,” RFC 1213,
characteristics such as link capacity and link failure on the IETF, March 1991
graphical representation. To notify the SNMP manager (client) x. E.Decker, P. Langille, A.Rijsinghani, K .McCloghrie, “Bridge
about the problem at SNMP agent (server) like disk crash at MIB,” RFC 1493, July 1993.
systems and so on we plan to use SNMP traps in the future.

ICCIT15@Citech, Bangalore Page 391


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Lagrange Based Quadratic Interpolation for Color Image Demosaicking


Shilpa N.S., Shivakumar Dalali
Dept. of Computer Science, Cambridge Institute of Technology, Bangalore.
shilpans11@gmail.com, shivakumar.dalali@gmail.com

Abstract— Digital image processing has become so popular pattern gives special importance to the number of green
over past few decades but increasing level of noise will effect sensors to mimic the human eyes greater sensitive to green
the quality of the image. Noise has to be removed to improve light .The demosaic method based on interpolation to
the image quality. The bayer color filter array (CFA) gives convert two –dimensional- bayer encoded image into the
information about the intensity of light in red, green and true color image, RGB, which is an M-by-N-by-3 array.
blue(RGB) wavelength regions. The CFA image captured by
the image sensor is then demosaick to get full color (RGB) Sensor alignment is one of the following text
image. The present work represent a novel color image strings that specifies the bayer pattern. Each string represents
demosaicking algorithm using a lagrange quadratic the order of the red, green and blue sensors by describing the
interpolation method and directional interpolation method. By four pixels in the upper left corner to the image (left-to-right,
introducing the lagarnge interpolation the interpolation top-to-bottom).
direction of the center missing color component can be
determined with minimum error. Also the center missing color
component is interpolated using the quadratic interpolation G1A R2A G3A R4A
method by exploring the intra channel correlation of the
neighboring pixels. In addition to this the present work B5A G6A B7A G8A
contributes in strengthening the image quality and provides
superior performance in both objectively and subjectively.
G9A R 10A G11A R 12A
Keywords—. Color filter array(CFA) interpolation,
demosaicking, lagrange quadratic interpolation. B13A G14A B15A G16A

I. INTRODUCTION
Fig. 1. Bayer CFA pattern.
Human eyes can perceive a few million colors. Most of these
colors can be produced by mixing just the three primary colors The existing methods were proposed to obtain a full
− red, green and blue − in varying proportions . Image sensor color image by utilizing the color differences between RGB
are used to acquire primary colors. Three separate sensors are planes. Each method has its own both the advantage and a
required for a camera to acquire an image. To reduce the cost disadvantages with respect to the interpolation .In a
and space many cameras are using a single sensor covered with demosaicking technique there are lot of challenges to
a color filter array (CFA). In the CFA-based sensor achieve interpolation with efficient and effective manner to
configuration, commonly 2×2 bayer patterns are used to acquire obtain full 24 bit color image with less degradation.
an color image as shown in fig 1. Color image contains three
RGB planes. The CFA image contains few color pixels of each In this paper proposed interpolation technique is
plane and remaining pixels are missing. Those missing color simplest approach to the demosaicking problem to treat color
components are estimated by considering the existence of planes seperately and fill in missing pixels in each plane
acquired neighboring pixels contained in the CFA image. This using a lagrange based quadratic interpolation. Advantage
process is called interpolation. The interpolation is applied to of this method is more effective in smooth region. However,
each and every missing pixel to obtain a full color image. The existing methods leads to color artifacts and lower resolution
color interpolation process is known as demosaicikng. Although in regions with texture and edge structures. To overcome
many different CFA patterns have been proposed in the camera, from these issues, proposed method reduces color artifacts
the most prevalent is the 2×2 ‘GRBG’ Bayer patter shown in and giving good resolution in the edges. Here introducing an
fig [1]. The color reproduction quality depends on the CFA interpolation in both horizontal and vertical directions and
templates and the demosaicking algorithms that are employed. seperately treated for all the three RGB planes
There are various demosaicking algorithms [1] - [8] have been independently.
proposed in the past few decades based on Bayer pattern.
The rest of the paper is organized as follows. Section II
A bayer filter array or CFA represent the arrangement describes the proposed Lagrange based Quadratic
of color filters that each and every sensor in a single sensor interpolation algorithm. Section III presents experimental
digital camera only acquired red, green and blue pixels. The results, and Section IV the conclusions.

ICCIT15@CiTech, Bangalore Page 392


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Where X is the position of the missing pixel position, X3,0 is
the position of the acquired pixel G9A, X3,1 is the position of
II. PROPOSED METHOD the acquired pixel G11A.

A. Lagrange quadratic interpolation Vertical Interpolation

The motivation of this proposed method comes by In the vertical interpolation only two pixels in the coloumn
observing the traditional demosaicking methods [1] - [8]. Due are interpolated because the pattern G6A is already acquired.
to the inaccurate edge information the center missing color The missing pixel G5I in the first column can be
component cannot be interpolated accurately because there is a obtained using equation (3) and the third column missing
inadequate information of irregular edge and texture details pixel G7I using equation (4).
exists. Here the edge directions of the neighboring pixels are
estimated in order to exploit the main direction by lagrange First column,
quadratic interpolation with localy.

B. Interpolating the missing Green component


Where Y is the position of the missing pixel position, Y1,0 is
The missing green components can be interpolated by using the position of the acquired pixel G1A, Y1,1 is the position of
the adjacent neighboring acquired pixels as shown in fig 2. So the acquired pixel G9A.
the missing green components represents, for example G2I
which is interpolated by using acquired green components of Third column,
G1A and G3A. Similarly remaining interpolated missing green
components can be obtained by using neighboring acquired
green pixels in the green plane.

Where Y is the position of the missing pixel position, Y3,0 is


the position of the acquired pixel G3A, Y3,1 is the position of
the acquired pixel G11A.

C. Interpolating the missing Red component

Fig.2. Demosaic Green plane

Horizontal Interpolation
The missing pixel G2I in the first row can be obtained
using equation 1 and the third row missing pixel G10I using
equation (2). Fig.3. Demosaic Red plane

First row, Horizontal Interpolation

First row,

Where X is the position of the missing pixel position, X1,0 is


the position of the acquired pixel G1A, X1,1 is the position of
the acquired pixel G3A. Third row,

Third row,

Vertical Interpolation

ICCIT15@CiTech, Bangalore Page 393


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

First coloumn, Third coloumn,

Second coloumn, III. Experimental Results

After applying the interpolation obtained the better results.


By observing the marking region in fig5 and fig7,in fig5
using bilinear interpolation in that marked region not giving
Third coloumn,
the detailed texture information but in fig3 in the marked
region its giving better texture details.
In fig8 and fig 10 shows the marked region in
original and demosaick images shows the difference in the
smooth region.
D. Interpolating the missing Blue Component
Original image

Horizontal Interpolation

Second row,

Fig5.original image of size 512×512 .


CFA image

Fig 4.Demosaic Blue plane

Third row,

Vertical Interpolation

First coloumn,

Second coloumn,

Fig6.CFA image of size 512×512

ICCIT15@CiTech, Bangalore Page 394


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Demosaick image Demosaick image

Fig.10.Demosaick image

Fig7.Demosaick image using lagrange interpolation IV. Conclusion


In this paper we proposed an efficient demosaicking
algorithm by applying a lagrange based quadratic
Original image interpolation method along with the horizontal and vertical
directions. This algorithm is more efficient, very simple and
consumes less time. And gives better image quality not only
in smooth regions as well as in the irregular edge and texture
details.
By using this method the true color (RGB) image
obtained independently is with less degradation. In future by
applying this method to all other directions around the
missing pixels to obtain true color image.

References
i. Xiangdong Chen,Gwanggil Jeon,Jechange Jeong”voting-based
directional interpolation method and its application to still color image
demosaicking”,vol.24.no.2,February 2014.
ii. Pekkucuksen and Y.Altunbasak,”Edge strength filter based color
filter array interpolation” IEEE Trans. Image Process..vol.21.no.1,pp.393-
397,Jan.2012.
iii. K.H. Chung and Y.H.ChaTrans.Image
Fig8.original image Process.,vol.15,no.10,pp.2944-2945,0ct,2006.
iv. N.X.lian,L.chang,Y.P.Tan,and V. Zagorodnov.”Adaptive filtering
CFA image
for color filter array demosaicking,”IEEE Trans. Image
Process.,vol.16,no.10,pp.2515-2525,oct.2007.
v. R.Lukac,K.N.Plataniotis,and D.Hatzinakos,”color image
zooming on the bayer patterns,”IEEE Trans,Circuits Syst.Video
Technol.,vol,15,no.11.pp,1457-1492,Nov,2005.
vi. D. Paliy, V. Katkovnik, R. Bilcu, S. Alenius, and K. Egiazarian,
“Spa-tially adaptive color filter array interpolation for noiseless and noisy
data,” Int. J. Imag. Syst. Technol., vol. 17, no. 3, pp. 105–122, 2007.
vii. L.Zhang,A. Wu,A.Buades,and X.Li.”color demosaicking by
local directional interpolation and non-localadaptive
thresholding,”J.Electron.Imaging,Vol.20,no.2,p,023016-16,2011.
viii. A.Buades,B.Coll,J.M.Morel, and C.Sbert,”self-similarity driven
color demosaicking,” IEEE Trans.Image Process.,vol.18,no.6,pp.1192-
1202,jun.2009.
ix. P.Simoncelli.”Image quality assessment:from error visibility to
structural similarity.” IEEETrans.Image Process.,vol.13,no.4.pp.600-
612.Apr.2004.
x. R. Lukac and K. N. Plataniotis, “A normalized model for color-
ratio based demosaicking schemes,” in Int. Conf. on Image Process., 2004,
Fig9. CFA image vol. 3, pp. 1657–1660.

ICCIT15@CiTech, Bangalore Page 395


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Case Study: Leveraging Biometrics to Big Data


Shivakumar Dalali1, Dr. Suresh L.2, Dr. ChandrakantNaikodi3
1
VTU,Belgaum
2,3
Dept. Of CSE. Cambridge Institute of Technology
shivakumar.dalali@gmail.coM, suriakls@gmail.com,chandrakant.naikodi@yahoo.in

Abstract- In order to weave the exponentially increasing billions of records is split into many small tasks each of which is
quantity of biometric data, it must be dealt with a big data running on one or multiple computing nodes [11][13].
perspective using technologies capable of processing massive One solution with respect to the outlined issues is moving
amounts of data efficiently and securely. The main challenge existing biometric technology to the big data platform that
in the biometric industry is to overcome all the threats during ensures appropriate scalability of the technology, sufficient
different phases of the biometric system development life cycle. amount of storage, parallel processing capabilities, new types of
The current biometric models emphasis’s the importance and tools to analyze the data and with the wide spread availability of
significance of big data. This paper capitalizes on the most mobile devices also provides an accessible entry point for
important challenges encountered and critical criteria’s to be various applications and services that rely on mobile clients.
followed in biometric analysis and proposes a general Hence big data biometrics analysis is capable of addressing
approach for the big data biometric analysis. issues related to the next generation of biometric technology, but
at the same time offers new analytical tools possible to be used
I. INTRODUCTION along with the existing biometric systems.
Most people in the internet authenticationuses passwords. The However moving the existing biometric technology to the
Biggest threat with password authentication approaches is the big data environment is a nontrivial task. Biometric architects,
existence of too many password account pairings for each user developers and researchers who attempt to tackle this should be
which leads to forgetting or the same user name and password aware of challenges encountered with big data [10][12].
for multiple sites [1]. One possible solution to this problem can The organization of the paper is structured as follows. In
be the use of biometric systems [2][6][14]. Biometric section 2 we concentrate on the challenges, considerations and
authentication techniques try to validate the identity of a user trends for big data in the field of biometrics. Section 3
based on his/her physiological or the behavioral traits, while concentrates on working strategies like operating territory and
their use on the internet is still relatively modest. The main the focused areas for big data biometric analysis. In section 4 we
reason is accessibility and scalability of existing biometric propose the general approach for the big data biometric analysis.
technology. Finally the paper is concluded with some comments.
Similar issues are also encountered in other deployment domains
of biometric technology such as forensics, law enforcement and II. RELATED WORK
alike. For example according to [3] the biometric databases of
the Federal Bureau of Investigation, the US State Department, Big data biometric analysis is a highly active field, which gained
Department of Defense, the development of Homeland security, popularity only a few years ago. Since the field covers a wide
and Aadhaar project in India are expected to grow significantly range of areas relating to all phases of big data analysis in
over the next few years to accommodate several hundred million biometrics, it is natural that not all possible aspects of the field is
(or even billions) of identities. Such expectations make it appropriately covered in available scientific literature. This is
necessary to devise highly scalable biometric technology, also true for big data biometric analysis [7][8].
capable of operating on enormous amount data which in turn This paper tries to cover challenges faced in the big data
induces the need for sufficient storage capacity and significant environment because big data gives many insights to the
processing power. analysis. Mean while we need to make many considerations and
mark the operating territories in the field of biometrics.
A. Big Data Mining platform
A. Challenges for Big data biometric analysis
In data mining systems, the mining algorithms require Due to improvement in the field of electronic devices and
computational intensive computing units for data analysis and multiple data collection sources produces enormous biometric
comparisons. Computing platform needs two types of resources: data. Processing of this volume, variety, value and velocity data
data and computing processors. For small scale data mining has the following challenges in the field of biometrics.
tasks a single desktop, which contains hard Disk and CPU is Handling large data: Almost every electronic sensing
sufficient. Indeed many data mining algorithms are designed for device generates some kind of digital data. However most of the
this type of problems [5][9]. data is not being used due to challenges such as storage, analysis
Big data mining will rely on cluster computers with a high and closed nature of existing biometric systems. And it also
performance computing platform. A data mining task is applies to the biometric devices and biometric data. Handling
deployed by running parallel programming tools. The role of the such huge biometric data is one real challenge and requires new
software component is to make sure that a single data mining types of storage, analytical skills and open systems.
task, such as finding best match of a query from a database with

ICCIT15@CiTech ,Bangalore Page 396


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
New biometric modes and multimode: Different types of  New biometric modes are continuously adding e.g. voice, gait,
biometric modes like new physical biometric modes, soft and scent to the existing biometric systems.
biometric modes and behavioral biometric modes are gaining  Exploitation of soft biometrics like scars (marks), tattoos are
popularity; new physical biometric modes like palm print, hand creating new requirements to the biometric system designers.
geometry, voice, signature, DNA and hand veins and soft III. WORKING STRATEGIES
biometric mode like Image Sample for knuckle biometrics, To raise existing biometric systems in par with big data
conjunctive vasculature and tattoo image needs to be addressed biometric systems needs some ideas like operating territories and
properly. New behavioral biometric modes like blinking pattern, focused areas.
ECG, EEG, gait, game strategy, keystroke dynamics, text style. A. Big data biometrics operating territory
And even the multimodal biometrics requires new techniques of In order to find insight present in the big data biometric we
fast analysis. Developing such environment is a real challenge in need some operating territories which are as follows.
the big data biometric analysis [4][16]. 1) Expand and improve open source biometric algorithms.
New processing capabilities and algorithm: Big data and 2) Refinement to the system model and tuning the existing
use of cloud, makes it very necessary to have a change in the biometric algorithms for big data biometrics.
traditional processing. Therefore the real challenge behind this is 3) Adapting big data analytics for biometrics.
to develop new types of algorithms and new types of processing 4) Enhancing visualization tools.
ideas to process the biometric data. B. Focusing areas
Interoperable system: Due to diverse modalities which can To leveraging biometric systems to the big data biometric
utilize tens of petabytes of biometric material in data storage, in systems we need to focus on the following areas:
near future these systems will not only be used to identify  Analytic stack.
individuals in near real time, they will also need to share  Pipelines: Parallel computing and algorithms.
information between multiple organizations in order to  Data visualization.
successfully accomplish a wide range of identity management  Biometric specific system modelling, biometric fusion
missions. This idea needs interoperable systems [15]. environment, visualization for biometrics and
Very large biometric repositories: Traditional biometric intelligence.
systems have begun to reach the limits of their scalability due to
a surge in enrollment and the increasing multi-modality of IV GENERAL APPROACH FOR BIG DATA BIOMETRIC
databases. To overcome these problems, need to have a large PROCESSING SYSTEM
biometric repository. The (Fig. 1.)demonstrates different components of big data
B. Considerations for big data biometric analysis biometric processing systems.
Due to different characteristic nature like volume, value, Diversified biometric data capture: For single trait, data may
variety and velocity of big data the design issues in the big data be captured from different sources. The captured data may be
biometric needs following kinds of considerations. heterogeneous and diversified. For example in face recognition,
Matching/processing algorithms: Most of the existing face images may be collected from different sources like mobile
biometric algorithms are proprietary, unique, and not cameras, cameras, surveillance cameras, face book images. Pre-
interoperable. In that some are very expensive to implement and processing may be used to extract salient features from these
maintain. Because of these reasons matching/processing diversified data.
algorithm needs to consider for refine/redefine.
Data fusion: More multimodal biometric systems are
common now-a day which makes more difficult to modal. While
fusing the different data modes needs to consider an idea of
filtering bad data because bad data can be worse than no data.
Analysis: Biometric data analysis algorithms need to
consider complex relationships and eliminate duplicate/multiple
identities.
Storage: Big data needs huge storage which could be
considered as cloud. Data needs to be accessed from “the edge”
which requires adequate throughput to meet performance,
security, classification, privacy, protection.

C. Trends in big data biometric


Big data analysis has some trends which are applied to the big
data biometric as follows:
 Keeping all inputs/samples instead of the best one which may
impact storage, processing, etc.
 Collecting biometric samples at the edge of the envelope Fig 1: General approach for big data biometric processing
creates new processing problems to eliminate duplications and system
to improving quality, orientation, and resolution. Parallel preprocessing algorithms: Preprocessing is required
to clean the samples which may be subjected to various types of
noise, inferences and prepare the samples into appropriate

ICCIT15@CiTech ,Bangalore Page 397


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
format for feature extraction or biometric analysis. xiv. Peter Peer and Jernej Blue "Building Cloud-based Biometric
services",Informatica 37, pp 115-122, 2013.
Preprocessing big data biometric needs new parallel
xv. Joshua C klontz, Brendan F Klare, Scott Klum, Anil K Jain and Mark
preprocessing algorithms which will be able to remove noise and J Burge. " Open Source Biometric recognition ", IEEE BTAS, Sept 2013.
inferences in biometric data. xvi. N. S. GirishRaoSalanke, N. Maheswari and Andrews Samraj,"An
Enhanced Intrinsic Biometric in Identifying People by Photopleythsmography
Signal", Proceedings of the ICSIP 2012, Ó Springer India 2013.
Parallel big biometric algorithms: Big data biometric analysis
xvii. S.N.S. Raghava“Iris Recognition on Handoop: a Biometrics System
needs parallel feature extraction and data analysis or data mining Implementation on Cloud Computing,” in: Proceedings of IEEE CCIS,2011.
algorithms which are suitable to meet considerations mentioned xviii. Karl Ricanek Jr.&Chris Boehnen “Facial Analytics: From Big Data
in section II B. The extracted biometric templates are stored in to Law Enforcement”Published by the IEEE Computer Society 2012.
xix. Randal E. Bryant, Randy H. Katz and Edward D. Lazowsk“Big-
big databases/cloud storage as individual references.
DataComputing: Creating revolutionary breakthroughs in commerce”
Big data biometric recognition often makes use of a comparator xx. ManasBaveja, Hongsong Yuan, and Lawrence M. Wein“Asymptotic
module which can be carried out in two different modes namely Biometric Analysis for Large Gallery Sizes”December 2010.
user verification and user identification. The former performs xxi. Anil K. Jain, Brendan Klare and UnsangPark.“Face Recognition:
Some Challenges in Forensics”
authentication based on “are you who you claimed to be” mode.
This mainly involves a straight forward “1 to 1” comparison,
where by the final verdict is a binary accept or reject decisions.
The latter performs an exhaustive “oneto many” searches on the
entire user database to solve the “who are you” question. The
main aim of user identification is to find the closest matching
identity if any exists.

V CONCLUSION
Leveraging existing biometric to Big data have an enormous
potential market value and as such attract the interest of research
and development groups from all around the world.This paper
highlights Challenges, considerations, trends, operating
territories and focused areas that need to be considered when
designing big data biometric. A general approach for big data
biometric processing systems is designed as an analysis stack.

BIBLIOGRAPHY

i. D. Balfanz,"The future of authentication",IEEE Security and Privacy,


2012, vol. 10, pp. 22-27.
ii. Edmund Kohlwey, Abel Sussman, Jason Trost, Amber Maurer Booz
Allen Hamilton "Leveraging the Cloud for Big Data Biometrics" 2011 IEEE
World Congress on Services
iii. JernejBule, Peter Peer "Fingerprint Verification as a Service in KC
CLASS" Proceedings of the 1th International CLoud Assisted ServiceS, Bled, 25
October 2012.
iv. GirishRaoSalanke N S, Dr. Maheswari N, Dr. Andrews Samraj,
S.Sadhasivam"Enhancement in the design of Biometric Identification System
based on Photoplethysmography data”, Proceedings of 2013 ICGHPC March-
2013.
v. XindongWu,Fellow, IEEE, Xingquan Zhu, Senior Member, IEEE,
Gong-Qing Wu, and Wei Ding, Senior Member, IEEE "Data Mining with Big
Data" IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
VOL. 26, NO. 1, JANUARY 2014.
vi. SharifahMumtazah Syed ahmad ,BorhanuddianMohd Ali, and Wan
Azizun Wan Adnan "Issues and challenges of biometric applications as access
control tools of information security" IJICIC. Volume 8,2012.
vii. David Hagan Biometric Solutions Architect "Biometric Systems and
Big Data", Big Data Conference May 8-9, 2012.
viii. A. Machanavajjhala and J.P. Reiter "Big Privacy: Protecting
Confidentiality in Big Data",ACMCrossroads,vol. 19, no. 1,pp. 20-23, 2012
ix. B. Huberman, "Sociology of Science: Big Data Deserve a Bigger
Audience",Nature,vol. 482, p. 308, 2012.
x. A. Labrinidis and H. Jagadish"Challenges and Opportunities with
Big Data", Proc. VLDB Endowment, vol. 5, no. 12, 2032-2033,2012.
xi. D. Luo, C. Ding, and H. Huang, "Parallelization with Multi-plicative
Algorithms for Big Data Mining", Proc. IEEE 12th Int’l Conf. Data Mining,pp.
489-498, 2012.
xii. X. Wu, K. Yu, W. Ding, H. Wang, and X. Zhu "Online Feature
Selection with Streaming Features", IEEE Trans. Pattern Analysis and Machine
Intelligence,vol. 35, no. 5, pp. 1178-1192, May 2013.
xiii. Joseph Rickert"Big Data Analysis with Revolution", R Enterprise
January 2011.

ICCIT15@CiTech ,Bangalore Page 398


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

‘DHERO’-Mobile Location Based fast Services


Bharath D., Anand S Uppar
Dept of CSE, SDIT Mangalore, DK
Bharathd363@gmail.com

ABSTRACT:Location based services is a part of mobile Location sharing application providesan personal location
multimedia services, where user can find the services and sharing service with known person based on user permission.
products.The services will support people to navigate on daily Location sharing service mainly concerns on user safety service
errands.There arenumerous application extents like mobile by directlysending user location to emergency stations like
works, shopping and sports, tourism delivery, community police stations, fire stations and hospitals. Based on user location
servicesand public transport and safety. Mobile location based emergency service providers will provide the safety service to
services will be based on standard technology like mobile particular user.It also provides guidelines on user required route
devices, wireless networks and maps. Particularly and transport all this services are based on the application called
mobilelocation based services utilize currentposition Google nearby search.
capabilities of a mobile device using GPS technology it extract
the position of the user. Based on user location nearby services
will be determined. One of the major issues in location based
service is privacy controls to users without vastly affecting
user’sservices. Main aspects of this application are providing
service, sharing and safety for users.

Key Terms: LBS Location based services

I. INTRODUCTION

In these computerized era peoples depends on


technology for daily errands. Now a days online shoppingsites,
home deliveries and mobile services gaining more popularity
because of savein timeand price.Todaymany service providing
application and messenger application implemented based on
mobile platform.
Mobile information society is developing rapidly as mobile
telecommunications moves from third to fourth generation
technology. In these computerized wordthe Internet and its Fig 1: Location based service components.
services are coming towireless devices. The convergence of
content and technology is deepening and themarket is being II. LITERATURE SURVEY
reorganised. Different actors want to reserve their place through
mobile application. Today most of peoples choose online sites This section briefly presents the related works on location based
and mobile applications to buy there required products, this services and its applications.
directly effects on growth in mobile and web technology. Key Today, we are dealing with the era of mobile
success for online e-commerce services is two principle wayof applications and online shopping sites which replace real time
marketing.First, web user interface for users to buy products marketing into online fast marketing’s.Present days, 40% of
from online sites. Second, providing a mobile applications users choose online sites to buy there required products and
interaction to users with this user can buy specific products remaining 60% peoples choose mobile applications to buy there
using application. Mobile application is key success factor for products. For example, Flipkart,Myntra, Amazon and
online business .Important fact in mobile technology and online Snapdealare professional online shoppinge-
shopping sites are quality of service and user interface standards. commerceorganizations providing services to user location.
Recent year android enable mobile phone going grad special These organizations startedwith online sites but now a days they
attention than others that it because of user friendlyenvironment are most concentrating on mobile applications because of rapid
and high level operation support features, there are many growth in mobile users andtechnology. These organization
features that android have but one of the interesting feature developed mobile applications to market their venders products
enable by android is GPS.In these GPS is the one which helps in
getting driving directions and proving location information .The Now a days ,number of android based applications increasing
purpose of providing GPS is to provide location based rapidly because recent, a developer survey conduct in April-
information is called LBS. 2013 found that 71% of mobile developers develop Android
because there were 1billion active android users per month.

ICCIT15@CiTech, Bangalore Page 399


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Mobile based marketing are becoming increasingly popular like online shopping supports, advertisements through
because of rapid growth in smart phone users and also Mobile applications.
location based services is considerably profitable opportunity for Medium and small scale business peoples presently facing
service providers and users. In MLBS set an application to problems because of lack in technology support. Because small
exploit the knowledge about geographical position to know the scale business peoples cannot spend more money for building
address to deliver the ordered product. Based on above survey web pages for indivual services. To overcome this problem we
today olx, quicker and whatapp applications gain popularity in proposed an application which plays an important role in
small peroids.These applications uses web service technique medium and small scale business level.
between database and application. Mobile Location based
service depends on the location of the users to get current III. PROPOSED SYSTEM
location of the user which require many LBS components.
Now a days there exist many location based service
We have huge number of applications for location based services
application for example location sharing with friends and
,onlineservices and nearby service providers searchingin that
parents, emergency contact based on user location etc. But there
most application work based on Google nearby key place search.
is no application which provides combination of services, here
Android applicationsrelated to LBS mainly concentrate on the we proposed the technique for combination of services.in our
current location of the user which is obtained using GPS or proposed application implemented for both service providers and
network provider. users to exchange information between them .In these
Mobile Location based services are classified into many application parallel we introduced location sharing entertainment
categories in that major types are: service; it provides an interface between users to users
Entertainment serviceapplications includes location sharing information exchange.
with friends and community and location based gaming service. A.Requirementes
Information based service is based on nearby location checking This work is designed for service providers and users. In
,localweather reports finding taxi and entertainment proposed system, two-wayinteraction for Service providers and
information’s. single user interface for normal users; first one is android mobile
Navigation services which provides navigation to particular application which providesprimary interface for users and
place, map services and traffic information’s. service providers. Second, web user interface is provided for the
E-commerce services include services like nearby store service provider, using browser support they can use e-
searching, sales information about products and mobile commerce service for their business.
transactions or billings.
Security and Tracking service is based on nearby emergency B. Architecture
service provider finder and tracking particular person activity . In these application main aspect is verify the service providers
Mobile LBS most commonly require GPS to locate user information to avoid fake registration of service providers
location with internet service .in most of the application web because fake service providers cause on users ordered service
service is common technique used to provide interface between failures .to avoid these problem we are proposed dashboard
database and mobile application. android owner application.
Most of the applications are implemented based on the e- There are two way interfaces provided for the service providers.
commerce technique because which includes nearby shop Android application, itprovides an android application interface
searching and products searching based on the user location like for both service providers and normal users. This application is
olx andquikermobile applications. developed in android studio[java,xml] with support of web
These application completely commerce service application service [php] connect with database[mysql].
because these applications can get user current location to search
their required products and display to users.
E-commerce services provide both mobile and web user
interface for users for example flipkart and amazon. This type of
services increases the quality of service providing for users.
Service provider organizations mainly depending on e-
commerce technology to market their venders’ products through
online site and now a day they back with mobile based
marketing.
E-commerce companies consider only high level venders or bulk
products proving venders. Because of these reason middleware
peoples cannot sell their product through online sites or mobile
applications.
In present, local service providers like local home delivery
services, local transports services and hotels are facing loss in Fig 2: Architecture diagram for proposed system
business due to High -technology support to large scale business

ICCIT15@CiTech, Bangalore Page 400


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Web application ,its provides an web user interface for service
providers to upload their information and view their respective
business orders from the customers.

C. Security in application
Major aspect in mobile applications is maintaining privacy in
user data from unknown authorized. Based on userauthentication
informationuser get response back from application. In mobile
application primary authentication is based on user name and
password.
In these application we are getting information through web
service to maintain URL privacy we are passing secrete key with
base URL .This provides safe information exchange between
application and database through web service.
Following mathematical statements describes the security
privacy in web service.
Universal resource link access security contains following
procedure;
URL=I(info)+key(ss)
Where
Key (ss) =>Server side secrete key,
I (info) =>Respective user information whichincludes
authenticationinformation plus service information.

IV. CONCLUSION
This section briefly presents the result works ofmobile location
based fast services application. This application provides
commerce services to userswith fast delivery services. Main
advantage of mobile location based service is service providers
are filtered based on the respective user location using system
user longitude and latitude coordinates of the mobile device user
locationwill be determined so delivery in service to users will be
comparably fast.Ithelpsin technology support for medium and
small scale business peoples by providing mobile interface to
Services providers to add their items into the application and
alsoit provides web user interface for service providers.

REFERENCES
i. Chandra, A., Jain, S., Qadeer, M.A., “GPS Locator: An Application
for Location Tracking and Sharing Using GPS for Java Enabled Handhelds,”
2011 International Conference on Computational Intelligence and
Communication Networks (CICN), pp.406-410, 7-9 Oct.2011.
ii. http://www.olx.in o
iii. http://www.olx.in
iv. Sandeep Kumar, Mohammed Abdul Qadeer, Archana
Gupta,―Location Based Services using Androidǁ , IEEE 2009.
v. Daniel J. Abadi, Peter A. Boncz, Stavros Harizopoulos, Column-
oriented Database Systems, VLDB ’09, August 24-28, 2009, Lyon, France.
vi. http://en.wikipedia.org/wiki/find
vii. Miguel C. Ferreira, Samuel R. Madden, Compression and Query
Execution within Column Oriented Databases.
viii. Daniel J. Abadi, Query Execution in Column-Oriented Database
Systems,
ix. D. J. Abadi, S. R. Madden, N. Hachem, Column-stores vs. row-stores:
how different are they really?, in: SIGMOD’08, 2008, pp.967–980.

ICCIT15@CiTech, Bangalore Page 401


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

‘Im@’- A Technique for Sharing Location

Bhavya A.1, Balapradeep K. N.2, Dr. Antony P. J.3


1
Deptt of CSE,VTU Belgaum, KVGCE Sullia, DK
2,3
Deptt of CSE,KVGCE Sullia, DK
Bhavya2535@gmail.com

Abstract: The use of mobile phones today has become a smartphone operating system it also supports execution of all
part of our daily life.Recently all the mobile phones or smart local and third party applications.There exist many open source
phones are equipped with Global positioning system (GPS) mobile platforms, butiOSfromapple, androidfrom Google,
sensors to get information about the location. LBS (Location Symbian from Symbian foundation, and windows from
based services) are used to obtain the knowledge about the Microsoft are the most popular. Android provides a platform
geographical position. There exist many applications today where any applications can be downloaded, according to
which are going to share ones location with other in the terms research has done till now more than 68,000 applications are
of location co-ordinates (longitude and latitude)that can be available and number of applications downloaded by android
viewed in the form of Google map also called as map based enabled mobile phone reached more than 1 billion.
location sharing. This paper provides a detail description of Recent year’s android enabled mobile phones going to grab
sharing location with the friends in the form of text (also called the special attention than others that is because of its features,
as text based location sharing) instead in terms of map, since there are many features that android have but one of the
map based location sharing is time consuming compared to interesting feature enabled by android is GPS. GPS is the one
text based location sharing. The Longitude and latitude (Geo- which helps in getting driving directions and proving location
graphical coordinates) properties are used to obtain the informationthe main purpose of using GPS is to provide location
location and that can be converted into text form and shared based information also called as location based service [3].
with the friends. This application is also enriched with the Location based services provide the location of person/device
Near-by services. It provides all the services near to the user and the same location can be shared with others and other
location. Here near-by services are the organizations those technology with respect to location based services is tracking
who provide the services to the user nearer to the organization the location of the other person/device this is also called as “self-
location. reporting position” instead of tracking.
The location of person/device can be obtained by location
Keywords:GPS (global positioning system),Location co-ordinates (longitude and latitude). The GPS sensors inserted
based services (LBS),longitude and latitude. inside the device sense the accurate location and obtain the
longitude and latitude of the position and display the location in
terms of Google map.
I.INTRODUCTION

In this computerized era of Facebook, watsapp, twitter etc.


sharing our day-to-day activities with friends and family has
become a trend while sharing location also plays a very
important role.
In olden day’s telephone was used to exchange the
information as the days passed technology is being updated
leading to the evolution of mobile phone, where as in mobile
phone information can be exchanged in the form of text message
and calls. Later the mobile phones are equipped with many
additional features such as Bluetooth, Wi-Fi, Internet, GPS [1]
etc. which lead into the era of smart phones. Smart phone (or
Fig. 1 LBS Components
mobile phone) is a mobile phone with same features as mobile
phone but with advanced operating system and more popular Fig 1 shows the LBS components. The LBS application
then mobile phone, some of the features of smart phones are represents the specific application means entire single
most of them have touch screen interface within digital camera application together with the smartphone implemented with
and they can run third party applications with inbuilt GPS, web sensor and server to store data, where LBS middleware is
browsing, mobile payment etc. The first smart phone is intermediate between the LBS application and core LBS
“Ericsson R380” released by Ericsson mobile communication in features. The core LBS features are location tracking to track the
the year 2000, then 2008 was the year saw the first phone that is location of the device, GIS (Geographic information system)
HTC Dream to use android. provider is used to provide the Google maps and location
collection service (LCS) used to collect the longitude and
Android is an open source platform founded by Andy Rubin
latitude information about device.
owned by Google. Android [2] is software which runson

ICCIT15@CiTech, Bangalore Page 402


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Today all smartphones have the location sensing capability for the applications which has been downloaded from the
built-in, the successful location based service can be obtained by Google because android is an open source platform for mobile
providing the accurate location co-ordinates. operating system. Anyone can upload the application in to the
Google so few may take advantage and upload application
without any security issues which leads to many computer
II.LITERATURE SURVEY crimes; such problem can be overcome by using the layered
approach to develop android application. This contains the
This section briefly presents the related works on location application sandbox to detect the suspicious application both
based services and its applications. statically and dynamically.
Today, we are dealing with the era of smart phones and The application named Nearest Friends Notification
iPhone’s, which are going to replace the bulky desktops in all Reminder [8] this application feature is to provide notification
manners. We have huge number of applications and usage where when any of friends in the user friend list moved in to the same
person walking on the roadside needs to get relevant data that location. GPS tracker will track the location of the friend only
can be obtained by location based services. GPS is a local when the friends get into the same location as the user. The
positing system becoming popular. It is easier these days to use advantage of using this application is which helps to meet a
the map by connecting to the GPS receiver to devices. GPS friend who is in same area/place.
chips are inserted into the device which obtains the accurate Google has built-in feature Search nearby [9]to search the
location of user by satellite signals and the location can be nearby locations. Which helps the person to find the nearby
viewed in terms of Google map. location together with the location it also provides the option of
The authors Chris Hulls and Alex Haro proposed an navigation and bookmark. Where navigate option provides the
application Life 360:usability of mobile devices for time use direction to location by showing the route and bookmark allows
surveys [4]in 2008. This is a family network location based the place to be marked as interest thereby saving the location and
service application which allows the family members to share retrieving the directions for them.
their location and easily communicate with each other by adding
the family members into the application which makes a family III.PROPOSED SYSTEM
circle. The main features of this application is the person can
Location based services provides the location of the device
instantly see where other family members are located, person
using GPS, location can be obtained by the geographical
can also able to share or not to share their particular location at
coordinates (longitude and latitude). There exist many
particular time and family members can also chat with each
applications today to share the location of person/device, but the
other within a circle, thereby providing the family safety and
purpose of the proposed system is to share the location with
also it gives alert when a new person enter into the circle or
friends in the text form and it also search the nearby services and
when the family member leave the circle.
display the contact of the registered nearby service providers,
Another noteworthy application related to the location where user can able to interact with the registered service
sharing is Find My Friends [5] by Apple in 2011. This providers and make their work done.
application allows the user to track other person location and can
share the location of his own with person of his choice, if a
A. Requirements
person wants to track the location of the other person then the
notification can be send to the person as a request. The location This work is designed for both users as well as for the service
can be turned on or off at any time, location of the person is providers. Both must have the smartphones that support GPS
obtained by GPS, so whenever the GPS is turned off it is and before they use the application both must be registered to the
difficult to track location and to share location. application. This application is used by the users to view the
In the paper GPS and SMS based child tracking system friends location and to view the nearby service providers, where
using smart phone[6] by A. Al-Mazloum, E.omer and M. F. A. service providers can only interact with the user who wish to use
Abdullah presented anapplication based on tracking the child
the services that has been provided by near-by service provider.
using the smart phone. This application specially developed to
provide children safety by tracking the location of the child This application is developed using androidStudio, database
using the smart phone, where once after the application is MySQL, web service using PHP. Android OS has been used for
installed by child and parent the parent can able to track the the implementation the solid reason for why we use this for
child activities. The parent can obtain the child location by implementation is to target more number of users.
sending the request message to the child to obtain the location
and child can response the request of the parent thereby parent B. Application Architecture
can view the location of the child. Here GPS is used to obtain We propose a solution to share the location and to access the
the location of the child and location can viewed in terms of nearby services using the GPS technology, GPS feature exist in
Google map their by providing the child safety. all advanced smartphone. The simple idea of this application is
In the paper titled Android based mobile application and to share and track the location of the person, and to search Near-
development and its security [7] by SuhasHolla and Mahima M by services.
Katti provided a detailed description of how to achieve security

ICCIT15@CiTech, Bangalore Page 403


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Fig.2. Architecture diagram of proposed system

1) GPS provider:GPS provider provides the exact


locationof the device/person. Almost of all the location based
services use GPS to obtain location. GPS receivers are inserted
inside the devices to get the location. GPS chips analyse the
satellite signals and identify the user location. The location can Fig.2. Friends list
be obtained by geographical coordinates.
Service providers can register into the application by
2) Mobile client: Mobile clients are the people who use
providing their location and service details, once after the
the mobile application. In this application both user and service
registration if any user get into the location of where the service
provider are called as mobile client. Mobile client can obtain the
provider exist then that service will automatically displayed in
location through the GPS and same location can be stored in the
the user’s nearby service list. Once the nearby services listed out
server database so mobile client can able to retrieve the location
into the nearby service list, user can able to see the location of
information stored in the server database at any time.
that service and same time user can interact with the service
3) Server: server is connected with the database to store provider and get their work done. The Service provider can get
the information. Server stores both location information as well to know the location of the user only when user shows the
as the user personal information. When the location of the interest, only work of the service provider is to provide the
person is changed/updated then that will be stored in the service to the users.
database,so user can able to track the other person position
which has been saved in the database. User can able to access
only the current location information person that exists in the
database.
The Fig 2 shows the architecture diagram of the proposed
system.The mobile client can be user or service provider who
uses the application. The user can able to share or track the
location of friends in the friend list, the location of the users can
be obtained through GPS provider which take the longitude and
latitude of the person location and convert it into the text form
and store it into the server database. So when user wish to view
the location of his/her friends can view by refreshing the
application then the location of each and every friend in the
friend list will be obtained from the database and display in the
text form. Fig.3 shows the friends list together with the name
and location of the person. Here location is displayed in the text
form as a status of the person. User can also able to access the
nearby services, when user click on the nearby services he/she
can obtain registered nearby services nearer to the location
thereby user can able to interact with the interested service
providers and get their work done.For ex: suppose consider you
are near some restaurant named Taj , if that restaurant is
registered to the application then it will be automatically
displayed in the service provider list, then user can click on the Fig.2. Near-by services
restaurantname and order food by interacting with them. The
restaurant service provider come to know your location details In order to use the application user and service provider
once you order foodand delivers the order to your location. Fig.4 must register into the application. The application users can
shows the near-by services with respect to the user location. register into the application as user or service provider.

ICCIT15@CiTech, Bangalore Page 404


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
4) User registration: user can register into the application
by providing user personal details such as phone number, city,
password etc. once after the registration the user personal data
and location information will be stored in the user database.
Whatever the changes takes place regarding the user everything
will be updated in the database. Once after the registration the
user can login to the application and can add the friends, share
the location, view friend’s location and also search near-by
service providers.
5) Service provider registration:service provider can
register into the application by providingname, place, service,
phone number etc. once after the registration the service
provider will be visible to users in that place. The service
provider can interact with the users in case if the user shows the
interest for his service. Service provider can see the user and
user location until provider fulfils the requirement of the user;
later service provider cannot able to see the user.
Application interface provides an interface between the
database and mobile client. The location server stores the
location information in the database, which can be stored and
retrieved at any time.

IV.CONCLUSION
Nowadays there exist many applications based on the
location based services. This application provides automatic
updating of location for every 30 seconds and sharing our
location address in text form with the friends. Where the basic
applications used to share location in the map view requires high
speed network. But in the proposed system sharing a location in
text form can be achieved through the low speed network also.
The system uses longitude and latitude properties to share a
location. The additional feature of this application is which
provides user with nearby services option, which shows all
nearby services exist near to the user location. So user can make
use of nearby service by interacting with the service of their
interest.It provides a fully secured location sharing based on
authorization, and privacy can be achieved by providing the data
only to the subscribed user.

REFERENCES
i. Chandra, A., Jain, S., Qadeer, M.A., “GPS Locator: An Application
for Location Tracking and Sharing Using GPS for Java Enabled Handhelds,”
2011 International Conference on Computational Intelligence and
Communication Networks (CICN), pp.406-410, 7-9 Oct.2011.
ii. http://en.wikipedia.org/wiki/android
iii. Sandeep Kumar, Mohammed Abdul Qadeer, Archana
Gupta,―Location Based Services using Androidǁ , IEEE 2009.
iv. Jennie W. Lai, LorelleVanno, Michael W. Link, PhD,Jennie Pearson,
HalaMakowska, Karen Benezra, and Mark Green “Life 360: usability of
mobile devices for time use surveys”AAPOR – May 14-17, 2009
v. http://en.wikipedia.org/wiki/Find_My_Friends
vi. A. Al-Mazloum, E. Omer, M. F. A. Abdullah,”GPS and SMS-Based
Child Tracking System Using Smart Phone”,International Journal of
Electrical, Computer, Electronics and Communication Engineering Vol:7,
No:2, 2013
vii. SuhasHolla, Mahima M Katti, “Android based mobile application
and development and its security” IEEE-2012.
viii. Http://blogs.wsj.com/digits/.../facebook-to-notify-users-when-friends-
are-nearby/.
ix. http://en.wikipedia.org/wiki/Nearby

ICCIT15@CiTech, Bangalore Page 405


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Self-Vanishing of Data in the Cloud Using Intelligent Storage

Shruthi Ramya N Swathi S.M Sreelatha P.K


2nd sem M.Tech 2nd sem M.Tech 2nd sem M.Tech Assistant Professor
Dept of CSE Dept of CSE Dept of CSE Dept of CSE
SVIT, Bengaluru. SVIT, Bengaluru. SVIT, Bengaluru. SVIT, Bengaluru.

ABSTRACT confidential file that are stored in the cloud. After the user
specified time expire the files and all the replicas will be self-
Cloud is meant for storing large amount of data
destructed from the cloud.
for long period of time with security. The user may
some of his confidential data in the cloud. To
Every confidential file stored in the cloud may
maintain the good consistency the cloud service
not be required for the long period of time by the user. To
provider replicates the data geographically without the
delete the file after some time vanish methodology was
permission of authorized user. As the data is confidential
proposed. In vanish methodology the secret key will
and the data is replicated and stored in different servers
be divided and stored in the distributed hash table (DHT),
the data can be misused and some malicious activity can
DHT is one of the characteristic of P2P, the node in the
be performed by the unauthorized user or by the cloud
DHT will be refreshed after every 8 hours. So the keys
service provider. In order to overcome the above conflicts
present in the node will be deleted, because of this the user
the SeDaS is proposed. Self- destructing is mainly used for
may not get the enough number of keys to decrypt the
protecting the confidential data. As the data is
file.
confidential to protect the data the user specifies the time
interval for that specific data that is stored in cloud. After
One of the disadvantage of vanish methodology is
the completion of time interval the data and the
the key cannot survive for long period of time. To overcome
replicated copies will be self- destructed without
this challenge SeDaS is proposed which is dependent on
intimating the authorized user. This paper is using active
Active Storage Framework. The SeDaS system mainly
storage and cryptographic techniques to solve above
stipulate two modules, one is self- destruct method object that
challenges.
is fraternized with each and every secret key and another is
for each secret key the survival time parameter.
Keywords: Cloud computing, self-destruction data,
active storage framework, data privacy.
SeDaS is offering:

I. INTRODUCTION
1) Key distribution algorithm is based on Shamir’s
Cloud computing plays a major role for organization algorithm which is mainly used as core algorithm
or the individuals for storing the large amount of data. to store the clients’ distributed key in the object
Cloud provides not only the storage but also provides the storage system.
services like infrastructure-as-a-service (Iaas), platform-as-a- 2) Here the object based storage interface is used to
service (Paas), software-as-a-service (Saas). Because of these store and manage the divided keys that are based on
services the organization and individuals are focusing Active Storage Framework.
towards cloud. 3) Securely deleting files and random encryption keys
that is stored in secondary storage is supported
As cloud is an internet based technology it also by SeDaS.
provides mobility so people are more interested in storing
and retrieving the personal data. The personal data may
contain passwords, passport numbers, account numbers II. RELATED WORK
and some more important documents. In spite of maintaining Levy et al. (2009) proposed “Vanish: Increasing Data
the individual files all the files can be stored in a single Privacy with Self- Destructing Data” [2]
directory in cloud. The user specifies the time for each

ICCIT15 @ CiTech Bangalore Page 406


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Personal data are cached, copied, and This mechanism is not universally applicable to

archived by third parties, often without our knowledge or all users or data types. They focus in particular on sensitive
control. We wish to ensure that all copies of certain data data that a user would prefer to see destroyed early rather
become unreadable after a user-specified time, without any than fall into the wrong hands. Vanish applications may
specific action on the part of a user, and even if an attacker compose VDOs with traditional encryption systems like PGP
obtains both a cached copy of that data and the user’s and GPG. In this case, the user will naturally need to
cryptographic keys and passwords. manipulate the PGP/GPG keys and passphrases. It does not
With the help of novel integration of cryptographic defend against denial of service
techniques, Vanish overcomes the above challenges. The goal attacks that could prevent reading of the data during its
is to self-destruct the data automatically after it is no longer lifetime.
useful. Vanish system leverage the services provided by
Tang et al. (2010) proposed “FADE: A secure overlay
decentralized, global-scale P2P infrastructures and, in
cloud storage system with File Assured Deletion” [3]
particular, Distributed Hash Tables (DHTs). DHTs are
Keeping data permanently is undesirable, as data
designed to implement a robust index-value database on a
may be unexpectedly disclosed in the future due to
collection of P2P nodes. Vanish encrypts a user’s data locally
malicious attacks on the cloud or careless management of
with a random encryption key not known to the user,
cloud operators. The challenge of achieving assured deletion
destroys the local copy of the key, and then sprinkles bits
is that we have to trust cloud storage providers to actually
(Shamir secret shares) of the key across random indices
delete data, but they may be reluctant in doing so. Also,
(thus random nodes) in the DHT.
cloud storage providers typically keep multiple backup copies
Vanish architecture of data for fault-tolerance reasons. It is uncertain,
from cloud clients’ perspectives, whether cloud providers
Data object D is taken by Vanish in order to reliably remove all backup copies upon requests of deletion.
encapsulate it into VDO. To encapsulate the data D, Vanish FADE is a secure overlay cloud storage system that
picks a random data key, K, and encrypts D with K to obtain provides fine-grained access control and assured deletion for
a cipher text C. outsourced data on the cloud, while working seamlessly atop
today’s cloud storage services. In FADE, active data files
that remain on the cloud are associated with a set of user-
defined file access policies (e.g., time expiration, read/write
permissions of authorized users), such that data files are
accessible only to users who satisfy the file access policies.
In addition, FADE generalizes time-based file assured
Figure 1: Vanish System Architecture Figure shows deletion (i.e., data files are assuredly deleted upon time
how to split the data key K into N pieces K1… KN expiration) into a more fine-grained approach called
vanish uses threshold secret sharing. The application or policy- based file assured deletion, in which data files are
the user can set the threshold, which is the parameter assuredly deleted when the associated file access policies
of secret sharing. To reconstruct the original key, N are revoked and become obsolete.
shares are required which is determined by threshold.
Advantages The FADE system

Vanish targets post-facto, retroactive attacks; The FADE system is composed of two main entities:
that is, it defends the user against future attacks on old, • FADE clients. A FADE client (or client for short) is an
forgotten, or unreachable copies of data. The attacker’s job interface that bridges the data source (e.g., file system) and
is very difficult, since he must develop an infrastructure the cloud. It applies encryption (decryption) to the outsourced
capable of attacking all users at all times. Solution utilizes data files uploaded to (downloaded from) the cloud. It also
existing, popular, researched technology - used since 2001. interacts with the key managers to perform the necessary
It does not require special security hardware, or special cryptographic key operations.
operations on the part of the user. Utilizes inherent half-life • Key managers. Minimum group of key managers together
(churn) of nodes in DHT – data definitely destroyed. developed FADE for assured deletion and access control
Disadvantages

ICCIT15 @ CiTech Bangalore Page 407


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

policy based keys are maintained, each of which in turn is a


standalone entity.
Advantages

FADE decouples the management of encrypted data


and encryption keys, such that encrypted data remains on
third-party (untrusted) cloud storage providers, while
encryption keys are independently maintained by a key
manager service, whose trustworthiness can be enforced
using a quorum scheme. FADE generalizes time-based file
assured deletion into a more fine-grained approach called
policy based file assured deletion, in which files are
associated with more flexible file access policies and are
assuredly deleted when the associated file access policies
are revoked and become obsolete. Figure 2: Architecture of SeDaS System
Disadvantages
It doesn’t support operations for a batch of files. Block
update operation is not supported. Along with the file, it has Active Storage Object: The user object defines the
to transmit metadata. active storage object that has the time-to-live value property.
Time-to-live value is the value that is specified by the user to
his private files. Time-to-live is the survival time for file that
III. SYSTEM ARCHITECTURE specifies how long the file exists in the cloud. After the ttl
value expires the file will be self- destructed.
SeDaS architecture is as shown in figure 2 that consists of
three components that are based on Active Storage IV. CONCLUSION
Framework. In cloud computing data privacy and security is mainly
i. Metadata server. concerned. Our paper mainly surveyed on the data privacy
ii. Application node. iii. for sensitive files that are stored in the cloud. Here we are
Storage node. mainly concentrating on the SeDaS methodologies that
Metadata server consists of many managements like user, mainly used for data privacy. In SeDaS architecture it
session, server, key and file management. Application node: consists of three components such as application node,
is used by the user for the services like storage in SeDaS. metadata server and storage node. By using some of the
Storage node: it mainly consists of two subsystems, storage cryptographic techniques and active storage, SeDaS provides
subsystem like <key,value> and runtime subsystem like security for confidential files.
Active Storage Object. These subsystems have many
features like: Store subsystem is dependent on Object Storage
Component which performs the operations like managing V. REFERENCES
objects that are stored in storage node. The object is [1] Lingfang Zeng , Shibin Chen , Qingsong Wei , and Dan Feng,
represented uniquely, called as objectID, which is used as “SeDas: A Self- Destructing Data System Based on Active Storage
key. Active Storage Object runtime subsystem is dependent Framework,” IEEE Transactions On Magnetics, Vol. 49, No. 6, June
on Active Storage Agent model. 2013.
[2] Levy et al. (2009) proposed “Vanish: Increasing Data Privacy with
Self-Destructing Data”.
[3] Tang et al. (2010) proposed “FADE: A secure overlay cloud storage
system with File Assured Deletion”.
[4] A. Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11, pp.
612– 613, 1979.

[5] R. Geambasu, T. Kohno, A. Levy, and H. M. Levy, “Vanish: Increasing


data privacy with self-destructing data,” in Proc.
USENIX Security Symp., Montreal, Canada, Aug. 2009, pp. 299–315

ICCIT15 @ CiTech Bangalore Page 408


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

A Survey on Various Comparisons on Anonymous Routing Protocol in


MANET’S
Dr. Rajashree V. Biradar1 , K. Divya Bhavani 2
Deptt of CSE, Bellary, Karnataka,India
rajashreebiradar@yahoo.com, kdivya9209@gmail.com

Abstract: MANET is an infrastructure less sensor network, Table 1.0 Characteristics of MANETS’s
consisting of collection of mobile devices. Secure Routing is a
s.no Characteristics Description
challenging task in MANETS to overcome this problem 1 Distributed network The control of the network is
anonymous routing protocol is developed. This paper focus on distributed among the nodes i.e. there
comparison of different existing anonymous routing protocol is no background network for the
based on routing category, design, advantages and central control of the network
operations.
disadvantages.

Keywords- Mobile ad hoc network, Routing, security,


2 Multi hop routing The out of radio range nodes will
anonymity communicate with each other with the
help of one or more intermediate
I. Introduction nodes
3 Dynamic network The nodes in MANET’s are mobile
A MANET Stands for "Mobile Ad Hoc Network”, it is type of topology nodes hence the network topology
may change rapidly and randomly
infrastructure less ad hoc network. MANET is an autonomous over time
collection of mobile nodes sharing a wireless channel without 4 Self-configuration Computations decentralization
any centralized control or established communication backbone. independent computational, switching
The mobile nodes that can communicate to each other via radio (or routing), and communication
capabilities.
waves, the mobile nodes that are in radio range of each other can
directly communicate, whereas other nodes i.e. out of 5 Bandwidth Resource constraints Limited
communication range will use intermediate nodes to route their constraint bandwidth available between two
packets. Each of the nodes has a wireless interface to intermediate nodes.
communicate with each other; hence these networks are also 6 Device access Access to the channel cannot be
flexibility restricted and the communication
called as multi -hop networks. MANET is a self-configuring medium is accessible to any entity.
network of mobile routers and associated hosts connected by
wireless links. The routers (mobile devices, nodes) are free to
move randomly and organize themselves arbitrarily, thus, the Table 2.0 Applications of MANETs
network’s wireless topology may change rapidly and
Application Services
unpredictably hence it has dynamic topology and each mobile
Tactical networks Military communication and operations
node has limited resources such as battery, processing power and
on-board memory. Sensor networks Data tracking of environmental conditions, animal
movements, chemical/biological detection

Home and enterprise Home/office wireless networking, Personal area


networks (PAN), Personal networks (PN)

Commercial and civilian • E-commerce: electronic payments anytime and


anywhere environments
• Vehicular services: road or accident guidance,
transmission of road and weather conditions, taxi
cab network, inter-vehicle networks

Emergency services Policing and fire fighting Supporting doctors and


nurses in hospitals Disaster recovery

II. Routing protocols in MANET’s

A routing protocol defines a set of rules used by routers to


determine the most appropriate paths into which they should
Figure 1.0 Mobile Ad hoc network forward packets towards their intended destinations. MANETS
classifies the routing protocol into 3 types.

ICCIT15@CiTech, Bangalore Page 409


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Proactive routing and global-knowledge-based routing schemes


are the ones used in infrastructure networks to provide
anonymity protection. These are not applicable in the case of
mobile ad hoc networks. In hostile environment, the adversary
may attack against the routing information to track the messages
sending from source to the destination node. The malicious
node or an attacker can be prevented by anonymous routing
protocols in which it will provide security to the routing and the
location of nodes in the network.

III. Types of anonymous routing protocols

ANODR: Anonymous On Demand Routing With


Untraceable Routes For Mobile Ad Hoc Networks

ANODR an anonymous on-demand routing


protocol for mobile ad hoc networks deployed in hostile
environments, addressing two close-related unlink ability
problems, namely route anonymity and location privacy.
ANODR design is based on “broadcast with trapdoor
Figure 1.1 Type of routing protocol information”. This protocol avoids the counting-to-infinity
problem of other distance-vector protocols by using sequence
Proactive Routing Protocols: numbers on route updates.
Proactive Routing Protocols are also called as AO2P: Ad-Hoc On-Demand Position-Based Private Routing
table driven routing protocol. In this routing each node in the
Protocol
network will maintain the routing table in which it consists of
routing information or routing activity from one to each other AO2P is a one of the important anonymous
node in the network. routing protocol. It is an ad hoc on-demand position-based
private routing algorithm. This protocol is mainly proposed for
Example: Destination-Sequenced Distance-Vector (DSDV), communication anonymity.
Cluster Gateway Switch Routing Protocol (CGSR) and
Wireless Routing Protocol (WRP). Zone Routing Protocol ZRP
Reactive routing protocol:
Zone routing protocol is for mobile ad hoc
In this routing protocol, routes can be identified networks which localizes the nodes into sub-networks (zones). It
whenever the node demands to send the packet from source to incorporates the merits of on-demand and proactive routing
destination. Hence this protocol is also known as source-initiated protocols. Anonymity zones are used to protect both source and
on demand driven routing protocols. destination privacy. Local flooding is used in destination
anonymity zones to guarantee data delivery.
Example: Ad Hoc On-Demand Distance Vector Routing
(AODV), Dynamic Source Routing (DSR) ALARM protocol (Anonymous Location Aided Routing in
MANET)
Hybrid routing protocol:
ALARM demonstrates the feasibility of
Hybrid routing protocol is combination of both obtaining, at the same time, both strong privacy and strong
proactive and reactive routing protocol. It uses the route security properties. It uses nodes current locations to construct a
discovery mechanism of reactive protocol and the table secure MANET map. Based on the current map, each node can
maintenance mechanism of proactive protocol. For instance, decide which other nodes it wants to communicate with.
table-driven protocols can be used between networks and on-
demand protocols inside the network or vice versa. ALERT: An Anonymous Location-Based Efficient Routing
Protocol in MANETs
Example: Zone Routing Protocol (ZRP)
ALERT dynamically partitions a net-work field into zones and
Significance of Anonymous routing protocol randomly chooses nodes in zones as intermediate relay nodes,
which form a non-traceable anonymous route. It randomly
Anonymity is a quality or state of being unknown or
chooses a node in the other zone as the next relay node and uses
unacknowledged or the state of being not identifiable within a
the GPSR algorithm to send the data to the relay node.
set of subjects, the anonymity set. Concept of anonymity has
recently attracted attention in mobile wireless security research.

ICCIT15@CiTech, Bangalore Page 410


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Table 3.0 Comparison of types of anomalous routing Advanced Research in Computer Science and Software Engineering”, Volume 3,
Issue 5, May 2013
protocols
iii. Prabhu.K, Senthil Kumar, “A Survey on Various Manet Routing
Anonymous Routing Anonymous routing protocol design Protocols Based on Anonymous Communication”, “International Journal of
routing Protocol based on Innovative Research in Computer and Communication Engineering”, Vol. 3,
protocols category
Issue 1, January 2015
ANODR: Reactive Broadcast and trapdoor information
iv. Anuj K. Gupta, Harsh Sadawarti, and Anil K. Verma, “Review of
AO2P: Reactive Receiver contention, For R-AO2P Various Routing Protocols for MANETs”,” nternational Journal of Information
channel access and Electronics Engineering”, Vol. 1, No. 3, November 2011.
Zone Hybrid Local broadcasting
Routing
v. SHINO SARA VARGHESE, J. IMMANUEL JOHN RAJA, “A Survey
Protocol
on Anonymous Routing Protocols in MANET”, RECENT ADVANCES in
(ZRP)
NETWORKING, VLSI and SIGNAL PROCESSING.
ALARM Proactive Group signature and location based
vi. VI. Jojy Saramma John, R.Rajesh”, “Efficient Anonymous Routing
protocol forwarding
Protocols in Manets: A Survey”, “International Journal of Computer Trends and
Technology (IJCTT) “, volume 11 number 1 – May 2014.
ALERT: hybrid Zone partition and Random choosing
vii. Xiaoxin Wu, Jun Liu, Xiaoyan Hong, and Elisa Bertino,Fellow, IEEE,
“Anonymous Geo-Forwarding in MANETsthrough Location Cloaking”, “IEEE
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS”, VOL. 19,
Anonymous pros Cons NO. 10, OCTOBER 2008.
routing
protocols viii. Harjeet Kaur, Varsha Sahni, Dr. Manju Bala, “A Survey of Reactive,
ANODR: Better trade-offs between 1. Performance of routing Proactive and Hybrid Routing Protocols in MANET: A Review”, “(IJCSIT)
routing performance varies significantly when International Journal of Computer Science and Information Technologies”, Vol.
and security protection is Different cryptosystems are 4 (3) , 2013.
Obtained. utilized.
2. It does not focus on ix. XI. Karim El Defrawy and Gene Tsudik, “Security Issues in ALARM
security. Protocol for Mutual Authentication in MANET: A Review”, “International
3.It have less efficiency Journal of Innovative Research in Computer and Communication Engineering”,
AO2P: Gives identity and location 1.Less significant routing Vol. 2, Issue 5, May 2014.
anonymity for source and performance hence lower
destination. packet delivery ratio x. X. Snehlata Handrale, Prof. S. K. Pathan, “An Overview of
Anonymous Routing ALERT Protocol”, (IJCSIT) International Journal of
Computer Science and Information Technologies, Vol. 5 (2) , 2014, 1607-1609.
Zone Increases the scalability of 1. Maintaining the high level
Routing MANETS and maintain of topological information of
xi. XI. Abhishek Gupta ,Samidha D Sharma, “A Survey on Location
Protocol the route nodes require more memory
Based Routing Protocols in Mobile Ad-hoc Networks”,” (IJCSIT) International
(ZRP) and power consumption.
Journal of Computer Science and Information Technologies”, Vol. 5 (2) , 2014,
2. Focus only on destination
994-997,
anonymity.
ALARM 1. Reliable data Does not provide route
xii. XII. Haiying Shen,Member, IEEE, and Lianyu Zhao, Student Member,
protocol transmission. anonymity.
IEEE, “ALERT: An Anonymous Location-Based Efficient Routing Protocol in
2. It will provide mutual
MANETs”, IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 12, NO.
authentication between the
6, JUNE 2013.
mobile nodes.
ALERT: Location and routing Not completely bullet proof
anonymity with high to all attacks. xiii. XIII. Xiaoxin Wu and Bharat Bhargava,Fellow, IEEE, “AO2P:Ad
performance with low cost. HocOn-DemandPosition-BasedPrivate Routing Protocol”, “IEEE
TRANSACTIONS ON MOBILE COMPUTING”, VOL. 4, NO. 4, JULY/AUGUST
2005.

Conclusion

This survey paper presents the overview of routing protocols and


importance of anonymous routing protocols. This paper gives an
idea about different existing anonymous routing protocols in
MANETs with its merits and demerits, still MANETs have
posed a challenge to fight against attacks to provide high secure
route efficiency.

References
i. Dr.S.S.Dhenakaran, A.Parvathavarthini, “An Overview of Routing
Protocols in Mobile Ad-Hoc Network”, “International Journal of Advanced
Research inComputer Science and Software Engineering”, Volume 3, Issue 2,
February 2013

ii. Aarti, Dr. S. S. Tyagi, “Study of MANET: Characteristics,


Challenges, Application and Security Attacks”, “International Journal of

ICCIT15@CiTech, Bangalore Page 411


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Stock Market Prediction: A Survey


Guruprasad S.1, Rajshekhar Patil 2, Dr.Chandramouli H3, Veena N4
1,2
Dept. of CSE, BMSIT, Benguluru.
3,4
Dept. of CSE, EPCET, Benguluru.
guruprasad@bmsit.in,Hcmcool123@gmail.com, Veena_guruprasad@rediffmail.com

Abstract: Stock market is a widely used investment scheme forecast should be made before the pattern is completed to
promising high returns but it has some risks. An intelligent facilitate the prediction process. The vital idea for successful
stock prediction model would be necessary. Stock market stock market prediction is achieving best results, and to
prediction is a technique to forecast the future value of the minimize the inaccurate forecast of the prices.
stock markets based on the current as well as historical data
available in the market. Stock market prediction is a mainly
based on Technical Analysis and Fundamental Analysis. In
literature it is observed that there are several techniques
available for the predicting the stock market value. This paper
aims at survey on the use of Neural Network (NN), Data
Mining, Hidden Markov Model (HMM), Neuro Fuzzy system,
Rough Set Data Model and Support Vector Machine
techniques for predicting the stock market variation. In this
paper, a methodology is proposed for forecasting to provide
better accuracy when compared to the traditional methods.
Fig 1: Various prediction techniques
Keywords: Data Mining, Hidden Markov Model, Neural
Network, Neuro Fuzzy system, Rough Set. The survey of recent techniques such as NN, Data Mining,
Neuro Fuzzy system, HMM and Rough Set Data Model system
1. Introduction offer useful tools for anticipating the noisy environments of
stock market. This article aims at providing intelligent
techniques to anticipate market prices. A stock market index is
Stock market plays a vital role in the economic
the representation of the movement in the “average of several
performance. Basically, it is used to deduce the economic
individual stocks”. Resistant characteristics are not taken into
situation of a particular nation. However, information regarding
consideration in forecasting process. To overcome these
stock market is typically incomplete, uncertainty and indefinite,
drawbacks, researchers could develop a model to forecast
making it a challenging task to predict the future economic
individual stock prices [3].
performance. More specifically, the stock market's variations are
analyzed and predicted in order to access knowledge that could 2. LITERATURE REVIEW
help to guide the investors, when to buy or when to sell, and to
hold a financial asset. In general, prediction means to know Phichhang Ou and Hengshan Wang applied ten different data
about the future. So, for the investment or trade in the market, mining techniques to anticipate price variation of Hang Seng
prediction of market value is very much essential. The market index of Hong Kong stock market [4]. Among those 10 methods
movement changes frequently, valued ahead, difficult to predict LS-SVM and SVM generate high ranking predictive
and disorganized in nature [1]. Hence by using only technical performance. Mostly, SVM is best as to compare with LS-SVM
analysis methods to anticipate stock market is very difficult, for in sample prediction. Since, in case of hit rate and error rate
similar to that of time series analysis. Essential analysis typically LS-SVM is better than SVM for considering the out sample
works best over longer periods of time, where technical analysis forecast.
is more appropriate for short term trading. Researchers have
made several attempts to predict the performance of financial Suresh et al. use different data mining techniques that are able
market. Many artificial intelligent techniques such as Neural to discover the hidden pattern; forecast the future trends and
Network and Fuzzy Systems have been proposed [2]. Since it is behaviors in financial market [5]. Pattern matching techniques
difficult to interpret their results, they are unable to visualize are found to be descriptive in time-series analysis. In this paper,
clearly the nature of interactions between technical indicators they used an algorithm to accommodate a flexible and dynamic
and stock market variations. The difficulty in case of technical pattern-matching task in time series analysis. Apart from
analysis is that, it requires a complete pattern to make an segment size the instance to sub-time-series size affects the
accurate prediction on the stock movement. Preferably, such a

ICCIT15@CiTech, Bangalore Page 1


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

system performance. In this paper, the ratio was set to 1 and also Md. Rafiul Hassan et al. in [8], deployed a fusion model by
the ratio was reduced to obtain better result. combining Hidden Markov Model (HMM), Artificial Neural
Networks (NN) and Genetic Algorithms (GA) to anticipate
Binoy et al. used hybrid decision tree-neuro-fuzzy system financial market prediction. In the proposed fusion model, an
methodology for forecasting of stock market. Automated stock NN is employed as a black- box to introduce noise to the
market trend anticipation system was proposed using decision observation sequences so that they may be better fitted with the
tree adaptive neuro-fuzzy hybrid system [6]. They used different HMM’s. GA is then applied to find out the optimal initial
techniques like technical analysis and decision tree. First, parameters for the HMM’s, given the transformed observation
technical analysis is used for feature extraction, and then the sequences. By using this fusion model there are wide range of
decision tree for feature selection. The reduced dataset obtained options to find number of alternative data items from historical
by these two is fed as input, to train and test the adaptive neuro- data. This type of data items is responsible for market behavior
fuzzy system for next day stock prediction. They tested their for current day. Then average of the price differences for the
proposed system on four major international stock market data. identified data items is calculated. After this, average is added to
Their experimental results clearly showed that the proposed the current day price. The value obtained is the forecast value of
hybrid system produces much higher accuracy when compared a particular day. This model consist of two phases
to stand-alone decision tree based system and Adaptive Neuro
Fuzzy Inference System (ANFIS). The above proposed neuro- Phase 1: Optimizations of HMM.
fuzzy system is as shown in Fig 2.
Phase 2: Using weighted average method to obtain the forecast.
Aditya Gupta and Bhuvan Dhingra in [7] used Hidden Markov
Model (HMM) for predicting the market prices. With the help of The schematic representation of fusion model is as shown in
historical stock prices they present the Maximum a Posteriori Fig.3.
HMM approach for anticipating stock values for the next day.
For training the continuous HMM they consider the intraday
high and low values and fractional change in stock price. Some
of the already existing methods like HMMs and Artificial Neural
Networks use Mean Absolute Percentage error (MAPE) to
minimize inaccuracy rate. They have tested their approach on
several markets, and compared the performance. Finally, they
present an HMM based Maximum a Posteriori (MAP) estimator
for market predictions. The model uses a latency of days to
predict the stock value for the next day. Using an already trained
continuous HMM MAP, decision is made over all the possible
values of stock. They assume the four underlying hidden states
viz fractional change, fractional high, fractional low emit the
visible observations.

Fig 3: Block diagram for fusion model.

A.E Hassanien et al. proposed a generic rough set predictive


model using the data set consisting of daily variations of a stock
traded by gulf-bank of Kuwait[9]. The objective was to modify
the existing market predictive models based on rough set
approach and to construct a rough set data model that would
significantly reduce the total number of generated decision rules,
keeping the degree of dependency intact. They created an
information table consisting of several market indicators like
closing price, high price, low price, trade, value, average,
momentum, disparity in 5 days, price oscillator, RSI (relative
strength index) and ROC (rate of change). These indicators acts
as conditional attributes of the decision table which predict price
Fig 2: Block diagram of neuro-fuzzy system. movement.

ICCIT15@CiTech, Bangalore Page 2


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

4. Acknowledgement
We express our sincere thank to all the authors, whose papers in
the area of Stock Market Prediction are published in various
conference proceedings and journals.

References
i. S.Arun , Joe Babulo, B. Janaki, C. Jeeva, “Stock Market Indices
Prediction with Various Neural Network Models”, International Journal of
Computer Science and Mobile Applications, Vol. 2, Issue 3, .pp32-35 march
2014.

ii. URL:
http://www.learnartificialneuralnetworks.com/stockmarketprediction.html.

iii. Kuo R J, Lee L C and Lee C F , Integration of Artificial NN and


Fuzzy Delphi for Stock market forecasting, IEEE International Conference on
Systems on Man, and Cybernetics, Vol. 2, pp. 1073-1078 Jan 1996.

iv. Phichhang Ou and Hengshan Wang “Prediction of Stock Market


Index Movement by Ten Data Mining Techniques”, Canadian Center of Science
and Education, Vol. 3 no 12 December, 2009.

v. M. Suresh babu, N.Geethanjali and B. Sathyanarayana, “Forecasting


of Indian Stock Market Index Using Data Mining & Artificial Neural Network”,
Fig 3.3: Rough set data model International journal of advance engineering & application, Vol. 3, Issue.4, .pp
312-316 may 2011.
The major steps involved in the creation of rough set data model
vi. Binoy B. Nair, N. Mohana Dharini, V.P. Mohandas, “A Stock Market
are as shown below steps: Trend Prediction System Using a Hybrid Decision Tree-Neuro-Fuzzy
System”, International Conference on Advances in Recent Technologies in
Step 1: Create a detailed information system table with set of Communication and Computing on , .pp 381-385 June 2010.
real value attributes.
vii. Aditya Gupta and Bhuwan Dhingra, “Stock Market Prediction Using
Step 2: Discover the minimal subset of conditional attributes Hidden Markov Models”, IEEE International Conference on Systems on Neural
Network World .pp 381-385 December 2012.
that discerns all the objects that are present in the decision
class.
viii. Md. Rafiul Hassan and Baikunth Nath, “Stock Market forecasting
th
Step 3: Divide the relevant set of attribute into a different using Hidden Markov Model: A New Approach”, Proceeding of the 5
variable sets. Then, Compute the dependency degree and the international conference on intelligent Systems Design and Application 0-7695-
classification quality. Calculate the discrimination factors for 2286-06/05, IEEE 2005.
individual combination to find the highest- discrimination ix. Hameed A1-Qaheri, AE Hassanien, and AAbraham, "A Generic
factors. Add the highest-discrimination factors combination to Scheme for Generating Prediction Rules Using Rough Set", IEEE International
the final reduct set. Conference on Systems Neural Network World, Vol. 18, No. 3,pp. 181-198,2008.

x. Yang, Kongyu, Min Wu, and Jihui Lin. "The application of fuzzy
Step 4: For each generated reduct data set, and its corresponding neural networks in stock price forecasting based On Genetic Algorithm
objects construct the decision rule. discovering fuzzy rules.", In Natural Computation (ICNC), 2012 Eighth
International Conference on, pp. 470-474. IEEE, 2012.

3. Conclusion xi. Victor Devadoss and T. Anthony Alphonnse Ligori, “Stock


This paper surveyed the different methodologies for stock Predictions using Artificial Neural Networks”, International Journal of Data
market prediction such as Data mining, Neural Network, Neuro Mining Techniques and Applications, Vol:02, pp283-291 December 2013,
.pp283-291.
Fuzzy system, Hidden Markov Model and Rough set data
models. This paper also highlights the fusion model by merging
the Hidden Markov Model (HMM), Artificial Neural Networks
(NN) and Genetic Algorithms (GA). The NN and HMM’s have
the ability to extract useful information from the data set. so, it
plays a vital role in stock market prediction. These approaches
are used to control and monitor the entire the market price
behavior as well as fluctuation. Hidden Markov Model and
Rough set data models are used frequently in anticipation of
market prices.

ICCIT15@CiTech, Bangalore Page 3


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Identifying and Monitoring the Internet Traffic with Hadoop


Ranganatha T.G., Narayana H.M
Deptt of CSE,M.S Engineering College, Bangalore
tgranganatha@gmail.com

ABSTRACT:Handling internet traffic in these days is not so hardware or network infrastructure to form a single, logical,
easy, but the explosive growth of internet traffic is hard to storage and compute platform, or cluster,that can be shared by
collect, store and analyze internet traffic on a single machine. multiple individuals or groups. Computation in HadoopMap
Hadoop has become a popular framework for massive data Reduce is performed in parallel, automatically, with a simple
analytics. It facilitates scalable data processing and storage abstraction for developers that obviate complex synchronization
services on a distributed computing system consisting of and network programming. Unlike many other distributed data
commodity hardware. In this paper, I present a Hadoop based processing systems, Hadoop runs the user-provided processing
traffic analysis and control system, which accepts input from logic on the machine where the data lives rather than dragging
Wire shark (Log File), and output in form of summary which the data across the network; a huge win for performance.
contains entire internet traffic details and I also implemented
the congestion control algorithm to control the online network The main contribution of my work exists in designing,
traffic in the internet. Implementing and controlling of internet traffic through big-data
analytics. Firstly I have created a virtual network by using the
Mininet tool, instantly makes virtual network on my laptop it
KEYWORDS: Single machine, Hadoop, Commodity contains switches, routers, and hosts. It can be controlled by
Hardware and Wire shark. using open daylight controller. To capture the packets flow from
the virtual network we use a Wire shark like tool. we capture the
1. INTRODUCTION packet log file and save it in a text file and then log file is given
as input to the Hadoop to process the large data of log file and
Internet has made great progress in daily life and brought much we will visualized the summary report that contains the flow
more convenience to much more people daily life in recent analysis details which as sender ip , destination ip and it also
years, fact that it is still provides a kind of BOF (Best of Effort) have the size of byte sent. By using that file we control the
service to application has never been changed since its traffic by using the congestion control algorithm to control the
invention. online traffic.
Mininet is a network emulator, an instant virtual network our
laptop, It runs a collection of end-hosts, Switches, Routers and The main objective of the work includes:-
Links on a single Linux kernel. It uses a lightweight  To design and implement a Traffic flow identification
virtualization to make a single system look like a complete system using Hadoop.
network, System, Code, running in the same kernel.  The traffic flow identification system will be very
Open daylight is a controller used to control the flows running in useful for network administrator to monitor faults and
the Mininet, Mininet will connect to controller and setup a ‘n’ also to plan for the future.
tree topology.
Wire shark is a tool used to capture packets in the network. It’s a
free open source packet analyzer used for network trouble 2. BACKGROUND WORK
shooting, analysis, software and Communication protocol
development and education. Over the past few years , a lot of tools have been developed and
Hadoop was originally designed for batch oriented processing widely used for monitoring the internet traffic. Mininet is tool
jobs, Such as creating web pages indices or analyzing log data. widely used to setup a virtual network in your laptop. So that we
Hadoop widely used by IBM, Yahoo!! , Face book, Twitter etc., can able to simulate using these Mininet tool to identify the flow
to develop and execute large-scale analytics or applications for of the packets in the virtual network. Wire shark is a popular
huge data sets. Apache Hadoop is a platform that provides traffic analyzer that offer’s user friendly graphics interface.
pragmatic, cost-effective, scalable infrastructure for building Tcpdump is also most popular tool of capturing and analyzing
many of the types of applications described earlier. Made up of a the internet traffic. Open daylight is an controller used to control
distributed file system called the Hadoop Distributed File system of the packets in the Mininet virtual network. It acts like an
(HDFS) and a computation layer that implements a processing controller to control the packets form where to where the packets
paradigm called Map Reduce, Hadoop is an open source, batch needs to be sent that can be controlled by the Open daylight tool.
data processing system for enormous amounts of data. We live
in a flawed world, and Hadoop is designed to survive in it by not Most of the map reduce applications on Hadoop are developed
only tolerating hardware and software failures, but also treating to analyze large text, log files or web, in this we firstly packet
them as first-class conditions that happen regularly. Hadoop uses processing or analysis for Hadoop that analyzes trace file in
a cluster of plain old commodity servers with no specialized blocks it will process the block of file and give the result in
parallel in distributed environment.

ICCIT15@CiTech, Bangalore Page 415


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Methodology that have a high predictive power of a single traffic class, and
each algorithm's ability to generate a minimal number of clusters
For flow analysis we use some of the map Reduce algorithm to that contain the majority of the connections. The results showed
reassemble the flows in the network. We use the k-means that the Auto Class algorithm produces the best overall accuracy.
algorithm for efficient clustering of the packets with in the
network. 4. EXISTING SYSTEM

For flow control we use the congestion control algorithm for the Today the internet users are growing very rapidly. Each and
control of internet traffic with the Hadoop. By this we able to every person is utilizing the internet through one or the other
control the packets flow control very easily and in the very way. so, possibly internet traffic may also increases. single it is
effective manner. not so easy to handle the very big internet traffic. And storing
these large data and processing these large data is not possible to
3. LITERATURE SURVEY handle by single system.
A lot of research is done to measure the performance of the The problem with this is handling the internet traffic using
internet traffic using Hadoop. S.J Shafer, S. Rixner, and Alan L single server is not scalable to handle bigger network and
[2]. Cox discuss about performance of distributed Hadoop file may be chances of single point of failure.
system. Hadoop is most accepted framework for managing huge
amount of data in distributed environment. Hadoop makes use of 5. PROPOSED SYSTEM
user-level file system in distributedmanner. The HDFS (Hadoop
Distributed File System) is a portable across both hardware and
software platforms. In
this paper a detailed performance analysis of HDFS was done
and it displays a lot of performance issues. Initially the issue was
on architectural bottleneck that exist in the Hadoop
implementation which resulted in the inefficient usage of HDFS.
The second limitation was based on portability limitations which
limited the java implementation from using the features of naive
platform. This paper tries to find solution for the bottleneck and
portability problems in the HDFS.

T. Benson, A. Akella, and D. A. Maltz,[3] wrote a paper on


Figure 1 System Architecture
“Network traffic characteristics of data centers in the wild” In
this paper the researcher conduct an experiential report of the I. Overview
network in few data centers which belongs to differenttypes of
organizations, enterprise, and university. In spite of the great Handling Internet Traffic Consists of Three main components
concern in developing network for data centers,only few namely Mininet (Network), Wire shark and Hadoop
information is known about the characteristic of network-level Cluster.Figure:-1 shows the key components required for Flow
traffic. In this paper they gather informationabout the SNMP Analysis. The functions of the above 3 Components are
topology, its statistics and also packet level traces. They examine described below:
the packet-level and flow-leveltransmission properties. They
observe the influence of the network traffic on the network Mininet: Mininet is the tool used to setup of the network.
utilization, congestion, linkutilization and also packet drops. Mininet is a network emulator it creates a realistic virtual
A. W. Moore and K. Papagiannaki [4] give traffic classification network, running real kernel, switch and application code on a
on basis of full packet payload. In this paper a comparison was single machine(VM, cloud or native), it uses light weight
done between port-based classification and content- based virtualization to make a single system look like a complete
classification. The data used for comparison was full payload network.
packet traces which were collected from the internet site. The
output of the comparison showed that the traffic classified based Wire shark:Wire shark is the tool used to capture, filter and
on the utilization of the well-known ports. The paper also proved inspect packet flow in the network. A network analysis tool
that port based classification can identify 70% of the overall formerly known as Ethereal, captures packets in real time and
traffic. L. Bernaille, R. Teixeira[5] tells that port-based display them in human readable format. Wire shark includes
classification is not a reliable method to do analysis. This paper filters, color-coding and other features that let you dig deep into
proposes a technique that depends on the observation of the first network traffic and inspect individual packets.
five packets of TCP connection to identify the application.
J.Erman, M.Arlitt, and A.Mahanthi [5] Traffic Classification
Using Clustering Algorithms In this paper we evaluated three HadoopCluster:Here it consists of two parts namely
different clustering algorithms, namely K-Means, DBSCAN, and
Auto Class, for the network traf_classi_cation problem. Our 1. Flow analysis will have the entire detail of traffic
analysis is based on each algorithm's ability to produce clusters analysis of the big-data.Inthis it will take care of map-

ICCIT15@CiTech, Bangalore Page 416


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
reduce and clustering algorithms to make a way to
obtain the output.
2. Flow control will be taking care of how to control the
large amount of traffic without collision and loss of
packets.

II. Setting Up of Network Through Mininet.


Mininet is a toll used to connect the virtual network with in a
laptop. So, that we can able to connect n-number of switches
between the sender and destination hosts. Figure 3:- Flows controlled in Hadoop

In the above figure 3 show the entire network how the flows can
be able control. I have implemented the congestion control
algorithm to control the flows from source host to destination
hosts.
If the packet is of byes exceeds above some range then the path
form host-1 to host-2 can be changed or else the packet bytes
doesn’t exceeds the range the old path form host-1 to host-2 can
Figure 2:- Mininet Setup be used for packet transmission.
Here it will check for bytes I wrote the algorithm that will
In the above figure host1 and host2 are the source and handle control on only bytes.
destination of the virtual network with in the laptop. S1 and S2 Case-1: If byte >= specified then the path is changed accordingly
are the switches that are present in between the hosts and and
corresponding paths are controlled by using the Opendaylight Case-2: If bytes<specified then the path is same old one.
controller I can able to control the virtual network setup in the
computer. Flows and operations on the network can be modified
or changed by the Opendaylight control.
6. SCREEN SHOTS
III. Capturing Packet Flow using Wire shark
Wire shark is the tool used to capture and modify the packet
flow with in the network.
After setting up of network through mininet, then next step is
used to capture the packet flows from source host to destination
host in between some switches is used to connect the end hosts.
In this wire shark tool is used to capture the packet flows details
and it is in the form of log Files, and the collected log Files is
stored in text file, and further it is processed to the next step.

IV. Packet Flows can be analyzed in Hadoop


For traffic trace collected with wire shark like tools, there is no
obvious flow information available, so the first step before
analysis is to recover flows from the individual packets. Our
system implements a bunch of MapReduceapplications, include
flow builder, which can quickly and efficiently reassemble flows
and conversation, even if they are stored across several traffic
files. The second step is flow clustering aims to extract groups of
flows that share common or similar characteristics and patterns
within the same clusters. In this paper I have choose the k-means
algorithm to identify different groups of flows according to their
statistical data.

V. Congestion control Algorithm to control flows in


Hadoop
As by looking the above the flows cannot be handled through
single system if huge no of traffic is came then it cannot be
handled so we planned to control the flows with in the network
through this lot of congestion in the network packet can be
Figure-1:- Setting Up of Network Through Mininet.
avoided.

ICCIT15@CiTech, Bangalore Page 417


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Figure-4:- Capturing Packet Flow using Wire shark.

7. CONCLUSION

Figure-2:- Controlling of network through open daylight In this paper, I have presented the work on Idenifing and
controller. Monitoring the Internet Traffic with Hadoop.Setting up of the
network and obtaining the trace file and input trace file given as
input to the Hadoop and flow analysis can be done. And we also
implemented the congestion control algorithm to control the
internet traffic analysis. Flow control can be done by using
congestion control algorithm to control the internet traffic in the
Hadoop cluster.

REFERENCES

i. M. Yu, L. Jose, and R. Miao,” Software defined traffic measurement


with open sketch,” in Proceedings 10th USENIX Symposium on
NetworkedSystems Design and Implementation NSDI, vol, 13, 2013.
ii. Scscc J. Shafer, S. Rixner, and Alan L. Cox, “The Hadoop
Distribution Filesystem: Balancing Portability and Performance”, in
Proceedings of
iii. the 10th ACM SIGCOMM conference on Internet measurement ACM
2010.
iv. T. Benson, A. Akella, and D. A. Maltz, “Network traffic
characteristics of data centers in the wild,” in Proceedings of the 10th
ACMSIGCOMM conference on Internet measurement. ACM, 2010, pp. 267–280.
v. A.W. Moore and K. Papagiannaki,” Toward the accurate
identification of network applications,” in Passive and Active network
Measurement.Springer, 2005, pp.41-54.
vi. 5.J. Erman, M. Arlitt, and A. Mahanti, “Traffic classification
usingclustering algorithms,” in Proceedings of the 2006 SIGCOMMworkshop on
Mining network data. ACM, 2006, pp. 281–286.
vii. Apache Hadoop Website, http://hadoop.apache.org/
viii. YuanjunCai, Bin Wu, XinweiZhang, Min Luo and Jinzhao Su, “Flow
identification and characteristics mining from internet traffic with hadoop” in
978-1-4799-4383-8/1 at IEEE 2014

Figure-3:- Setting up of the content that need to be captured


in the wireshark.

ICCIT15@CiTech, Bangalore Page 418


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Authentication Prevention Online Technique in Auction Fraud Detection


Anitha k., Priyanka M., Radha Shree B
Deptt. of CSE, Raja Rajeswari College of Engineering, kumbalgudu Bangalore 560074
anithakrishna14@gmail.com, priyagowda444@gmail.com, radha13shree@gmail.com

Abstract— The E-business sector is rapidly evolving and the marketing sites so that fraudulent retailers can be caught
needs for web market places that anticipate the needs of the immediately after the first wave of buyer complaints. In addition,
customers and the trust towards a product which are having proactive moderation systems are built to allow human experts to
more good rating. When most of the people are benefited from manually investigate suspicious retailers or buyers. Even though
the online website trading, culprits are also taking advantage to electronic-commerce sites spend a large budget to fight frauds
conduct fraudulent activities against honest parties to obtain with a moderation system, there are still many outstanding and
more fake profits. Therefore understanding the requirement for challenging cases. Criminals and fraudulent sellers frequently
analysing for user needs and trust providence in order to change their accounts and IP addresses to avoid being caught.
improve the usability and user retention of a website can be Also, it is usually infeasible for human experts to investigate
addressed by personalizing and using a fraud product detection every buyer and seller to determine if they are committing fraud,
system. especially when the e-commerce site attracts a lot of traffic. The
Keywords— online auction, fraud detection, fraud prevention, patterns of fraudulent sellers often change constantly to take
online authentication. advantage of temporal trends. For instance, fraudulent sellers
tend to sell the “hottest” products at the time to attract more
potential victims. Also, whenever they find a loophole in the
1. INTRODUCTION fraud detection system, they will immediately leverage the
Since the emergence of the World Wide Web (WWW), weakness.
electronic commerce, commonly known as e-commerce, has In this paper, we consider the application of an authentication
become more famous. Websites such as eBay and so on likestart- prevention technique for auction fraud detection in a major
up websites allow Internet users to buy and sell products and auction site, where hundreds to thousands of new auctions take
provide services online, which benefits everyone in terms of use- place every day. Therefore, it is necessary to develop an
fullness and profitability. The regular online shopping business automatic prevention system that only directs suspicious cases for
model allows sellers to sell a product or service at a default price, expert inspection, and passes the rest as clean cases. The
where buyers can choose to purchase if they find it to be a good moderation system for this site extracts rule-based features to
deal. Online auction however is a different business structure by make decisions. Where with years of experience human experts
which items are sold through price auction. There is often a have created many set of rules to detect the suspicious fraudulent
starting price and expiration time specified by the retailer. Once culprits and the resulting features are often binary.1 using rank
the auction starts, possible buyers bid against one another, and algorithm that is for instance we can create a binary feature from
the winner gets the item with their highest winning auction. giving the ratings i.e. if the feature value is 1 if the rating of a
Similar to any formats supporting economic transactions, online seller is lower than a threshold; otherwise it is 0. The final
auction attracts criminals to indulge in fraudulent activities. The prevention decision is based on the fraud score of each case.
varying types of auction fraud are: Products purchased by the Which can be done only by preserving the database of a retailers
buyer are not delivered by the retailer. The delivered products do and buyers with all the basic details and investigation is done
not match the descriptions that were posted by the retailer. while keeping their workload at a reasonable level.
Malicious retailers may even post non-existing or fake items with Since the fraudulent sellers change their pattern very fast, it
fake description to cheat buyers, and request payments to be requires the model to also evolve dynamically. However, for
strange directly to them via bank-to-bank wire transfer. offline models it is often non-trivial to address such needs. Based
Furthermore, some culprits apply e-mail techniques to steal high- on the reviews if the case is determined as fraudulent, all the
rated retailer's accounts so that possiblebuyers can beeasily cases from this retailer along with his pending products will be
cheated due to their high rating. Personaffected byfraud removed immediately. Therefore, smart fraudulent sellers tend to
transactions usually lose their amount and in most cases are not change their patterns immediately to avoid being caught. Also,
refundable. As a result, the status of the online auction services is since the training data is from human labelling, the high cost
hurt significantly due to culprit crimes. makes it almost impossible to obtain a very large sample.
To provide some security against fraud, internet marketing Therefore for such systems (i.e. relatively tiny sample size with
sites often provide security to culprit victims to cover their loss many features with temporal pattern), authentication prevention
up to a certain amount. To reduce the amount of such feature selection is often required to provide good performance.
compensations and improve their online fame, internet marketing Human experts are also willing to see the results of authentication
providers often adopt the following approaches to control and prevention feature selection to monitor the effectiveness of the
prevent fraud. The identifies of registered users are validated current set of features.
through email, SMS, or phone verifications. A rating system Our Contribution: In this paper, we study the problem of
where buyers provide feedbacks is commonly used in internet building online models for the authentication prevention

ICCIT15@CiTech, Bangalore Page 419


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

technique system, which essentially evolves dynamically over [7] to train convenient decision treesto select good sets of
time. We propose rank probit authentication model frame work features and make predictions. Developed [23] another
for the binary response we are applying a well known technique convenient approach that uses social networkanalysis and
in statistical literature called search variable selection (SSVS). decision trees. Proposed [38] an offline reversion structuring
The paper is organised as follows. In Section 2 we first framework for the auction fraud detectionmoderation system
summarize several specific features of the application and which incorporates domain knowledgesuch as coefficient bounds
describe authentication prevention frame work with fitting details. and multiple instance learning.
We review the related work in literature in Section 3. In section 4 In this paper we treat the fraud detection problem as a
we show the experimental results that compare all the models binaryclassification problem. The most frequently used
proposed in this paper and several simple baselines. Finally, we structuresfor binary classification include planning reversion
conclude and discuss future work in section5. [26],probit reversion [3], and support vector machine (SVM) [12]
anddecision trees [29]. Feature selection for reversion structures
2. OUR METHODOLOGY isoften done through introducing an act of punishing on the
Our application is to detect online frauds for a major websites coefficients.Typical punishments include vault regression [34]
where hundreds of thousands of new auction cases are posted (L2 penalty)and Lasso [33] (L1 penalty). Compared to vault
every day. Every new case is sent to authentication prevention regression,Lasso shrinks the unnecessary coefficients to zero
system for in advance to assess the risk of being fraud. The instead ofless values, which provides both awareness and good
system is being featured by: performance.Stochastic search variable selection (SSVS) [16]
 Rule-based features: Human experts with years of uses“spike and slab” prior [19] so that the posterior of the
experience created many rules to detect whether a user is fraud or coefficientshave some possibility being 0. Another approachis to
not. An example of such rules is “blacklist”, i.e. whether the user consider the variable selection problem as model selection,i.e. put
has been detected or complained as culprit before. Each rule can priors on structures (e.g. a Bernoulli prior oneach coefficient
be concerned as a binary feature that indicates the fraud likeliness. being 0) and compute the marginal posteriorprobably of the
 Selective labelling: If the fraud score is above a certain structure given data. People then eitheruse Markov Chain Monte
threshold level, the case will enter a queue for further Carlo to sample structures from themodel space and apply
investigation by human experts. Once it is evaluated, the final Bayesian model averaging [36], ordo a stochastic search in the
result will be labelled as Boolean feature, i.e. genuine or culprit. structure space to find the posteriormode [18]. Among non-linear
Cases with higher scores have higher priorities in the queue to be models, a tree model usually handles the non-linearity and
evaluated. The cases whose fraud score are belowthe threshold variable selection simultaneously.Representative work includes
are determined as clean by the system without any human decision trees[29], random forests [5], gradient boosting [15] and
judgment. Bayesianadditive regression trees (BART) [8].
Authentication prevention considers the scenario thatthe
 Fraud churn:Once one case is labelled as fraud by
input is given one piece at a time, and when receiving a batch of
human experts, it is very likely that the retailer is not trust-able
and may be also selling other fraudulent products; hence all the input the structure has to be updated accordingto the data and
items submitted by the same retailer are labelled as fraud too. The make predictions and servings for the nextbatch. The concept of
fraudulent retailer along with his/her cases will be removed from online modelling has been applied tomany areas, such as stock
the website immediately once detected. price forecasting (e.g. [22]), webcontent optimization [1], and
web spam detection (e.g. [9]).Compared to offline models, online
 User feedback:Buyers can file complaints to claim
learning usually requiresmuch lighter computation and memory
Loss if they are recently cheated by fraudulent retailers.
load; hence it canbe widely used in real-time systems with
Similarly retailers may also complain if his/her products have
continuous supportof inputs. For online feature selection,
been judged as mistakenly
representativeapplied work include [11] for the problem of object
trackingin computer vision research, and [21] for content-
3. RELATED WORK basedimage retrieval. Both approaches are simple while in this
Online auction fraud is always recognized as an importantissue. paperthe embedding of SSVS to the online structuring is
There are contexts on websites to teach people how toprevent moreprincipled.Multiple instance learning, which handles the
online auction fraud (e.g. [35, 14]). Categorizesauction fraud [10] training datawith bags of instances that are labelled positive or
into several types and proposes strategies tofight them. negative,is originally proposed by [13]. Many papers has been
Statussystems are used extensively by websites to detect auction publishedin the application area of image classification suchas
frauds, although many of them useinnocent approaches. [25, 24]. The logistic regression framework of multiple
Summarized[31] several key propertiesof a good status system instancelearning is presented in [30], and the SVM framework.
and also the challenges for themodern status systems to extract
user feedback. Otherrepresentative work connecting status
4. EXPERIMENTS
systems with onlineauction fraud detection include [32, 17, 28],
We conduct our experiments on a real online auction fraud
where thelast work [28] introduced a Markov random field model
detection data set collected from a major earlier website. We
witha belief propagation algorithm for the user status.Other than
consider the following online structures:
status systems, machine learned models have been applied to
 ON-PROB is the online probit reversion structure
moderation systems for monitoring anddetecting fraud. Proposed

ICCIT15@CiTech, Bangalore Page 420


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

 ON-SSVSB is the online probit reversion structure


with“spike and slab” prior on the coefficients, and the coefficients
for the binary rule features are bounded to be positive.
 ON-SSVSBMILis the online probit reversion structure
with multiple instance learning and “spike and slab” prior on the
coefficients. The coefficients for the binary rule features are also
bounded to be positive.
For all the above online structures we ran 10000 iterations
plus 1000 burn-ins to guarantee the convergence of the Gibbs
sampling.
We compare the online structures with a set of offline
structuresthat are similar to [38]. For observation i, we denote
thebinary response as yiand the feature set as x. For
multipleinstance learning purpose we assume retailer i has Ki
cases and denote the feature set for each case l as x .The offline
structures are:
 Experienced has the human-tuned coefficients set by Figure 1: Fraction of bags versus the number ofcases per bag
domain experts based on their knowledge and recent fraud (“bag size”) submitted by fraudulentand clean sellers
fighting experience? respectively. A bag contains all thecases submitted by a seller
 OF-LR is the offline planning reversion structure that in the same day.
minimizes the loss function.
 OF-MIL is the offline planning reversion with Human experts often label cases in a “pouched” way,
multipleinstance learning that optimizes the loss function. i.e. at any point of moment they select thecurrent most
 OF-BMIL is the bounded offline planning reversion “suspicious” retailer in the system and examineall of his/her
with multiple instance learning that optimizes the loss function in cases posted on that day. If any of these cases is culprit, all of this
(39) such that ß = T , where T is the retailer's cases will be labelled as fraud.Therefore we put all the
Pre-defined vector of lower bounds. cases submitted by a retailer in thesame day into a pouch. In
Figure1 we show the distributionof the bag or pouch size posted
All the above offline structures can be fitted via the standardL- by fraudulent and clean retailer's respectively.From the figure, we
BFGS algorithm [39]. This section is organized as follows. We do see that there are someproportion of retailer's selling more
first introduce the data and describe the general settings of than one item in a day,and the number of bags or pouches
the structures. We specify the analyzing metric for this (retailers) decays exponentially as thepouch size increases. This
experiment: the rate of missed customer feedback. Finally we indicates that applying multiple instance learning can be useful
show the performance of all the structures. for this data.

4.1 The Data and Structure Setting

Our application is real authentication prevention and fraud


detection system designed for a major earlier online auction
websitethat attracts hundreds of thousands of new auction
postingsevery day. The data consist of around 2M expert
labelledauction cases with ∼ 20K of them labelled as fraud
duringSeptember and October 2010. Besides the labelled data
wealso have unlabelled cases which passed the “pre-screening”
ofthe temperance system (using the Expert structure). The
numberof unlabelled cases in the data is about 6M-10M. For
eachobservation there is a set of features indicating how
“suspicious”it is. To prevent future fraudulent retailers gaming
aroundour system, the exact number and theme of these
featuresare highly confidential and cannot be released. Besides
theexpert-labelled binary response, the data also contains a listof
customer feedback every day, filed by the victims of the culprits.
Our data in October 2010 contains a sample of around 500 Figure 2: The boxplots of the rates of missed customer
customer feedback. Complaints on a daily basis for all the offline and online
models. It is obtained given 100% workloadrate.
It is also interesting to see that the fraudulent retailer’s tend to
post moreauction cases than the clean retailer’s, since it

ICCIT15@CiTech, Bangalore Page 421


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
potentially leadsto higher fake profit.We conduct our experiments could re-rank all the cases (labeled and unlabeled) in the group
for the offline structures OF-LR,OF-MIL and OF-BMIL as and select the first M cases with the highest scores.We call M/N
follows: we train the structures usingthe data from September and the “workload rate” in the following. For a passage specific
then test the structures on thedata from October. For the online workload rate such as 100%, we could count the number of
structures ON-PROB, ONSSVSBand ON-SSVSBMIL, we create reported culprits feedback m in the M cases.Denote the total
groups with varioussizes (e.g. one day, 1/2 day, etc.) starting number of reported culprit feedback in the test group as C, we
from the beginningof September to the end of October, update describe the rate of missed feedbackas 1−Cm/C given the
the structuresfor every group, and test the structures on the next workload rate M/N. Note that since in structure analyzing we re-
group. Tofairly compare them with the offline structures, only the rank all the cases including bothlabeled and unlabelled data,
groupsin October are used for analysing. different structures with the sameworkload rate (even 100%)
usually have different ratings of missed customer feedback. We
4.2 Analysing Metric argue structure A is better than structure B if given the same
In this paper we adopt an analyzing metric introducedin [38] that workload rate, the rate of missed customer feedback for A is less
directly reflects how many culprits a structure cancatch: the rate than B.
of missed complaints, which is the part of customer feedback that
the structure cannot capture as culprit. Note that in our
application, the labelled data was notcreated through random
sampling, but via a pre-screening authentication system using the
experienced-tuned coefficients (thedata were created when only
the experienced structure was deployed).This in realityintroduces
biases in the analyzing for the metrics which only use the labeled
observations but ignore theunlabelled ones. This rateof missed
feedback metric howevercovers both labelled and unlabelled data
since buyersdo not know which cases are labelled, hence it is
unbiasedor analyzing the structure performance.
Recall that our data were generated as follows: For each
case the authentication system uses a human-tuned linear
scoringfunction to determine whether to send it for skilled
labeling. If so, skilled review it and make a genuine or culprit
judgment; otherwise it would be determined as clean andnot
reviewed by anyone. Although for those cases that arenot labelled
we do not quickly know from the systemwhether they are
genuine or not, the real culprit cases wouldstill show up from the
feedback filed by victims of theculprits. Therefore, if we want to
prove that one machinelearned structures is better than another,
we have to make surethat with the same or even less skilled Figure 4: For ON-SSVSBMIL with daily batches, delta= 0.7
labeling workload, the former structure is able to catch more and omega = 0.9, the posterior probability of B jt = 0 (j is the
culprits (i.e. generateless customer complaints) than the latter feature index) over time for a selected set of features.
one.

Figure 3: The rates of missed customer complaints for work Figure 5: For ON-SSVSBMIL with daily batches, delta= 0.7
load rates equal to 25%, 50%, 75% and 100% for all the and omega = 0.9, the posterior mean of Bjt (j is the feature
offline models and online models with daily batches. index) over time for a selected set of features.
Finally, the most interesting set of features are the ones
For any test group, we regard the number of labeled that have a large variation of pjt day overday. One important
casesas the expected 100% workload N, and for any structure we reasonto use authentication prevention feature selection in our

ICCIT15@CiTech, Bangalore Page 422


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
application is to capture the dynamics of those unstablefeatures. xiv. T. Dietterich, R. Lathrop, and T. Lozano-P´erez.Solving the multiple
instance problem withaxis-parallel rectangles. Artificial Intelligence,89(1-2):31–
In Figure 5 we show the posterior mean of a randomly selected
71, 1997.
set of features. It is obvious that whilesome feature coefficients xv. Federal Trade Commission. Internet auctions: A guidefor buyers and
are always close to 0 (unimportant sellers. http://www.ftc.gov/bcp/conline/pubs/online/auctions.htm, 2004.
features), there are also many features with huge variationof the xvi. J. Friedman. Stochastic gradient boosting.Journal of the Royal
Statistical Society.Series B (Methodological), 58(1):267–288, 1996Computational
coefficient values.
Statistics & Data Analysis, 38(4):367–378, 2002.
xvii. E. George and R. McCulloch. Stochastic search variable
5. CONCLUSION AND FUTURE WORK selection.Markov chain Monte Carlo inpractice, 68:203–214, 1995.
In this paper we build online structures for the authention xviii. D. Gregg and J. Scott. The role of reputation systems in reducing on-
line auction fraud.International Journal of Electronic Commerce, 10(3):95–120,
prevention auction fraud and investigating system designed for a
2006.
major traditional online auction website. By empirical xix. C. Hans, A. Dobra, and M. West.Shotgun stochastic search for ¸Slarge
experiments on a realwordonline auction fraud investigating pT regression. Journal of theAmerican Statistical Association, 102(478):507–
prevention data. We show that our proposed authentication probit 516,2007.
xx. H. Ishwaran and J. Rao. Spike and slab variable selection: frequentist
structure framework, which combinesonline authentication
and bayesian strategies. The Annals of Statistics, 33(2):730–773, 2005.
feature selection, bounding coefficients from experienced xxi. T. Jaakkola and M. Jordan.A variational approach to bayesian logistic
knowledge and multiple instance learning, can regression models and their extensions.In Proceedings of the sixth international
significantlyimprove over baselines and the human-tuned model. workshop on artificial intelligence and statistics.Citeseer, 1997.
xxii. W. Jiang, G. Er, Q. Dai, and J. Gu. Similarity-based
Notethat this online authentication prevention structuring
xxiii. Online feature selection in content-based image retrieval. Image
framework can be easily extendedto many other applications, Processing, IEEE Transactions on, 15(3):702–712, 2006.
such as web spam detection,content optimization and so forth. xxiv. K. Kim. Financial time series forecasting using Support vector
Regarding to future work, one direction is to include machines. Neurocomputing,55(1-2):307–319, 2003.
xxv. Y. Ku, Y. Chen, and C. Chiu.A proposed data mining approach for
theadjustment of the selection bias in the online authentication
internet auction fraud detection.Intelligence and Security Informatics, pages 238–
prevention structure trainingprocess. It has been proven to be 243, 2007.
very effective for offline structures in [38]. The main idea there is xxvi. O. Maron and T. Lozano-P´erez.A framework for multiple-instance
to assume all theunlabeled samples have response equal to 0 with learning.In Advances in neural information processing systems, pages 570–576,
1998.
a very less weight. Since the unlabeled samples are obtained from
xxvii. O. Maron and A. Ratan.Multiple-instance learning for natural scene
an effective prevention system, it is reasonable to assume classification.In The Fifteenth International Conference on Machine Learning,
thatwith high probabilities they are genuine. Another futurework 1998.
is to deploy the online authentication prevention structures xxviii. P. McCullagh and J. Nelder.Generalized linear models.Chapman &
Hall/CRC, 1989.
described in this paperto the real production system, and also
xxix. A. B. Owen. Infinitely imbalanced logistic regression. J. Mach. Learn.
other applications. Res., 8:761–773, 2007.
xxx. S. Pandit, D. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and
REFERENCES scalable system for fraud detection in online auction networks. In Proceedings of
the 16th international conference on World Wide Web, pages 201–210. ACM,
2007.
i. D. Agarwal, B. Chen, and P. Elango. Spatio-temporalmodels for
xxxi. J. Quinlan. Induction of decision trees. Machine learning, 1(1):81–
estimating click-through rate. InProceedings of the 18th international conference
106, 1986.
onWorld Wide Web, pages 21–30. ACM, 2009.
xxxii. V. Raykar, B. Krishnapuram, J. Bi, M. Dundar, and R. Rao. Bayesian
ii. S. Andrews, I. Tsochantaridis, and T. Hofmann.Support vector
multiple instance learning: automatic feature selection and inductive transfer. In
machines for multiple-instancelearning. Advances in neural information
Proceedings of the 25th international conference on Machine learning, pages
processingsystems, pages 577–584, 2003.
808–815. ACM, 2008.
iii. C. Bliss. The calculation of the dosage-mortalitycurve.Annals of
xxxiii. P. Resnick, K. Kuwabara, R. Zeckhauser, and E. Friedman.
Applied Biology, 22(1):134–167, 1935.
Reputation systems. Communications of the ACM, 43(12):45–48, 2000.
iv. A. Borodin and R. El-Yaniv.Online computation andcompetitive
xxxiv. P. Resnick, R. Zeckhauser, J. Swanson, and K. Lockwood. The value of
analysis, volume 53.Cambridge UniversityPress New York, 1998.
reputation on eBay: A controlled experiment. Experimental Economics, 9(2):79–
v. L. Breiman. Random forests.Machine learning,45(1):5–32, 2001.
101, 2006.
vi. R. Brent. Algorithms for minimization withoutderivatives.Dover
xxxv. R. Tibshirani. Regression shrinkage and selection viaThe lasso.
Pubns, 2002.
xxxvi. A. Tikhonov. On the stability of inverse problems. In
vii. D. Chau and C. Faloutsos. Fraud detection inelectronic auction. In
xxxvii. Dokl.Akad. Nauk SSSR, volume 39, pages 195–198,1943.
European Web Mining Forum(EWMF 2005), page 87.
USA Today. How to avoid online auction
viii. [8] H. Chipman, E. George, and R. McCulloch. Bart:Bayesian
fraud.http://www.usatoday.com/tech/columnist/2002/05/07/yaukey.htm,
additive regression trees. The Annals ofApplied Statistics, 4(1):266–298, 2010.
2002.
ix. [9] W. Chu, M. Zinkevich, L. Li, A. Thomas, andB. Tseng. Unbiased
xxxviii. L. Wasserman. Bayesian model selection and model averaging.
online active learning in datastreams. In Proceedings of the 17th ACM
Journal of Mathematical Psychology, 44(1):92–107, 2000.
SIGKDDinternational conference on Knowledge discovery anddata mining, pages
xxxix. M. West and J. Harrison.Bayesian forecasting and dynamic models.
195–203. ACM, 2011.
Springer Verlag, 1997.
x. C. Chua and J. Wareham. Fighting internet auctionFraud: An
xl. L. Zhang, J. Yang, W. Chu, and B. Tseng.A machine-learned proactive
assessment and proposal. Computer,37(10):31–37, 2004.
moderation system for auction fraud detection.In 20th ACM Conference on
xi. R. Collins, Y. Liu, and M. Leordeanu. Online selection of
Information and Knowledge Management (CIKM).ACM, 2011.
discriminative tracking features. IEEE Transactionson Pattern Analysis and
Machine Intelligence, pages1631–1643, 2005.
xii. N. Cristianini and J. Shawe-Taylor.An introduction to
xiii. Support Vector Machines: and other kernel-basedlearning methods.
Cambridge university press, 2006.

ICCIT15@CiTech, Bangalore Page 423


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Differential Query Services Using Efficient Information Retrieval Query


Scheme In Cost-Efficient Cloud Environment

Shwetha R., Kishor Kumar K., Dr. Antony P. J.


Department of Computer Science and Engineering, KVGCE, Sullia, DK
Shwethas183@gmail.com

Abstract—Cloud computing is a new technology where the sharing in the cloud. As a typical cloud application, an
users tend to get the services through internet based on their organization subscribes the cloud services and authorizes its
demand. In this new technology users should receive the staff to share files in the cloud. Each file is described by a set of
services without much delay and the costs also should be keywords, and the staff, as authorized users, can retrieve files of
reduced. The most important aspect in this environment is their interests by querying the cloud with certain keywords. In
maintaining privacy and efficiency. The original key word such an environment, how to protect user privacyfrom the cloud,
based file retrieval scheme proposed by ostrovsky allows the which is a third party outside the security boundary of the
users to retrieve the requested files without leaking any organization, becomes a key problem.
information but it causes heavy querying overhead. In this User privacy can be classified into search privacy and
paper we present an efficient information retrieval query access privacy[2]. Search privacy means that the cloud knows
(EIRQ) scheme to reduce the querying overhead in the cloud. nothing about what the user is searching for, and access privacy
In EIRQ, a user will give the query along with the rank then means that the cloud knows nothing about which files are
the user will retrieve the files based on the rank. The rank returned to the user. When the files are stored in the clear forms,
shows the percentage of files that will be returned to the user. a proper solution to protectuser privacy is for the user to request
allof the files from the cloud; this way, the cloud cannot know
Keywords - Cloud computing, cost efficiency, differential which files the user is really interested in. While this does
query services, privacy. provide the necessary privacy, the communication cost is
high.Private searching was proposed by Ostrovsky et al. [3][4]
I INTRODUCTION which allows a user to retrieve files of interest from an untrusted
server without leaking any information. However, the Ostrovsky
Cloud computing is the delivery of computing scheme has a high computational cost, as it requires the cloud to
resources over the Internet. It has been widely adopted in broad process the query on everyfile in a collection. Otherwise, the
applications and is becoming more pervasive.The main reasons cloud will assume that certain files, without processing, are of no
behind cloud computing sharp growth are increases in interest to the user. It will quickly become a performance
computing power and data storage, exponential growth of social bottleneck when the cloud needs to processthousands of queries
network data, and modern data centres, some of which can suffer over a collection of hundreds of thousands of files.To make
from high maintenance costs and low utilization. There are also private searching applicable in a cloud environment, the
challenges in the development of reliable and cost-effective previous work [7] designed a cooperateprivate searching
cloud-based systems.Cloud computing presents a new way to protocol (COPS), where a proxy server, called the aggregation
supplement the current consumption and delivery model for IT and distribution layer (ADL), is introduced between the users
services based on the Internet, by providing for dynamically and the cloud. The ADL deployed inside an organization has two
scalable and often virtualized resources as a service over the main functionalities: aggregating user queries and distributing
Internet. Cloud computing is the use of computing resources search results. Under the ADL, the computation cost incurred on
(hardware and software) which are available in the cloud can be largely reduced, since the cloud only needs to
execute a combined query once, no matter how many users are
executing queries.
Furthermore, the communication cost incurred on the
remote location and accessible over the network. cloud will also be reduced, since files shared by the users need to
be returned only once.Motivated by this goal, the new scheme,
Users are able to buy these computing resources as a utility, on named Efficient Information retrieval for Ranked Query (EIRQ),
demand.The name comes from the common use of a cloud- in which each user can provide his own percentage along with
shaped symbol as an abstraction for the complex infrastructure it the query to determine the percentage of matched files to be
contains in system diagrams. Cloud computing entrusts remote returned. The basic idea of EIRQ is to construct a privacy
services with a user's data, software and computation. preserving mask matrixthat allows the cloud to filter out a
certain percentage of matched files before returning to the ADL.
Cloud computing as an emerging technology is This is not a trivial work, since the cloud needs to correctly filter
expected to reshape information technology processes in the out files according to the rank of queries without knowing
near future [1]. Due to the overwhelming merits of cloud anything about user privacy.
computing, e.g., cost-effectiveness, flexibility and scalability,
more and more organizations choose to outsource theirdata for

ICCIT15@CiTech, Bangalore Page 424


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
II. RELATED WORK approach that allows data owners to integrate data access
policies within the encrypted data. However, little work has been
A number of methods have been proposed in recent years to done to explore flexible authorization in specifying the data
provide user privacy and also regarding private searching user's privileges and enforcing the data owner's policy in cloud
schemes. based environments. In this paper, G. Wang, Q. Liu, J. Wu, and
Private searching on streaming data (2005).In this M. Guo [4]propose a hierarchical attribute based access control
paper, R. Ostrovsky and W. Skeith[1] considered the problem of scheme by extending cipher text-policy attribute-based
private searching on streaming data. He showed that in this encryption (CP-ABE) with a hierarchical structure of multi
model we can efficiently implement searching for documents authorities and exploiting attribute-based signature (ABS). The
under a secret criteria (such as presence or absence of a hidden proposed scheme not only achieves scalability due to its
combination of hidden keywords) under various cryptographic hierarchical structure, but also inherits fine-grained access
assumptions.The results can be viewed in a variety of ways: as a control with authentication in supporting write privilege on
generalization of the notion of a PrivateInformation Retrieval as outsourced data in cloud computing. In addition, it showed
positive results on privacy-preserving datamining; and as a decoupling the task of policy management from security
delegation of hidden program computation. Searchable enforcement by using the extensible access control mark up
symmetric encryption (2006) allows a party to outsource the language (XACML) framework. Extensive analysis shows that
storage of his data to another party in a private manner, while this scheme is both efficient and scalable in dealing with access
maintaining the ability to selectively search over it. This control for outsourced data in cloud computing.
problem has been the focus of active research and several Efficient information retrieval for ranked queries in
security definitions and constructions have been proposed. In cost-effective cloud environments (2012).Cloud computing as an
this paper we begin by reviewing existing notions of security emerging technology trend is expected to reshape the advances
and propose new and stronger security definitions. R. Curtmola, in information technology. In this paper, it addresses two
J. Garay, S. Kamara, and R. Ostrovsky[2] presented two fundamental issues in a cloud environment: privacy and
constructions that show secure under new definitions. efficiency. Here first review a private keyword-based file
Interestingly, in addition to satisfying stronger security retrieval scheme proposed by Ostrovsky et.[5]. Then, based on
guarantees, the new constructions are more efficient than all an aggregation and distribution layer (ADL), presented a
previous constructions. Further, prior work on SSE only scheme, termed efficient information retrieval for ranked query
considered the setting where only the owner of the data is (EIRQ), to further reduce querying costs incurred in the cloud.
capable of submitting search queries. They also consider the Queries are classifiedinto multiple ranks, where a higher ranked
natural extension where an arbitrary group of parties other than query can retrieve a higher percentage of matched files.
the owner can submit search queries. The SSE is formally Extensive evaluations have been conducted on an analytical
defined in this multi-user setting, and presents an efficient model to examine the effectiveness of this scheme.
construction. Newconstructions and practical applications for private
Private searching on stream data Journal of Cryptology stream searching (2013).A system for private stream searching
(2007).Private searching on streaming data is a process to allows a client to retrieve documents matching some search
dispatch to a public server a program, which searches streaming criteria from a remote server while the server evaluating the
sources of data without revealing searching criteria and then request remains provably oblivious to the search criteria. In this
sends back a buffer containing the findings. From an Abelian extended abstract, we give a high level outline of a new scheme
group homomorphic encryption, the searching criteria can be for this problem and an experimental analysis of its scalability.
constructed by only simple combinations of keywords, for The new scheme is highly efficient in practice. We demonstrate
example, disjunction of keywords. the practical applicability of the scheme by considering its
The recent breakthrough in fully homomorphic performance in the demanding scenario of providing a privacy
encryption has allowed us to construct arbitrary searching preserving version of the Google News Alerts service.
criteria theoretically. Here consider a new private query, which
searches for documents from streaming data on the basis of III. PROPOSED SYSTEM
keyword frequency, such that the frequency of a keyword is
required to be higher or lowerthan a given threshold. This form Here the new proposed scheme called Efficient Information
of query can help us in finding more relevant documents. Based retrieval System is introduced. This new system uses the method
on the state of the art fully homomorphic encryption techniques, of Flexible ranking mechanism which allows users to provide a
we give disjunctive, conjunctive, and complement constructions rank and can personally decide how many matched files will
for private threshold queries based on keyword cloud returns.
frequency.Combining basic constructions, further The basic idea is to construct a matrix that allows the
presented a generic construction for arbitrary private threshold cloud to filter out certain percentage of matched files. The new
queries based on keyword frequency. The protocols are scheme reduces the querying overhead and also computational
semantically secure as long as the underlying fully costs. The EIRQ system protects user privacy which allows each
homomorphic encryption scheme is semantically secure. user to retrieve matched files on demand. This is not an easy
Hierarchical attribute-based encryption and scalable work, because the cloud needs to correctly filter out files
user revocation for sharing data in cloud servers (2011).Access according to the rank of queries without knowing anything about
control is one of the most important security mechanisms in user privacy. This has two extensions: the first extension shows
cloud computing. Attributed based encryption provides an the least amount of modifications from the Ostrovsky scheme,

ICCIT15@CiTech, Bangalore Page 425


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
and the second extension provides privacy by leaking the least Then, the cloud processes the combined query on the file
amount of information to the cloud collection and returns a buffer that contains all of matched files
to the ADL, which will distribute the search results to each user.
To aggregate sufficient queries, the organization may require the
ADL to wait for a period of time before running our schemes,
which may incur a certain querying delay.

Ranked Queries
To further reduce the communication cost, a differential
query service is provided by allowing each user to retrieve
matched files on demand. Specifically, a user selects a particular
rankfor his query to determine the percentage of matched files to
be returned. This feature is useful when there are a lot of files
that match a user’s query, but the user only needs a small subset
of them.

IV EXPERIMENTAL RESULTS & EVALUATION


Figure 1: Architecture Of the Proposed system
In this section, we will compare three EIRQ schemes,
The proposed system has following four modules from the following aspects: file survival rate and computation/
communication cost incurred on the cloud. Then,based on the
Differential Query Services simulation results, we deploy our program in Amazon Elastic
The novel concept proposed here is a differential query Compute Cloud (EC2) to test the transfer-in and transfer-out
service, to COPS, where the users are allowed to personally time incurred on the cloud when executing private searches. In
decide how many matched files will be returned. This is the previous scheme there was a large querying overhead and
motivated by the fact that under certain cases, there are a lot of consumes more bandwidth.
files matching a user’s query, but the user is interested in only a First, we test the transfer-in time in the real cloud, which is
certain percentage of matched files. To illustrate, let us assume mainly incurred by receiving queries from the ADL.Then, we
that Alice wants to retrieve 2% of the files that contain keywords test the transfer-out time at the cloud, which is mainly incurred
“A, B”, and Bob wants to retrieve 20% of the files that contain by returning files to the ADL. The results are shown below
keywords “A, C”. The cloud holds 1,000 files, where {F1, . . . ,
F500} and {F501, . . . , F1000} are described by keywords “A,
B” and “A, C”, respectively. In the Ostrovsky scheme, the cloud
will have to return 2,000 files. In the COPS scheme, the cloud
will have to return 1,000 files. In our scheme, the cloud only
needs to return 20 files. Therefore, by allowing the users to
retrieve matched files on demand, the bandwidth consumed in
the cloud can be largely reduced.

Efficient Information Retrieval for Ranked Query


The new scheme proposed is termed as Efficient
Information retrieval for Ranked Query (EIRQ), in which each
user can choose the rank of his query to determine the
percentage of matched files to be returned. The basic idea of
EIRQ is to construct a privacy preserving mask matrixthat
allows the cloud to filter out a certain percentage of matched
files before returning to the ADL. This is not a trivial work,
since the cloud needs to correctly filter out files according to the
rank of queries without knowing anything about user privacy.
Focusing on different design goals, we provide two extensions:
the first extension emphasizes simplicityby requiring the least
amount of modifications from the Ostrovsky scheme, and the
second extension emphasizes privacyby leaking the least amount
of information to the cloud.
Therefore, EIRQ-Efficient is most suitable to be deployed to a
Aggregation and Distribution Layer cloud environment. For example, the time to transfer a query
An ADL is deployed in an organization that authorizes from the ADL to the cloud consumes less than 100 seconds, and
its staff to share data in the cloud. The staff members, as the the time to transfer the buffer from the cloud to the ADL
authorized users, send their queries to the ADL, which will consumes less than 500 seconds, fewer than 4 common
aggregate user queries and send a combined query to the cloud. keywords.

ICCIT15@CiTech, Bangalore Page 426


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

V CONCLUSION

In this paper, we proposed three EIRQ schemes based on an


ADL to provide differential query services while protecting user
privacy. By using our schemes, a user can retrieve different
percentages of matched files byspecifying queries of different
ranks. By further reducing the communication cost incurred on
the cloud, the EIRQ schemes make the private searching
technique more applicableto a cost-efficient cloud environment.
However, in the EIRQ schemes, we simply determine the rank
of each file by the highest rank of queries it matches. Forour
future work, we will try to design a flexible ranking mechanism
for the EIRQ schemes.

REFERENCES
i. R. Ostrovsky and W. Skeith, “Private searching on streaming data,”
in Proc. of CRYPTOLOGY, 2005.
ii. R. Curtmola, J. Garay, S. Kamara, andR.Ostrovsky, “Searchable
symmetric encryption: improved definitions and efficient constructions,” in Proc.
ofACM CCS, 2006.
iii. “Private searching on streaming data,” Journal of Cryptology, 2007.
iv. G. Wang, Q. Liu, J. Wu, and M. Guo, “Hierarchicalattribute-based
encryption and scalable user revocation for sharing data in cloud servers,”
Computers & Security, 2011.
v. Q. Liu, C. C. Tan, J. Wu, and G. Wang, “Efficient information
retrieval for ranked queries in cost-effective cloud environments,” in Proc. of
IEEE INFOCOM, 2012.
vi. J. Bethencourt, D. Song, and B. Waters, “New constructions and
practical applications for private stream searching,” in Proc. ofIEEE S&P,
2013.

ICCIT15@CiTech, Bangalore Page 427


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

An Efficient and Effective Information Hiding Scheme using Symmetric Key


cryptography
Sushma U., Dr. D.R. Shashi Kumar
Dept. Of CSE,Cambridge Institute Of Technology Bangalore - 36, India
ssushma999@gmail.com,shashikumar.cse@citech.edu.in

Abstract : Modern days are fully dependent on internet utilized to transmit the wanted information with a high level of
communication. Through net we can transfer data anywhere security and dependability while is passing through the
in the world to anyplace we want. The Internet was born out of unsecured channels [3]. The difficulties of media framework
academic efforts to share information; it never actually strove like advanced pictures, archives, sound, and feature rely on upon
for the high security process. It plays a key role in becoming two components that media framework data size is here and
people online, it is very easy and effective but dangerous too in there horribly monstrous and wish to be handled inside the
terms of data hacking and eavesdropping by hackers. It is constant [4]. Encryption calculations like DES, plan and RSA
needed that while using Internet data must be secured and aren't fitting for sensible picture encoding, especially underneath
should be personal. Image encryption can be used to protect the condition of undertakings of on-line interchanges [5].
data during transmission. Image encryption is a suitable Militaries, governments, privately owned businesses have
process to protect image data. There are many cryptographic utilized the encryption for quite a while to encourage mystery
algorithms which are being used to secure multimedia data like correspondence. The routine cryptographic frameworks chiefly
images, but they have some definite advantages and have been produced for securing alphanumeric information as
disadvantages. So there is a requirement to develop a strong opposed to the picture and sound signals. The encryption of
image cryptography algorithm for securing the image while sound signs with customary encryption obliged extensive
transferring. In this paper, a new symmetric key cryptography measure of processing power and time. A quick, dependable,
algorithm has been proposed for color 3D images. In this and powerful calculation is needed to scramble both picture and
algorithm a different type of key generation method is being sound with less processing time and high level of precision [6].
introduced. This technique is unique and is used for th e first Mechanical advances in the advanced substance process,
time for key generation. Here two public keys are used in generation and conveyance has offered ascent to a scope of late
cryptography process. Key generation is very important in flag handling applications amid which security dangers won't be
symmetric as well as in the asymmetric key cryptographic taken care of in an exceedingly established style. These
algorithm. Here, we propose a work for developing a new applications fluctuate from the interactive media framework,
symmetric key cryptography algorithm for image data to content creation and dissemination of cutting edge biometric
provide a secure transmission during the network sign procedure for validation, biometric recognizable proof and
communication. All the concepts related to this area are access administration. In a few of those cases, security and
explained. This algorithm is totally lossless, such that image protection dangers may block the reception of late picture and
pixel are preserved during encryption and decryption. feature transforming administrations. Therefore, the vocation of
cryptanalytic systems in picture and feature process applications
Key Words —Encryption, Decryption, Image. is transforming into more regular. The cryptanalytic methods
used in these applications will be utilized as a part of two
I. Introduction measurements (20) grid [7].
The quick improvement of PC system correspondence, there
is so natural to acquire computerized pictures through the system II. Literature Review
and further utilize, imitate and disseminate them. Computerized
innovation brings us much handiness, however it likewise gives As indicated by Dr. Mohammad V Malakooti Mojtaba
a chance to assailant or unlawful client to hack our own Raeisi Nejad [6], they have proposed an calculation for pictures
information. Regularly, there are two noteworthy methodologies taking into account a novel misfortune less advanced encryption
which are utilized to ensure pictures. One is data concealing framework for interactive media utilizing the orthogonal changes
which incorporates obscurity, watermarking, steganography and for the encryption of picture information. This technique is in
spread channel. The other is encryption which incorporates light of the square figure symmetric key cryptography. Creators
traditional cryptography calculation [I]. The field of encryption have an accentuation on the improvement of a novel lossless
and security is getting to be essential in the twenty first century, advanced encryption framework for interactive media. They
when an enormous measure of data is transmitted over the utilized the symmetric properties of the orthogonal changes to
neighborhood and additionally the Internet. The advanced ascertain the opposite of the orthogonal grids amid the execution
information and pictures account more than two-third of data of the decoding methodology. They utilized a few traditional
that is transmitted over the Internet [2].Thus, a very dependable picture encryption methodologies, for example, Discrete Cosine
and hearty encryption calculation is obliged when the data is Change (OCT), Hadamard Transform (HT) and additionally
transmitted over the unsecured channels. Information encryption Malakooti Transform (MT) [6]. As indicated by Sahar Mazloom,
and information inserting are the most vital implies that can be Amir-Masud Eftekhari -Moghadam, a picture encryption is by

ICCIT15@CiTech, Bangalore Page 428


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
one means or another not quite the same as content information less as contrasted with a content information cryptography
scrambled because of some innate highlights of the pictures. calculation. The RAS, AES, DES, MD5 and so on are not
Pictures have endlessness data capacity highlight and the high utilizing for picture cryptography on the grounds that sight and
relationship between pixels, that square measure regularly sound like picture, feature and sound information size is
intense to handle like an instant message. Bizarrely interesting typically expansive. At present numerous cryptography
properties of the clamorous maps, for example, sympathy to calculation is utilized for picture information, yet some have
starting conditions furthermore, irregular like conduct have shortcoming in it. They are a need of new symmetric key
pulled in the mindfulness of cryptographers to grow new cryptography calculation for picture information. They require a
encryption calculations. The Author has foreseen another high secure transmission execution, when connected at high bit-
symmetric key cryptography for pictures. This calculation rate interactive media information in the system; it requires high
broadly utilized confusingly-dispersion plan which uses the idea transforming assets and quick calculation. Another
of disorderly 2 measurements (20) customary guide and 1 cryptographic calculation has been produced to speak to the
measurement (I D) Logistic guide. This calculation utilize 128 interactive media content security in system channel. This
bits of mystery key. This calculation is explicitly planned for the calculation speaks to the proposed plan for shading picture
shading pictures, which are 3D shows of RGB information encryption in the system that uses the 3D framework. In this
stream. This calculation is especially intended for shading paper, we have portrayed the method for usage for proposed
pictures, which is a 3D show of the information stream [11]. calculation. There is a need of security for mixed media
Creators had an accentuation on costly sight and sound information like picture, feature and sound amid the system
substance, for example, computerized pictures, still, is helpless correspondence. This calculation is in view of another
against unapproved access while away and amid transmission innovation for key era of pictures. In this paper a new key era
over a system. The Stream of advanced pictures additionally procedure is created encryption and decoding. This key era
obliges high system transfer speed for transmission. In this methodology is interesting. Two diverse open is the utilization
paper, they display a novel plan, which joins Discrete Wavelet of cryptography procedure. Key era capacity coding is
Transform (DWT) for picture pressure and square figure Data independent from encryption and decoding program because of
Encryption Standard (DES) for picture encryption. The conceal key values from the client and programmer. The
reenactment results show that, the proposed technique upgrades accompanying steps are depicting the proposed calculation
the security for picture transmission over the Internet too as approach.
enhances the transmission rate [9]. In this paper creators have
extraordinary about pictures territory unit that was totally not IV. Key Generation Algorithm
quite the same as messages in a few perspectives, as to a great
degree repetition and relationship, the indigenous structure Step 1: Calculate two different qualities i.e. a & b by help of p,q.
furthermore the attributes of adequacy recurrence. Therefore, the Step 2: Two open key is created with the assistance of four
methods for normal mystery composing can't be relevant to values p, q, a, b.
photos. They have propensity to enhance the properties of Step 3: Create a grid of 2x2 or 4x4 i.e. initially open key.
disarray and dispersion regarding particular exponential Step 4: Create another grid of 16x16 or 8x8 i.e. second open key.
disorganized maps, and style a key subject for the Step 5: These two open keys are utilized for encryption also,
imperviousness to dim code assault and datum assault and decoding.
differential assault [1]. NK Pareek, Vinod Patidar, KK Sud [8]
has portrayed about the disorder fundamentally cryptological V. Calculation For Encryption
calculations, which was new and conservative to build up a
secured picture cryptography strategy. They proposed a new The method for the Image encoding algorithmic tenet is as per
approach for picture cryptography that upheld riotous maps, the following:
such an approach to meet the necessities of secured picture Step 1: Enter two huge numbers p, q.
exchange. With the proposed picture cryptography calculation, Step 2: Import the obliged plain picture and proselyte into the
they utilized outer mystery key of 80-bit size and 2 riotous framework by the assistance of MAT LAB charge.
supplying maps are utilized. The starting conditions for the Step 3: Divide entire grid into n number of 2x2 or 4x4 grids.
every supplying maps are inferred abuse the outside mystery key Step 4: Multiply n number of 2x2 or 4x4 grid with first and
by giving totally diverse weightage to any or every one of its foremost open key.
bits. The proposed calculation for cryptography technique, Step 5: Combined all "n" grid and structure a solitary network of
utilized eight various types of operations to compose the picture size equivalent to plain picture.
components of a picture and that one among them are utilized Step 6: Divide entire grid into "n" number of 8x8 or 16x16 grids.
for a chosen pixel is dictated by the deciding aftereffect of the Step 7: Add "n" number of 8x8 or 16x16 grid with second open
supplying guide. To make the figures picture a ton of solid key.
calculation product needed against any assault. The estimation of Step 8: Transpose the framework
key changed, when scrambling each square of sixteen pixels of Step 9: Combined all "n" framework and structure a solitary grid
the picture [8]. of size equivalent to plain picture.
Step 10: Resultant framework is obliged figure picture.
III. The Proposed Work On Color Image Cryptography
There are a ton of cryptographic calculations for
content information, yet the media, cryptography calculation is

ICCIT15@CiTech, Bangalore Page 429


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

VI. Calculation For Decryption

The strategy for the Image deciphering algorithmic


tenet is as per the following:
Step I: import Cipher picture and proselyte into the network by
the assistance of MA TLAB charge.
Step 2: Divide entire network into "n" number of 8x8 or 16x16
networks.
Step 3: Transpose the network.
Step 4: Subtract "n" number of 8x8 or 16x16 network with
second open key.
Step 5: Combined all "n" network and structure a solitary grid
of size equivalent to figure picture.
Step 6: Divide entire framework into "n" number of 2x2 or 4x4
frameworks.
Step 7: Divide "n" number of 2x2 or 4x4 framework with to start
with open key.
Step 8: Combined all "n" framework and structure single grid of
size equivalent to figure picture.
Step 9: Resultant framework is Plain picture. The exhibitions of
encoded and decoded pictures have been tried and the outcome
will be examined through MATLAB test system, the proposed
calculation is basic and lossless however troublesome for the
gatecrashers to benefit the key.
In symmetric key cryptography, open key is the utilization of the
sender and collector. We clarified all the ideas identified with
this exploration region. This calculation is lessening misfortunes
of picture pixel amid encryption and decoding. We purposed a
lossless computerized encryption model based on another
innovation for key era for pictures. The estimations of mystery
keys were acquired from the another system of key generator
Figure 1 shows unique picture which is utilized for the
strategy, results have tried also, investigate the performs of
cryptography reason. The proposed calculation utilizing for
encryption and unscrambling methodology utilizing parallel
encryption and unscrambling connected to picture (figure 1
calculation or some kind of shrewd calculation that make more
unique pictures) and the calculation lives up to expectations
secure and straightforward however more hard to find key
legitimately with the methodology. With this picture, a check
worth.
was made to analyze whether an information misfortune is
prevented or not. Figure 2 represents a cipher image of an
VII. Exploratory Result Analysis
original image. This output came after applying the new
generated keys on the original image and performed encryption
To accept our proposition, the test setup is actualized in
and decryption process, a cipher image is obtained that is shown
Java language with the utilization of MA TLAB test system
in figure 2. Figure 3 represents the decoded picture or plain
running on Windows stage. We utilized RGB shading scale
picture after decoding methodology. Decoded picture or plain
pictures. In light of the proposed calculation, we created
picture acquired after applying new produced keys and decoding
programming for encryption and unscrambling of pictures and
calculation available for later request of encryption calculation.
the outcome is indicated in Figure 1 and 3. Two measurements
are utilized for key era. The primary metric is our first open key
VIII. Examination Between Encrypted And
of size 2x2 or 4x4 and the second one is our second open key of
size 8x8 or 16x16. The first Rose picture is tried on MATLAB
Decoded Histogram:
through the proposed calculation for encryption and
unscrambling. All the sweep modules grew in new cryptography
Figure 4 demonstrates a histogram of the first picture. The
calculation. Key era capacity coding is divided from encryption
histogram is utilized for discovering force of pixel at diverse
and decoding program due to conceal key qualities from the
focuses. imhist(I) order is utilized for showing a histogram of the
client and programmer. Test results for the Rose picture are
picture "I" over a dark scale shading bar. The quantity of
demonstrated in figure (Fig. 1 to 8).
containers in the histogram is indicated by the picture sort. In the
event that is a grayscale picture, imhist utilizes a default
estimation of 256 containers. We broke down the force of pixel
at diverse point through the histogram demonstrated in figure 4
and figure 5. A picture histogram represents how pixels in an

ICCIT15@CiTech, Bangalore Page 430


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
picture are disseminated by diagramming the quantity of pixel at X. CONCLUSION
every shading force level. The histogram of unique/rearranged
picture is like the histogram of plain picture. This implies that In this proposal, we build up another system for symmetric key
the relating factual data in rearranged picture is as equivalent as encryption of a picture information. This method is special,
the plain picture. straightforward for encryption and decoding and hard to
recognize. The mean square lapse of our outcome is additionally
IX. Examination: zero at extremely piece of the framework. In this paper, we
actualize a methodology which gives lossless picture
N=row=column; transmission. Exploration work gives another calculation to
picture encryption and unscrambling procedure. This calculation
n=NxN; performs a worthy nature of administration and need a fitting
distinctive security level of the picture information. We
Mat= Squar e [Ml (i,j)-M2(i,j)]; effectively recreate the idea of key encryption and unscrambling
of pictures through MATLAB and dissect the outcome. Several
Mean Square Error (MSE) = (Mat/N); shortcomings were discovered and overcome effectively. The
outcomes looked at with other encoded methods that
We have thought about our mean square Error Estimation with demonstrated the viability of the new scrambled system. This
Table 2 [6]. paper gave an important approach in the developing region of
Table 1 is our outcome which MSE is zero at diverse piece. cryptography.
Square network is utilization for correlation. Figure 6
demonstrate a snapshoot of MATLAB yield of mean square References
blunder between unique yield and plain picture. i. Linhua Zhang ,Xiaofeng Liao, Xuebing Wang, "An image encryption
approach based on chaotic maps" Department of Computer Science and
Engineering, Chongqing University, Chongqing, China,2005, 759- 765
www.elsevier.com/locate/chaos.

ii. S. Changxiang, Z. Huangguo, F. Dengguo, C. Zhenfu & H.Jiwu,


"Survey Of Information Security", Science In China Press 2007.
iii. H. Cheng, X. Li, "Partial Encryption of Compressed Images and
Videos", IEEE Transactions on Signal Processing, Vol. 48 No. 8, August 2000.
iv. R. Rudraraju, B.A, "Digital Data Security Using Encryption",
Master's Paper, University of Texas at San Antonio, 2010.
v. Khan UM, Kh M. Classical and chaotic encryption techniques for the
security of satellite images. In: IEEE international symposium on biometrics and
security technologies (ISBAST 2008), vol. 5, no. 23-24; 2008. p. 1-6.
vi. Dr. Mohammad V. Malakooti Mojtaba Raeisi Nejad Dobuneh "A
Lossless Digital Encryption System for Multimedia Using Orthogonal
Transforms" Dobuneh Islamic Azad University, Dubai, UAE 978-1-4673-0734-
5/12/2012 IEEE
vii. W. Puech, Z. Erkin, M. Barni, S. Rane, and R. L. Lagendijk
"Emerging Cryptographic Challenges In Image And Video Processing
Mitsubishi Electric Research Laboratories", TR2012-067 September 2012.
viii. N.K. Pareek, Vinod Patidar, K.K. Sud, "Cryptography victimization
multiple one-dimensional chaotic maps", Communications in nonlinear Science
and Numerical Simulation 10 (2005) 715-723].
ix. Philip P. Dang and Paul M. Chau "Image Encryption For Secure
Internet Multimedia Applications" IEEE; 2000; Department of Electrical and
Computer Engineering University of California, San Diego La Jolla, CA,
@2000, 92093 0098 3063/2000]
x. Centro-symmetric Key Cryptography Definition and clarification,
Date Accessed: twenty two Gregorian calendar month 2009 http:// en.
wikipedia.org/wiki/Symmetrickey algorithm.
xi. Sahar Mazloom, Amir-Masud Eftekhari -Moghadam, "Color Image
Cryptosystem using Chaotic Maps"; 2011 IEEE ; Faculty of Electrical,
Computer and IT Engineering, 978-1-4244-9915-1.
xii. William Stallings, (Feb, 2007), Third Edition Book on Cryptography
and Network Security.
xiii. Andrew S. Tanenbaum, Third Edition, Chapter 2 CDMA, Book of
laptop Networkso. 1, pp. 1089–1098, 1993.

ICCIT15@CiTech, Bangalore Page 431


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

ATM Deployment using Rank Based Genetic Algorithm with convolution


Kulkarni Manjusha M.
Department of CSE, V.T.U., SJBIT, Bangalore, India
manjusha.shastry@gmail.com

Abstract— ATM is the most significant service provided by a new Rank Based Genetic Algorithm for solving the banking
banking sector to the customers. Optimally deploying ATM’s is ATM location problem using Convolution (RGAC) which
very complex. The effective deployment of ATM depends upon outperforms the Heuristic Algorithm based on Convolution
various factors such as, where the customers lives, where they (HAC) algorithm that is inefficient while market size increases.
work, roads they travels and the cost to reach ATM. Genetic (RGAC) increases the search efficiency by improving the
algorithms are used to solve such optimization problems using evolutionary process while meeting a feasible solution.
techniques such as inheritance, mutation, selection, and Moreover, RGAC has proved to be a robust approach for solving
crossover. A banks decision to deploy ATM’s should be logical the ATMs deployment problem and is able to provide high
as well as profitable which provide greater convenience and quality solutions in a reasonable time.
covering larger market area with maximum customers. The The rest of the paper is structured as follows: Section II
objective is to minimize the total number of machines but indicates some important related work RGAC.A detailed
covering all the customer demands in the selected area. This description of the problem encoding and specific operators are
study proposes a Rank Based Genetic Algorithm using explained in Section III. Section IV explains about RGAC and
convolution for solving the Banking ATM’s Location Problem VI section includes concluding remarks.
(RGAC).
RGAC is one of the ATM deployment strategy based II. RELATED WORK
on rank concept which gives high feasible solution in The study [1] investigated placement of minimum number of
reasonable time. RGAC gives cost efficient allocation of ATM machines covering all customer demands in given
ATM’s and computing percentage coverage(PC Covering geographical area. They have developed a heuristic algorithm to
whole area ) which is high as it covers customers demands by efficiently solve the optimal location problem of ATM
maximizing the service utility of each machine . placement by formulating a mathematical model.
In this study, the problem of finding the minimum number of
Key–Words: Genetic Algorithms (GAs), Rank, Automated ATM’s and their locations given arbitrary demand patterns is
Teller Machines (ATM), Percentage coverage (PC), Client considered. They have considered one particular area and
Utility matrix (CU), Service Utility Matrix (SU) ,Rank Based divided the parts of that accordingly such as area with no
Genetic Algorithm using convolution (RGAC). demand, high demand, and normal demand and so on with color
code. Using the variables the placement problem is modeled.
I.INTRODUCTION The study [2] presents the problem of WiFiDP (WiFi
Network Design problem) grouping problem. A hybrid grouping
ATM is an electronic banking outlet, which allows customers to genetic algorithm (HGGA) is proposed as a convenient method
complete basic transactions without the aid of a branch to solve such problems with providing a smaller and low cost
representative or teller. ATMs are scattered throughout cities, connection service. The popularity of WiFi-enabled devices
allowing customers easier access to their accounts. ATMs have represents an enormous market potential for wireless networking
become a competitive weapon to commercial banks whose services and mobile applications, based on this technology. The
objective is to capture the maximum potential customers. The deployment of citywide WiFi access networks is a location
fact is that commercial banks compete not only on the dimension problem as well as its a large assignment. In this case, the
of price but also on the dimension of location. ATM optimal grouping genetic algorithm is combined with a repairing
Deployment Strategies offer the opportunity to provide greater procedure, to ensure feasible solutions, and with a local search to
convenience and to attract more customers by covering the improve its performance for the case of the WiFiDP. The
money market with sufficient ATM facilities. These strategies grouping genetic algorithm (GGA) is a class of evolutionary
also provide greater cost efficiency by finding optimal number algorithm specially modified to tackle grouping problems, i.e.
of ATMs to be installed and greater profitability by increasing scenarios in which a number of items must be assigned to a set
the ATM user base in order to earn much more transactions and of predefined groups. Thus, in the GGA, the encoding,
services fees as well as through the inflow of deposits from the crossover, and mutation operator of traditional GAs are
depositors. modified, obtaining a compact algorithm with very good
The location depends on the transactions demanded by the performance in problems of grouping.
customer of proprietary ATM and non-proprietary ATM. A The study [3] investigate the ATM placement problem
bank’s decision to deploy ATMs should be a rational Economic which is significant service provided by bank to customers.
decision using the best ATM deployment strategy that takes into Many banks utilize ATMs to make cash withdrawal available to
account the high computation complexities. This paper proposes their customers at all times. They have formulated the ATM

ICCIT15@CiTech, Bangalore Page 432


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
allocation problem as an optimization problem with will be implemented based on a priori analysis of all the
mathematical model by considering various factors for applicable factors. Using SPSS program , the related data are
deployment such as the price of buying or leasing an ATM, cost entered. The variables used are Customers Age, income,
of deployment, cost of operation, and ATM characteristics to be Education and Marital Status which constitute the demographic
deployed. . and economic factors. The traffic data are represented by a
The study [4] determines cash management for ATM variable such as the location importance which encompasses
network. They have proposed one approach based on artificial factors like number of residents, number of public institutes,
neural network to forecast a daily cash demand for every ATM number of private institutes and the state of street whether it is
in the network and on the optimization procedure to estimate the main street, by-street or crossroad.
optimal cash load for every ATM.. ANN are used for tasks such The procedure now is to compute the mean value of
as pattern recognition, classification and time series fore-casting. these variables for each customer then we segment the customers
The key to all forecasting applications is to capture and process according to their areas and compute the cumulative mean value
the historical data so that they provide insight into the future. for customers belonging to each respective segment. Each
The primary objective of cash forecasting is to ensure that cash cumulative mean value represents one element in G(x×y) matrix.
is used efficiently throughout the branch network. Cash fore- The elements of G(x×y) range from 0 to 10.when The element
casting is integral to the effective operation of an ATM/branch g(x×y) is high means that there are more number of potential
network optimization procedure. customers in that area, in contrast, when g(x×y) is small means
The study [5] revolves around the grid computing less number of potential customer are there. • Generate sub-
environment where sharing computational resources, software matrices (cur), the matrix of cur is presented in equation 2 and
and data take place at a large scale. However, the management Figure 1:
of resources and computational tasks is a critical and complex cur = G(x×y) × U(m×n) × 10 (2)
undertaking as these resources and tasks are geographically
distributed and a heterogeneous, dynamic in nature. They
proposed a new Rank Based Genetic Scheduler for Grid
Computing Systems (RGSGCS) for scheduling independent Figure 1. Cur Matrix.
tasks in the grid environment by minimizing Makespan and Where: r = 1, 2… (m× n) (I× J) . U(m×n) is the degradation of
Flowtime. The performance of the system can be evaluated by Client Utility, by assuming m =3, n=3, U(m×n) is given in figure
distributing number of resources across the networked 2:
computers.
III. PROBLEM FORMULATION
The ATM placement problem is modeled and defined
mathematically. The variables used in modeling of the intended
problem are shown in the table I. The optimization problem is Figure 2: Degradation of client utility matrix
organized in such a way as to realize market clearance. CU matrix can be obtained by replacing each element in G(x×y)
In other words, the difference between CU and SU should be by its corresponding matrix cur as in Figure
minimized. This difference can be expressed mathematically in
equation 1:
E = SU – CU ≥ ᴓ (1)
Where E is the difference matrix of size(I×J) after assigning total
number of machines, SU is service utility matrix.
A. Client Utility Matrix CU:
Any exercise to optimize the deployment of ATMs must start
with a thorough understanding of the customer base and 3.
identification of the priority of the customer’s .The generation of Figure 3. Client Utility Matrix CU.
CU is made by following these procedures:
• The first step is to categorize people based on where they live,
where they work and where they may need money in order to
make payment for shopping and other transactions. The science
of grouping of the people in a geographical area according to
socioeconomic criteria is known as Geo-demography. The
Commercial Geodemography has been used to target ATM
services to the Bank’s clients based on their lifestyle and
The reason behind calculating cur is that, cur will be strongest at
location. In this study the geo-demographic approach is used by
the center of the areas, and it will degrade as one moves away
conducting a survey on potential Customer as well as
from it.
geographical, demographic, economic, and traffic data. Other
considerations include safety, cost, convenience, and visibility.
B. Service Utility Matrix SU:
Quite often, malls, supermarkets, gas stations, and other
Once the deployment of ATMs is a one off project, hence it is
high-traffic shopping areas are prime locations for ATM sites. In
done once. It is essential to distribute the limited number of atms
this paper, the priorities for different potential ATM locations
in such a way as to maximize the utility of services. In order to
ICCIT15@CiTech, Bangalore Page 433
International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
find SU, this study assumes that ATMs are homogeneous, in The value of g ranges between [0, 1] and it approaches
line with this there exists only one matrix A. Matrix A one only when all elements in E are zeros or positive values,
(represents the degradation of ATMs utility as one moves away denoting the saturation level of Client Utility.
from its location) is predetermined and held constant for all
machines. The rectilinear distance model is adopted as shown in
figure 4. In order to deploy less number of ATMs without affecting
negatively on PC and γ if the value of PC is equal to hundred
(100), then next step the number of ATMs is reduced by one, the
trial continues reducing N as long as PC is within the acceptable
limit (i.e. more than the lower limit 99). Otherwise when the
value of PC is less than the acceptable limit, then trial increases
Figure 4. Service Matrix A (The rectilinear distance model)
number of ATMs till PC reaches the acceptable limit. The
previous conditions are presented in equation 9:
The matrix Ln indicates the location of the nth machine. If this
location is denoted by the coordinates (un,vn) then all elements
of Ln are equal to zero except for coordinates (un,vn) where they
are equal to one as in the equation 3 and figure 5. Where: k = 1, 2 …
V. RANK BASED GENETIC ALGORITHM FOR
SOLVING THE BANKING ATM’S LOCATION
PROBLEM (RGAC)
GA is used to solve optimization problems by imitating the
genetic process of biological organisms. A potential solution to a
specific problem may be represented as a chromosome
containing a series of genes. A set of chromosomes makes up the
population. By using Selection, Crossover and Mutation
The matrix SU can be obtained from the convolving of two Operators, GA is able to evolve the population to generate an
matrices A and L as in equation 4 and figure 6. Notice that the optimal solution.
objective of the convolution here is to surround the unique non- A. Chromosome Representation
zero element in Ln with the service pattern matrix A. Therefore, The efficiency of GA depends largely on the representation of a
the convolution operation in this case can be performed very chromosome which is composed of a series of genes.
efficiently by simply centering the elements of the A matrix at Here each gene represents an ATM location which is equal to
(un, vn). one or zero based on binding of the ATM to its location as in
SU = A * L (4) equation 3. As a result, L represents the chromosome.
Where: the symbol * indicates the convolution product. Population Initialization is generated randomly.
C. Percentage Coverage (PC): B. Fitness Equation
In order to satisfy the client, his Utility should be satisfied by A fitness equation must be devised to determine the quality of a
covering his demand, and the Service Utility should be given chromosome and always returns a single numerical value.
maximized through effective deployment of ATMs, this will In determining the fitness equation, it is necessary to maximize
save the cost of providing additional ATM. PC is computed as the Percentage Coverage PC of CU. RGAC takes the PC value
the percentage of ψ (ψ is equal to one in all points in E that have as a fitness equation for a given chromosome, which presented
SU greater than CU) divided by the number of elements in E. PC in equation 5.
is given as in equation (5): C. Evolutionary Process
Evolutionary process is accomplished by applying Rank based
Roulette Wheel Selection (RRWS). Crossover and mutation
Where ψ is given in equation ( 6): operate from one generation to the next. Selection Operator
determines how many and which individuals will be kept in the
next generation. Crossover Operator controls how to exchange
In addition to PC, another important measure (the total Client genes between individuals, while the Mutation Operator allows
Utility satisfied γ) is calculated. The formula of γ is given in for random gene alteration of an individual. Besides the standard
equation (7): genetic operators discussed above, the Elitism Phase is used to
preserve the best candidates. These stages are discussed in
details as below. Firstly, in order to carry out the RRWS, the
Relative Probability (shown in equation 14) and cumulative
proportion of each chromosome are calculated.
The algorithm returns both γ and PC values with the solution as Pi = Rank (fitness); (14)
will be shown in the simulations section. PC and γ values are After that, one-Point Crossover and Mutation Operators, the
essential in measuring the goodness of deployment of ATMs. algorithms (1, 2) are applied to the chromosomes from the
selection phase. Mutation Operator runs through the genes in
each of the chromosomes and mutates each gene according to a

ICCIT15@CiTech, Bangalore Page 434


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
Mutation Rate Pm. Finally, Elitism combines the parent 10: if stopping criterion has reached then
population with the modified population (the candidates 11: return PC, g values and best ATM Location Matrix L
generated by Crossover and Mutation Operators), and takes the found;
best chromosomes to the next generation. The purpose of this 12: break;
phase is to preserve the best chromosomes from being lost. After 13: end if
this phase, the algorithm continues to the next iteration. RGAC 14: end for
is presented in the algorithm 3. RGAC is effective in speeding up convergence while meeting a
feasible result. The fitness equation gives very good results and
D. Performance Analysis high quality solutions, even though the complexity of the
RGAC needs to execute some hundreds of iterations to come up Problem increases.
with an optimal solution. However, the shortcoming of HAC is
convergence to a local optimum. According to the simulation VI CONCLUSIONS AND FUTURE WORK
results, it is proved that RGAC is effective in speeding up
convergence while meeting a feasible result. Also RGAC This paper presents the RGAC technique, for deploying ATMs
outperforms HAC, in the PC and g values to obtain the final in the Banking world, which increases search efficiency by
schedule. improving the evolutionary process while meeting a feasible
result. Moreover, RGAC has proved to be a robust approach for
Algorithm 1 One point Crossover solving the ATMs deployment problem and is able to provide
high quality solutions in a reasonable time as compared to HAC.
1: for i=1 to popSize/2 do The simulation results show that, RGAC solves the optimization
2: Select two chromosomes p1, p2 randomly problem of ATMs deployment by maximizing PC and
3: Select Crossover point Point randomly minimizing N, thus RGAC matches the two objectives of banks,
4: Save coordinates of ATM locations of two chromosomes namely, attaining the highest client utility as well as improving
p1, p2 in row1, col1, row2, col2. the cost efficiency of the banks. In the future, it is appropriate to
5: if random [0, 1] probCrossover then extend the goodness of results by including other measures like
6: for k=Point+1: ChromosomeLength do variance or standard deviation in order to obtain less dispersion
7: Swap the coordinates of ATM locations of in matrix E.
T two chromosomes p1, p2
8: end for REFERENCES
9: Keep the newly produced chromosomes as candidates
10: end if i. M.A. Aldajani and H. K. Alfares,“ Location of banking automatic
teller machines based convolution,” Comput. Ind. Eng., vol. 57, no. 4, pp.1194–
11: end for
1201, 2009.
ii. Figueras, and M. Solarski, “A hybrid grouping genetic algorithm for
Algorithm 2 Mutation citywide ubiquitous wifi access deployment,” in CEC’09: Proceedings of the
1: for i=1 to popSize do Eleventh conference on Congress on Evolutionary Computation. Piscataway, NJ,
USA: IEEE Press, 2009, pp. 2172–2179.
2: Select chromosome p randomly
iii. A. Qadrei and S. Habib, “Allocation of heterogeneous banks’
3: Save coordinates of ATM locations of chromosome automated teller machines,” in INTENSIVE ’09: Proceedings of the 2009 First
p in row, col International Conference on Intensive Applications and Services. Washington,
4: if random[0,1] probMutation then DC, USA: IEEE Computer Society, 2009, pp. 16–21.
iv. L. B. J. F. P. D. Rimvydas Simutis, Darius Dilijonas, “Optimization
5: Select one ATM location of chromosome p
of cash management for ATM network,” in Information Technology and Control,
randomly and make it equals to zero 2007.
6: Generate one ATM location of chromosome p v. A.J. S. R. Wael Abdulal, Omar Al Jadaan, “Rank based genetic
randomly and make it equals to one scheduler for grid computing systems,” in The International Conference on
Computational Intelligence and Communication Networks (CICN 2010).IEEE,
7: add the newly produced chromosomes as candidates
2010.
8: end if vi. Alaa Alhaffa and Wael Abdulal,” A Market-Based Study of Optimal
9: end for ATM’S Deployment Strategy,” International Journal of Machine Learning and
RGAC Algorithm Computing, Vol.1, No. 1, April 2011.
vii. Omar Al Jadaan, Lakishmi Rajamani,3C. R. Rao ,” Improved
1: Generate Initial Population P of size N1 randomly.
selection operator for ga”, Journal of Theoretical and Applied Information
2: Evaluate each chromosome using equations (1,5 and 7) Technology, 2005 - 2008 JATIT.
3: for g 1 to MaximumGenerations do viii. Rakesh Kumar, Senior Member, IACSIT and Jyotishree, Member,
4: Generate offspring Population from P IACSIT,” Blending Roulette Wheel Selection & Rank Selection in Genetic
Algorithms”, International Journal of Machine Learning and Computing, Vol. 2,
5: Rank based Roulette Wheel Selection
No. 4, August 2012.
6: Crossover and Mutation algorithms (1,2) ix. D. E. Goldberg, Genetic Algorithms in Search, Optimization, and
7: Evaluate each chromosome resulting from Crossover Machine Learning. New York, NY: Addison-Wesley, 1989.
and Mutation stages using equations (1,5 and 7) x. M. Wagner, “The optimal cash deployment strategy modeling a
8: (Elitist) Select the members of the combined population network of automated teller machines,” thesis in Master of Science in
Accounting, Hanken- Swedish School of Economics and Business
based on minimum fitness, to make the population P of the Administration, 2007.
next generation.
9: Evaluate each chromosome using equations (1,5 and 7)

ICCIT15@CiTech, Bangalore Page 435


International Journal of Engineering Research N:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

 Data Oblivious Caching Framework for Hadoop using MapReduce in Big data
Sindhuja.M, Hemalatha.S
Assistant professor- Information Technology, PG Scholar - software engineering
sindhuja.m@rajalakshmi.edu.in,hemzmohan12@gmail.com

accomplishes two job: massive data storage and data


Abstract— The invention of online social networks, smart processing.Hadoop follows the Master/slave architecture which
phones, fine tuning of ubiquitous computing and many other will decouple the system metadata and application data. Hadoop
technological advancements have led to the generation of can be used to implement the mapReduce framework.
multiple petabytes of both structured, unstructured and
semi-structured data. These massive data sets have lead to the
birth of some distributed data processing and storage II. BACKGROUND DETAILS
technologies like Apache Hadoop and MongoDB. The data that
are huge in volume takes more time to execute for particular A. Data storage in Hadoop
method and causes failure to distributed system. To defeat this The hadoop distributed file system (HDFS) is a distributed file
issue a framework was developed called hadoop for big data system mainly designed to run on large commodity hardware .The
processing and is being used in many large scale organizations. hadoop can also deployed on low cost hardware’s. It provides
It process huge amount of data in least amount of time even in high throughput access to application data and suitable for
the large distributed systems. The main advantage is automatic application that have large dataset including files that reach into
fault tolerance capacity to handle with failure of systems during the petabytes. HDFS is scalable and highly fault tolerant .By
execution or processing using mapreduce programming facilitating the transfer of data between the nodes, enabling
technique. Here execution time is still an issue for delivering hadoop system to continue running even if any one of the nodes
large amount of data and processing it repeatedly for particular gets fail. This process decreases the risk of catastrophic failure,
process. Existing method does not have any aspect to reduce even in the event of numerous node fail.
time for recompilation. The proposed approach includes
hadoop distributed oblivious caching system for big data B. Data analytics using Map Reduce
processing, which can handle both the type of cache memory,
one is local cache and other is distributed cache. This
distributed cache would reduce the recompilation time and
increase the cache hit ratio. I Map () O
Reduce
Index Terms— Big-data, Distributed Cache, Hadoop, Map N U
Reduce. Map ()
P T
I. INTRODUCTION Reduce
U P
Map ()
Big data is an all-encompassing term for any collection of data
T U
sets so large and complex that it becomes difficult to process using Split Sort Merge
on-hand data management tools or traditional data processing
T
applications. It requires new technologies and architectures so that
it becomes possible to extract value by capturing and to perform Figure 1: MapReduce architecture
analysis process. Because of its various properties such as
volume, velocity, variety, variability, value and complexity leads Map Reduce [10] has become the most popular framework for
many challenges. Since big data is a imminent technology in the large scale processing and analysis. MapReduce has to be specified
market which can bring huge benefits to the business as two phases. First, MapPhase as specified by a map function
organizations. There are various challenges and issues associated (also called Mapper) takes the key value pairs as input, possibly
in bringing and adapting this implementation. Apache hadoop [3] perform some computation on the input and produces intermediate
is an open source implementation which is widely used in result in the form of key/value pairs and second, Reduce Phase
distributed system, follows Clustered approach and allows (also called reducer) which processes these results as specified by a
massive amount of data to be stored. Essentially, it reduce function .The data from the map phase are shuffled (i.e.
exchanged and merge sorted) and passed to the machines to
perform the reduce function. The Mapper and reducer can be

ICCIT15@CiTech, Bangalore Page 436


International Journal of Engineering Research N:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
performed in parallel. This parallelism also offers some possibility coordinates access to the distributed caches. The system aims at
of recovering from partial failure of servers or storage during the minimizing total execution time of job by evicting those items
operation if one Mapper or reducer fails, the job can be reschedule. whose inputs are not completely cached. For evicting those inputs,
LIFE sticky policy has been developed.
III. RELATED WORK
Multiple intelligent cache [11] mechanism in which the cache
There has been an enormous amount of work on in- memory
distributed over redis servers and redis server (single place to store all
storage and caches had carried out. Data oblivious caching was
cached data) will serves client request. This mechanism helps in
proposed by ideas from the prior work. It is mainly dealt with
improving the performance, lowering access latency and increasing
parallel jobs that led to a coordinated system which improve the
the throughput. In order to increase and progress the performance of
job completion time and cluster efficiency. The issues and
MapReduce, the new design of HDFS was proposed by introducing
challenges in Big data are discussed [1] and also begun a
multi intelligent caching concept with redis server.
collaborative research program into methodologies for Big data
analysis [12] and design
Collaborative caching [6] presented a design of a proactive fetching
and caching mechanism based on Memcached and integrates it with
Dache [13] used a novel cache description scheme and cache
Hadoop. Hadoop with Memcached servers which would store
request and response protocol have been designed for processing
meta-data about objects cached at datanodes. The blocks to be cached
the cached data. It is a distributed layered cache system built on
are decided on the basis of two level greedy approaches.
the top of the hadoop distributed file system which provides a
multiple cache services. Distributed hash table is used to provide
To overcome the limitation of the hadoop system a system called
the cache service using p2p Style. Three replicas are maintained
in-memory cache scheme [7] has been developed. Data locality and
in three different cache services. The replication will improve the
task parallelism can be improved in multi-core environment by
robustness and alleviates the workload. The system maintains
integrating the in-memory cache scheme with the cache data. The
three layers namely in-memory cache, snapshot of the local disk,
performance of the hadoop increased from 1.5 x to 3.5x.
actual disk view. The cache service can be accessed by integrating
the applications with the client library of the system.
IV. DESIGN
Memory cached[4] is a distributed caching system designed as an
object accessing layer between the application and underlying
relational database. The cache manager of Dache could utilize
Memory cached to accelerate query response because the size of
cache item is generally small.

RAM Cloud [9] and prior work on databases such as MMDB [2]
stored all data in RAM only. While this is suited for web servers, it
is unlikely to work in data- intensive clusters due to capacity
reasons – FaceBook has more storage on disk that aggregate
memory. Proposed system thus treats memory as a constrained
cache. The study of speculative execution of tasks described the
execution which potentially slows down the entire MapReduce
job in order to accelerate [15] the execution of a MapReduce job
and also does not address the data sharing problem which is
identified. This mechanism is orthogonal to proposed work and
could be integrated straight forwardly.
Figure 2: The Docache infrastructure
Distributed systems such as Zebra [5] and XFS developed for the
Sprite operating system make use of client-side in-memory block Figure 2 shows the overall infrastructure of the system.
caching, also suggesting using the cache only for small system. Docache is a mechanism which is used to access the cached data
However, these systems make use of relatively simple eviction with less time and resources. All the local caches can be
policies and do not coordinate scheduling with locality since they coordinated by the distributed cache called Docache. It uses the
were designed for usage by a network of workstations. data oblivious caching algorithm for processing the data. because
this algorithm is much easier to analyze than a real cache`s
According to PACMan [14] when multiple jobs run in parallel, characteristic such as replacement policies etc. This framework
job’s running time can be decreased only when all the inputs does not depend on variables/hardware parameter such as cache
related to running a job are cached. Caching only part of the inputs size or cache line length. It is efficient usage of processor caches
will not help in improving the performance. These massive and reduction of memory bandwidth requirements.
distributed clustered systems have large memories and job
execution performance can be improved if these memories can be For application data, a distributed cache keeps a copy of a
utilized to the fullest. PACMan is a caching service that subset of the data in the database and it is also temporary in nature.

ICCIT15@CiTech, Bangalore Page 437


International Journal of Engineering Research N:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
It is work by divide and conquer algorithm, where problem is are not effective due to the mobility of the task. Adaptive replacement
divided in to many sub problems. These sub problems are cache is the efficient cache replacement policy for cache utilization.
processed by MapReduce framework. The centralized cache This algorithm constantly balances LRU and LFU to improve the
management in HDFS allows user to specify the data to be cached combined result. It splitting the cache directory into two lists T1 and
by HDFS. The name node will communicate with the data nodes T2 for recently and referenced entries. Any entry in L1 that
that have the desired blocks on disk and instruct them to cache the referenced once more gets another chance and enters L2.Entries
blocks in local caches as well as in centralized cache. The cached entering the cache (T1, T2) will cause to move towards the target
item can get from both the local cache and remote cache. The data maker. If no free space exists in the cache, the marker will also
from the user are processed by MapReduce. The input will first determine whether either T1 or T2 will evict an entry.
split into various key/value pair and each key/value pair will be
assigned to each map. The intermediate result that produced as a
ARC has low space complexity. A realistic implementation had
result of map task is cached in both local node and in distributed
a total space overhead of less than 0.75%.It has low time
cache. The docache does not contain the actual data. Then the
complexity, virtually identical to LRU and suitable to adapt
reducer will perform the Reduce task .Number of reduce task will
different workloads and cache sizes. In particular, it gives very
always lower than the reduce task.
small cache space to sequential workloads, thus avoiding a key
limitation of LRU. For a huge, real-life workload generated by a
The metadata will be in the form of key/value pair. Docache
large commercial search engine with a greater than 3GB cache,
retrieve the searched data fast and helps to access the cached item
ARC's hit ratio was dramatically better than that of LRU
in a short time with less resource used. Name node is the central
coordinator which coordinates the entire data node. It is the master Algorithm:
of the HDFS that directs the slave node to perform low level I/O
tasks. The name node updates the mapping of a cached block to ARC(c) T1 = B1 = T2 = B2 = 0, p= 0, X.
the data node. Data node provides the report about the local Case I. x € T1 ᴜT2 (a hit in ARC(c) and DBL (2c)): Move X to
cached block to the Name node and docache. Data nodes are the top of T2.
responsible for data replication across multiple nodes.
Case II. X € B1 (a miss in ARC(c), a hit in DBL (2c)):
V. METHODOLOGY Adapt P= min{c, p+ max{|B2|/ |B1|, 1}} .REPLACE(p). Move X
to the top of T2 and place it in the cache.
A. Hadoop cluster
Case III. X € B2 (a miss in ARC(c), a hit in DBL(2c)):
Hadoop cluster can be formed by single node or multi -node.
Adapt P= max {0,p max{ |B1|/ |B2|, 1} } .
The hadoop cluster mainly consists of Name node, Data node, Job
REPLACE (P). Move X to the top of T2 and place it in the cache.
tracker, task tracker. Each node in the cluster is either master or
slave. Slave nodes are data \node and task tracker. HDFS is a
Case IV. X € L1∪L2 (a miss in DBL(2c) and ARC(c)):
primary storage in hadoop application which exposes a file system
case ( i) |L1| = c:
namespace and allows user to store the data in it. The HDFS file
consists of many blocks. Each block consists of 64 MB. Each If |T1| <c then delete the LRU page of B1 and REPLACE (P).
block is replicated three times .So that the blocks can be processed Else delete LRU page of T1 and remove it from the cache.
fast and easily. Data nodes are responsible for serving read and
write request. The name node is the repository for all HDFS case ( ii) |L1| <c and |L1| + |L2| ≥ c:
metadata. Then 4 GB dataset were uploaded into the file system. if|L1| + |L2| = 2c then delete the LRU page of B2.
REPLACE (P).
Put X at the top of T1 and place it in the cache.
B. Distributed cache Subroutine REPLACE (P)
Accessing data from cache is much faster compared to disk If (|T1| ≥ 1) and ((X € B2 and |T1| = P) or (|T1| >P)) then move the
access. To improve the overall performance and efficiency of the LRU page of T1 to the top of B1 and remove it from the cache.
cache, distributed cache is used. It reduces the overall time taken Else move the LRU page in T2 to the top of B2 and remove it
to search the required data from the cache. Almost the time is from the cache.
reduced by 80% .Distributed cache allows us to query, search
analyze more cache entries in memory with results to complex
D. Performance evaluation
searches returned in less than a second. The time consumption and
expensive process can be avoided by querying the cache directly The calculated execution time will be present in both time and
from the database, then mapping query results to cache lookups. graph. The cache execution time is then compared with the normal
execution (Disk memory).The below graph represents the
different run time of the same search.
C. Cache Replacement policy
Keeping the cached item for a long time will result in wastage of
resource and memory. The unused item should be removed from the
memory in order to make it available for new cached item. Many
well-known cache replacement policies like LRU and LFU [8] alone

ICCIT15@CiTech, Bangalore Page 438


International Journal of Engineering Research N:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
VI. EXPERIMENTAL RESULTS process a given set of data and save the intermediary data within
the local files. The reducer nodes will then copy these data from
Hadoop is run in single-distributed mode. The number of
the Mapper nodes and later aggregate it to produce the final result.
mappers is set as 6 in this experiments and the reducers count
In Data oblivious cache Data can be get from the local cache as
varies according to the application. The input of the application is
well as remote cache. Centralized caching can improve the overall
cached in local cache. Then distributed cache is used to access the
cluster utilization. It maintains replication copy of cached data for
data from the local cached item.
high availability .Hence the cache memories would reduce the
recompilation time and increase the cache hit ratio. The future
Table 1: Experimental result of searching word shows how much
development will focus on enhancing the caching mechanism with
time consumed for searching.
advanced algorithm using the hadoop and MapReduce

References
RUN TIME(ms)
1 13750 i. Avita Katal, Mohammad Wazid, R H Goudar “Big Data: Issues,
Challenges, Tools and Good Practices”, 2013 IEEE.
2 25 ii. H.Garcia-Molina and K. Salem., “Main Memory Database Systems: An
Overview”, In IEEE Transactions on Knowledge and Data Engineering, 1992.
iii. Hadoop, http://hadoop.apache.org/, 2013.
3 23 iv. Jing Zhang, Gonqing Wu, XuegangHu, Xindong Wu “A Distributed cache
for Hadoop Distributed File System in time cloud services” on ACM/IEEE 13 th
4 20 international conference, 2012.
v. John H, Hartman and John K. Ousterhout “The Zebra Striped Network
File System”, In ACM SOSP, 1993.
vi. Meenakshi Shrivatava, Dr.Hans-peter Bischof, “Hadoop-Collaborative
Caching in Real Time HDFS”, Google, 2013.
vii. Memcached—A distributed memory object caching system,
http://memcached.org/, 2013.
viii. Nimrod Megidd, Dharmendra S. Modha, “Outperforming LRU with an
Adaptive Replacement Cache Algorithm”, IEEE trans. on distributed system,
2004.
ix. J. Ousterhout.K “The Case for RAMClouds: Scalable High- Performance
Storage Entirely in DRAM” in SIGOPS Operating Systems Review, 2009.
x. Pietro Michiardi, “Map Reduce Theory and Practice of Data Intensive
Applications”, Eurecom, 2011.
xi. K.Senthil Kumar, K. Satheesh Kumar, S. Chandrasekar, “Performance
Enhancement of data Processing using Multiple Intelligent Cache in
Hadoop”,IJIET,Vol.4, Issue 1, June 2014.
xii. Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William Money, “Big
Data: Issues and Challenges Moving Forward”, IEEE, 46th Hawaii
International Conference on System Sciences, 2013.
xiii. Yaxiong Zhao, JIU Wu , “Dache: A Data Aware Caching For Big Data
Applications Using the MapReduce Framework”, Vol.19, No 1, February 2014
xiv. Zhiwei Xaio, Haibochen, Binyu sang “A hierarchical Approach to
maximizing MapReduce Efficiency” on international conference, 2011.
xv. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving
mapReduce performance in heterogeneous environments”, in Proc. of
OSDI’2008, Berkeley, CA, USA, 2008.

Figure 3: Comparison of different runs

For this application book publication data of about 4 GB in size


was used. The time taken for different run is given in below table.

VII. CONCLUSION
This paper has exposed the major data analyzing problems that
need to be addressed in Big Data processing and storage. We have
described Data oblivious caching, an in-memory coordinated
caching system for data processing in hadoop using MapReduce
framework. During Map-Reduce framework, Mapper nodes

ICCIT15@CiTech, Bangalore Page 439


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

PAPR Reduction For STBC MIMO-OFDM Using Modified PTS Technique


Combined With Interleaving and Pulse Shaping
Poonam, Sujatha S
Dept. of TCE, CMRIT-Bengaluru
pkkpoonam2@gmail.com, sujatha.s@cmrit.ac.in

Abstract : Multiple Input Multiple Output-Othogonal multipath fading, high power spectral efficiency,better
Frequency Division Multiplexing MIMO-OFDM is a most performance. However, at the saem time MIMO-OFDM
attractive technology which has been recently proposed in suffers with a problem of PAPR while implementing the
wireless communication. It provides high data rate services system. PAPR is defined as the peak to average power ratio
and better system performances. It improves data throughput which increases the analog to digital and digital to analog
and delivers highest capacity as well. However, MIMO-OFDM converter complexity and as a result it reduces the efficiency
suffers with the drawback of Peak to Average Power Ratio of the radio-frequency(RF) power amplifier.
(PAPR) for the large number of subcarriers which can effect
the system output. Therefore, to overcome the problem of There are several techniques used to reduce PAPR
PAPR reduction, an effective technique PTS (partial transmit performance in MIMO-OFDM system. The techniques are
sequence) is used. In this paper, modified PTS technique categorised into 3 types- Distotion methods, Distortionless
combined with interleaving and pulse shaping method has methods and other methods. These methods includes-
been presented to improve the performance of MIMO-OFDM Clipping,Companding,Selective mapping (SLM), Partial
system in terms of PAPR reduction. The basic idea behind PTS transmit sequence (PTS), Active constellation extension
is to analyses the influence of number of detected peaks on (ACE), Tone reservation (TR). Clipping considers a
PAPR performance and the system complexity by combining predetermined threshold which helps in reducing PAPR to a
signal subblocks and the rotation factors. The simulation loewst value. Interleaving combined with PTS, is also
results are computed by using MATLAB simulation which introduced in implementing MIMO-OFDM to reduce PAPR.
completely improves PAPR performance by using modified Interleaving is basically, transmission of reordering of
PTS combined with interleaving and pulse shaping method for consecutive bytes of data over a large sequence of data to
STBC MIMO-OFDM. reduce the effect of burst error. In this paper, PAPR is
reduced by PTS combined with interleaving and pulse
Key-Words: - MIMO-OFDM, PAPR, STBC, Partial shaping method in MIMO-OFDM. SLM and PTS belong to
Transmit Sequences, Interleaved Subblock Partition the probabilistic class because several different signals are
Scheme, Raised-Cosine pulse shape obtained but only the minimum PAPR signal is taken into
consideration. In SLM, several signals contains same
1. Introduction information data and one OFDM signal of lowest PAPR is
Orthogonal Frequency Division Multiplexing (OFDM) is a high selected. However SLM is a flexible technique but it
speed wireless communication technology which requires high computatinal complexity low bandwidth
has a demanding future in mobile communicatin system. It efficiency.
provides high data rates and high quality multimedia services to
mobile users and also delivers high data throughput and gives Therefore, an effective technique PTS (Partial Transmit
efficient wideband communication system. Due to all these Sequence) has been used in this paper which helps to reduce
advantages of OFDM it has been playing an important role in PAPR to a minimum value. PTS is a distortion type method
various communication system. Multiple antennas are used to and attractive technique used to improve the statistic of a
increase the capacity of wireless lines so have been a great deal multicarrier. In PTS, the data input information is divided
of interest in communication system. Space-time codes with into smaller disjoint subsequences. The input data is carried
OFDM results in wideband communication. By using multiple out and IFFT is performed and each subsequence are then
antennas at the transmitter as well as at the receiver end, spatial multiplied with rotating phase factors. The output combined
diversity can be achieved since it does not increase the transmit with rotating phase factors are then added to obtain 0FDM
power and signal bandwidth. Therefore, many high speed data symbol for transmission. Each and every subsequence
transmission standards have been presented such as determines the PAPR reduction. PAPR is computed for each
WiMAX(IEEE 802.16), WLAN(IEEE 802.11a/g), digital video resulting sequnce and the signal sequence with minimum
broadcasting (ADSL) etc. PAPR is considered and transmitted. The partitioning types
for PAPR reduction can be categorised as- interleaving
MIMO-OFDM is the technology which combines multiple partition, adjacent partition and pseudo-random partition.
input,multiple output, which multiples capacity by transmitting However, PTS in modified form is the better option
different signals over multiple antennas, and orthogognal compared to an ordinary PTS because in an ordinary PTS
frequency division multiplexing (OFDM). MIMO-OFDM has ,all the phase factor combination are considered which
several advantages of high data throughput, robustness against results in the increasing complexity with the several number

ICCIT15@CiTech, Bangalore Page 440


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
of subsequences. Hence, modified PTS is being considered to Partial Transmit Sequence (PTS) algorithm is a technique for
complete the PAPR reduction and to reduce the system improving the statistics of a multicarrier signal. The basic
complexity as well. Therefore modified PTS is a successful idea of Partial Transmit Sequence algorithm is to divide the
technique and has real and imaginary parts which are separatly original OFDM sequence into several subsequences and for
multiplied with phase factors. each sub-sequences multiplied by different phase factors
until an optimum value is chosen. First, consider the data
If the previous work related to the same concept is taken into block as vectors, X=[X1, X2 ……XN-1]T. Then, data vector X
consideration using PTS, then PAPR reduction is jointly are partioned into disjoint sets, represented by the vector
optimized in both the real and imaginary parts separatly {Xm, m=1, 2….M}. Here, we assume that the data clusters
multiplied with phase factors when different subcarriers and consist of a contiguous set of subcarriers and are of equal
subsequences. PTS combined with interleaving is very helpful in size.
MIMO-OFDM system for PAPR reduction.

2. MIMO-OFDM and PAPR in OFDM system

MIMO-OFDM system consists of the the transmitter and


receiver end. Transmission antennas are used at the transmitting
end. An input data bit stream is supplied into space-time coding,
then modulated by OFDM and finally fed to antennas for
sending out (radiation). At the receiving end, in-coming signals
are fed into a signal detector and processed before revitalization
of the original signal is made.
The PAPR of OFDM is defined as the ratio between maximum
the power and the average power, The PAPR of the OFDM
signal X (t) is defined by the following equation (1):

Fig. 1. Block diagram for Partial Transmit Sequences


The objective is to combine the M number of clusters using
the equation (6) and to obtain a optimal solution,
The Cumulative Distribution Function (CDF) is used to measure
the efficiency of PAPR technique. The Complementary CDF is
used to measure the probability that PAPR of a certain data
block exceeds the given threshold which is implemented by
using mean as a zero and variance of 0.5 and using Guassian
distribution. where {m=1,2,….M} are weighting phase factors and are
The following equation describes probability of OFDM signal assumed to be rotated using different combinations. Here Xm
which exceeds the threshold value: is the partially transmitted sequence. The increase in the
number of phase factor decreases PAPR of the OFDM signal
but in turn increases the hardware complexity of the system.
The input to the high power amplifier (HPF) must be continuous The partial transmit sequence scheme is an attractive
time signal. Therefore, Oversampling has been used to solution to reduce PAPR in MIMO-OFDM system without
approximate the CCDF of PAPR to make the continuous OFDM any distortion of transmitted signals. In the PTS scheme, the
signals and is presented as: input data block is partitioned into disjointed subblocks. And
each subblock is multiplied by phase weighting factors,
which obtained with optimization algorithm. If the subblocks
The CCDF of PAPR using oversampling is recalculated and can are optimally phase shifted, they exhibit minimum PAPR
be given as: and consequently reduce the PAPR of the merged signal.
The number of subblocks (V) and the subblock partition
scheme determine the PAPR reduction. The main drawback
of PTS arises from the computation of multiple IFFTs,
3. PTS AND MODIFIED PTS SCHEME resulting in a high computational complexity with the
factorial of the product of transmit antennas number and
Partial Transmit Sequence (PTS) algorithm was first proposed
subblocks number.
by Muller S H and Huber J B , which is a technique for
improving the statistics of a multi-carrier signal. The basic idea
In general, Subblock partitioning types can be classified into
of partial transmit sequences algorithm is to divide the original
3 categories; interleaved partition, adjacent partition, and
OFDM sequence into several sub-sequences, and for each sub-
pseudo-random partition. For the interleaved method, every
sequence, multiplied by different weights until an optimum
subcarrier signal spaced apart is allocated at the same
value is chosen.
subblock. In the adjacent scheme, successive subcarriers are

ICCIT15@CiTech, Bangalore Page 441


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
assigned into the same subblock sequentially. Lastly, each larger series of data to reduce the outcome of burst errors.
subcarrier signal is assigned into any one of the subblocks The use of interleaving to a great extent increases the
randomly in the pseudorandom scheme. It can be noted that the capability of error protection codes for corrected burst
computational complexity of the interleaved subblock errors. Many of the error protection coding processes can
partitioning scheme is reduced extensively as compared to that correct for small numbers of errors, but cannot correct for
of the adjacent and pseudorandom partition scheme. This errors that occur in groups.
subblock partitioning scheme reduces considerably the envelope
fluctuations in the transmitted waveform.

A data symbol vector is encoded with space-time encoder takes


a single stream of binary input data and transforms it into two
vectors S1 and S2 as:

which are given to the IFFT blocks and feded to transmitter


antennas i.e. are and respectively.
Symbols S1 and S2 represents the two continous time domain
signals and which is combined to V clusters: Fig.3. Interleaving operation

5. PROPOSED SCHEME

where { 1,2,……..V} are weighing factors and are


asumed to be perfect rotation.

Modified PTS technique


In PTS technique to find the minimum PAPR signal after
multiplication of phase factors, exhaustive search algorithm is
used. Mathematical complexity of exhaustive search is high. So
in modified PTS technique[10], neighbourhood search algorithm
is used to find the minimum PAPR signals.

Fig 4.: Structure of transmitter end using modified PTS


combined with interleaving and pulse shaping method

6. RESULTS

The proposed work is computed in MATLAB Simulation.


The results obtained can be shown in below figures. The
Fig.2. Block diagram of OFDM system using PTS technique analysis of PAPR reduction performance for modified PTS
combined with interleaving and pulse shaping methods has
been carried out using the MATLAB simulation.
4. INTERLEAVING
The simulation parameters used in the analysis of PAPR
Interleaving is the data reordering which is to be transmitted in
reduction are represented in the following table.
such a way that successive bytes of data are scattered over a

ICCIT15@CiTech, Bangalore Page 442


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Table: simulation parameters

SIMULATION Values of parameters


PARAMETERS
Number of OFDM 1000
blocks
Number of 64,128,256,512,1024
subcarriers (N)
Number of 2,4,8,16
subbloacks
Oversampling 4
factors(L)
Roll-off factor(α) 0.6 (range - 0 to 1)
Fig.6.CCDF of PAPR for different subcarriers N = 64, 128,
Subblock partitioning Interleaving and pulse shaping
256, when V=8, L=4, α=0.6 and Mt=2
scheme method
Number of antennas 2
used( )
Modulation schemes 16-QAM

Phase weighting 1, -1, j, -j


factor(b)

A modified technique PTS is used to represent the input


information in the form of subblocks which is modulated by
using 16- QAM and QPSK as modulation schemes. The phase
rotating factors are directly transmitted to the receiver through
subblocks. The PAPR performance is evaluted using CCDF
which describes the probability that the new PAPR value is
smaller/lowest than the original PAPR.

The figures below represents the PAPR reduction ouput and the
CCDF of MIMO-OFDM. Figure 1. Shows the MIMO-OFDM
PAPR reduction of OFDM subsequences which is taken at Fig.7.CCDF of PAPR for different oversampling factor L=
different subcarriers N = 64,128 and 256 For different 2, 4, 8, 16 when N=256, V=4,α=0.6 and Mt=2
subcarriers N= 64,128 and 256, the PAPR performace has been
improved from 8.6dB to 5.6dB. The PAPR reduces as the The results are obtained by using modified PTS technique
number of value of subcarriers used in MIMO-OFDM decreases. which is combined with interleaving and pulse shaping
This performance of PAPR has been improved by using modfied method. The waveform results for different subcarriers N =
PTS technique combined with interleaving and pulse shaping 64,128,256,512 and 1024 are represented in the figures. As
method. the number of subcarriers increases in MIMO-OFDM, the
PAPR also increases with respect to subcarriers.

7. CONCLUSION
The paper revolves around the idea behind the modified
PTS technique which helped to reduce the PAPR
perfromance for MIMO-OFDM. The PTS technique has
been used along with the interleaving and pulse shaping
method. The results presents that the PAPR reduction has
been improved to a greater extent i.e. PAPR has been
reduced from 9.5dB to 5.2dB. It is an effective technique
combined with interleaving and pulse shaping method for
STBC MIMO-OFDM system which is used to achieve a
better trade off between complexity and PAPR
performance. It also provides high data rates and helps to
provide the data thoughput in a very better way. MIMO-
Fig.5.CCDF of PAPR for different subcarriers N = 64, 128, 256 OFDM has several advantage and is very helpful in digital
when V=4, L=4, α=0.6 and Mt=2 multimedia and wirless broadband mobile communication
system.

ICCIT15@CiTech, Bangalore Page 443


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

8. REFERNCES

i. S. B. Weinstein and Paul M. Ebert “Data Transmission by Frequency-Division


Multiplexing Using the Discrete Fourier Transform”, IEEE Transactions on
Communication Technology, vol-19, no. 5, pp. 628-634, October 1971.
ii. Dae-Woon Lim, Seok-Joong Hoe, and Jong-Seon No, “An Overview of Peak-
to-Average Power Ratio Reduction Schemes for OFDM Signals”, Journal of
Communications and Networks, vol. 11, no. 3, pp. 229-239, June 2009.
iii. S.H. Muller and J.B. Huber, “OFDM with reduced peak-to-average power
ratio by optimum combination of partial transmit sequences”, IEEE Electronics
Letters, vol. 33 no. 5, pp. 368-369, February 1997.
iv. Parneet Kaur, Ravinder Singh, “Complementary Cumulative Distribution
Function for Performance Analysis of OFDM Signals”, IOSR Journal of
Electronics and Communication Engineering (IOSRJECE), ISSN : 2278-2834,
vol 2, Issue 5, pp 05-07, Sep-Oct 2012.
v. Taewon Hwang, Chenyang Yang, Gang Wu, Shaoqian Li and Geoffrey Ye Li,
“OFDM and its Wireless Applications: A Survey”, IEEE Transactions on
Vehicular Technology, vol. 58, no. 4, pp. 1673-1694, May 2009.
vi. S.Sujatha and P.Dananjayan, “PAPR reduction techniques in OFDM systems
using DCT and IDCT’’, Journal of Theoretical and Applied Information
Technology 30th June 2014. Vol. 64 No.3.
vii. Zhongpeng Wang, “Combined DCT and Companding for PAPR Reduction in
OFDM Signals”, Journal of Signal and Information Processing, vol. 2, pp. 100-
104, 2011.
viii. T. Jiang and Y. Imai, “An Overview: Peak- To-Average Power Ratio Reduction
Techniques for OFDM Signals,” IEEE Transactions on Broadcasting, vol. 54,
no. 2, pp. 257- 268, 2008.
ix. P.Mukunthan and P.Dananjayan, “PAPR reduction of and OFDM
signal using modified PTS combined with interleaving and pulse shaping
method”, European Journal of Scientific Research, Vol. 74, No. 4, May 2012,pp.
475-486.

ICCIT15@CiTech, Bangalore Page 444


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Generation of Migration List of Media Streaming Applications for Resource


Allocation in Cloud Computing

Vinitha Pandiyan, Preethi, Manjunath S.


Deptt. Of CSE, Cambridge Institute of Technology, Bangalore, India

Abstract-The recent trend and requirement for large storage the live migration efficiency of multiple virtual machines from
in cloud computing has made migration and cloud experimental perspective and investigate different resource
virtualization technology increasingly popular and valuable in reservation methods and migration strategies in the live
cloud computing environment due to the benefits of server migrations. We first describe the live migration frame work of
consolidations, live migration, and resource isolation. Live multiple virtual machines with resource reservation technology.
migration of virtual machines can be used to implement energy Then we perform aseriesof experiments to investigate the
saving and load balancing in cloud data centre. However, to impacts of different resource reservation methods on the
our knowledge ,most of the previous work concentrated on the performance of live migration in both source machine and target
implementation of migration technology itself while didn’t machine .Additionally, we also analyze the efficiency of parallel
consider the impact of resource reservation strategy on migration strategy and workload-aware migration startegy.The
migration efficiency .This paper focuses on the live migration matrix such as down time ,total migration time, and work load
strategy of multiple virtual machines with different resource performance over heads are measured
reservation methods. We first describe the live migration
framework of multiple virtual machines with resource 2. Related Work
reservation technology .As soon as the virtual machine size
increases then the data which is in migration list is transferred Resource allocation is one of the major aspects of cloud
to the corresponding virtual machine. computing, dynamic resource allocation has its own challenges
that has to be while implementing it, there are many techniques
Keywords : virtualization technology, Amazon, which have come up in order to deal with it. The cloud
Google,Yahoo!,Microsoft, IBM and Sun comprises of data center hardware and software [1].The resource
allocation concept is been analyzed in many computing areas
1.Introduction such as grid computing and operating systems. The prediction
plays a very important role during the process of resource
Cloud computing has recently received considerable attention in allocation. The prediction of CPU utilization for the upcoming
both academics and industrial community as a new computing demand has been studied in the literature. A prediction method
paradigm to provide dynamically scalable and virtualized was proposed based on Radical Basis Function (RBF) network
resource as a service over the internet Currently, several large by Y.Lu et.al for the purpose of predicting the user access
companies, such as Amazon, Google,Yahoo!,Microsoft, IBM demand. Also came up with the concept of multi –scaling. The
and Sun are developing their own cloud platforms for consumers statistical expected value can be obtained by service provider by
and enterprises to access the cloud resources through services . using methods [13][14].Content Delivery Network [CDN] and
Recently, with the rapid development of virtualization central resource by data center should be used to represent and
technology, more and more data canters use this technology to include multi-scaling for Video On Demand (VOD) with
build new generation data center to support cloud computing due guarantee Quality Of Service (QOS) [1].There exists the
it the benefits such as server consolidation, live migration and necessity for the consolidation of cost involved in the process of
resource isolation. Live migration of virtual machines means the streaming along with optimized experience of the user. There are
virtual machine seems to be responsive all the time during the different types of resource provisioning plan that can be chosen
migration of the clients’ perspective. Compared with traditional as per the requirement which are provided by cloud
suspend/resume migration ,live migrate holds many benefits providers.[15] The two common types include on-demand plan
such as energy saving, load balancing, and online maintenance and resource reservation by analyzing both types, the resource
.Many live migration methods are proposed to improve the reservation is said to be more inexpensive compared to on-
migration efficiency As the live migration technology widely demand plan.
used in modern cloud computing data center, live migration of
multiple virtual machines becomes more and more frequent. 3. System Architecture
Different from the single virtual machine migration, the live
migration of multiple virtual machines faces many new The components of the system architecture includes
problems, such as migration failures due to the insufficient  Service Provider
resource in target machine, migration conflicts due to the  Cloud server
concurrent migrations, and migration trashing due to the  Android User
dynamic changes of virtual machine workloads.All the above
issues should be overcome to maximize the migration efficiency Service Provider: The login operation is the main operation
in virtual cloud data center environments In this paper, we study carried out at the service provider module. The login into the

ICCIT15@CiTech, Bangalore Page 445


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
cloud server is achieved by the cloud administrator which
involves many crucial aspects of the entire process. On the
successful completion of the login process, the browsing of
multimedia files can be achieved. The details regarding the send
file can be obtained along with the memory balance details.

Cloud server: Here the major operation is viewing all the


multimedia files along with the memory maintenance which
retrieves the information of the memory

Fig2:Data Flow

4.Conclusion
Live migration of virtual machines is an efficient technology
used to implement energy saving and load balancing in
Fig1: System Architecture virtualized cloud computing data center. This paper, we study
the live migration efficiency of multiple virtual machines from
experimental perspective and investigate different resource
is been utilized and also the remaining space. The cloud server reservation methods in the live migration process as well as
also contains the prediction details, which provides predictive other complex migration strategies such as parallel migration
information about the future requirements of space required for and workload-aware migration. Experimental results show that:
the files based on the previous statistical data. While the media (1) Live migration of virtual machine brings some performance
files are viewed the service provider /cloud administrator will overheads. (2)The performance overheads of live migration are
have to check for the current amount of space that is available affected by memory size,CPU resource, and the workload types.
for further addition of the new file. If there exists a condition, (3)Resource reservation in target machine is necessary to avoid
where the available space is less compared to the space available the migration failures.(3) The adequate system resources in the
on the cloud, then comes the concept of migration list. The data source machine can make more parallel number of migrations
which does not fit into the available space is placed on the and can obtain better migration efficiency.(5) The workload-
migration list which can be added on to the cloud server, the file aware migration strategy can efficiently improve the
will be stored in different place, all the data which is ready to be performance of migrated workload. Based on the experimental
added but its temporarily not possible to be added due to discoveries, three optimization methods, optimization in the
insufficient space .The migration list can be defined as the source machine, parallel migration of multiple virtual machines
temporary space where all the data is stored when the memory is and work-aware migration strategy, are proposed to improve the
not sufficient for the completion of the addition process.After migration efficiency. Future work will include designing and
the memory is expanded with sufficient space for the addition of implementing intelligent live migration mechanism to improve
the new file the re – migration process takes place i.e. the data the line migration efficiency in the multiple virtual machines
that is placed on the migration list is added to the server. scenario and studying the migration strategies as an optimization
problem using mathematical modelling methods.
Android user: The android user should first register with the
cloud by providing the necessary details. Once the registration References
process is completed successfully, the user can login with the i. M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G.
corresponding username and password. Lee, D. Patterson, A. Rabkin, I. Stoica et al., “A view of cloud computing,”
Communications of the ACM, vol. 53, no. 4, pp. 50–58, 2010.
The available files can be viewed and also be carried out .On the ii. C. Waldspurger, “Memory resource management in VMware ESX
server,” ACM SIGOPS Operating Systems Review, vol. 36, no. SI, p. 194, 2002.
selection of particular file, the user can view the rank of that iii. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R.
particular file along with the comments provided by other users. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in
In this way the user can also provide the comments and rank for Proceedings of the nineteenth ACM symposium on Operating systems principles,
the particular multimedia file. 2003, p. 177.

ICCIT15@CiTech, Bangalore Page 446


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
iv. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L.
Youseff, and D. Zagorodnov, “The eucalyptus open-source cloud-computing
system,” in Proceedings of the 2009 9th IEEE/ACM International Symposium on
Cluster Computing and the Grid-Volume 00, 2009, pp. 124–131.
v. K. Ye, X. Jiang, D. Ye, and D. Huang, “Two Optimization
Mechanisms to Improve the Isolation Property of Server Consolidation in
Virtualized Multi-core Server,” in Proceedings of 12th IEEE International
Conference on High Performance Computing and Communications, 2010, pp.
281–288.
vi. C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt,
and A.Warfield, “Live migration of virtual machines,” in Proceedings of the 2nd
conference on Symposium on Networked Systems Design & Implementation-
Volume 2, 2005, p. 286.
vii. M. Nelson, B. Lim, and G. Hutchins, “Fast transparent migration for
virtual machines,” in Proceedings of the annual conference on USENIX Annual
Technical Conference, 2005, p. 25.
viii. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L.
Youseff, and D. Zagorodnov, “The eucalyptus open-source cloud-computing
system,” in Proceedings of the 2009 9th IEEE/ACM International Symposium on
Cluster Computing and the Grid-Volume 00, 2009, pp. 124–131.

ix. M. Nelson, B. Lim, and G. Hutchins, “Fast transparent migration for


virtual machines,” in Proceedings of the annual conference on USENIX Annual
Technical Conference, 2005, p. 25.
x. Y. Luo, B. Zhang, X. Wang, Z. Wang, Y. Sun, and H. Chen, “Live and
incremental whole system migration of virtual machines using block-bitmap,” in
Proceedings of the IEEE International Conference on Cluster Computing, 2008,
pp. 99–106.
xi. H. Liu, H. Jin, X. Liao, L. Hu, and C. Yu, “Live migration of virtual
machine based on full system trace and replay,” in Proceedings of the 18th ACM
international symposium on High performance distributed computing, 2009, pp.
101–110.
xii. M. Hines and K. Gopalan, “Post-copy based live virtual machine
migration using adaptive pre-paging and dynamic self-ballooning,” in
Proceedings of the 2009 ACM SIGPLAN/ SIGOPS international conference on
Virtual execution environments, 2009, pp. 51–60.
xiii. K. Ye, J. Che, X. Jiang, J. Chen, and X. Li, “vTestkit: A Performance
Benchmarking Framework for Virtualization Environments,” in Proceedings of
fifth ChinaGrid Annual Conference, 2010, pp. 130–136.
xiv. D. Niu, Z. Liu, B. Li, and S. Zhao, “Demand forecast and performance
prediction in peer-assisted on-demand streaming systems,” in Proc. of IEEE
Infocom conference, pp 421–425, 2011.
xv. D. Niu, H. Xu, B. Li, and S. Zhao, “Quality- Assured Cloud
Bandwidth Auto-Scaling for Video-on-Demand Applications,” in Proc. of IEEE
Infocom Conference, pp. 421–425, 2012.
xvi. E. White, M. O’Gara, P. Romanski, P. Whitney, “Cloud Pricing
Models,” in Cloud Expo: Article,
whitepaper,2012.http://java.syscon.com/node/2409759?page=0,1.

ICCIT15@CiTech, Bangalore Page 447


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Child Tracking in School Bus Using GPS and RFID


ShilpithaSwarna, Prithvi B. S., Veena N.
Dept of CSE, SVIT, Bangalore

Abstract : Many researches on real-time vehicle tracking is satellite with the resolution (frequency) based on the user
conducted, like wise tracking a school bus is important, requirement like 10 readings/minute in NMEA format will
sending a child to school by bus can be wracking for parents. It contain raw data with huge information.
is important to know if their child has boarded to the right bus
,safe on it and reached a correct destination(i.e school) on The microcontroller processes the raw data information
time. According to the statistics conducted by world health according to the algorithm present. The location co-ordination
organization (WHO).In India about 41% of children die due to instructs the GSM modem to provide serial communication to
lack of road transportation safety. This paper presents a server(database).
reliable real time tracking system using global positioning
system (GPS),global system for mobile communication (GSM)
services and RFID or smartcard ,which keeps real time
tracking of child at all time. Parents can log in into their
mobile or web to track the bus to know whether the bus is
running late and minimizes the time children to wait at bus
stop by which less time to be exposed to be exposed to criminal
predators, bad weather or any other dangerous condition
avoided to the child.

Key-Words: -tracking, global positioning system, global system


for mobile communication, RFID, Google map API.

2. Introduction
There is a huge demand for tracking devices, which is
actually considered a life saving devices. These devices keep
track of children and update about the real time tracking to their Figure 1:Block diagram of School bus tracking
parents. During the time of disasters, these system helps the
parents to track their children location. According to hind[1] Figure 1 shows the overview of the tracking system installed in
tracking provides several services like stolen of assets, to keep the school bus.
track of the behavior of the employee at workplace environment.
RDID is used find the child login and logout from the
Parents must know about child safety in school bus, bus. There are two types of RFID namely active and passive
sending a child to school by bus is a wracking for their parents. RFID. In which active RFID reader will be centralized in the
The parents should know whether the child has boarded the bus, bus. In Passive RFID reader will be installed at door.
safely reached the school the school, found the right bus to reach
home on time. To keep the real time tracking of children by
using the GPS installed in school bus make parents to be bit
relaxed on their safety while travelling in school bus by
installing such safety components make bus and child tracking
easier and safer- includes accountability, increases the
convenience and savings.

Providing safety measures to their children while


travelling in school bus is a important concern for parents by
using GPS tracking , GSM services and RFID or smartcard make
parents to relax on their ward safety by using these components
for real time tracking using which parents can login anywhere to
find the location of bus and child on their mobiles or web.

3. System Implementation Figure 2 :RFID integrated GPS tracking device.

The GPS receiver will receive the location co- The raw GPS data collected from the device installed in
ordination(longitude ,latitude, speed, device data) from the the school bus will send it to the NMEA server at the

ICCIT15@CiTech, Bangalore Page 448


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
reconfigured timings ( i. e 10 readings/minute).

Figure 3 :Tracking Architecture

In figure 3 shows the tracking architecture in which the


GPS device is installed in the school bus would provide raw
GPS data to the NMES server would parse the input raw data
and output will be sent to server and this data will be computed
based on the sms logic the message will be triggered to the
parents. The parents can login with mobiles or web to access this
location information.
Figure 6 :NMEA of GPS receiver.
To access their child location information the parents will be
given with the Id and password. If the driver exceeds the limited The NMEA server would get the raw data(unreadable
speed the alert can be generated to the administrator. form and encrypted form) of GPS as input and parsed output.

4. Conclusion and Future work


The RFID and GPS tracking is designed and implemented in the
school bus, parent can track there children location provided
with more reliable information about the boarding and departed
of the child. The location coordinates(latitude ,longitude) will be
converted by Google API.

This system would work on system GPS accuracy which would


depend on weather condition and satellite coverage, delay in
tracking message provided to the parents about there ward.

REFERENCES

i. Design and implementation of an accurate real time GPS tracking


system Hind AbdalsalamAbdallahDafalla
ii. What is PHP, URL: http://www.techrepublic.com/ accessed on 3
April2011
iii. Michaelkofler, “The definitive guide to MYSQL5, third edition”,
NewYork 2005, (PAGE 3, 4, 5, 6, 7).
iv. Official Google Map API website, URL
http://code.google.com/apis/maps/faq.
Figure 4 :view of location in mobile v. OZEKI NG SMS gateway website, URL:
http://www.ozekisms.com/index.php February 2011.

ICCIT15@CiTech, Bangalore Page 449


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Risk Mitigation by Overcoming Time Latency during Maternity- An IoT


Based Approach
Sanjana Ghosh, R. Valliammai, Kiran Babu T.S., Manoj Challa
sagh12cs@cmrit.ac.in, vamr12cs@cmrit.ac.in, kiran.ts@cmrit.ac.in, manoj.c@cmrit.ac.in

Abstract— Coming along with the recent development of Iot for healthcare by accurately monitoring, measuring and
(Internet of Things), wireless devices have invaded the medical analyzing a variety of health status indicators
science with a broad spectrum of possibilities. Along with
improving the quality of life of patients, wireless technology II. RELATED WORKS
enables patients to be monitored remotely during emergencies
and provides them health information, reminders, and support 1. The Sure CALL Labor Monitor technology is used for calling
potentially extending the reach of health care by making it the onset of labor. Uterine Electromyography Labor Monitoring
available anytime. The wireless sensor networks are inserted detects uterine muscle contractions from abdominal recordings
into the vaginal canal and can detect electrical signals of electrical signals generated in the uterus, uterine EMG
associated with uterine contractions, sensing that labor has activity can be measured by abdominal surface electrodes.
begun. These sensors detect signals directly from the specific
points in the body where they originate responsible for sensing Tocodynamometers are external pressure measurement devices
uterine contractions, even during the preterm labor. These which are being used to measure the contractions of the uterus
sensors are responsible for transmitting information to the and are the primary type of external monitor. The
cellphones which in turn alerts the maternity centers so that patient wears the device on a tightly attached belt, which must
patient receives apt treatment. maintain a constant pressure on a pressure-sensing device. As
the uterus contracts, a strain gauge measures the pressure exerted
Keywords : Internet of Things, wireless technology by the abdomen on the device.

I. INTRODUCTION
The Internet of Things refers to a wireless connectivity medium
between objects. Internet of things is not only a global network
for people to communicate with one another, but it is also a
platform where devices communicate electronically with each
other and the world around them. From any time and place,
connectivity for everyone, we will now have connectivity for
anything.

Embedding short-range mobile transceivers into a wide array of


additional gadgets, enabling new forms of communication
between people and things and between things themselves, is the
essence of IoT. The term "Internet of Things" has been
formulated to describe a number of technologies and research
disciplines that enable the Internet to reach out into the real
world of physical objects. Unfortunately, the Tocodynamometer is recognized to be
lacking in accuracy and in comfort. Many other factors affect
In the current scenario it is being realized that integration of the pressure measurement, such as the amount of fat, instrument
small microcontrollers with sensors can result in creating of placement and uterine wall pressure, body movements, gastric
extremely useful and powerful devices, which can be used as an activity and other non-labor induced stresses on the device can
integral part of the sensor nodes. These devices are termed as be misinterpreted for labor contractions.
sensor nodes. Nodes are able to communicate each other over As a result, the Tocodynamometer crudely and indirectly
different protocols. measures uterine contractions and therefore cannot identify true
labor contractions. The devices themselves are also
In a society, where a joint family system is not prevailing, uncomfortable and inconvenient for the patient. They are not
particularly in nuclear families, an emergency can arise at any suitable for ambulatory uses and have not proven to be very
time. A Typical case of a working couple where the husband is effective for home uterine monitoring. Cost, instrument
away at work and the wife alone at home has labor pain and she reliability and patient mobility are also issues.
has to be put in contact with the hospital for an emergency.
SureCALL Wireless Remote Model will couple wirelessly to a
For this purpose smart sensors, which combine a sensor and a cellular phone within the range where a customized application
microcontroller, make it possible to couple the power of the IoT will transmit patient recordings to a clinic or monitoring service.

ICCIT15@CiTech, Bangalore Page 450


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
are typically very small electronic devices, equipped with a
limited power source.

An embedded system is a part of a product that has a specific


task to be performed or it has a dedicated function. These
systems are not directly interacted or controlled by end users. In
any embedded application, the hardware given to a system is
unique and the software has to specially written to make
optimum use of the available resources.

In this embedded application, contraction signals in the uterus


are detected by the sensor node. These analog signals are
converted into digital signals by the ADC converter, and passed
on to the microcontroller. The digital signal is checked for its
validity within the microcontroller. Validity includes checking
of the specific threshold for contractions.
2. This device is developed to detect a woman's likelihood of
delivering a premature baby. The Cervo Check is a small ring
Once the signal is validated, the contraction processing is done
like structure that are embedded with sensors which picks up
and if this processing is successful, then the microcontroller
electrical signals associated with uterine contractions. The ring is
activates the GSM modem. The GSM modem is disabled until a
designed such that it is easy to embed in a woman's vaginal
confirmed contraction signal is processed. The GSM radio
canal at a physician's office or hospital.
signals are kept disabled by design due to the reason so that the
harmful radio signals need not be kept ON for the safety of the
The babies born before 37 weeks gestation are considered to be
foetus.
preterm while the normal gestation period is 40 weeks. By
detecting preterm contractions with greater accuracy, this system
The GPS module has the present position of the latitude and
could allow doctors to take steps at an early stage to prevent
longitude. The present position address is taken from the Google
preterm births.
map which is stored in the external memory.
The microcontroller selects the current address. This current
This new system picks up early signs that a woman is going into
address is processed by the microcontroller for the SMS
labor too soon. This device has only been tested on animals at
constructor module. The default message and the location are
this point. This device can prolong the pregnancy for almost six
combined and sent to the GSM modem.
weeks.
The ring is made up of medical grade biocompatible silicone
The GSM modem is interfaced with the microcontroller for SMS
elastomer. Sensors are embedded within the ring and are
communication with cellphones. The default message and the
designed to pick up electrical signals that are associated with
current location are combined and sent to the GSM modem. The
uterine contractions. These sensors detect signals directly from
numbers stored in the GSM sim are fetched by the
the places in the body where they originate, and not through the
microcontroller. Once the emergency numbers are fetched, an
abdominal wall.
SMS to these specific emergency numbers are sent to the
cellphones.
IV. PROPOSED DESIGN

To find signs of preterm labor, physicians have so long relied on


a tocodynamometer, but this device is not effective and accurate
at detecting preterm labor very early in a pregnancy.

III. EMBEDDED APPLICATION

The wireless sensor nodes containing the sensors for sensing the
muscle contaractions are inserted into the vaginal canal and can
detect electrical signals associated with the uterine wall
contractions, a sign that labor has begun. Wireless sensor nodes

ICCIT15@CiTech, Bangalore Page 451


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
FUTURE ENHANCEMENT

The implanted wireless sensors networks can be enhanced to


detect the heart rate of the foetus and the pulse rate of the
mother. This information can be transmitted to the cellphones
and the healthcare centers.

REFERENCES

i. CERVOCHECK developed by JOHNS HOPKINS graduates Karin


Hwang, Chris Courville, Deepika Sagaram and Rose Huang.

ii. Hunhu Healthcare Raising the bar in fall detection technology.

iii. An Energy Efficient Cross-Layer Network Operation Model for IEEE


802.15.4-Based Mobile Wireless Sensor Networks Al-Jemeli, M. ; Hussin, F.A.

iv. Applications of wireless sensors in medicine Furtado, H. ; Trobec, R.


MIPRO, 2011 Proceedings of the 34th International Convention.

v. Medical Applications Based on Wireless Sensor Networks Stanković,


Stanislava, 2005.

CONCLUSION

Sensor based networks coupled with IoT is the latest


advancement in technology which is being explored to its fullest
extent in the current times. Medical applications based on
Internet of things are still research projects with good potential
for utilization. Great numbers of medical scenarios are being
covered with these applications and that opens wide spectrum of
benefits for medical practioners. Solution proposed for this
problem can be a game changer in achieving high success rates
in timely and safe deliveries.
This can play a vital role, in mitigating of risks and overcoming
the time lapse during an emergency.

ICCIT15@CiTech, Bangalore Page 452


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Predicting Future Resources for Dynamic Resource Management using


Virtual Machines in Cloud Environment
VidyaMyageri1, Mrs. Preethi. S2
1
Deptt. of Computer Science & Eng.,2Dept of Information Technology
Cambridge Institute of Technology,K.R.Puram, Bangalore, India
Email id- gmvidya_viddu@yahoo.com, preethi.srinivas2002@gmil.com

ABSTRACT:Cloud Computing became an optimal solution for researches were introduced dynamic resource management [7 ,8]
business customers to maintain and promote their business with virtual systems. These systems will consider the available
needs to clients via internet. Now a days the cloud computing resources at server and allocates them to applications based on
allows the business costumer to scale up and down their application workload requirements. To achievethis dynamic
resource usage based on needs. In order to achieve resource managements systems follows unevenness algorithms and on
multiplexing in cloudcomputing, recent researches were demand resource allocation strategies. This approach will
introduced dynamic resource allocation through virtual manage the resources dynamically in an efficient manner with
machines. Existing dynamic approaches followed unevenness virtualization of cloud systems. Dynamic mapping of virtual
procedures to allocate the available resources based on current requirements with physical resources will also helpto avoid SLA
workload of systems. Unexpected demand for huge amount of violations[9] in cloud environment. Sometimes unexpected
resources in future may cause allocation failure or system demand for huge amount of resources in future may cause
hang problem. In this paper we present a new systematic allocation failure or system hang problem.
approach to predict the future resource demands of cloud from In order to mitigate these problems, in this paper we
past usage. This approach uses the resource prediction present a new systematic approach to predict the future resource
algorithm to estimate future needs to avoid allocation failure demands of cloud from past usage. This approach analyzes the
problem in cloud resource management. And skewness resource allocation logs of virtual server, SLA agreements and
algorithm to determine the unevenness in the multi- follows the demand prediction algorithm to estimate future
dimensional resource utilization of a server. needs to avoid allocation failure problem in cloud resource
management. Our approach uses the present and past statistics to
Keywords: Dynamic resource allocation, Cloud computing, predict the future requirements in an efficient manner. To do this
Resource Prediction Algorithm, Virtual machine migration, we proposed two different methodologies in this paper are (i)
Load balancing, Skewness, Green computing,Hotspot hours-bounded (ii) days-bounded resource prediction
migration and Cold spot migration. techniques.By integrating the results of these methodologies our
approach assess the reliable resource requirements in future.
INTRODUCTION Experimental results aresupporting our strategy is more scalable
and reliable than existing approaches.
Cloud computing is a fast growing technology that currently The rest of the paper is organized as follows:
being studied in[1].It has moved computing and data away from Section 2 discusses about related work, followed by proposed
desktop and portable PCs, into large data centers [2]. It has the system design which consist of load balancing Cloud
capability to harness the power of Internet and wide area architecture, Load prediction algorithm, Skewness algorithm and
network (WAN) to use resources that are available remotely, finally results and future scope.
thereby providing cost effective solution to most of the real life
requirements[3][4]. Majority of Business customers interested RELATED WORK
towards cloud computing and they started their app migration Computing is an emerging computing technology thatis
with cloud environment to promotetheir business operations to rapidly consolidating itself as the next big step in
end client with low investments and high availability. Due to this thedevelopment and deployment of an increasing number
increased adoption, Resource Management in Cloud (RMC) ofdistributedapplications.Cloud computing nowadays becomes
becomesanimportant research aspect in this area. quite popular amonga community of cloud users by offering a
Earlierapproaches [5][6] were used evenness procedure in variety of resources.Cloud computing platforms, such as those
resource distribution to allocate the available resources among provided byMicrosoft, Amazon, Google, IBM, and Hewlett-
the running applications. This approach may leads to Packard, letdevelopers deploy applications across computers
resourceover flow due to high amount of resourceallocation than hosted by acentralorganization.These applications can access a
required and resource underflow due to less amount of resource largenetwork of computing resources that are deployed
allocation than required. Always resource needs for a running andmanaged by a cloud computing provider.
application changes from time to time depends on number of In cloud platforms, resource allocation (or load
live clients. balancing) takes place at two levels. First, when an application is
uploaded to the cloud, the load balancer assigns the requested
In order to overcome resource overflow and resource
instances to physical computers, attempting to balance the
underflow problems from evenness distribution recent
computational load of multiple applications across physical

ICCIT15@CiTech, Bangalore Page 453


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
computers. Second, when an application receives multiple this paper presents skewness algorithm which uses green
incoming requests, these requests should be each assigned to a computing technologies and load prediction algorithm which
specific application instance to balance the computational load uses past resource usage to predict the resources for present
across a set of instances of the same application. For example, working environment.
Amazon EC2[15] uses elastic load balancing (ELB) to control
how incoming requests are handled. Application designers can 1. SYSTEM DESIGN
direct requests to instances in specific availability zones, to
specific instances, or to instances demonstrating the shortest A. System architecture
response times. The proposed system presents the design and
Elnozahy et al. [11] have investigated the problem of implementation of an automated resource management system
power efficient resource management in a single web- that achieves a good balance. The proposed system makes the
application environment with fixed SLAs (response time) and following three contributions:
load balancing handled by the application. As in [13], two power a)Develops a resource allocation system that can avoid overload
saving techniques are applied: switching power of computing in the system effectively while minimizing the number of servers
nodes on/off and Dynamic Voltage and Frequency Scaling used.
(DVFS). The main idea of the policy is to estimate the total CPU b)Introduces the concept of “skewness” to measure the uneven
frequency required to provide the necessary response time, utilization of a server. By minimizing skewness, thus we can
determine the optimal number A.However, the transition time improve the overall utilization of servers in the face of multi-
for switching the power of a node is not considered. Only a dimensional resource constraints.
single application is assumed to be run in the system and, like in c)Designs a load prediction algorithm that can capture the future
[10], the load balancing is supposed to be handled by an external resource usages of applications accurately without looking inside
system. The algorithm is centralized that creates an SPF and the VMs. The algorithm can capture the rising trend of resource
reduces the scalability. Despite the variable nature of the usage patterns and help reduce the placement churn
workload, unlike [11], the resource usage data are not significantly.
approximated, which results in potentiallyinefficient decisions
due to fluctuations. Nathuji and Schwan [14] have studied power
management techniques in the context of virtualized data
centers, which has not been done before.
Besides hardware scaling and VMs consolidation, the
authors have introduced and applied a new power management
technique called “soft resource scaling”. The idea is to emulate
hardware scaling by providing less resource time for a VM using
the Virtual Machine Monitor‟s (VMM) scheduling capability.
The authors found that a combination of “hard” and “soft”
scaling may provide higher power savings due to the limited
number of hardware scaling states. The authors have proposed
an architecture where the resource management is divided into
local and global policies. At the local level the system leverages
the guest OS‟s power management strategies. However, such
management may appear to be inefficient, as the guest OS may
be legacy or power unaware.
The provision of resource may be made with various
virtualization techniques. This may ensure a higher throughput
and usage than the existing cloud resource services. The future
work is required to deals with the evolutionary techniques that
will further result in better resource allocation, leading to
improve resource utilization. These resource allocation strategies
Fig 1: System architecture
have the following limitations.
The Fig.1 represents the architecture of the dynamic
a)Since users rent resources from remote servers for their
resource allocation for cloud computing environment, which
purpose, they don’t have control over their resources.
consists of N servers each server consists of two virtual
b)Migration problem occurs, when the users wants to switch to
machines(VM) those are connected to the VM scheduler is
some other providerfor the better storage of their data. It’s not
connected to the internet todistribute the resources dynamically
easy to transfer huge data from one provider to the other.
to the clients ,the clients are accessing resources through the
c)More and deeper knowledge is required for allocating and
internet. Virtual machine (VM) is a software implementation of
managing resources in cloud, since all knowledge about the
computing environment in which operating system or program
working of the cloud mainly depends upon the cloud service
can be installed and run. The VM Scheduler is invoked
provider.
periodically and receives the resource demand history of VMs,
Hence the existing systems are has the limitations as
the capacity and the load history of server, and the current layout
migration of resources, overloading at server and migrates only
of VMs on servers.
working set of an idle VM. To overcome from these limitations

ICCIT15@CiTech, Bangalore Page 454


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
already overloaded. It is decided to use this more conservative
B. Skewness Algorithm approach and leave the system some time to react before
The paper introduces the concept of skewness to initiating additional migrations.There are two scenarios are
quantify the unevenness in the utilization of multiple resources considered in hot spot mitigation. In the first scenario, the VMs
on a server. By minimizing the skewness, we can combine running in identified hot spots are migrated to warm spot servers
different types of workloads nicely and improve the overall which will not become hot by accommodating the VMs. In the
utilization of server resources. second scenario, if sufficient warm spots are not available to
Let n be the number of PMs and m be the number of accommodate the VMs in the hot spot, few loads are migrated to
VMs in the system, respectively. The number of resources such the nodes in the cold spot also to mitigate the hot spots.
as CPU, memory, I/O, network, etc. that should be considered is
usually a small constant. Thus the calculation of the skewness D. Predicting Future resource need
and the temperature metrics for a single server takes an Recent cloud architectures are facing resource
invariable amount of time. During load prediction, we need to management problem, due to unexpected requirement of huge
apply the FUSD algorithm to each VM and each PM. The time resources (CPU time, memory, networks etc.) in an
complexity is O (n+m).We define the resource skewness of a asynchronous manner.Current dynamic resource allocation
server p as the following equation methodologies are having the capability to map virtual system
resources with physical systems dynamically depends on work
load.This process will adjust the available resources with the
help of hotspot and cold spot migration among the physical
machines.
Dynamic resource allocation may fail under some
circumstances, when suddenly there is an unexpected huge
wherer- average utilization of all resources for server p. resource requirement for a physical machine.To address the
In practical, we are not considering all the types of above problem, in this paper we introduced “Predicting Future
resources because all are not performance critical. So in the Resource Requirement Framework(PFRRF)” to assess the future
above calculation we only need to consider bottleneck resources. resource needs. This framework is an extension to dynamic
Doing this we can improve the overall resource utilization of the resource allocationandmanagementarchitecturePFRRF considers
server by combining different types of workload by minimizing the log files of resource allocation system to analyze and
the skewness. estimate future requirements.
To achieve this our framework uses theResource
C. Hot spot migration Prediction Algorithm(RPA) which takes structured log file
To reduce the temperature of the hot spots to less or contentas training data and time bounded methodologies for
equal to warm threshold. The nodes in hot spots are sorted by decision making process in resource assessing.Semi-structured
quick sort in the descending order. VM with the highest log file data would be transformed tostructured log files,
temperature should be first migrated away. Destination is whichare having the periodic resource allocation charts (PRAC)
decided based on least cold node. After every migration the for every physical machine running on cloud
status of each node is updated. This procedure continues until all environment.PRAC contains individual resource level mapping
hot spots are eliminated. to each physical machine and hot spot, cold spot thresholds.
The VM which is removed from the identified hot spot After that PRAC’s of every physical system are sorted on date
can reduce the skewness of that server the most. For each VM in parameter and clustered individually at physical system level.
the list, if a destination server can be found to accommodate it Theseindividual clusters are given as input data toRPA
then that server must not become a hot spot after accepting this to predict the future workload of every physical machine. After
VM. Among all such servers, we select one whose skewness can computing of physical machines predictionresults,
be reduced the most by accepting this VM. Note that this resourcemigration will be performed based on resource overflow
reduction can be negative which means we select the server and under flow results.This migration helps to know the
whose skewness increases the least. If a destination server is additionalresource requirement for future based on current and
found, then the VM can be migrated to that server and past log requirements. At the end, sum of every physical
thepredicted load of related servers was updated. Otherwise, machine level additional requirement will be the final
move on to the next VM in the list and try to find a destination requirements, which to be added to cloud resource pool to avoid
server for it. As long as a destination server was found for any of future underflows and hanging problems.
its VMs, it can be considered as this run of the algorithm a
success and then move on to the next hot spot. Note that each 4.RESOURCE PREDICTION ALGORITHM
run of the algorithm migrates away at most one VM from the Input: present &past resource usage chart
overloaded server. This does not necessarily eliminate the hot output: future resource prediction chart(FRPC)
spot, but at least reduces its temperature. If it remains a hot spot PM : Physical Machine,
in the next decision run, the algorithm will repeat this process. RUC : resource usage chart
It is possible to design the algorithm so that it can foreach pm in cloud do
migrate away multiple VMs during each run. But this can add RUC ← getUsageChart(pm)
more load on the related servers during a period when they are SRUC ← doUsageAnalysis(RUC)

ICCIT15@CiTech, Bangalore Page 455


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

PRUC ← predictFutureRequirements(SRUC)
if(PRUC <= THRESOLD) setOverFlowFlag()
elsesetUnderFlowFlag()
wishList.addToWishlist(PRUC)
endforeach;
FRPC = doMigration(wishList)
return FRPC
End

First RPA considers every physical machine in cloud and


generates the resource usage chart (RUC) for every machine
based on cloud resource consuming log file. Based on RUC data
RPA creates the SRUC to predict future requirements Fig 2: Predicted vs. consumed memory utilization representation
PRUC. If the requirements are less than the threshold the
resource underflow flag will be set else resource overflow flag
will be set and be added to wish list. Migration method[12] Will
take the wish list as input and performs mapping to return FRPC.

2. SIMULATION AND RESULTS


In this section we discuss about the performance of our
framework and comparison with other approaches to prove the
efficiency. Our experiments mainly concentrated on prediction
of future resource requirements based on present and past usage
details from server log files. In this case we observed the three
important resources are memory, process and network resource.

In order to perform these experiments we were selected


the Unixbased private cloud hosting center which is managing
more than 20 applications of various technologies.On pilot basis Fig 3: Predicted vs. consumed CPU utilization representation
we were taken 8 applications to adopt our framework process
separately. For the training data we had observed the last 12
months resource allocation charts from log files. This cloud
center is alreadyfollowing its ownresource allocation technology
for all physical machinesrunning on hosting center. Along with
allocation accuracy checking this testing also considers the SLA
violations with every physical machine.

RPA evaluation:Every physical Machine of Cloud hosting


center is having 8 GB of RAM, Core i7 (2nd Gen) Processor and
20GB of Hard Disk. Apart from this additional resources are
available with the hosting center to allocate dynamically as per
the requirements. We deploy 8 virtual machines to map with 8
physical machines which are running the applications.For every
five minutes the resource Allocation will be updated to adjust
the resources among physical machines and writes the same on Fig 4: Predicted vs. consumed Network utilization representation
log files.
We had given the last one year resource allocation chart
to experimental cloud center. After analyzing this data,our 3. CONCLUSION
framework predicted thenext one month resource requirements In this paper, we presented Predicting Future Resource
hour wise and day wise as per applicability at physical machine Requirement Framework (PFRRF) to assess the future resource
level. We had compared the framework predicted resources for needs. This framework is an extension to dynamic resource
future with the consumed resources on the specific hours and allocation and management architecture. Review shows that
days. These experimental results are shownin the below table .1 dynamic resource allocation is growing need of cloud providers
and accuracy of prediction also represented with graphs. for more number of users and with the less number of
systems.Based on the changing demand the proposed system
multiplexes virtual to physical resources. The system uses the
skewness metric to mix VMs with different resources.

ICCIT15@CiTech, Bangalore Page 456


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
The proposed algorithm achieve overload avoidance BY v. R. Nathuji and K. Schwan, “Virtualpower: coordinated power
management in virtualized enterprise systems,” in Proc. of the ACM SIGOPS
predicts the future resource needs based on present and past
symposium on Operating systems principles (SOSP’07), 2007.
allocation data from resource log files and by using green
computing technology we can turn off the idle serves. The vi. J. S.Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat,and R. P.
skewness algorithm supports load balance as well as green Doyle, “Managing energy and server resources inhosting centers,” ACM New
York, NY, USA,2001, pp. 103–116.
computing.
We use the Resource Prediction Algorithm (RPA) to assess vii. Atsuo Inomata, TaikiMorikawa, Minoru Ikebe, Sk.Md.
the future need effectively. Experiments are proving the MizanurRahman: Proposal and Evaluation of Dynamin Resource Allocation
prediction accuracy and green computing for memory, CPU time Method Based on the Load Of VMs onIaaS (IEEE,2010),978-1-4244-8704-2/11.
and network resources in physical machines of cloud.
viii. etahiWuhib and Rolf Stadler: Distributed monitoring and resource
management for Large cloud environments(IEEE,2011),pp.970-975.
4. REFERENCES
ix. HadiGoudaezi and MassoudPedram: Multidimensional SLA-based
Resource Allocation for Multi- tier Cloud Computing Systems IEEE 4th
i. Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph,
International conference on Cloud computing 2011,pp.324 - 331.
Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion
Stoica, and MateiZaharia, “Abovethe clouds: A berkeley view of cloud
x. E. Pinheiro, R. Bianchini, E.V. Carrera, T. Heath, Load balancing
computing,” UCB/EECS-2009-28.
and unbalancing for power and performance in cluster-based systems, in:
ii. R. Buyya, R. Ranjan, and R. N. Calheiros, “Modeling And Simulation
Proceedings of the Workshop on Compilers and Operating Systems for Low
Of Scalable Cloud Computing Environments And The Cloudsim Toolkit:
Power, 2001, pp. 182–195
Challenges And Opportunities,” Proc. Of The 7th High Performance Computing
And Simulation Conference (HPCS 09), IEEE Computer Society, June 2009.
xi. E. Elnozahy, M. Kistler, R. Rajamony, Energy-efficient server
iii. Nidhi Jain K ansal, “Cloud Load Balancing Techniques : A Step
clusters, Power-Aware Computer Systems (2003)179-197.
Towards Green Computing”, IJCSI International Journal Of Computer Science
Issues, January 2012, Vol. 9, Issue 1, No 1, , Pg No.:238-246, ISSN (Online):
xii. J.S. Chase, D.C. Anderson, P.N. Thakar, A.M. Vahdat, R.P. Doyle,
1694-0814
Managing energy and server resources in hosting centers, in: Proceedings of the
iv. R. P. Mahowald, Worldwide Software As A Service 2010–2014
18th ACM Symposium on Operating Systems ,Principles, ACM, New York, NY,
Forecast: Software Will Never Be Same ,In, IDC, 2010
USA, 2001, pp, 103-116
xiii. R. Nathuji, K. Schwan, Virtualpower: coordinated power coordinated power management in virtualized enterprise systems, ACM
management in virtualized enterprise systems, ACM SIGOPS Operating SIGOPS Operating Systems Review 41 (6) (2007) 265–278.
Systems Review 41 (6) (2007) 265–278. Nathuji, K. Schwan, Virtualpower: xiv. “Amazon elastic compute cloud (Amazon
EC2),http://aws.amazon.com/ec2/.”

ICCIT15@CiTech, Bangalore Page 457


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Pixel Based Approach of Satellite Image Classification


Rohith. K.M., Dr.D.R.Shashi Kumar, VenuGopal A.S.
Dept. of CSE, CiTech, Bengaluru -036
rohitgowda40@gmail.com, venudon02@gmail.com

Abstract--In this project pixel-based approach for urban land information about the data is available. The problem of
covers classification from high resolution satellite image using clustering points in multidimensional space can be posed
K means clustering and ISO data clustering. Pixel based image formally as one of a number of well-known optimization
analysis of image segmentation that is, clustering of pixels into problems, such as the Euclidean k-median problem, in which the
homogenous objects, and subsequent classification or labeling objective is to minimize the sum of distances to the nearest
of the pixels, and modeling based on the characteristics of center, the Euclidean kcenterproblem, in which the objective is
pixels is done using MATLAB GUI model. When applied to a to minimize themaximum distance, and the k-means problem, in
satellite image, the clustering algorithm approach involves two which the objective is to minimize the sum of squared distances.
steps. First, each group or cluster is homogeneous; i.e. Efficient solutions are known to exist only in special cases such
examples that belong to the same group are similar to each as the planar 2-center problem. There are no efficient exact
other. Second, each group or cluster should be different from solutions known to any of these problems for general k, and
other cluster, i.e. examples that belong to one cluster should be some formulations are known to be NP-hard. Efficient
different from the examples of other clusters. The algorithm approximation algorithms have been developed in some cases.
was implemented in MATLAB GUI model and was tested on These include constant factor approximations for the k-center
remotely sensed images of different sensors, resolutions and problem, the k-median problem and the k-means problem. There
complexity levels. are also approximation algorithms for the k-median and k-means
problems, including improvements based on coresets. Work on
Keywords: pixel based approach, high resolution satellite the k-center algorithm for moving data points, as well as a linear
image. time implementation of a 2-factor approximation of the k-center
problem have also been introduced. In spite of progress on
INTRODUCTION theoretical bounds, approximation algorithms for these
clustering problems are still not suitable for practical
Clustering algorithms for remote sensing images are used to implementation in multidimensional spaces, when k is not a
being divided into two categories: pixel-base and object-based small constant. This is due to very fast growing dependencies in
approaches. Using pixel based clustering algorithms for high the asymptotic running times on the dimension and/or on k. In
resolution remote sensing images, one could often find the practice, it is common to use heuristic approaches, which seek to
―pepper and salt‖ effect in the results because of the lack of and a reasonably good clustering, but do not provide guarantees
spatial information among pixels.Using pixel based on the quality of the results. This includes randomized
clusteringalgorithms for high-resolution remote sensing images, approaches, such as clara and clarans, and methods based on
one could often find the ―pepper and salt‖ effect in the results neural networks. One of the most popular and widely used
because of the lack of spatial information among pixels. In clustering heuristics inremote sensing is isodata. A set of n data
contrast, object-based clustering algorithms are not based on points in ddimensionalspace is given along with an integer k
spectral features of individual pixels but on image objects, i.e., indicating the initial number of clusters and a number of
segments. Consequently, in terms of semantics, the quality of additional parameters. The general goal is to compute a set of
image objects is heavily dependent on segmentation algorithms. cluster centers in d-space. Although there is no
In this letter, a novel clustering algorithm is proposed to detect specifyoptimization criterion, the algorithm is similar in spirit
geoobjectsfrom high-spatial-resolution remote sensing images tothe well-known k-means clustering method, in which the
using both neighborhood spatial information and probabilistic objective is to minimize the average squared distance of each
latent semantic analysis model (NSPLSA). The proposed point to its nearest center, called the average distortion. One
algorithm is not based on either pixels or segments but on significant advantage of isodataover kmeansis that the user need
densely overlapped sub images, i.e., rectangular regions, with only provide an initial estimate of the number of clusters, and
prefixed size. The probabilistic latent semantic analysis model based on various heuristics the algorithm may alter the number
(PLSA), which is also called aspect model, is employed to of clusters by either deleting small clusters, merging nearby
model all sub images. Every pixel in each sub image has been clusters, or splitting large clusters. The algorithm will be
allocated a topic label. The cluster label of every pixel in the described in the next section. As currently implemented, isodata
large satellite image is derived from the topic labels of multiple can run very slowly, particularly on large data sets. Given its
sub images which cover the pixel. Unsupervised clustering is a wide use in remote sensing, its efficient computation is an
fundamental tool in image processing for geosciences and important goal. Our objective in this paper is not to provide a
remote sensing applications. For example, unsupervised new or better clustering algorithm, but rather, to show how
clustering is often used to obtain vegetation maps of an area of computational geometry methods. Can be applied to produce a
interest. This approach is useful when reliable training data are faster implementation of isodataclustering. There are a number
either scarce or expensive, and when relatively little a priori of minor variations of isodata that appear in the literature .These

ICCIT15@CiTech, Bangalore Page 458


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015
variations involve issues such as termination conditions, but they orthogonal dimensions are all attributes. The value of each
are equivalent in terms of their overall structure. We focus on a attribute of an example represents a distance of the example
widely used version, called isoclus, which will be presented in from the origin along the attribute axes. Of course, in order to
the next section. The running times of isodataand isoclustering use this geometry efficiently, the values in the data set must all
are dominated by the time needed to compute the nearest among be numeric (categorical data must be transformed into numeric
the k cluster centers to each of the n points. This can be reduced ones) and should be normalized in order to allow fair
to the problem of answering n nearest-neighbor queries over a computation of the overall distances in a multiattributespace.
set of size k, which naively would involve O(kn) time. To The K-means algorithm is a simple, iterative procedure, in which
improve the running time, an obvious alternative would be to a crucial concept is the one of ― centroid ‖. Centroid is an
store the k centers in a spatial index such as a kd-tree. However, artificial point in the space of records which represents an
this is not the best approach, because k is typically much average location of the particular cluster. The coordinates of this
smaller than n, and the center points are constantly changing, point are averages of attribute values of all examples that belong
requiring the tree to be constantly updated. Kanungo proposed a to the cluster. The steps of the k-means algorithm are given
more efficient and practical approach by storing the points, below.
rather than the cluster centers, in a kd-tree. The tree is then used  Select randomly k points (it can also be examples) to be the
to solve the reverse nearestneighbor problem, that is, for each seeds for the centroids of k clusters.
center. We compute the set of points for which this center is the  Assign each example to the centroid closest to the example,
closest. This method is called the iltering algorithm. We show forming in this way k exclusive clusters of examples.
how to modify this approach for isoclustering. The modifications  Calculate new centroids of the clusters. For that purpose,
are not trivial. First, in order to perform the sort of aggregate average all attribute values of the examples belonging to the
processing that theiltering algorithm employs, it was necessary same cluster (centroid).
to modify the way in which the isoclustering algorithm computes  Check if the cluster centroids have changed their
the degree of dispersion within each cluster. In order to further ―coordinatesǁ. If yes, start again from the step (ii). If not,
improve execution times, we have also introduced an cluster detection is finished and all examples have their cluster
approximate version of the lteringalgorithm. A user-supplied memberships defined.
approximation error bound i > 0 is provided to the algorithm, Usually this iterative procedure of redefining centroids and
and each point is associated with a center whose distance from reassigning the examples to clusters needs only a few iterations
the point is not farther to converge. For a discussion on cluster detection see.
than (1 + i) times the distance to its true nearest neighbor. This To summarise, clustering techniques are used when there are
result may be of independent interest because it can be applied to natural grouping in a data set. Clusters should then represent
k-means clustering as well. The running time of the iltering groups of items that have a lot in common. Creating clusters
algorithm is a subtle function of the structure of the clusters and prior to application of some other data mining technique might
centers, and so rather than presenting a worst-case asymptotic reduce the complexity of the problem by dividing the space of
analysis, we present an empirical analysis of its efficiency based data set.
on both synthetically generated data sets, and actual data sets This space partitions can be mined separately and such two steps
from a common application in remote sensing and geostatistics. procedure might give improved results as compared to data
As the experiments show, depending on the various input mining without using clustering.
parameters (that is, dimension, data size, number of centers,
etc.), the algorithm presented runs faster than a straightforward RESULTS
implementation of isoclustering by factors ranging from 1.3 to
over 50. In particular, the improvements are very good for
typical applications in geostatistics, where the data size n and the
number of centers k are large, and the dimension d is relatively
small. Thus, we feel that this algorithm can play an importanrole
in the analysis of geostatistical data analysis and other
applications of data clustering. The remainder of the paper is
organized as follows. We describe in the next section a variant
of isodata, called isoclustering, whose modification is the focus
of this paper. In Section 3 we provide background, concerning
basic tools such as the kd-tree data structure and the iltering
algorithm that will be needed in our efficient implementation of
isoclustering.

K MEANS ALGORITHM

This algorithm has as an input, a predefined number ofclusters


(i.e., the K from its name. Means stands for an average, an
average location of all the members of a particular cluster).
When dealing with clustering techniques, one has to adopt a
notion of a higher dimensional space, or space in which

ICCIT15@CiTech, Bangalore Page 459


International Journal of Engineering Research ISSN:2319-6890(online),2347-5013(print)
Volume No.4, Issue Special 5 19 & 20 May 2015

Fig .1. Classification results of pixel based image. The first


image shows the input images, second image shows the k-means
clustering of iteration 1 and the third image shows the k-means
clustering of iteration 2, and finally classification results of
ISODATA clustering is displayed. k-means clustering, iterative
self-organizing data analysis technique (ISODATA) clustering
and pixel-Oriented Classification (Fuzzy Based) are examined
.On the one hand, the qualitative evaluation is examined in terms
of the semantics of the segmentation results. On the other hand,
the results are also quantitatively evaluated in terms of the purity
and entropy.

CONCLUSION

k-means clustering, iterative self-organizing data analysis


technique (ISODATA) clustering and pixel-Oriented
Classification (Fuzzy Based) are examined .On the one hand, the
qualitative evaluation is examined in terms of the semantics of
the segmentation results. On the other hand, the results are also
quantitatively evaluated in terms of the purity and entropy.

REFERENCES

i. KanXu,Wen Yang, Gang Liu, and Hong Sun, (2013)―Unsupervised


Satellite Image Classification Using Markov Field Topic Model‖.
ii. Delong. A, Osokin.A, Isack H.N, and Boykov.Y,(2010) ―Fast
approximate energy minimization with label costs,‖ in Proc. IEEE Conf. Comput.
Vis. Pattern Recog., pp. 2173–2180
iii. Liénou.M, Maître.H, and Datcu.M, (2010), ―Semantic annotation of
satellite images using latent Dirichlet allocation,‖ IEEE Geosci. Remote Sens.
Lett, vol. 7, no. 1, pp. 28–32.
iv. Larlus.D and Jurie.F, Apr.(2009) ―Latent mixture vocabularies for
object categorization and segmentation,‖ Image Vis. Comput., vol. 27, no. 5, pp.
523–534
v. MacKay.D.J.C,(2003)‖Information Theory, Inference, and Learning
Algorithms‖, vol. 8. Cambridge, U.K.: Cambridge Univ.,p. 12.
vi. Memarsadeghi.N,Mount.D,Netanyahu.N.S, Le Moigne.J, and de
Berg.M,(2007) ―A fast implementation of the ISODATA clustering algorithm,ǁ
Int. J. Comput. Geom. Appl., vol. 17, no. 1, pp. 71–103.
vii. Rosenberg and J. Hirschberg, ―V-measure: A conditional entropy
based external cluster evaluation measure,‖ in Proc. Joint Conf. EMNLPCoNLL,
2007, pp. 410–420.
viii. Tang. W. Yi, H, and Chen.Y, ―An object-oriented semantic
clustering algorithm for high-resolution remote sensing images using the aspect
model ―May (2011) IEEE Geosci. Remote Sens.Lett., vol. 8, no. 3, pp. 522–526.
ix. Verbeek.J and Triggs.B, ―Region classification with Markov field
aspect models,‖ (2007) in Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1–
8.
x. Yang.W, Dai.D, Wu.J, and He.C, ―Weakly supervised polarimetric
SAR image classification with multi-modal Markov aspect model,‖ (2010) in
Proc. ISPRS, TC VII Symp. (Part B), 100 Years ISPRS—Advancing Remote
Sensing Science, Vienna, Austria, Jul. 5–7, pp. 669–673.

ICCIT15@CiTech, Bangalore Page 460

You might also like