A Survey: Hybrid Job-Driven Meta Data Scheduling For Data Storage With Internet Approach

IDL - International Digital Library Of
Technology & Research

Available at: www.dbpublications.org
International e-Journal For Technology And Research-2018
A Survey: Hybrid Job-Driven Meta Data

Scheduling for Data storage with Internet
Approach
Ms. BHANUPRIYA S V 1, Mrs. SHRUTHI G 2
Department of Computer Science and Engineering
1 M-Tech, Student, DBIT, Bengaluru, India
2 Guide and Professor, DBIT, Bengaluru, India
1. ABSTRACT performance among all tested algorithms in cloud

computing data storage.
Cloud computing is a promising computing
Keywords: Cloud Computing, Communication
model that enables convenient and on demand network
System, IAAS, Scheduling Process, Auditing
access to a shared pool of configurable computing
Process.
resources. The first offered cloud service is moving
data into the cloud: data owners let cloud service
providers host their data on cloud servers and data 2. INTRODUCTION
consumers can access the data from the cloud servers. Cloud computing is a promising computing model
This new paradigm of data storage service also that enables convenient and on-demand network
introduces new security challenges, because data access to a shared pool of computing resources. Cloud
owners and data servers have different identities and computing offers a group of services, including
different business interests with map and reduce tasks Software as a Service, Platform as a Service and
in different jobs. Therefore, an independent auditing Infrastructure as a Service. Cloud storage is an
service is required to make sure that the data is important service of cloud computing, which allows
correctly hosted in the Cloud. The goal is to improve data owners to move data from their local computing
data locality for both map tasks and reduce tasks, systems to the Cloud. More and more data owners
avoid job starvation, and improve job execution start choosing to host their data in the Cloud. By
performance. Two variations are further introduced to hosting their data in the Cloud, data owners can avoid
separately achieve a better map-data locality and a the initial investment of expensive infrastructure setup,
faster task assignment. We conduct extensive large equipments, and daily maintenance cost. The
experiments to evaluate and compare the two data owners only need to pay the space they actually
variations with current scheduling algorithms. The use, e.g., cost-per-giga byte stored model. Another
results show that the two variations outperform the reason is that data owners can rely on the Cloud to
other tested algorithms in terms of map-data locality, provide more reliable services, so that they can access
reduce-data locality, and network overhead without data from anywhere and at any time. Individuals or
incurring significant overhead. In addition, the two small-sized companies usually do not have the
variations are separately suitable for different Map resource to keep their servers as reliable as the Cloud
Reduce workload scenarios and provide the best job
IDL - International Digital Library 1|P a g e Copyright@IDL-2017


does. By hosting data in the Cloud, it introduces new guaranteed to provide unbiased and honest auditing
security challenges. result. Data storage auditing is a very resource
Firstly, user can be authorized to store data demanding operation in terms of computational
in a cloud according to job scheduling with connected resource, memory space, and communication cost.
internet.
Secondly, the data owners would worry their
data could be take max storage in cloud or data loss. 3. SURVEYS
This is because data loss could happen in any
infrastructure, no matter what high degree of reliable 3.1 “Data storage auditing service in cloud
measures the cloud service providers would take. computing: challenges, methods and
Some recent data loss incidents are the Sidekick Cloud opportunities”
Disaster in 2009 and the breakdown of Amazon’s In this survey author said that, cloud
Elastic Compute Cloud (EC2) in 2010. Sometimes, the computing is a promising computing model that
cloud service providers may be dishonest and they enables convenient and on-demand network access to
may discard the data which has not been accessed or a shared pool of configurable computing resources.
rarely accessed to save the storage space or keep fewer The first offered cloud service is moving data into the
replicas than promised. Moreover, the cloud service cloud: data owners let cloud service providers host
providers may choose to hide data loss and claim that their data on cloud servers and data consumers can
the data are still correctly stored in the Cloud. As a access the data from the cloud servers. This new
result, data owners need to be convinced that their data paradigm of data storage service also author
are correctly stored in the Cloud. Checking on introduces new security challenges, because data
retrieval is a common method for checking the data owners and data servers have different identities and
integrity, which means data owners check the data different business interests. Therefore, an
integrity when accessing their data. This method has independent auditing service is required to make sure
been used in peer-to-peer storage systems, network file that the data is correctly hosted in the Cloud. In this
systems, long-term archives, web-service object stores survey paper, they investigate this kind of problem
and database systems. However, checking on retrieval and give an extensive survey of storage auditing
is not sufficient to check the integrity for all the data methods in the literature. First, they give a set of
stored in the Cloud. There is usually a large amount of requirements of the auditing protocol for data storage
data stored in the Cloud, but only a small percentage is in cloud computing. Then, they introduce some
frequently accessed. There is no guarantee for the data existing auditing schemes and analyze them in terms
that are rarely accessed. An improved method was of security and performance. Finally, some
proposed by generating some virtual retrieval to check challenging issues are introduced in the design of
the integrity of rarely accessed data. But this causes efficient auditing protocol for data storage in cloud
heavy I/O overhead on the cloud servers and high computing.
communication cost due to the data retrieval
operations. 3.2 “Efficient Public Integrity Checking for Cloud
Therefore, it is desirable to have storage auditing Data Sharing with Multi-User Modification”
service to assure data owners that their data are In past years a body of data integrity
correctly stored in the Cloud. But data owners are not checking techniques have been proposed for securing
willing to perform such auditing service due to the cloud data services. Most of these survey assume that
heavy overhead and cost. In fact, it is not fair to let any only the data owner can modify cloud-stored data.
side of the cloud service providers or the data owners Recently a few attempts started considering more
conduct the auditing, because neither of them could be realistic scenarios by allowing multiple cloud users to


modify data with integrity assurance. However, these workload scenarios and provide the best job
attempts are still far from practical due to the performance among all tested algorithms.
tremendous computational cost on cloud users.
Moreover, collusion between misbehaving cloud 3.4 “Hybrid Job-Driven Scheduling for Virtual
servers and revoked users is not considered. This MapReduce Clusters”
paper proposes a novel data integrity checking It is cost-efficient for a tenant with a limited
scheme characterized by multi-user modification, budget to establish a virtual MapReduce cluster by
collusion resistance and a constant computational renting multiple virtual private servers (VPSs) from a
cost of integrity checking for cloud users, this survey VPS provider. To provide an appropriate scheduling
novel design of polynomial-based authentication tags scheme for this type of computing environment, we
and proxy tag update techniques. This survey scheme propose in this paper a hybrid job-driven scheduling
also supports public checking and efficient user scheme (JoSS for short) from a tenant's perspective.
revocation and is provably secure. Numerical JoSS provides not only job-level scheduling, but also
analysis and extensive experimental results show the map-task level scheduling and reduce-task level
efficiency and scalability of our proposed scheme. scheduling. JoSS classifies MapReduce jobs based on
job scale and job type and designs an appropriate
3.3 “Hybrid Job-Driven Meta Data Scheduling for scheduling policy to schedule each class of jobs. This
BigData with MapReduce Clusters and Internet survey goal is to improve data locality for both map
Approach” tasks and reduce tasks, avoid job starvation, and
It is cost-efficient for a tenant with a limited improve job execution performance. Two variations
budget to establish a virtual Map Reduce cluster by of JoSS are further introduced to separately achieve a
renting multiple virtual private servers (VPSs) from a better map-data locality and a faster task assignment.
VPS provider. To provide an appropriate scheduling We conduct extensive experiments to evaluate and
scheme for this type of computing environment, we compare the two variations with current scheduling
propose in this paper a hybrid job-driven scheduling algorithms supported by Hadoop. The results show
scheme (JoSS for short) from a tenant’s perspective. that the two variations outperform the other tested
JoSS provides not only job level scheduling, but also algorithms in terms of map-data locality, reduce-data
map-task level scheduling and reduce-task level locality, and network overhead without incurring
scheduling. JoSS classifies Map Reduce jobs based significant overhead. In addition, the two variations
on job scale and job type and designs an appropriate are separately suitable for different MapReduce-
scheduling policy to schedule each class of jobs. The workload scenarios and provide the best job
goal is to improve data locality for both map tasks performance among all tested algorithms.
and reduce tasks, avoid job starvation, and improve
job execution performance. Two variations of JoSS 3.5 “A new approach to internet host mobility.
are further introduced to separately achieve a better ACM Computer Comminication”
map-data locality and a faster task assignment. We This paper describes a new approach to
conduct extensive experiments to evaluate and Internet host mobility. We argue that by separating
compare the two variations with current scheduling local and wide area mobility, the performance of
algorithms supported by Hadoop. The survey show existing mobile host protocols (e.g. Mobile IP) can be
that the two variations outperform the other tested significantly improved. We propose Cellular IP, a
algorithms in terms of map-data locality, reduce data new lightweight and robust protocol that is optimized
locality, and network overhead without incurring to support local mobility but efficiently interworks
significant overhead. In addition, the two variations with Mobile IP to provide wide area mobility
are separately suitable for different Map Reduce support. Cellular IP shows great benefit in


comparison to existing host mobility proposals for [2] P. K. McKinley, H. Xu, A. H. Esfahanian, and L.
environments where mobile hosts migrate frequently, M. Ni. Unicastbased cloud storage communication in
which we argue, will be the rule rather than the wormhole-routed direct networks. TPDS, 1992.
exception as Internet wireless access becomes
ubiquitous. Cellular IP maintains distributed cache [3] H. Wu, C. Qiao, S. De, and O. Tonguz. Integrated
for location management and routing purposes. cell and ad hoc relaying systems: iCAR. J-SAC, 2001.
Distributed paging cache coarsely maintains the
position of ‘idle’ mobile hosts in a service area. [4] Y. H. Tam, H. S. Hassanein, S. G. Akl, and R.
Cellular IP uses this paging cache to quickly and Benkoczi. Optimal multi-hop cellular architecture for
efficiently pinpoint ‘idle’ mobile hosts that wish to wireless communications. In Proc. of LCN, 2006.
engage in ‘active’ communications. This approach is
beneficial because it can accommodate a large [5] Y. D. Lin and Y. C. Hsu. Multi-hop cellular: A
number of users attached to the network without new architecture for wireless communications. In
overloading the location management system. Proc. of INFOCOM, 2000.
Distributed routing cache maintains the position of
active mobile hosts in the service area and [6] P. T. Oliver, Dousse, and M. Hasler. Connectivity
dynamically refreshes the routing state in response to in ad hoc and hybrid networks. In Proc. of INFOCOM,
the handoff of active mobile hosts. These distributed 2002.
location management and routing algorithms lend
[7] E. P. Charles and P. Bhagwat. Highly dynamic
themselves to a simple and low cost implementation
destination sequenced distance vector routing (DSDV)
of Internet host mobility requiring no new packet
for mobile computers.In Proc. of SIGCOMM, 1994.
formats, encapsulations or address space allocation
beyond what is present in IP.
[8] C. Perkins, E. Belding-Royer, and S. Das. RFC
3561: Ad hoc on demand distance vector (AODV)
4. CONCLUSION routing. Technical report, Internet Engineering Task
In this paper, we have plan to investigate the Force, 2003.
auditing problem for data storage in cloud computing
and proposed a set of requirements of designing the [9] D. B. Johnson and D. A. Maltz. Dynamic source
third Party Auditing protocols. Here we applying job routing in adhoc wireless networks. IEEE Mobile
scheduling process with two phase level, its help to Computing, 1996.
comparison all type of processing issue from storage
system with respect data and time. Finally, we have [10] V. D. Park and M. Scott Corson. A highly
plan to introduce some challenging issues in the adaptive distributed routing algorithm for mobile
design of efficient auditing protocols for data storage wireless networks. In Proc. of INFOCOM, 1997.
in cloud computing.
[11] R. S. Chang, W. Y. Chen, and Y. F. Wen. Hybrid
wireless network protocols. IEEE Transaction on
5.REFERENCES Vehicular Technology, 2003.
[1] H Luo, R. Ramjee, P. Sinha, L. Li, and S. Lu.
Ucan: A unified cell and cloud computing [12] G. N. Aggelou and R. Tafazolli. On the relaying
architecture. In Proc. of MOBICOM, 2003. capacity of nextgeneration gsm cellular networks.
IEEE Personal Communications Magazine, 2001.

A Survey: Hybrid Job-Driven Meta Data Scheduling For Data Storage With Internet Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey: Hybrid Job-Driven Meta Data Scheduling For Data Storage With Internet Approach

Uploaded by

Copyright:

Available Formats

IDL - International Digital Library Of

Technology & Research

International e-Journal For Technology And Research-2018