You are on page 1of 10

A Novel Resource Handling Approach Through Scheduling Heterogeneous Peers (RHASHP) In Grid Computing

Abstract One of the fundamental requirements of Grid computing is efficient and effective resource discovery mechanism. Resource discovery involves discovery of appropriate resources required by user applications. In this regard various resource discovery mechanisms have been proposed during the recent years. These mechanisms range from centralized to hierarchical information servers approach. Most of the techniques developed based on these approaches have scalability and fault tolerance limitations. To overcome these limitations we are proposing a new methodology for resource recovery mechanism through grid scheduling approach. Keywords: Grid Computing, Resource Recovery Management (RRM), Grid resource, Grid Scheduling Introduction Resource management is very important and complex problems in grid computing environment. The resource management problem becomes complex when resources are distributed geographically, heterogeneous, dynamic and autonomous. There is a need to a grid that responds to various requests more quickly. For this we are proposing a new method to optimize resource grouping based on criteria such as delay, bandwidth and semantics in order to select the resources more quickly and appropriately. Along with that we are working on the new orientation of applying different scheduling methology on these parallel grids. The initial results show that the proposed method provides better results than existing approaches. Background Various resource discovery mechanisms are being developed in the paradigm of distributed systems. Goal of almost every mechanism is efficient and effective resource management in fault tolerant and scalable manner. Since in the real world of computing the Underlying environment is heterogeneous and highly unpredictable therefore the mechanisms have to be optimized and sometimes combined for proper resource discovery and management. Grid inherits most of the properties of conventional distributed systems. Resource management in Grid has more or less same goals of other distributed systems, but with the difference that Grid is organized in much better way. Aim of this project is to provide an practical implementation scenario through existing grid resource discovery mechanisms and introduce a novel grid scheduling methodology to enhance the processing criteria in accordance with resource recovery management Protocol.

Grid Computing Scenario There are some motivational factors behind the Grid deployment which are outlined here. These factors are one of the driving forces for effective resource management. Resource Management Systems Resource management is a complex task involving security, fault tolerance along with scheduling. It is the manner in which resources are allocation, assigned, authenticated, authorized, assured, accounted, and audited. Resources include traditional resources like compute cycles, network bandwidth, space or a storage system and also services like data transfer, simulation etc. A RMS is also responsible for naming the resources in the system, monitoring and reporting the job, resource status and accounting for resource usage. The RMS interacts with the security system to validate user requests, the information service to obtain information about resource availability, the local system to schedule jobs on the local resource management system

Key Concern Related To Grid: Exploitation of underutilized resources: Main idea is to distribute the work load to an underutilized resource over the Grid. Consider various cases in which machines are in their idle states or in peak utilization states. Therefore if an application is running on a busy machine, further applications or jobs could be executed on some other idle machine(s) on the grid. This idea is not new in domain of distributed computing but Grid provides a framework to exploit such underutilized resources in a very effective and broad way. There are two fundamental requirements in order to execute an application on a remote machine. The perspective application should have the ability to execute remotely without any considerable over head. Secondly the remote machine should satisfy the resource requirements of the application.

Parallel CPU Capacity: This is one of the major benefits of Grid. This computing capability has wide industrial application ranging from Bio-medical and High energy physics. The actual utilization of this potential depends on the design of applications us in this computation facility. These applications should have the capability to be divided in to sub applications or sub jobs, so that an application could be submitted to distributed machines. The scalability of application lies in the way of subdivision into sub-jobs i.e. the more the division of application the more is the scalability. Virtual resources and virtual organizations for collaboration: Enabling and simplifying collaboration among wider entities has been Grids major contribution. Distributed systems developed in the past have provided this facility and have been quite successful in achieving this goal. Grid has enhanced this capability to a higher scale incorporating very heterogeneous systems and providing various standards and frameworks. In Grid users can be dynamically grouped into Virtual Organizations (VOs) with their own policies. Thus these VOs can share their resources in larger grid. Access to additional resources: Besides the storage and processing resources, grid provides access to various additional resources. These resources could range from remote printing to sharing software licenses. There are many expensive scientific equipments with capabilities of remote access, grid exploits this state of the art resource access and makes it available on an ordinary end machine. Resource Balancing: As described earlier grid federates vast resources into a single large virtual resource. Various mechanisms have been proposed and developed for resource balancing on the grid. For occasions of peak load these resource balancing mechanisms could be vital. This may be done in two ways. Either an unexpected peak could be transferred to a considerably idle machine on the grid or in case of full utilization state of grid, low priority jobs could be suspended and the jobs with higher priority could be executed. This could be accomplished through proper reservation and scheduling of resources. Reliability: Reliability is one of the fundamental goals of any distributed system. Usually hardware reliability is achieved through redundancy of equipment. In grid the underlying software technology offers more than hardware based reliability. The grid management software resubmits a job to alternate machines in case of failures or in some case a critical jobs multiple instances are executed over different machines.

Research Problem Domain Identification We know that while applying the grids practically we have to keep the things according to the distributed environment .So first of all we have to focus on methods of heterogeneous client interaction. Also the major functionalities which we are going to enhance in grid Resource management protocol are been identified as:(i) Receiving the job Properly in Grids. (ii) Understanding Specification of different processes.

(iii)
(iv)

Estimating resource requirements for above processes. Scheduling the application.

We are focusing mainly our research on scheduling of application processes through a grid resources recovery mechanism which can be considered as a management protocol. In scheduling we identify that the major factors which plays a vital role in improvement of parallel processing is resource discovery, system selection & job execution. So finally this synopsis focuses on proposing a Novel Resource Handling Approach Through Scheduling Heterogeneous Peers (RHASHP). Related Study Maheswaran et al.,in 2002 identifies that the grid computing systems is a distributed computing model that supplies the access to the heterogeneity resources that are spread geographically. Tthe use of this kind of grid systems has become common in sharing, selecting and gathering computing resource. So the main concern about understanding the grid is how to utilize parallel resources & the way to handle them. Heine et al., 2004 In grid, resources consists of hardware resources, software packages, grid services, etc. consequently, the specific features needed to describe these resources tends to be highly complex and requests have also been complex. Semantic grid is called to respond these requests. Chen et al. (2005) employed p2p technology for resource management in two stages: wide-area manager for managing resources on hosts with one or more than one virtual organization(s), and a local manager which is for managing resources on one host. Truong et al. (2006) used super-peer and cluster model in which each superpeer acts as a central server for some clients, and superpeers communicate with each other. Nagargadde et al. (2006) used supernode model for managing resources. Considering that the existent resources in grid heterogeneity, belong to the organizations and the specific places with specific policies and various accesses and with proficiency consists of a nature dynamic, so the resources management and computing appeals in grid numbers a vital and complicated issue, as the scheduling and the resources management can be one of the area of

ongoing researches and development (Jacob et al., 2005; Li et al., 2005; Prodan et al., 2006; Caron et al., 2007). To arrive at faster and more appropriate access resources, they are grouped in terms of QoS criteria such as network bandwidth, delay, and semantics. Each group includes a number of nodes. In each group, the strongest node in terms of QoS criteria is selected as the super node, and each super node has the most complete information on the nodes in its group (Tanenbaum, 1998; Huo et al., 2007). ; Somasundaram et al., 2006). In semantic grid, characteristics and concepts of resources are employed to discover resources. Some words not synonyms while that may be semantically related such as CPU and computer. Let us suppose a situation in which a user requests a CPU resource with high speed and there is not any resource named CPU in any node. At the same time, there may be a resource named computer which includes CPU component. Therefore, request for CPU in the semantic grid should be interpreted once requested by computer. Based on resource type, El-Darieby et al. (2006) have organized grid environment as hierarchical clusters having three management levels, i.e. Individual Resource Manager (IRM), Cluster Resource Manager (CRM), and Grid Resource Manager (GRM). An IRM manages resources of individuals and stores updated information on resource situation and its accessibility. It also has management policies for the resource. A CRM manages a cluster and through connection with IRMs collects information on resources of its management clusters. In the third level, the CRM of the second level is grouped into virtual clusters managed by CRM. Each CRM stores an abstract body of information on clusters. GRM is in the highest level and has connection with corresponding GRMs, enabling GRM to participate for providing interorganizational resource. In the above-mentioned study, the resources existing in a cluster are similar in terms of software, and its management would be difficult as these resources can be different geographically. Discovery and monitory system defines and implements a mechanism to discover and control in distributed environment, and it supports traditional services. However, it does not support semantic description services and grid resources. Somasundaram et al. (2006) put a knowledge layer on grid architecture to describe and discover semantic resources. This layer provides services which can look for patterns in existing data repositories and manage information services. Ontology concept has been used for semantic description of grid resource and spread semantic relation between resources. Web ontology language (OWL) has been employed to build grid resource Ontology. OWL is based on different logical models, enabling it to describe various concepts and to create complicated concepts out of simpler ones. Heine et al. (2004) used Description Logic (DL) systems and distributed hash table for semantic description. DL systems have evolved as knowledge representation systems. In such system, knowledge is divided into two parts: Taxonomical-BOX that stores conceptual knowledge on words and it can be compared with a database system model; and Assertional-BOX that represents concrete knowledge on individuals, and it includes conceptual assertion and rule assertion. Behnam Barzegar, Habib Esmaeelzadeh et al.(2011) identifies that resource management problem becomes complex when resources are distributed geographically, heterogeneous, dynamic and autonomous. There is a need to a grid that responds to various requests more quickly. They have proposed a new method to optimize resource grouping based on Quality of Services (QoS) criteria such as delay, band width and semantics in order to select

the resources more quickly and appropriately. These results show that the proposed method provides better results than existing approaches Proposed Solution For Grid Scheduling Grid scheduling involves scheduling of resources over different and dispersed domains This might involve resource searching on multiple administrative domains to reach a single machine or a single administrative domain for multiple or single resource. As mentioned before we are focusing our research on grid scheduler or broker which has to make scheduling decisions on an environment where it has no control over the local resources and this scheduling is closely linked with GIS. We are parted our work in three recognized domains of scheduling phenomenas which are; Resource discovery, System selection and Job execution Resource discovery The first stage involves the authorization of the user submitting job. This determines the access of user on the desired resource. This procedure is not much different than the traditional way of remote authorization i.e. the job would not be permitted to execute if the user has no access on that resource. The GIS keeps track of the resources and the access record. Therefore a user can inquire about the access rights on various resources. As the number of resources and users increase dynamically in grid, it becomes more challenging to manage authorization and access control. One way to address this issue is to write account, machine and password information at some secure place but this approach has issues of scalability and fault tolerance.

The next step after authorization is the specification of the requirements given by the user. This might include some static information such as operating system or much dynamic information such as memory. It is better to include as much details as possible. Currently the requirements are specified in command line or job submission scripts e.g. in PBS and LSF. In most cases of system work the information is simply available. But in the grid environment it is highly possible that application requirements might change according to the matched target resource e.g. depending on the system architecture the memory and computing requirements might change. Little work has been done in this regard, i.e. to automatically gather data depending on the changed requirements. The coming chapters explain how Peer to Peer based mechanisms address this issue. After authorization and requirements specification, next step involves the filtration of resources which do not meet the minimum requirement criteria of user application. At the end of this step the scheduler will have the list of the resources for detail investigation and discovery. System Selection After having information about the possible resources, the most suitable resource has to be selected. This is done in two steps i.e. dynamic information gathering and system selection. In order to make best possible match of resource for perspective job, it is important to have the current information of the resource. Since information may change with respect to the scheduled application or the accessed resource. The main sources for such information are either GIS or the local resource schedulers. Given the dynamic information the resource selection could be done. There are various mechanism developed in this regard. For example Condor match making is one of several Job Execution

Once resources are chosen the job could be submitted to the selected resource. This job submission may be as easy as submitted a single command or complicated. In grid the ordinary process of job submission might get quite complicated due to unavailability of an agreed standard. Significant research is being done in order to develop some common APIs The next step after job submission involves preparation tasks. This might include the reservation claim or preparing target resource for the perspective job. Again this process might get complicated due to different underlying authorization policies and administrative domains. After this step the jobs progress is monitored and then job completion tasks are carried out. Finally the cleanup activities are executed. These steps are very similar to the steps involved in traditional computing paradigm. But these steps are carried out considering the very dynamic nature of grid environment. So by observing the above proposed methodology on initial test result factors we can easily enhance the capabilities of grid through heterogeneous resource handling technique (RHASHP).

Conclusion

References:1. Minoli, Daniel. Networking Approach to Grid Computing. Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004. p 1. 2. Bart Jacob, Michael Brown, Kentaro Fukui, Nihar Trivedi. Introduction to Grid Computing, [www.ibm.com/redbooks]. 3. The Globus Data Grid Effort, [www.globus.org/toolkit/docs/2.4/datagrid/]. 4. Network File System, [http://www.redhat.com/docs/manuals/enterprise/RHEL-4Manual/ref-guide/ch-nfs.html] 5. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham and Michael J. West Scale and Performance in a Distributed File System, ACM Transactions on Computer Systems, Volume 6, Issue 1, 1988 6. Jason Barkes, Marcelo R. Barrios, Francis Cougard, Paul G. Crumley, Didac Marin, Hari Reddy, Theeraphong Thitayanun GPFS: A Parallel File System, [www.ibm.com/redbooks]. 7. AFS reference page, [http://www.cs.cmu.edu/afs/andrew.cmu.edu/usr/shadow/ www/afs.html#general]

You might also like