You are on page 1of 8

CLUSTER COMPUTING

1.Introduction
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand alone computers co - operatively working together as a single, integrated computing resource. This cluster of computers share common network characteristics like the same namespace and it is available to other computers on the network as a single resource .These computers are linked together using high-speed network interfaces between themselves and the actual bind in together of the all the individual computers in the cluster is performed by the operating system and the software used

The following are some prominent components of cluster computers: Multiple High Performance Computers (PCs, Workstations, or SMPs) State-of-the-art Operating Systems (Layered or Micro-kernel based) High Performance Networks/Switches (such as Gigabit Ethernet and Myrinet) Network Interface Cards (NICs) Fast Communication Protocols and Services (such as Active and Fast Messages) Cluster Middleware (Single System Image (SSI) and System Availability Infrastructure){ Hardware (such as Digital (DEC) Memory Channel, hardware DSM, and MP techniques) { Operating System Kernel or Gluing Layer (such as Solaris MC and GLUnix

2. Logical View Of Cluster


A cluster uses multi computer architecture as shown :

Fig.1.multi computer architecture

Nodes or systems are the individual members of a cluster . They can be computers, servers, and other such hardware although each node generally has memory and processing capabilities. If one node becomes unavailable, the other nodes can carry the demand load so that applications or services are always available. There must be at least two nodes to compose a cluster structure otherwise they are just called servers. The master node acts as a server for Network File System (NFS) and as a gateway to the outside world. As an NFS server, the master node provides user file space and other common system software to the compute nodes via NFS. As a gateway, the master node allows users to gain access through it to the compute nodes. The sole task of the compute nodes is to execute parallel jobs. In most cases, therefore, the compute nodes do not have keyboards, mice,video cards, or monitors. All access to the client nodes is provided via remote connections from the master node. Because compute nodes do not need to access machines outside the cluster, nor do machines outside the cluster need to access compute nodes directly, compute nodes commonly use private IP addresses, such as the 10.0.0.0/8 or 192.168.0.0/16 address ranges.

logical view of cluster

3.Custer Classification According To Architecture


Clusters can be basically classified into two Close Clusters Open Clusters

Close Clusters
They hide most of the cluster behind the gateway node.Consequently they need less IP addresses and provide better security. They are good for computing tasks.

Close Cluster

Open Clusters
All nodes can be seen from outside, and hence they need more IPs, and cause more security concern .But they are more flexible and are used for internet/web/information server task.

Open Cluster

4.Clustering Concepts
Clusters are in fact quite simple. They are a bunch of computers tied together with a network working on a large problem that has been broken down into smaller pieces. There are a number of different strategies we can use to tie them together. There are also a number of different software packages that can be used to make the software side of things work. The name of the game in high performance computing is parallelism. Parallelism operates at two levels: hardware parallelism software parallelism

a. Hardware parallelism CPU level parallelism


Commonly CPU is a device that operates on one instruction after another in a straight line . But new CPU architectures have an inherent ability to do more than one thing at once. The logic of CPU chip divides the CPU into multiple execution units that allows the CPU to process more than one instructions at a time . Two modern hardware features supports multiple execution unit- Cache & Pipeline .

System level parallelism

It is the parallelism of multiple nodes coordinating to work on a problem in parallel that gives the cluster its power . There are other levels at which even more parallelism can be introduced. For example if we decide that each node in our cluster will be a multi CPU system we will be introducing a fundamental degree of parallel processing at the node level. Having more than one network interface on each node introduces communication channels that may be used in parallel to communicate with other nodes in the cluster.

b. Software parallelism
It includes the program part that can be distributed and gives us the speed up that we want to get out of a high-perfomance computing system. Before we can run a program on a parallel cluster , we have to ensure that the problems we are trying to solve are amenable to being done in a parallel fashion. Almost any problem that is composed of smaller sub-problems that broken down into smaller problems 5.Cluster Components The basic building blocks multiple categories: the cluster nodes, cluster operating system, network switching hardware and the node/switch interconnect

cluster components

Application :
It includes all the various applications that are going on for a particular group. These applications run in parallel.

Middleware:
These are software packages which interacts the user with the operating system for the cluster computing.

Operating System:
Clusters can be supported by various operating systems which include Windows, Linux etc

Interconnect:

Interconnection between the various nodes of the cluster system can be done using 10Gb E . In case of small cluster system these are be connected with the help of simple switches.

Nodes: Nodes of the cluster system implies about the different computers that are connected. All
of these processors can be of intels or 64 bit microprocessors

6.Cluster Types
There are several types of clusters, each with specific design goals and functionality . Those are : High Availability/ Failover Clusters Load balancing cluster Parallel/Distributed processing clusters.

High Availability / Failover cluster


The purpose of these clusters is to provide uninterrupted availability of data or services to the end-user community .

Fig.6.high availability/failover cluster

Load Balancing Cluster


This type of cluster distributes incoming requests for resources or content among multiple nodes running the same programs or having the same content . If a node fails, requests are redistributed between the remaining available nodes. This type of distribution is typically seen in a web-hosting environment

Parallel/Distributed Processing Cluster

A parallel cluster is a system that uses a number of nodes to computational or datamining simultaneously solve a specific task Unlike the load balancing or high availability clusters that distributes requests/tasks to nodes where a node processes the entire request, a parallel environment will divide the request into multiple sub-tasks that are distributed to multiple nodes within the cluster for processing.Parallel clusters are typically used for mathematical computation, scientific analysis (weather forecasting) and financial data analysis .

parallel/Distributed processing cluster

7.Cluster Applications
Few important cluster application are: Google Search Petroleum Reservoir Simulation Protein Explorer Space Application Telecommunication application

8.Cluster Benefits

The main benefits of clusters are: Hypercluster


Hypercluster consists of number of clusters . 6 Scalability :-Clustering concept obeys scalability i. e any numbers of additional nodes can be added to the Cluster . High Availability :- In cluster computing , the service is available till the end . Ability to handle unexpected peaks in workload . Significantly reduced cost & easy to build by combining all the components. Surprisingly powerful . High speed network .

9.Future Trends

Fig.10.Hyper Cluster

10.Comparing Old And New


Today, open standards-based HPC systems are being used to solve problems from Highend, floating-point intensive scientific and engineering problems to data intensive tasks in industry.Some of the reasons why HPC clusters outperform RISC based systems Include:

1.Collaboration 2.Scalability 3.Availability 4.Ease of technology refresh 5.Affordable service and support 6.Vendor lock-in 7.System manageability 8.Reusability of components 9.Disaster recovery 11.Cluster Classifications 1.Clusters are classified in to several sections based on the facts such as
1)Application target 2) Node owner ship 3)Node Hardware 4) Node operating System 5) Node configuration.

2.Clusters based on Node Ownership are again classified in to two:


Non-dedicated clusters Dedicated clusters

3.Clusters based on Node Hardware are again classified in to three:


7

Clusters of PCs (CoPs) Clusters of work stations(CoWs) Clusters of SMPs(CLUMPs)

4.Clusters based on Node Operating System are again classified into:


Linux Clusters (e.g., Beowulf) Solaris Clusters (e.g., Berkeley NOW) Digital VMS Clusters HP-UX clusters Microsoft Wolf pack clusters

12.Advantages and disadvantages

Advantages:
Super computing power at a fraction of the cost part of the cluster can go down and still run jobs while repairs are being made. They are simple to build using components available from hundreds of sources.

Disadvantages:
If cluster head gets down all setup gets down. You can not use less used node for processing other applications data like Grid

You might also like