You are on page 1of 31

Linux Cluster Architecture

by

Alex Vrenios
mailto://vrenios@asu.edu

(Shameless Plug)

Linux Users Group Slide # 1 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Overview:
• Why would anyone want to build a cluster system?

• Computer Architecture Review: UPs through Clusters

• Gathering the PC computer hardware (on the cheap!)

• Connecting the node computers into a local area network

• Configuring relevant Linux OS files for internetworking

• Client-Services and sockets make PCs work as a team

• The design of our simple master-slave cluster server

• Internal and external performance monitoring and tuning

Linux Users Group Slide # 2 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Why would anyone want to build a cluster system?
• Hobbyists:
It’s a new and interesting pathway to experience; and
how many of your friends have a cluster server anyway?
• Professionals:
Sophisticated systems are often developed in parallel,
meaning the hardware won’t be ready when you want to
test your software. Having a test bed will get you past
the hardware independent bugs, and put you in a position
to polish your product when the platform is finally ready.
• Managers:
This is all bleeding edge stuff; you’ll want to prepare for
the issues your people might face and the questions they
might ask. Experience gives you the insight you’ll need.
• Academics:
Analyze data from a live system, instead of questionable
and potentially over-simplified simulation output.
Linux Users Group Slide # 3 October 5th, 2002
Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture

Linux Users Group Slide # 4 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Computer Architecture Review:
• Uniprocessor or UP

CPU RAM I/O SISD*: Single instruction,


single data stream.

OUTPUT
Arithmetic Port
Data Bus
and Logic
Instructions
The typical PC is a uniprocessor.
and Data

INPUT
Instruction
Port
Processor

* Flynn proposed this taxonomy - some other configurations follow…

Linux Users Group Slide # 5 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
• Array or Vector Processor
CONTROLLER
SIMD: Single Instruction,
Instructions Instruction Bus multiple data stream.

CPU0 CPU1 CPUN (ILLIAC IV, IBM 390, DSPs, etc.)

Data Data
... Data
A[0] , B[0] A[1] , B[1] A[n] , B[n]

• Pipeline Processor
MISD: Multiple instruction,
CPU Pipe: confluent instruction execution
single data stream?
Data Bus
RAM: Instructions and Data
(Some say there is no MISD.)

• Multiprocessor or MP
MIMD: Multiple instruction,
CPU0 CPU1 ... CPUN multiple data streams.

Data Bus Mainframe, Workstation, etc.


RAM: Instructions and Data (Mostly for the very wealthy!)

Linux Users Group Slide # 6 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
• The MIMD is so interesting that gets its own taxonomy:
UMA: Uniform Memory Access
CPU0 CPU1 ... CPUN
MP Tightly-coupled multiprocessor.
Data Bus All CPUs access instructions and
RAM: Instructions and Data data at the same transfer rate.

NUMA: Non-uniform memory access


CPU0 CPU1 CPUN
... VME Chassis, some Beowulf Clusters,
Inst+Data Inst+Data Inst+Data and many embedded processor systems.
MULTICOMPUTERP

Plug-in boards, for example, each


(Loosely-Coupled)

High-speed Back Plane With a CPU and some local memory.

NORMA: No (hardware) Remote


CPU0 CPU1 CPUN Memory Access (rare distinction)
...
Inst+Data Inst+Data Inst+Data Older Beowulf Clusters, Distributed
Shared Memory Systems (IVY), and
some Modern day cluster computers.
Local Area Network
PCs, whose “personality” may
be molded by its software!

Linux Users Group Slide # 7 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Gathering PC Computer Hardware:

• Small computer stores (Renaissance Computer, e.g.)

• Newspaper and club and organization newsletter ads

• Family, friends and neighbors (closets, garage sales)

• Large corporations? (hospitals, Am Exp, Mot, etc.)

• Computer salvage outlets: N


Rio Salado

Pima 101 ASU Salvage

University

Linux Users Group Slide # 8 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Connecting the Node PCs into a LAN:

alpha chaos.org beta omega


Hub
...

10.0.0.1 10.0.0.2 10.0.0.5


IP Address Network Interface Cables

PC Rear View
RJ-45 Jack RJ-45 Plugs and Hub Ports

(10/100 Base T)

Built-in Network Interface Ports


Add-on

Linux Users Group Slide # 9 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture

Network Block Diagram:

CHAOS: CHeap Array of Outmoded Systems


Multicomputer Server

10MB ‘386 12MB ‘386


Linux rh4.2 Linux rh4.2
NFS
12MB ‘386 13MB ‘386
External Interactive Linux rh4.2 Linux rh4.2
Monitor Client
13MB ‘386 13MB ‘386
Desktop PC 32MB ‘486 Linux rh4.2 Linux rh4.2
(Real-Time (p75 Equiv.) 13MB ‘386 16MB ‘386
Performance) Linux rh4.2 Linux rh4.2 Linux rh4.2

Ethernet
Client queries
& responses

Linux Users Group Slide # 10 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Configuring a Linux Network – Local User Files:
/ (the root directory)

/home
alpha:/home/chief/src> make pgm
/home/chief

.rhosts
/home/chief/src
/home/chief/inc
/home/chief/bin
(others)
pgm.c
pgm.h
pgm

makefile

pgm:
gcc -I../inc/ –o../bin/pgm pgm.c

Linux Users Group Slide # 11 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Configuring a Linux Network – Remote User Files:
• Network File System: the illusion of locality via remote-mount points

alpha Hub omega


NFS Server NFS Client

chaos.org
/dev/hda 10.0.0.1 10.0.0.5 /dev/hda

/ /
adduser
/home /home

/home/chief /home/chief

/home/chief/bin /home/chief/src

/home/chief/inc

Linux Users Group Slide # 12 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Configuring a Linux Network:
• File /etc/hosts.equiv on every cluster node:
alpha.chaos.org chief
...
omega.chaos.org chief
• File /home/chief/.rhosts in the user’s home directory:
alpha.chaos.org chief
...
omega.chaos.org chief
• Test access using rsh, a remote shell command:
alpha:/home/chief> rsh omega
> Note that this and what follows may lead to a SECURITY leak!.

Linux Users Group Slide # 13 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Configuring a Linux Network (continued):
• First, file /etc/hosts belongs on all the network nodes:
127.0.0.1 localhost localhost.chaos.org
10.0.0.1 alpha alpha.chaos.org
...
10.0.0.5 omega omega.chaos.org
• File /etc/exports on 10.0.0.1, the NFS server named alpha:
Server /home (rw)
• File /etc/fstab on each cluster node except the server named alpha:
/dev/hda1 swap swap defaults 0 0
/dev/hda2 / ext2 defaults 1 1
Clients alpha:/home /home nfs rw 0 0
/dev/fd0 /mnt/floppy ext2 noauto 0 0
none /proc proc defaults 0 0
Linux Users Group Slide # 14 October 5th, 2002
Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture

Internetworking Services – Operation:

Screen Output

> rcat myfile remote


Local Machine
First line in myfile
Second line, etc.
Keyboard Input rcat
rcat myfile remote

Network
UDP or TCP
socket

Remote Machine
service myfile
rcatd
cat

Linux Users Group Slide # 15 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Internetworking services – Configuration (inetd):
• Add a line to file /etc/services on each remote-server node:
rcatd 5000/udp # remote-cat UDP service on port 5000
Refers to
entry in
services • Add a line to file /etc/inetd.conf on each remote-server node:
rcatd dgram udp wait chief /home/chief/bin/rcatd

• [Reconfiguration if necessary] omega:/root> killall -HUP inetd

Sequence of events:
1. Client process sends a UDP packet to server’s port 5000
2. Daemon (inetd) starts process at /home/chief/bin/rcatd
3. Service reads incoming UDP packet data from “keyboard”

Linux Users Group Slide # 16 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Internetworking Services – Configuration (xinetd):
• File /etc/xinetd.d/rcatd on each (xinetd) remote-server node:
service rcatd
{
port = 5000
socket_type = dgram Refers to
name of
protocol = udp service
wait = yes
user = chief
server = /home/chief/bin/rcatd
only_from = 10.0.0.0
disable = no
Means 10.0.0.*
}

• [Reconfiguration] omega:/root> /etc/rc.d/init.d/xinetd restart

Linux Users Group Slide # 17 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Distributed Systems C-Language Skills:
SUBTASKING INTERNETWORKING

main main Sockets


Sockets remote
Shared Shared process
Memory subtask Memory subtask

Network
Many examples
in the book!

SIGNAL HANDLING NETWORK SERVICES

inetd
main
main SIGALRM
Shared Sockets
subtask Memory subtask remote
service
SIGCHLD

Network Network

Linux Users Group Slide # 18 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture

Master-Slave Cluster Server - Initialization:

Broadcast starts slave tasks… Local subtasks contact slaves…


beta gamma beta gamma
slave slave slave slave
perform perform
alpha delta alpha delta
master slave master slave
perform perform
s1 s2 s3 s1 s2 s3

master starts local subtask, one


for each registering remote slave. all tasks start perform subtasks.

Linux Users Group Slide # 19 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture

Master-Slave Cluster Server - Operation:

Perform tasks send performance info to monitor…


beta gamma
slave slave
perform perform
query alpha delta
client monitor
master slave
perform perform
s1 s2 s3
response

incoming queries are processed by first available slave, via subtask.

Linux Users Group Slide # 20 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Real-Time Performance Monitoring – Internal:
• Resource utilization reporting via the /proc pseudo-files*:

- CPU Utilization in /proc/stat – Running Jiffy Counts in each State


cpu 1256 0 1566 565277
idle
system
nice
user

- Disk Reads and Writes in /proc/stat – Running I/O Counts


disk_rio 1270 0 0 0
disk_wio 1337 0 0 0
/dev/hda

* Note that the exact meaning and content of proc files can be OS release dependent.

Linux Users Group Slide # 21 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Real-Time Performance Monitoring – Internal (continued):
• Resource utilization reporting (continued):

- Memory Utilization in /proc/meminfo – Current Values


Mem: 14942208 13713408 1228800 . . .
Free
Used
Total

- Packets Sent and Received in /proc/net/dev – Running I/O Counts


lo: 80 0 0 0 0 80 0 0 0 0
eth0: 115 0 0 0 0 68 0 0 0 0
Transmitted
Received

Linux Users Group Slide # 22 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Real-Time Performance Monitoring – monitor/perform(s):
NEAR REAL-TIME CLUSTER PERFORMANCE STATISTICS

10Base2
+----ALPHA-----+ | +-----BETA-----+
| Cpu Mem | | | Cpu Mem |
| 7% 94% |Rcvd 0 | 21 Rcvd| 28% 40% |
| Rio Wio +-----------+-----------+ Rio Wio |
| 1 0 |Sent 12 | 1 Sent| 0 1 |
+---10.0.0.1---+ | +---10.0.0.2---+
|
+----GAMMA-----+ | +----DELTA-----+
| Cpu Mem | | | Cpu Mem |
| 2% 75% |Rcvd 2 | 0 Rcvd| 5% 56% |
| Rio Wio +-----------+-----------+ Rio Wio |
| 4 0 |Sent 0 | 10 Sent| 3 0 |
+---10.0.0.3---+ | +---10.0.0.4---+
chaos.org

- Overall Network Loading -


23 Pkts/sec

Linux Users Group Slide # 23 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Real-Time Performance Monitoring – External (displayed):
• Resource utilization reporting via a custom client process:

RESPONSE | OBSERVATIONS
TIME (msec) | 10 20 30 40 50
------------+----+----+----+----+----+----+----+----+----+----+
1 10 |
11 20 |
21 30 |************************
31 40 |************************
41 50 |**
51 60 |
61 70 |
71 80 |
81 90 |
91 100 |

50 Total Observations

Average = 30 milliseconds …so what if you’re not happy with this level of performance?

Linux Users Group Slide # 24 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Performance Tuning – Defining Execution Phases:

Query MASTER
Client 2 1
Response S1 STP
Table
8 S2
Transit
times 3 7

Ethernet

4 6
Slave 1 DB Slave 2

5 Shared

Linux Users Group Slide # 25 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Performance Tuning – Execution Phase Times:

Initial MSI Phase Times Not a SW


Found
issue.
a bug!
0.02
Leave the
file open?
Average Time
(seconds)

Expon
0.01 Pulse
Sweep

0.00
1 2 3 4 5 6 7 8
Execution Phases
(Three Time Distributions)

Linux Users Group Slide # 26 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Performance Tuning – Final Times:
Final MSI Phase Times

0.02

About a 10%
improvement
Average Time

(not too bad)


(seconds)

Expon
0.01 Pulse
Sweep
Dramatic
reduction!
See book for
further details
0.00 on statistical
1 2 3 4 5 6 7 8 distributions.
Execution Phases
(Three Time Distributions)

Linux Users Group Slide # 27 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Performance Tuning:

• The proof is in the pudding!

RESPONSE | OBSERVATIONS
TIME (msec) | 10 20 30 40 50
------------+----+----+----+----+----+----+----+----+----+----+
1 10 |
11 20 |*******
21 30 |*********************************
31 40 |********
41 50 |**
51 60 |
61 70 |
71 80 |
81 90 |
91 100 |
50 Total Observations

Average = 25 milliseconds = 17% improvement!

Linux Users Group Slide # 28 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Further Details are in the Book:
• Download all the source code for free:
http://www.samspublishing.com
- Search on "Linux cluster architecture“ or “Vrenios”
- Click on the “Downloads” link in the book description
1. Individual chapter examples are in zip files
2. A complete user chief environment is in a tar.gz file
• Book Signings:
Sep 8th Borders Chandler, Sunday @ 2pm
Sep 15th Borders Arrowhead, Sunday @ 2pm
Oct 25th Barnes & Noble Arrowhead, Friday @ 7pm

Linux Users Group Slide # 29 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
References:
Distributed Operating Systems, Andrew S. Tanenbaum
(of MINIX and AMOEBA fame!), Prentice Hall, 1995

Unix Distributed Programming, Chris Brown,


Prentice Hall, 1994

Advanced Programming in the UNIX Environment,


W. Richard Stevens, Addison-Wesley, 1992

“CHAOS: A CHeap Array of Outmoded Systems,” Alex Vrenios,


LinuxGazette.com, October 1998

“CHAOS Part 2,” LinuxGazette.com, Alex Vrenios, December 1998

Linux Programming White Papers, Rushling, et al, Coriolis Open, 1999

Linux Users Group Slide # 30 October 5th, 2002


Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture

You’ve been a terrific audience!


Any questions?

Hurry out and


buy this book! 20
Manifold Server - Final Performance Results

Time for 10 Que ries


15

(se conds)
10

..
5

0
1 2 3 4 5 6 7
Numbe r of Remote Workers

Linux Users Group Slide # 31 October 5th, 2002


Copyright © 2002 Alexander Vrenios

You might also like