You are on page 1of 168

Concurrent Programming with Threads

Rajkumar Buyya
School of Computer Science and Software Engineering Monash Technology Melbourne, Australia Email: rajkumar@ieee.org URL: http://www.dgs.monash.edu.au/~rajkumar

Objectives
Explain the parallel computing right from architecture, OS, programming paradigm, and applications Explain the multithreading paradigm, and all aspects of how to use it in an application Cover all basic MT concepts Explore issues related to MT Contrast Solaris, POSIX, Java threads Look at the APIs in detail Examine some Solaris, POSIX, and Java code examples Debate on: MPP and Cluster Computing 2

Agenda

Overview of Computing Operating Systems Issues Threads Basics Multithreading with Solaris and POSIX threads Multithreading in Java Distributed Computing Grand Challenges Solaris, POSIX, and Java example code

Computing Elements

Applications

Programming paradigms Threads Interface Microkernel Multi-Processor Computing System


P P P P P P

Operating System Hardware



P Processor

Thread

Process

Two Eras of Computing


Sequential Era

Parallel Era

Architectures Compilers Applications P.S.Es Architectures Compilers Applications P.S.Es


1940 50 60 70 80 90 2000 2030

Commercialization R&D Commodity

History of Parallel Processing

PP can be traced to a tablet dated around 100 BC.


 

Tablet has 3 calculating positions. Infer that multiple positions: Reliability/ Speed

Motivating Factors

d Just as we learned to fly, not by constructing a machine that flaps its wings like birds, but by applying aerodynamics principles demonstrated by nature...

We modeled PP after those of biological species.


7

Motivating Factors

Aggregated speed with which complex calculations carried out by individual neurons response is slow (ms) - demonstrate feasibility of PP
8

Why Parallel Processing?

Computation Sequential

requirements are ever increasing -- visualization, distributed databases, simulations, scientific prediction (earthquake), etc.. architectures reaching physical limitation (speed of light, thermodynamics)
9

Technical Computing
Solving technology problems using computer modeling, simulation and analysis
Geographic Information Systems Life Sciences Aerospace

Mechanical Design & Analysis (CAD/CAM)

10

Computational Power Improvement

Multiprocessor

C.P.I.

Uniprocessor

2. . . .

No. of Processors
11

Computational Power Improvement

Vertical

Horizontal

Growth
5 10 15 20 25 30

35

40

45 . . . .

Age
12

Why Parallel Processing?

The Tech. of PP is mature and can be Significant


development in Networking technology is paving a way for heterogeneous computing.

exploited commercially; significant R & D work on development of tools & environment.

13

Why Parallel Processing?

Hardware Vector

improvements like Pipelining, Superscalar, etc., are nonscalable and requires sophisticated Compiler Technology. Processing works well for certain kind of problems.

14

Parallel Program has & needs ...

Multiple processes active simultaneously solving a given problem, general multiple processors. Communication and synchronization of its processes (forms the core of parallel programming efforts).

15

Processing Elements Architecture

16

Processing Elements
Simple classification by Flynn:
(No. of instruction and data streams)

SISD - conventional SIMD - data parallel, vector computing MISD - systolic arrays MIMD - very general, multiple approaches.

Current focus is on MIMD model, using general purpose processors. (No shared memory)
17

SISD : A Conventional Computer


Instructions

Data Input

Processor

Data Output

Speed is limited by the rate at which computer can transfer information internally.
Ex:PC, Macintosh, Workstations

18

Instruction Stream A Instruction Stream B

The MISD Architecture

Instruction Stream C
Processor

A Data Input Stream


Processor

Data Output Stream


Processor

B C

More of an intellectual exercise than a practical configuration. Few built, but commercially not available

19

SIMD Architecture
Instruction Stream

Data Input stream A Data Input stream B Data Input stream C

Processor

Data Output stream A Data Output stream B


Processor

A
Processor

B C

Data Output stream C

Ci<= Ai * Bi Ex: CRAY machine vector processing, Thinking machine cm*

20

MIMD Architecture
Instruction Instruction Instruction Stream A Stream B Stream C

Data Input stream A Data Input stream B Data Input stream C

Processor

Data Output stream A Data Output stream B


Processor

A
Processor

B C

Data Output stream C

Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD

21

Shared Memory MIMD machine


Processor A
Processor B Processor C

M E M B O U R S Y

M E M B O U R S Y

M E M B O U R S Y

Global Memory System


Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandability. A memory component or any processor failure affects the whole system. Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers....

22

Distributed Memory MIMD


IPC channel Processor A
Processor B Processor C

IPC channel

M E M B O U R S Y

M E M B O U R S Y

M E M B O U R S Y

Memory System A
  

Memory System B

Memory System C

Communication : IPC on High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system)

23

Laws of caution.....


Speed of computers is proportional to the square of their cost.


i.e.. cost = Speed C

(speed = cost2)
S

Speedup by a parallel computer increases as the logarithm of the number of processors.


S Speedup = log2(no. of processors) P

24

Caution....
Very fast development in PP and related area have blurred concept boundaries, causing lot of terminological confusion : concurrent computing/ programming, parallel computing/ processing, multiprocessing, distributed computing, etc..

25

Its hard to imagine a field that changes as rapidly as computing.

26

Caution....

Computer Science is an Immature Science.


(lack of standard taxonomy, terminologies)

27

Caution....
There is no strict delimiters for contributors to the area of parallel processing : CA, OS, HLLs, databases, computer networks, all have a role to play. This makes it a Hot Topic of Research

28

Parallel Programming Paradigms


Multithreading Task level parallelism

29

Serial Vs. Parallel


COUNTER COUNTER 1 COUNTER 2

Q
Please

30

High Performance Computing function1( ) { //......function stuff } function2( ) { //......function stuff } Parallel Machine : MPP function1( ) || function2 ( )  massively parallel system containing thousands of CPUs Time : max (t1, t2) 31

t1

t2
Serial Machine function1 ( ): function2 ( ):  Single CPU Time : add (t1, t2)

Single and Multithreaded Processes


Single-threaded Process Threads of Execution Multiplethreaded Process

Single instruction stream

Common Address Space

Multiple instruction stream

32

OS: Multi-Processing, Multi-Threaded


Threaded Libraries, Multi-threaded I/O MultiApplication Application Application CPU CPU Better Response Times in Multiple Application Environments CPU CPU CPU Application

CPU

Higher Throughput for Parallelizeable Applications

33

Multi-threading, continued...
Multi-threaded OS enables parallel, scalable I/O
Application

Application

Application

OS Kernel

CPU

Multiple, independent I/O requests can be satisfied simultaneously because all the major disk, tape, and network drivers have been multithreaded, allowing any given driver to run on multiple CPUs simultaneously.
CPU

CPU

34

Basic Process Model

STACK Shared memory segments, pipes, open files or mmapd files

STACK

DATA TEXT

DATA TEXT

processes

Shared Memory maintained by kernel

processes 35

What are Threads?  Thread is a piece of code that can execute in 


concurrence with other threads. It is a schedule entity on a processor
Hardware Context

Registers Status Word


Program Counter

Local state Global/ shared state PC Hard/Software Context


Thread Object 36

Running

Threaded Process Model

THREAD STACK

SHARED MEMORY THREAD DATA THREAD TEXT

Threads within a process

 Independent executables  All threads are parts of a process hence communication easier and simpler.

37

Levels of Parallelism
Task i-l Task i Task i+1
Code-Granularity Code Item Large grain (task level) Program

Task Control Data Multiple Issue

func1 ( ) { .... .... }

func2 ( ) { .... .... }

func3 ( ) { .... .... }

Medium grain (control level) Function (thread)

a ( 0 ) =.. b ( 0 ) =..

a ( 1 )=.. b ( 1 )=..

a ( 2 )=.. b ( 2 )=..

Fine grain (data level) Loop Very fine grain (multiple issue) With hardware

Load

38

Simple Thread Example


void *func ( ) { /* define local data */ - - - - - - - - - - - - - - - - - - - - - /* function code */ - - - - - - - - - - thr_exit(exit_value); } main ( ) { thread_t tid; int exit_value; - - - - - - - - thread_create (0, - - - - - - - - thread_join (tid, - - - - - - - - }

- 0, func (), NULL, &tid); - 0, &exit_value); - -

39

Few Popular Thread Models


POSIX, ISO/IEEE standard Mach C threads, CMU Sun OS LWP threads, Sun Microsystems PARAS CORE threads, C-DAC Java-Threads, Sun Microsystems Chorus threads, Paris OS/2 threads, IBM Windows NT/95 threads, Microsoft
40

Multithreading - Uniprocessors
Concurrency Vs Parallelism
Concurrency P1 P2 P3 CPU

time

Number of Simulatneous execution units > no of CPUs

41

Multithreading Multiprocessors
Concurrency Vs Parallelism
CPU P1 CPU P2 P3 CPU

time

No of execution process = no of CPUs

42

Computational Model
User Level Threads Virtual Processors
Physical Processors User-Level Schedule (User) Kernel-Level Schedule (Kernel)

Parallel Execution due to :


Concurrency of threads on Virtual Processors Concurrency of threads on Physical Processor

True Parallelism :
threads : processor map = 1:1 43

General Architecture of Thread Model


Hides the details architecture Maps User threads Threads of machine

to

kernel

Process VM is shared, state change in VM by one thread visible to other.


44

Process Parallelism
int add (int a, int b, int & result) // function stuff int sub(int a, int b, int & result) Processor // function stuff IS1 add
pthread t1, t2; pthread-create(&t1, add, a,b, & r1); pthread-create(&t2, sub, c,d, & r2); pthread-par (2, t1, t2);

Data a b r1 c d r2
45

Processor

IS2

sub

MISD and MIMD Processing

Data Parallelism
sort( int *array, int count) //...... //......
pthread-t, thread1, thread2; pthread-create(& thread1, sort, array, N/2); pthread-create(& thread2, sort, array, N/2); pthread-par(2, thread1, thread2);

Data
Processor

Sort IS
Processor

do dn/2 dn2/+1 dn
46

Sort

SIMD Processing

Process and Threaded models

Purpose
Creation of a new thread

Process Model
fork ( )

Threads Model
thr_create( )

Start execution of a new thread

exec( )

[ thr_create() builds the new thread and starts the execution thr_join()

Wait for completion of thread Exit and destroy the thread

wait( )

exit( )

thr_exit()

47

Code Comparison

Segment (Process) Segment(Thread) main ( ) { fork ( ); fork ( ); fork ( ); } main() { thread_create(0,0,func(),0,0); thread_create(0,0,func(),0,0); thread_create(0,0,func(),0,0); }

48

Printing Thread

Editing Thread

49

Independent Threads
printing() { - - - - - - - - - - - } editing() { - - - - - - - - - - - } main() { - - - - - - - - - - - id1 = thread_create(printing); id2 = thread_create(editing); thread_run(id1, id2); - - - - - - - - - - - }

50

Cooperative threads - File Copy

reader() { - - - - - - - - lock(buff[i]); read(src,buff[i]); unlock(buff[i]); - - - - - - - - }

buff[0] buff[1]

writer() { - - - - - - - - - lock(buff[i]); write(src,buff[i]); unlock(buff[i]); - - - - - - - - - }

Cooperative Parallel Synchronized Threads

51

RPC Call
Client Server

RPC(func) func() { /* Body */ } ........

52

Multithreaded Server
Client Process Server Process
Server Threads

Client Process

User Mode Kernel Mode


Message Passing Facility

53

Multithreaded Compiler

Source Code

Preprocessor Thread

Compiler Thread

Object Code

54

Thread Programming models

1. The boss/worker model 2. The peer model 3. A thread pipeline

55

The boss/worker model

Program

Workers
taskX

Resources
Files Databases

Boss
taskY

Input (Stream)

main ( )

Disks
taskZ

Special Devices

56

Example
main() /* the boss */ { forever { get a request; switch( request ) case X: pthread_create(....,taskX); case X: pthread_create(....,taskX); .... } } taskX() /* worker */ { perform the task, sync if accessing shared resources } taskY() /* worker */ { perform the task, sync if accessing shared resources } .... --Above runtime overhead of creating thread can be solved by thread pool * the boss thread creates all worker thread at program initialization and each worker thread suspends itself immediately for a wakeup call from boss

57

The peer model

Program

Workers
taskX

Resources
Files Databases

Input
(static)

taskY

Disks
taskZ

Special Devices

58

Example
main() { pthread_create(....,thread1...task1); pthread_create(....,thread2...task2); .... signal all workers to start wait for all workers to finish do any cleanup } } task1() /* worker */ { wait for start perform the task, sync if accessing shared resources } task2() /* worker */ { wait for start perform the task, sync if accessing shared resources }

59

A thread pipeline
Program
Input (Stream)

Filter Threads
Stage 1 Stage 2 Stage 3

Resources

Files Databases Disks

Files Databases Disks

Files Databases Disks

Special Devices

Special Devices

Special Devices

60

Example
main() { pthread_create(....,stage1); pthread_create(....,stage2); .... wait for all pipeline threads to finish do any cleanup } stage1() { get next input for the program do stage 1 processing of the input pass result to next thread in pipeline } stage2(){ get input from previous thread in pipeline do stage 2 processing of the input pass result to next thread in pipeline } stageN() { get input from previous thread in pipeline do stage N processing of the input pass result to program output. }

61

Multithreaded Matrix Multiply...

X A B

= C

C[1,1] = A[1,1]*B[1,1]+A[1,2]*B[2,1].. . C[m,n]=sum of product of corresponding elements in row of A and column of B. Each resultant element can be computed independently. 62

Multithreaded Matrix Multiply


typedef struct { int id; int size; int row, column; matrix *MA, *MB, *MC; } matrix_work_order_t; main() { int size = ARRAY_SIZE, row, column; matrix_t MA, MB,MC; matrix_work_order *work_orderp; pthread_t peer[size*zize]; ... /* process matrix, by row, column */ for( row = 0; row < size; row++ ) for( column = 0; column < size; column++) { id = column + row * ARRAY_SIZE; work_orderp = malloc( sizeof(matrix_work_order_t)); /* initialize all members if wirk_orderp */ pthread_create(peer[id], NULL, peer_mult, work_orderp); } } /* wait for all peers to exist*/ for( i =0; i < size*size;i++) pthread_join( peer[i], NULL ); }

63

Multithreaded Server...
void main( int argc, char *argv[] ) { int server_socket, client_socket, clilen; struct sockaddr_in serv_addr, cli_addr; int one, port_id; #ifdef _POSIX_THREADS pthread_t service_thr; #endif port_id = 4000; /* default port_id */ if( (server_socket = socket( AF_INET, SOCK_STREAM, 0 )) < 0 ) { printf("Error: Unable to open socket in parmon server.\n"); exit( 1 ); } memset( (char*) &serv_addr, 0, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); serv_addr.sin_port = htons( port_id ); setsockopt(server_socket, SOL_SOCKET, SO_REUSEADDR, (char *)&one, sizeof(one));

64

Multithreaded Server...
if( bind( server_socket, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0 ) { printf( "Error: Unable to bind socket in parmon server->%d\n",errno ); exit( 1 ); } listen( server_socket, 5); while( 1 ) { clilen = sizeof(cli_addr); client_socket = accept( server_socket, (struct sockaddr *)&serv_addr, &clilen ); if( client_socket < 0 ) { printf( "connection to client failed in server.\n" ); continue; } #ifdef POSIX_THREADS pthread_create( &service_thr, NULL, service_dispatch, client_socket); #else thr_create( NULL, 0, service_dispatch, client_socket, THR_DETACHED, &service_thr); #endif } }

65

Multithreaded Server
// Service function -- Thread Funtion void *service_dispatch(int client_socket) { Get USER Request if( readline( client_socket, command, 100 ) > 0 ) { IDENTI|FY USER REQUEST .Do NECESSARY Processing ..Send Results to Server } CLOSE Connect and Terminate THREAD close( client_socket ); #ifdef POSIX_THREADS pthread_exit( (void *)0); #endif }

66

The Value of MT
Program structure Parallelism Throughput Responsiveness System resource usage Distributed objects Single source across platforms (POSIX) Single binary for any number of CPUs
67

To thread or not to thread

Improve efficiency on uniprocessor systems Use multiprocessor Hardware Improve Throughput


Simple to implement Asynchronous I/O

Leverage special features of the OS

68

To thread or not to thread

If all operations are CPU intensive do not go far on multithreading Thread creation is very cheap, it is not free
thread that has only five lines of code would not be useful

69

DOS - The Minimal OS


Stack & Stack Pointer Program Counter

User Code User Space

Global Data DOS

Kernel Space DOS Data

DOS Code

Hardware

70

Multitasking OSs
Process User Space Process Structure Kernel Space UNIX

Hardware

(UNIX, VMS, MVS, NT, OS/2 etc.)


71

Multitasking Systems
Processes P1 P2 P3 P4

The Kernel Hardware

(Each process is completely independent)


72

Multithreaded Process
T1s SP T3sPC T1sPC T2sPC T1s SP T2s SP User Code Global Data Process Structure

The Kernel

(Kernel state and address space are shared)


73

Kernel Structures
Traditional UNIX Process Structure
Process ID UID GID EUID EGID CWD.

Solaris 2 Process Structure


Process ID UID GID EUID EGID CWD.

Signal Dispatch Table Memory Map

Signal Dispatch Table Memory Map

File Descriptors

Priority Signal Mask Registers Kernel Stack

File Descriptors

CPU State LWP 2 LWP 1

74

Scheduling Design Options

M:1 HP-UNIX

1:1 DEC, NT, OS/1, AIX. IRIX

M:M

2-level

75

SunOS Two-Level Thread Model


Traditional process Proc 1 User LWPs Kernel threads Kernel Hardware Processors 76

Proc 2

Proc 3

Proc 4

Proc 5

Thread Life Cycle


T1
pthread_create(...func...) pthread_exit()

T2 main() { ... pthread_create( func, arg); ... } void * func() { .... } POSIX main() { thr_create( ..func..,arg..); ... }

Solaris

77

Waiting for a Thread to Exit


T1
pthread_join() pthread_exit()

T2 main() { ... pthread_join(T2); ... } void * func() { .... } POSIX main() { thr_join( T2,&val_ptr); ... }

Solaris

78

Scheduling States: Simplified View of Thread State Transitions

Stop

RUNNABLE
Continue Preempt Stop

Wakeup

STOPPED

SLEEPING

Stop

ACTIVE

Sleep

79

Preemption
The process of rudely interrupting a thread and forcing it to relinquish its LWP (or CPU) to another. CPU2 cannot change CPU3s registers directly. It can only issue a hardware interrupt to CPU3. It is up to CPU3s interrupt handler to look at CPU2s request and decide what to do. Higher priority threads always preempt lower priority threads. Preemption ! = Time slicing All of the libraries are preemptive 80

EXIT Vs. THREAD_EXIT


The normal C function exit() always causes the process to exit. That means all of the process -- All the threads. The thread exit functions: UI : thr_exit() POSIX : pthread_exit() OS/2 : DosExitThread() and _endthread() NT : ExitThread() and endthread() all cause only the calling thread to exit, leaving the process intact and all of the other threads running. (If no other threads are running, then exit() will be called.) 81

Cancellation
Cancellation is the means by which a thread can tell another thread that it should exit.
(pthread exit)

T1
(pthread cancel()

POSIX

OS/2

T2 Windows NT

main() main() main() {... {... {... pthread_cancel (T1); DosKillThread(T1); TerminateThread(T1) } } } There is no special relation between the killer of a thread and the victim. (UI threads must roll their own using signals)

82

Cancellation State and Type


State
PTHREAD_CANCEL_DISABLE (Cannot be cancelled) PTHREAD_CANCEL_ENABLE (Can be cancelled, must consider type)

Type
PTHREAD_CANCEL_ASYNCHRONOUS (any time what-so-ever) (not generally used) PTHREAD_CANCEL_DEFERRED (Only at cancellation points)

(Only POSIX has state and type) (OS/2 is effectively always enabled asynchronous) (NT is effectively always enabled asynchronous) 83

Cancellation is Always Complex!


It is very easy to forget a lock thats being held or a resource that should be freed. Use this only when you absolutely require it. Be extremely meticulous in analyzing the possible thread states. Document, document, document!

84

Returning Status
POSIX and UI
A detached thread cannot be joined. It cannot return status. An undetached thread must be joined, and can return a status.

OS/2
Any thread can be waited for No thread can return status No thread needs to be waited for.

NT
No threads can be waited for Any thread can return status

85

Suspending a Thread
T1
suspend() continue()

T2 Solar is: main() { ... thr_suspend(T1); ... thr_continue(T1); ... }

* POSIX does not support thread suspension

86

Proposed Uses of Suspend/Continue


Garbage Collectors Debuggers Performance Analysers Other Tools? These all must go below the API, so they dont count. Isolation of VM system spooling (?!) NT Services specify that a service should b suspendable (Questionable requirement?)

Be Careful
87

Do NOT Think about Scheduling!


Think about Resource Availability Think about Synchronization Think about Priorities

Ideally, if youre using suspend/ continue, youre making a mistake!

88

Synchronization
Websters: To represent or arrange events to indicate coincidence or coexistence. Lewis : To arrange events so that they occur in a specified order. * Serialized access to controlled resources. Synchronization is not just an MP issue. It is not even strictly an MT issue! 89

Threads Synchronization :

 On

shared memory : shared variables semaphores  On distributed memory : within a task : semaphores Across the tasks : By passing messages

90

Unsynchronized Shared Data is a Formula for Disaster


Thread1 temp = Your - > BankBalance; dividend = temp * InterestRate; newbalance = dividend + temp; Your->Dividend += dividend; Your->BankBalance = newbalance; Thread2

Your->BankBalance+= deposit;

91

Atomic Actions
An action which must be started and completed with no possibility of interruption. A machine instruction could need to be atomic. (not all are!) A line of C code could need to be atomic. (not all are) An entire database transaction could need to be atomic. All MP machines provide at least one complex atomic instruction, from which you can build anything. A section of code which you have forced to be atomic is a Critical Section. 92

Critical Section (Good Programmer!)


T1
reader() { - - - - - - - - lock(DISK); ........... ........... ........... unlock(DISK); - - - - - - - - }

T2
writer() { - - - - - - - - - lock(DISK); .............. .............. unlock(DISK); - - - - - - - - - }

Shared Data 93

Critical Section (Bad Programmer!)


T1
reader() { - - - - - - - - lock(DISK); ........... ........... ........... unlock(DISK); - - - - - - - - }

T2
writer() { - - - - - - - - - .............. .............. - - - - - - - - - }

Shared Data 94

Lock Shared Data!


Globals Shared data structures Static variables (really just lexically scoped global variables)

95

Mutexes
Thread 1
item = create_and_fill_item(); mutex_lock( &m ); item->next = list; list = item; mutex_unlock(&m);

Thread2
mutex_lock( &m ); this_item = list; list = list_next; mutex_unlock(&m); .....func(this-item);

POSIX and UI : Owner not recorded, block in priority order. OS/2 and NT. Owner recorded, block in FIFO order. 96

Synchronization Variables in Shared Memory (Cross Process)

Synchronization Variable

Process 1 S Shared Memory S S

Process 2 S

Thread

97

Synchronization Problems

98

Deadlocks
Thread 1
lock( M1 ); lock( M2 );

Thread 2
lock( M2 ); lock( M1 );

Thread1 is waiting for the resource(M2) locked by Thread2 and Thread2 is waiting for the resource (M1) locked by Thread1 99

Avoiding Deadlocks
Establish a hierarchy : Always lock Mutex_1 before Mutex_2, etc..,. Use the trylock primitives if you must violate the hierarchy.
{ while (1) { pthread_mutex_lock (&m2); if( EBUSY |= pthread mutex_trylock (&m1)) break; else { pthread _mutex_unlock (&m1); wait_around_or_do_something_else(); } } do_real work(); /* Got `em both! */ }

Use lockllint or some similar static analysis program to scan your code for hierarchy violations.

100

Race Conditions
A race condition is where the results of a program are different depending upon the timing of the events within the program. Some race conditions result in different answers and are clearly bugs.
Thread 1 mutex_lock (&m) v = v - 1; mutex_unlock (&m) Thread 2 mutex_lock (&m) v = v * 2; mutex_unlock (&m)

--> if v = 1, the result can be 0 or 1based on which thread gets chance to enter CR first

101

Operating System Issues

102

Library Goals
Make it fast! Make it MT safe! Retain UNIX semantics!

103

Are Libraries Safe ?


getc() OLD implementation:
extern int get( FILE * p ) { /* code to read data */ }

getc() NEW implementation:


extern int get( FILE * p ) { pthread_mutex_lock(&m); /* code to read data */ pthread_mutex_unlock(&m); }

104

ERRNO
In UNIX, the distinguished variable errno is used to hold the error code for any system calls that fail. Clearly, should two threads both be issuing system calls around the same time, it would not be possible to figure out which one set the value for errno. Therefore errno is defined in the header file to be a call to thread-specific data. This is done only when the flag_REENTRANT (UI) _POSIX_C_SOURCE=199506L (POSIX) is passed to the compiler, allowing older, non-MT programs to continue to run. There is the potential for problems if you use some libraries which are not reentrant. (This is often a problem when using third party libraries.)

105

Are Libraries Safe?


MT-Safe MT-Hot MT-Unsafe This function is safe This function is safe and fast This function is not MT-safe, but was compiled with _REENTRANT Alternative Call This function is not safe, but there is a similar function (e.g. getctime_r()) MT-Illegal This function wasnt even compiled with _REENTRANT and therefore can only be called from the main thread.

106

Threads Debugging Interface


Debuggers Data inspectors Performance monitors Garbage collectors Coverage analyzers

Not a standard interface!

107

The APIs

108

Different Thread Specifications


Functionality Design Philosophy UI Threads POSIX Thteads Near-Base Primitives Local/Global Simple Simple Buildable Simple Buildable Impossible Yes Yes Yes No Most Some NT Threads Complex Primitives Global Complex Buildable Buildable Buildable Complex Yes Yes Yes n/a Yes All? Some OS/2 Threads Complex Primitives Global Complex Buildable Buildable Complex Yes Yes Yes n/a No All? Some

Base Primitives Scheduling Classes Local/ Global Mutexes Simple Counting Semaphores Simple R/W Locks Simple Buildable Condition Variables Simple Multiple-Object Buildable Synchronization Thread Suspension Yes Cancellation Buildable Thread-Specific Data Yes Signal-Handling Primitives Yes Compiler Changes Required No Vendor Libraries MT-safe? Moat ISV Libraries MT-safe? Some

109

POSIX and Solaris API Differences


POSIX API Solaris API continue thread cancellation scheduling policies sync attributes thread attributes
join

suspend

exit key creation semaphore vars priorities sigmask create thread specific data concurrency setting mutex vars kill reader/ writer vars condition vars

daemon threads

110

Error Return Values


Many threads functions return an error value which can be looked up in errno.h. Very few threads functions set errno(check man pages). The lack of resources errors usually mean that youve used up all your virtual memory, and your program is likely to crash very soon.

111

Attribute Objects
UI, OS/2, and NT all use flags and direct arguments to indicate what the special details of the objects being created should be. POSIX requires the use of Attribute objects: thr_create(NULL, NULL, foo, NULL, THR_DETACHED); Vs: pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_DETACHED); pthread_create(NULL, &attr, foo, NULL);

112

Attribute Objects
Although a bit of pain in the *** compared to passing all the arguments directly, attribute objects allow the designers of the threads library more latitude to add functionality without changing the old interfaces. (If they decide they really want to, say, pass the signal mask at creation time, they just add a function pthread_attr_set_signal_mask() instead of adding a new argument to pthread_create().) There are attribute objects for: Threads stack size, stack base, scheduling policy, scheduling class, scheduling scope, scheduling inheritance, detach state. Mutexes Cross process, priority inheritance Condition Variables Cross process

113

Attribute Objects
Attribute objects must be: Allocated Initialized Values set (presumably) Used Destroyed (if they are to be freed) pthread_attr_t attr; pthread_attr_init (&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) pthread_create(NULL, &attr, foo, NULL); pthread_attr_destroy (&attr);

114

Thread Attribute Objects


pthread_attr_t; Thread attribute object type: pthread_attr_init (pthread_mutexattr_t *attr) pthread_attr_destroy (pthread_attr_t *attr) pthread_attr_getdetachstate (pthread_attr_t *attr, in *state) pthread_attr_setdetachstate (pthread_attr_t *attr, int state) Can the thread be joined?: pthread_attr_getscope(pthread_attr_t *attr, in *scope) pthread_attr_setscope(pthread_attr_t *attr, int scope)

115

Thread Attribute Objects


pthread_attr_getinheritpolicy(pthread_attr_t *attr, int *policy) pthread_attr_setinheritpolicy(pthread_attr_t *attr, int policy) Will the policy in the attribute object be used? pthread_attr_getschedpolicy(pthread_attr_t *attr, int *policy) pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy) Will the scheduling be RR, FIFO, or OTHER? pthread_attr_getschedparam(pthread_attr_t *attr, struct sched param *param) pthread_attr_setschedparam(pthread attr_t *attr, struct sched param *param); What will the priority be?

116

Thread Attribute Objects


pthread_attr_getinheritsched(pthread_attr_t *attr, int *inheritsched) pthread_attr_setinheritsched(pthread_attr_t *attr, int inheritsched) Will the policy in the attribute object be used? pthread_attr_getstacksize(pthread_attr_t *attr, int *size) pthread_attr_setstacksize(pthread_attr_t *attr, int size) How big will the stack be? pthread_attr_getstackaddr (pthread_attr_t *attr, size_t *base) pthread_attr_setstackaddr(pthread_attr_t *attr, size_t base) What will the stacks base address be?

117

Mutex Attribute Objects


pthread_mutexattr_t; mutex attribute object type pthread_mutexattr_init(pthread_mutexattr_t *attr) pthread_mutexattr_destroy(pthread_mutexattr_t *attr) pthread_mutexattr_getshared(pthread_mutexattr_t*attr, int shared) pthread_mutexattr_setpshared (pthread_mutex attr_t *attr, int shared) Will the mutex be shared across processes?

118

Mutex Attribute Objects


pthread_mutexattr_getprioceiling(pthread_mutexattr_t *attr, int *ceiling) pthread_mutexattr_setprioceiling(pthread_mutexattr_t *attr, int *ceiling) What is the highest priority the thread owning this mutex can acquire? pthread_mutexattr_getprotocol (pthread_mutexattr_t *attr, int *protocol) pthread_mutexattr_setprotocol (pthread_mutexattr_t *attr, int protocol) Shall the thread owning this mutex inherit priorities from waiting threads?

119

Condition Variable Attribute Objects


pthread_condattr_t; CV attribute object type pthread_condattr_init(pthread_condattr_t * attr) pthread_condattr_destroy(pthread_condattr_t *attr) pthread_condattr_getpshared (pthread_condattr_t *attr, int *shared) pthread_condattr_setpshared (pthread_condattr_t *attr, int shared) Will the mutex be shared across processes?

120

Creation and Destruction (UI & POSIX)


int thr_create(void *stack_base, size_t stacksize, void *(*start_routine) (void *), void * arg, long flags, thread_t thread); void thr_exit (void *value_ptr); int thr_join (thread_t thread, void **value_ptr); int pthread_create (pthread_t *thread, const pthread_attr_t *attr, void * (*start_routine) (void *), void *arg); void pthread_exit (void *value_ptr); int pthread_join (pthread_t thread, void **value_ptr); int pthread_cancel (pthread_t thread);

121

Suspension (UI & POSIX)


int thr_suspend int thr_continue (thread_t target) (thread_t target)

122

Changing Priority (UI & POSIX)


int thr_setpriority(thread_t thread, int priority) int thr_getpriority(thread_t thread, int *priority) int pthread_getschedparam(pthread_t thread, int *policy, struct sched param *param) int pthread_setschedparam(pthread_t thread, int policy, struct sched param *param)

123

Readers / Writer Locks (UI)


int rwlock_init void *arg); int rw_rdlock int rw_wrlock int rw_tryrdlock int rw_trywrlock int rw_unlock int rw_destroy (rwlock_t *rwlock, int type, (rwlock_t *rwlock); (rwlock_t *rwlock); (rwlock_t *rwlock); (rwlock_t *rwlock); (rwlock_t *rwlock); (rwlock_t *rwlock);

124

(Counting) Semaphores (UI & POSIX)


int sema_init (sema_t *sema, unsigned int sema_count, int type, void *arg) (sema_t *sema) (sema_t *sema) (sema_t *sema) (sema_t *sema) (sem_t *sema, int pshared, unsigned int (sem_t *sema) (sem_t *sema) (sem_t *sema)

int sema_wait int sema_post int sema_trywait int sema_destroy int sem_init count) int sem_post int sem_trywait int sem_destroy

(POSIX semaphores are not part of pthread. Use the libposix4.so and posix4.h)

125

Condition Variables (UI & POSIX)


int cond_init(contd_t *cond, int type, void *arg) int cond_wait(cond_t *cond, mutex_t *mutex); int cond_signal(cond_t *cond) int cond_broadcast(cond_t *cond) int cond_timedwait(cond_t *cond, mutex_t *mutex, timestruc_t *abstime) int cond_destroy (cond_t *cond) int pthread_cond_init(pthread_cond_t *cond,pthread_condattr_t *attr) int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) int pthread_cond_signal (pthread_cond_t *cond) int pthread_cond_broadcast(pthread_cond_t *cond, pthread_mutex_t *mutex, struct timespec *abstime) int pthread_cond_destroy(pthread_cond_t *cond)

126

Signals (UI & POSIX)


int thr_sigsetmask(int how, const sigset_t *set, sigset_t *oset); int thr_kill(thread_t target thread, int sig) int sigwait(sigset_t *set) int pthread_sigmask(int how, const sigset_t *set, sigset_t *oset); int pthread_kill(thread_t target_thread, int sig) int sigwait(sigset_t *set, int *sig)

127

Cancellation (POSIX)
int pthread_cancel int pthread cleanup_pop int pthread_cleanup_push (pthread_thread_t thread) (int execute) (void (*funtion) (void *), void *arg) (int state, int *old_state) (void) 128

int pthread_setcancelstate int pthread_testcancel

Other APIs
thr_self(void) thr_yield() int pthread_atfork (void (*prepare) (void), void (*parent) (void), void (*child) (void) pthread_equal (pthread_thread_t tl, pthread_thread_t t2) pthread_once (pthread_once_t *once_control, void (*init_routine) (void)) pthread_self (void) pthread_yield()

(Thread IDs in Solaris recycle every 2^32 threads, or about once a month if you do create/exit as fast as possible.)

129

Compiling

130

Solaris Libraries
Solaris has three libraries: libthread.so, libpthread.so, libposix4.so Corresponding new include files: synch.h, thread.h, pthread.h, posix4.h Bundled with all O/S releases Running an MT program requires no extra effort Compiling an MT program requires only a compiler (any compiler!) Writing an MT program requires only a compiler (but a few MT tools will come in very handy)

131

Compiling UI under Solaris


Compiling is no different than for non-MT programs
libthread is just another system library in /usr/lib Example: %cc -o sema sema.c -lthread -D_REENTRANT %cc -o sema sema.c -mt

All multithreaded programs should be compiled using the _REENTRANT flag


Applies for every module in a new application If omitted, the old definitions for errno, stdio would be used, which you dont want

All MT-safe libraries should be compiled using the _REENTRANT flag, even though they may be used single in a threaded program.

132

Compiling POSIX under Solaris


Compiling is no different than for non-MT programs
libpthread is just another system library in /usr/lib Example : %cc-o sema sema.c -lpthread -lposix4 -D_POSIX_C_SOURCE=19956L

All multithreaded programs should be compiled using the _POSIX_C_SOURCE=199506L flag


Applies for every module in a new application If omitted, the old definitions for errno, stdio would be used, which you dont want

All MT-safe libraries should be compiled using the _POSIX_C_SOURCE=199506L flag, even though they may be used single in a threaded program

133

Compiling mixed UI/POSIX under Solaris


If you just want to use the UI thread functions (e.g., thr_setconcurrency()) %cc-o sema sema.c -1thread -1pthread -1posix4 D_REENTRANT _POSIX_PTHREAD_SEMANTICS If you also want to use the UI semantics for fork(), alarms, timers, sigwait(), etc.,.

134

Summary
Threads provide a more natural programming paradigm Improve efficiency on uniprocessor systems Allows to take full advantage of multiprocessor Hardware Improve Throughput: simple to implement asynchronous I/O Leverage special features of the OS Many applications are already multithreaded MT is not a silver bullet for all programming problems. Threre is already standard for multithreading--POSIX Multithreading support already available in the form of language syntax--Java Threads allows to model the real world object (ex: in Java)

135

Java

Multithreading in Java

136

Java - An Introduction
Java - The new programming language from Sun Microsystems Java -Allows anyone to publish a web page with Java code in it Java - CPU Independent language Created for consumer electronics Java - James , Arthur Van , and others Java -The name that survived a patent search Oak -The predecessor of Java Java is C++ -- ++ 137

Object Oriented Languages -A comparison


Feature
Encapsulation Inheritance Multiple Inherit. Polymorphism Binding (Early or Late) Concurrency Garbage Collection Genericity Class Libraries

C++ Yes Yes Yes Yes Both Poor No Yes Yes

Objective C Yes Yes Yes Yes Both Poor Yes No Yes

Ada Yes No No Yes Early Difficult No Yes Limited

Java Yes Yes No Yes Late Yes Yes No Yes

138

Sun defines Java as:


Simple and Powerful Safe Object Oriented Robust Architecture Neutral and Portable Interpreted and High Performance Threaded Dynamic

139

Java Integrates Power of Compiled Languages and Flexibility of Interpreted Languages

140

Classes and Objects


Classes and Objects Method Overloading Method Overriding Abstract Classes Visibility modifiers default public protected private protected , private

141

Threads
Java has built in thread support for Multithreading Synchronization Thread Scheduling Inter-Thread Communication: currentThread start setPriority yield run getPriority sleep stop suspend resume Java Garbage Collector is a low-priority thread

142

Ways of Multithreading in Java


Create a class that extends the Thread class Create a class that implements the Runnable interface

1st Method: Extending the Thread class


class MyThread { public void { // thread } } Creating thread: MyThread thr1 = Start Execution: thr1.start(); extends Thread run() body of execution

new MyThread();

143

2nd method: Threads by implementing Runnable interface


class ClassName implements Runnable { ..... public void run() { // thread body of execution } } Creating Object: ClassName myObject = new ClassName(); Creating Thread Object: Thread thr1 = new Thread( myObject ); Start Execution: thr1.start();

144

Thread Class Members...


public class java.lang.Thread extends java.lang.Object implements java.lang.Runnable { // Fields public final static int MAX_PRIORITY; public final static int MIN_PRIORITY; public final static int NORM_PRIORITY; // Constructors public Thread(); public Thread(Runnable target); public Thread(Runnable target, String name); public Thread(String name); public Thread(ThreadGroup group, Runnable target); public Thread(ThreadGroup group, Runnable target, String public Thread(ThreadGroup group, String name); // Methods public static int activeCount(); public void checkAccess(); public int countStackFrames(); public static Thread currentThread(); public void destroy(); public static void dumpStack(); public static int enumerate(Thread tarray[]); public final String getName();

name);

145

...Thread Class Members.


public public public public public public public public public public public public public public public public public public public public public public public } final int getPriority(); // 1 to 10 priority-pre-emption at mid. final ThreadGroup getThreadGroup(); void interrupt(); static boolean interrupted(); final boolean isAlive(); final boolean isDaemon(); boolean isInterrupted(); final void join(); final void join(long millis); final void join(long millis, int nanos); final void resume(); void run(); final void setDaemon(boolean on); final void setName(String name); final void setPriority(int newPriority); static void sleep(long millis); static void sleep(long millis, int nanos); void start(); final void stop(); final void stop(Throwable obj); final void suspend(); String toString(); static void yield();

146

Manipulation of Current Thread


// CurrentThreadDemo.java class CurrentThreadDemo { public static void main(String arg[]) { Thread ct = Thread.currentThread(); ct.setName( "My Thread" ); System.out.println("Current Thread : "+ct); try { for(int i=5; i>0; i--) { System.out.println(" " + i); Thread.sleep(1000); } } catch(InterruptedException e) { System.out.println("Interrupted."); } } } Run: Current Thread : Thread[My Thread,5,main] 5 4 3 2 1

147

Creating new Thread...


// ThreadDemo.java class ThreadDemo implements Runnable { ThreadDemo() { Thread ct = Thread.currentThread(); System.out.println("Current Thread : "+ct); Thread t = new Thread(this,"Demo Thread"); t.start(); try { Thread.sleep(3000); } catch(InterruptedException e) { System.out.println("Interrupted."); } System.out.println("Exiting main thread."); }

148

...Creating new Thread.


public void run() { try { for(int i=5; i>0; i--) { System.out.println(" " + i); Thread.sleep(1000); } } catch(InterruptedException e) { System.out.println("Child interrupted."); } System.out.println("Exiting child thread."); } public static void main(String args[]) { new ThreadDemo(); } } Run: Current Thread : Thread[main,5,main] 5 4 3 Exiting main thread. 2 1 Exiting child thread.

149

Thread Priority...
// HiLoPri.java class Clicker implements Runnable { int click = 0; private Thread t; private boolean running = true; public Clicker(int p) { t = new Thread(this); t.setPriority(p); } public void run() { while(running) click++; } public void start() { t.start(); } public void stop() { running = false; } }

150

...Thread Priority
class HiLoPri { public static void main(String args[]) { Thread.currentThread().setPriority(Thread.MAX_PRIORITY); Clicker Hi = new Clicker(Thread.NORM_PRIORITY+2); Clicker Lo = new Clicker(Thread.NORM_PRIORITY-2); Lo.start(); Hi.start(); try { Thread.sleep(10000); } catch (Exception e) { } Lo.stop(); Hi.stop(); System.out.println(Lo.click + " vs. " + Hi.click); } } Run1: (on Solaris) 0 vs. 956228 Run2: (Window 95) 304300 vs. 4066666

151

The Java monitor model


Method 1 Method 2 Key Block 1 Threads Monitor (synchronised) solves race-condition problem 152

Threads Synchronisation...
// Synch.java: race-condition without synchronisation class Callme { // Check synchronized and unsynchronized methods /* synchronized */ void call(String msg) { System.out.print("["+msg); try { Thread.sleep(1000); } catch(Exception e) { } System.out.println("]"); } } class Caller implements Runnable { String msg; Callme Target; public Caller(Callme t, String s) { Target = t; msg = s; new Thread(this).start(); }

153

...Threads Synchronisation.
public void run() { Target.call(msg); } } class Synch { public static void main(String args[]) { Callme Target = new Callme(); new Caller(Target, "Hello"); new Caller(Target, "Synchronized"); new Caller(Target, "World"); } } Run 1: With unsynchronized call method (race condition) [Hello[Synchronized[World] ] ] Run 2: With synchronized call method [Hello] [Synchronized] [World] Run3: With Synchronized object synchronized(Target) { Target.call(msg); } The output is the same as Run2

154

Queue (no inter-threaded communication)...


// pc.java: produce and consumer class Queue { int n; synchronized int get() { System.out.println("Got : "+n); return n; } synchronized void put(int n) { this.n = n; System.out.println("Put : "+n); } } class Producer implements Runnable { Queue Q; Producer(Queue q) { Q = q; new Thread( this, "Producer").start(); }

155

Queue (no inter-threaded communication)...


public void run() { int i = 0; while(true) Q.put(i++); } } class Consumer implements Runnable { Queue Q; Consumer(Queue q) { Q = q; new Thread( this, "Consumer").start(); } public void run() { while(true) Q.get(); } }

156

...Queue (no inter-threaded communication).


class PC { public static void main(String[] args) { Queue Q = new Queue(); new Producer(Q); new Consumer(Q); } } Run: Put: 1 Got: 1 Got: 1 Got: 1 Put: 2 Put: 3 Got: 3 ^C

157

Queue (interthread communication)...


// PCnew.java: produce-consumenr with interthread communication class Queue { int n; boolean ValueSet = false; synchronized int get() { try { if(!ValueSet) wait(); } catch(InterruptedException e) { } System.out.println("Got : "+n); ValueSet = false; notify(); return n; }

158

Queue (interthread communication)...


synchronized void put(int n) { try { if(ValueSet) wait(); } catch(InterruptedException e) { } this.n = n; System.out.println("Put : "+n); ValueSet = true; notify(); } } class Producer implements Runnable { Queue Q; Producer(Queue q) { Q = q; new Thread( this, "Producer").start(); }

159

Queue (interthread communication)...


public void run() { int i = 0; while(true) Q.put(i++); } } class Consumer implements Runnable { Queue Q; Consumer(Queue q) { Q = q; new Thread( this, "Consumer").start(); } public void run() { while(true) Q.get(); } }

160

...Queue (no interthread communication).


class PCnew { public static void main(String[] args) { Queue Q = new Queue(); new Producer(Q); new Consumer(Q); } } Run: Put : 0 Got : 0 Put : 1 Got : 1 Put : 2 Got : 2 Put : 3 Got : 3 Put : 4 Got : 4 ^C

161

Deadlock...
// DeadLock.java class A { synchronized void foo(B b) { String name = Thread.currentThread().getName(); System.out.println(name + " entered A.foo"); try { Thread.sleep(1000); } catch(Exception e) { } System.out.println(name + " trying to call B.last()"); b.last(); } synchronized void last() { System.out.println("Inside A.last"); } }

162

Deadlock...
class B { synchronized void bar(A a) { String name = Thread.currentThread().getName(); System.out.println(name + " entered B.bar"); try { Thread.sleep(1000); } catch(Exception e) { } System.out.println(name + " trying to call A.last()"); a.last(); } synchronized void last() { System.out.println("Inside B.last"); } }

163

...Deadlock.
class DeadLock implements Runnable { A a = new A(); B b = new B(); DeadLock() { Thread.currentThread().setName("Main Thread"); new Thread(this).start(); a.foo(b); System.out.println("Back in the main thread."); } public void run() { Thread.currentThread().setName("Racing Thread"); b.bar(a); System.out.println("Back in the other thread"); } public static void main(String args[]) { new DeadLock(); } } Run: Main Thread entered A.foo Racing Thread entered B.bar Main Thread trying to call B.last() Racing Thread trying to call A.last() ^C

164

Grand Challenges (Is PP Practical?)


Need OS and Compiler support to use multiprocessor machines. Ideal would be for the user to be unaware if the problem is running on sequential or parallel hardware - a long way to go. With Highspeed Networks and improved microprocessor performance, multiple stand-alone machines can also be used as a parallel machine - a Popular Trend. (appealing vehicle for parallel computing) Language standards have to evolve. (Portability). Re-orientation of thinking Sequential Parallel

165

Grand Challenges (Is PP Practical?)


Language standards have to evolve. (Portability). Re-orientation of thinking Sequential Parallel

166

Breaking High Performance Computing Barriers

2100

2100

2100

2100

G F L O P S

2100
2100 2100 2100 2100

Single Processor

Shared Memory

Local Parallel Cluster

Global Parallel Cluster

167

Thank You ...

168

You might also like