You are on page 1of 117

GHC(STG,Cmm,asm) illustrated

for hardware persons

exploring some mental models and implementations

Takenobu T.
Rev. 0.01.0
WIP

Any sufficiently advanced technology is

indistinguishable from magic.

Arthur C. Clarke

NOTE
- This is not an official document by the ghc development team.
- Please don't forget semantics. It's very important.
- This is written for ghc 7.8 (and ghc 7.10).

Contents
- Executable binary
- Compile steps
- Runtime System
- Development languages

- Machine layer/models
- STG-machine
- Heap objects in STG-machine
- STG-machine evaluation
- Pointer tagging
- Thunk and update
- Allocate and free heap objects
- STG - C land interface

- Thread
- Thread context switch
- Creating main and sub threads
- Thread migration
- Heap and Threads
- Threads and GC
- Bound thread
- Spark
- Mvar
- Software transactional memory
- FFI
- IO and FFI
- IO manager
- Bootstrap
- References

Executable binary

The GHC = Compiler + Runtime System (RTS)


Haskell
source (.hs)

GHC (compile)

object (.o)

RuntimeSystem
(libHsRts.o)

libraries
(GHC.Base, ...)

GHC (link)
Executable
binary

including the RTS

References : [1], [C1], [C3], [C10], [C19], [S7]

Compile steps

GHC transitions between five representations


Haskell language

GHC
compile
steps

Core language

STG language

Cmm language

Assembly language
(native or llvm)

each intermediate code can


be dumped by :
% ghc -ddump-parsed
% ghc -ddump-rn

% ghc -ddump-ds
% ghc -ddump-simpl
% ghc -ddump-prep

% ghc -ddump-stg

% ghc -ddump-cmm
% ghc -ddump-opt-cmm

% ghc -ddump-llvm
% ghc -ddump-asm

References : [1], [C3], [C4], [9], [C5], [C6], [C7], [C8[], [S7], [S8]

Runtime System

Generated binary includes the RTS

Haskell user code


executable
binary
Runtime System

software

hardware

Library

OS
(Linux, FreeBSD, Win, ...)

Physical Processor
(x86, ARM, ...)

References : [C10], [9]

Runtime System includes ...


Runtime System

Storage Manager

Byte-code interpreter

Software
Transactional Memory

User space
Scheduler

Profiling

...

References : [C10], [8], [9], [5], [17], [S13]

Development languages

The GHC is developed by some languages


compiler

runtime system

( $(TOP)/compiler/*)

( $(TOP)/rts/*)

Haskell
+
Alex (lex)
Happy (yacc)
Cmm (C--)
Assembly

C
+
Cmm
Assembly

library
( $(TOP)/libraries/*)

Haskell
+
C
References : [C2]

Machine layer/models

Machine layer

STG-machine
(Abstract machine)

HEC - Haskell Execution Context


(Capability, Virtual processor)

Physical Processor
(x86, ARM, ...)

Each Haskell code is executed in STG semantics.


References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11]

Machine layer
STG Registers

STG-machine
(Abstract machine)

Stack

Heap

Static region

Heap

Static region

BaseReg
R1
Hp
Sp
:

Register table

HEC - Haskell Execution Context


(Capability,
Virtual processor)

BaseReg
R1
Hp
:

Registers

Physical Processor
(x86, ARM, ...)

Memory

ebx
esi
edi
ebp
:

References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11]

Runtime system and HEC

user code

Haskell user code


by STG-semantics
STG-machine

runtime system

HECs

user space

Runtime System
OS Threads
(native threads, kernel threads)

OS Process

OS

OS
(Linux, FreeBSD, Win, ...)

supervisor
space
(kernel space)

hardware

Physical Processor
(x86, ARM, ...)
References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11]

many HECs
Multi HECs can be generated by compile and runtime options :
$ ghc -rtsopts -threaded
$ ./xxx +RTS -N4

HEC
(Capability,
Virtual processor)

Task
(Worker Thread)

HEC

HEC

HEC

HEC

Tasks
(abstract OS Thread)

OS Threads
OS Process

software

hardware

OS
(Linux, FreeBSD, Win, ...)

Physical Processor
(x86, ARM, ...)
References : [1], [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]

HEC (Capability) data structure


[rts/Capability.h] (ghc 7.8)
struct Capability_ {
StgFunTable f;
StgRegTable r;
register table
nat no;
Task *running_task;
rtsBool in_haskell;
nat idle;
run queue
rtsBool disabled;
StgTSO *run_queue_hd;
StgTSO *run_queue_tl;
InCall *suspended_ccalls;
bdescr **mut_lists;
bdescr **saved_mut_lists;
bdescr *pinned_object_block;
bdescr *pinned_object_blocks;
int context_switch;
int interrupt;

#if defined(THREADED_RTS)
Task *spare_workers;
nat n_spare_workers;
Mutex lock;
Task *returning_tasks_hd;
Task *returning_tasks_tl;
Message *inbox;
SparkPool *sparks;
SparkCounters spark_stats;
#endif

W_ total_allocated;
StgTVarWatchQueue *free_tvar_watch_queues;
StgInvariantCheckQueue *free_invariant_check_queues;
StgTRecChunk *free_trec_chunks;
StgTRecHeader *free_trec_headers;
nat transaction_tokens;

Each HEC (Capability) has a register table and a run queue and ...
Each HEC (Capability) is initialized at initCapabilities [rts/Capability.c]
References : [S15], [S16], [C11], [C17]

STG-machine

The STG-machine consists of three parts


STG Registers

BaseReg
R1
Hp

HpLim

Sp

SpLim

Stack
Sp

grows downwards

SpLim

Heap

HpLim

Hp

grows upwards

References : [2], [C15], [C11], [C12]

STG-machine is mapped to physical processor


logical view

STG Registers

physical view

physical register
(x86 example)

Register table

BaseReg
R1
Hp
Sp

ebx
esi
edi
ebp

ebx
esi
edi

...

References : [C15], [S1], [S2]

STG-machine is mapped to physical processor


logical view

physical view

Heap memory
Stack

TSO
Thread State Object

Heap

Stack

A stack and a TSO object are in the heap.


The stack is stored separately from the TSO for size extension and GC.
References : [C11], [C12], [S16], [S5]

TSO data structure


[includes/rts/storage/TSO.h] (ghc 7.8)
typedef struct StgTSO_ {
StgHeader
header;
struct StgTSO_*
_link;
struct StgTSO_*
global_link;
struct StgStack_
*stackobj;
link to stack object
StgWord16
what_next;
StgWord16
why_blocked;
StgWord32
flags;
StgTSOBlockInfo
block_info;
StgThreadID
id;
StgWord32
saved_errno;
StgWord32
dirty;
struct InCall_*
bound;
struct Capability_* cap;
struct StgTRecHeader_ * trec;
struct MessageThrowTo_ * blocked_exceptions;
struct StgBlockingQueue_ *bq;
StgWord32 tot_stack_size;
} *StgTSOPtr;

A TSO object is only ~17words + stack. Lightweight!

References : [S5]

Heap objects in STG-machine

Every heap object is represented uniformly

header

payload

info ptr

info table

meta data

entry code

actual machine code

Closure (header + payload) + Info Table + Entry Code

References : [C11], [S3], [S4], [S6], [2]

Heap object (closure)


logical view
header

payload

info ptr

physical view
heap memory

payload1
payload0
info ptr
info table
static memory
entry code
info table
entry code

References : [C11], [S3], [C9], [C8], [2]

Closure examples : Char, Int


'a' :: Char
header
C#
info
ptr

7 :: Int
payload

header

'a'

I#

GHC.Types.C#_static_info
layout : 0_1
type : CONSTR
bitmap :

inc %esi
jmp *0x0(%ebp)

info
ptr

payload
#7

GHC.Types.I#_static_info

info table

layout : 0_1
type : CONSTR
bitmap :

info table

entry code

inc %esi
jmp *0x0(%ebp)

entry code

References : [C11], [S3], [C9], [C8], [2], [S20]

Closure example (code)


[Example.hs]
module Example where
value1 :: Int
value1 = 7

[ghc -O -ddump-stg Example.hs]


STG

Example.value1 :: GHC.Types.Int
[GblId, Caf=NoCafRefs, Str=DmdType m, Unf=OtherCon []] =
NO_CCS GHC.Types.I#! [8];

Cmm

[ghc -O -ddump-opt-cmm Example.hs]


section "data" { __stginit_main:Example:
}
section "data" {
Example.value1_closure:
const GHC.Types.I#_static_info;
const 7;
}
section "relreadonly" { SMc_srt:
}

[ghc -O -ddump-asm

Example.hs]

.data

asm

.align 4
.align 1
.globl __stginit_main:Example
__stginit_main:Example:
.data
.align 4
.align 1
.globl Example.value1_closure
Example.value1_closure:
.long GHC.Types.I#_static_info
.long 7
.section .data
header
.align 4
I#
.align 1
SMd_srt:

payload
#7

References : [C11], [S3], [C9], [C8], [2], [S20]

Closure examples : Maybe


Just 7 :: Maybe Int
header

payload

header

Just

info
ptr

I#

Data.Maybe.Just_static_info
layout :
type : CONSTR
bitmap :

add
jmp

info table

$0x2,%esi entry code


*0x0(%ebp)

info
ptr

payload
#7

GHC.Types.I#_static_info
layout :
type : CONSTR
bitmap :

info table

inc %esi
entry code
jmp *0x0(%ebp)

References : [C11], [S3], [C9], [C8], [2], [S20]

Closure examples : List


[ 1, 2 ] :: [Int]
header
Cons
info
ptr

GHC.Types.[]_closure

payload

I#

Cons

GHC.Types.:_static_info
layout :
type : CONSTR
bitmap :

add
jmp

$0x2,%esi
*0x0(%ebp)

Nil

I#

GHC.Types.[]_static_info
layout :
type : CONSTR
bitmap :

inc %esi
jmp *0x0(%ebp)

References : [C11], [S3], [C9], [C8], [2], [S20]

Closure examples : Thunk


"thunk"
x + 1 :: Int
(free variable : x = 7)
header
x+1

header

payload
reserved

info
ptr

I#

payload
#7

info
ptr

info table
type : THUNK

info table
type : CONSTR

entry code for


x -> x + 1

entry code for


I#

References : [C11], [S3], [C9], [C8], [2], [S20]

STG-machine evaluation

STG evaluation flow

stack

Sp

continuation

stack

current
expression
R1

(1)
push a continuation code
(next code) to the stack
top

(2)
enter to R1 closure

jump
a value

Sp

continuation

R1

(3)
set a result to R1

(4)
jump (return)
to the stack top code

(5)
repeat from (1)

References : [C8], [3]

Enter to a closure
unevaluated closure
R1

(1) read R1
to get a closure address
(2) read header(info ptr)
to get a Entry code address
(3) jump to the Entry code address

(4) execute the Entry code


(5) set a result to R1

header

payload

info
ptr

layout
closure type
...
if ((Sp + -12) < SpLim)
goto c3h9;
else goto c3ha;
:

Info table

Entry code

(6) jump to the stack top address


(continuation)
References : [C11], [C9], [C8], [10], [3], [2], [12]

Pointer tagging

Pointer tagging
header
(info ptr)

pointer

payload

R1 or ...

pointer

00
01

10
11

... an unevaluated closure


... an evaluated closure;
1st constructor value or evaluated.
(for instance: "Nothing" )
... an evaluated closure; 2nd constructor value.
(for instance: "Just xx")
... an evaluated closure; 3rd constructor value.
* 32bit machine case

fast judgment!
check only pointer's lower bits without evaluating the closure.
References : [4], [2], [C16]

Thunk and update

Thunk and update


"thunk"

x + 1 :: Int (free variable : x = 7)


payload

header
Thunk
info
ptr

type : THUNK

(empty)

I#

#7

evaluate ( x + 1 )

x+1

Thunk
info
ptr

(empty)

type : THUNK

x+1

I#

#7

I#

#8

update
Ind (indirect)

info
ptr

type : THUNK
x+1

lock free
I#

#7

I#

#8

type : IND

GC (eliminate Indirect)

References : [3], [2], [C8]

Allocate and free heap objects

Allocate heap objects


heap memory (nursery)

HpLim

Hp

allocate
(without malloc)

HpLim

Hp

can't allocate
because full

HpLim

Hp

if (Hp > HpLim ) goto ...


call stg_gc_...

References : [C11], [C13], [8], [9], [5], [15], [12], [13], [19], [S25]

free and collect heap objects

HpLim

copying
collection
(minor GC)

from space

if (Hp > HpLim ) goto ...


call stg_gc_...

to space

References : [C11], [C13], [8], [9], [5], [15], [12], [13], [19], [S25]

STG - C land interface

STG (Haskell) land - C land interface

User code
R1
STG land
(Haskell land)
C land

RtsAPI
StgRun

StgReturn

result
function f
BaseReg

Runtime System
(Scheduler)

References : [S18], [S17], [S19], [S21]

Thread

Thread layer (single core)


Haskell Threads
Haskell
Threads

HEC
(Capability,
Virtual processor)

exclusive execution

...

HEC

user space

OS Thread
OS Process
OS

software

(Linux, FreeBSD, Win, ...)

hardware

Physical Processor

supervisor
space

(x86, ARM, ...)

References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]

Thread layer (multi core)


Haskell Threads

Haskell Threads

Haskell
Threads

HEC
(Capability,
Virtual processor)

...

...

HEC

HEC

user space
OS Thread

OS Thread
OS Process

software
hardware

supervisor
space

OS
(Linux, FreeBSD, Win, ...)

Physical Processor

Physical Processor

(x86, ARM, ...)

(x86, ARM, ...)

*Threaded option case (ghc -threaded)


References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]

Thread context switch

Threads and context switch


logical view

Thread
states

Thread #0

Thread #1

Registers

Registers

Registers

Stack

Stack

Stack

load
state
STG-machine

Registers
Stack

Thread #2

...

interleaved
exclusive execution

save
state

execution and
pre-empted via the context switch

heap

References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]

Threads and TSOs


logical view

Thread
states

Thread #0

Thread #1

Registers

Registers

Stack
load
state

STG-machine

Stack
save
state

physical view
Thread #2

heap memory

Registers

Stack

...

TSO #0
(Thread State
Object)

TSO #1
(Thread State
Object)

Registers
Stack

TSO #2
(Thread State
Object)

heap

References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]

Scheduling by run queue


STG-machine

Registers
Stack

STG land
(Haskell land)
StgRun

C land

StgReturn

Scheduler

run queue

popRunQueue

appendToRunQueue

round robin

...
heap

TSO

TSO

...

References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]

Context switch flow


Haskell thread

heap
heap
HpLim
-> 0

(2) HpLim -> 0

RTS
Interrupt

(8) context switch

(3) heap size check


if (Hp > HpLim)

TSO

(4) call GC

TSO

GC

Scheduler

(5) HpLim check


HpLim = 0 ?

(7) scheduling

(6) goto schedule


(1) interrupt
platform OS timer

context switch at safe points


References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]

Context switch flow (code)


stg_gc_noregs
if (HpLim == 0) {
jump stg_returnToSched [R1];

STG land
(Haskell land)

C land

cap->r.rHpLim = NULL;
stopCapability

stg_returnToSched
W_ r1;
r1 = R1; // foreign calls may clobber R1
SAVE_THREAD_STATE();
foreign "C" threadPaused(MyCapability()
"ptr", CurrentTSO);
R1 = r1;
jump StgReturn [R1];

schedule

contextSwitchCapability
contextSwitchAllCapabilities
handle_tick

next
handle_tick ..

CreateTimerQueue

initTicker
initTimer

startTimer

hs_init_ghc
real_main

OS
*Windows case
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]

Creating main and sub threads

Create a main thread


Runtime
System

Runtime system bootstrap code [rts/RtsAPI.c]


rts_evalLazyIO
createIOThread
createThread ... (1), (2), (3)
pushClosure ... (4)
scheduleWaitThread
appendToRunQueue ... (5)

scheduler
run queue

(5)
(1)
TSO

heap memory

(2)
stack

(3)

*stackobj

closure

(4)

static memory

ZCMain_main_closure

header
payload

info table
entry code

References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]

Create a sub thread using forkIO


Haskell Threads
forkIO
stg_forkzh
ccall createIOThread ... (1), (2), (3), (4)
ccall scheduleThread ... (5)

User code
Runtime System

append
(5)

scheduler
run queue
(1)
TSO
heap memory

forked closure

(2)

*stackobj

(3)

stack

(4)

header
payload

closure

info table
static memory

entry code
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]

Thread migration

Threads are migrated to idle HECs

Haskell
Threads

Idle HEC
run queue

Idle HEC

run queue

run queue

Idle HEC
run queue

empty

empty

empty

HEC

HEC

HEC

HEC

OS Thread

OS Thread

OS Thread

OS Thread

Physical Processor

Physical Processor

Work pushing

OS Process
hardware

Physical Processor

Physical Processor

References : [5], [8], [9], [14], [C17], [C18], [S17], [S16], [S23], [S24]

Heap and Threads

Threads shared a heap


Haskell Threads

Registers

Registers

Stack

Stack

...

Registers

Registers

Stack

Stack

...

heap memory (shared)

static memory (shared)

hardware

HEC

HEC

Physical Processor

Physical Processor

References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S17], [S16], [S25]

Local allocation area (nursery)


Haskell Threads

heap memory
(shared)

Registers

Registers

Stack

Stack

Registers

Stack

Stack

nursery

...

nursery
generation N

static memory
(shared)

hardware

...

Registers

static memory
HEC

HEC

Physical Processor

Physical Processor

fast access using nursery for each processors


References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S17], [S16], [S25]

Threads and GC

GC, nursery, generation, aging, promotion


Haskell
Thread

STG land
(Haskell land)
Runtime System

allocate

generation 0, step 0

Hp

nursery
HEC

nursery
aging

HEC

nursery
HEC

nursery
HEC

generation 0, step 1
promotion

generation 1

heap memory
References : [8], [9], [15], [C13], [C11], [S25]

Threads and minor GC


sequential GC for young generation (minor GC)
stop-the-world GC
Haskell
Threads

GC thread

GC thread

GC thread

GC thread

nursery

nursery

nursery

nursery

generation 0, step 1

heap

generation 1

HEC

HEC

HEC

HEC

Physical Processor

Physical Processor

Physical Processor

Physical Processor

References : [8], [9], [15], [C13], [C11], [S25]

Threads and major GC


parallel GC for oldest generation (major GC)
stop-the-world GC
Haskell
Threads

GC thread

GC thread

GC thread

GC thread

nursery

nursery

nursery

nursery

generation 0, step 1

heap

generation 1

HEC

HEC

HEC

HEC

Physical Processor

Physical Processor

Physical Processor

Physical Processor

References : [8], [9], [15], [C13], [C11], [S25]

GC discover live objects from the root


Runtime System
scheduler
run queue
root
heap memory
stack

closure

TSO
TSO

closure

stack
TSO

closure

reachable from root

GC aging or promote

unreachable
(garbage)

free
References : [8], [9], [15], [C13], [C11], [S25]

Bound thread

A bound thread has a fixed associated OS Thread


forkOS
Bound
Thread

Haskell
Threads

HEC
(Capability,
Virtual processor)

fixed association
for safe foreign calls

HEC

OS Thread

OS Thread
OS Process

Foreign calls from a bound thread are all made by the same OS thread.
A bound thread is created using forkOS.

The main thread is bound thread.

References : [6], [5], [8], [9], [14], [C17], [19], [S17], [S16], [S23], [S22]

forkIO, forkOn, forkOS

forkIO

forkOS

forkOn

Bound
Threads

Haskell
Threads
affinity

bound

HEC

HEC

HEC

OS Thread

OS Thread

OS Thread

create a haskell unbound


thread

create a haskell unbound


thread
on the specified HEC

OS Thread

create a haskell bound thread


and an OS thread

References : [6], [5], [8], [9], [14], [C17], [19], [S17], [S16], [S23], [S22]

Spark

Spark layer
serial execution on each Spark Threads

Sparks
Haskell
Thread
HEC
(Capability,
Virtual processor)

...
Spark Thread
HEC

user space

OS Thread
OS Process
OS

software

(Linux, FreeBSD, Win, ...)

hardware

Physical Processor

supervisor
space

(x86, ARM, ...)

Spark Threads are generated on idle HECs.


References : [C17], [19], [S17], [S26], [S27], [S33], [S12]

Sparks and Spark pool


logical view
Spark

Spark

Spark

Spark

Spark Thread

Spark Thread

Spark Thread

Spark Thread

HEC

HEC

HEC

HEC

Physical Processor

Physical Processor

Physical Processor

Physical Processor

Spark pool

rpar
Spark
(Thunk)
References : [C17], [19], [S17], [S26], [S27], [S33], [S12]

Spark pool and work stealing


physical view
Spark

Spark

Spark Thread

Spark

Spark Thread

Spark

Spark Thread

Spark Thread

Spark pool

Spark pool

Work stealing

HEC

Spark pool

Spark pool

References : [C17], [19], [S17], [S26], [S27], [S33], [S12]

Sparks and closures


Spark Thread
runSparks

STG land
(Haskell land)

getSpark

C land
Spark pool
(WSDeque)

HEC

rpar
push

...
heap

closure
(thunk)

closure
(thunk)

...

(not TSO objects, but closures. therefore very lightweight)


References : [C17], [19], [S17], [S26], [S27], [S33], [S12]

MVar

MVar

Haskell Thread #0

Haskell Thread #1

putMVar

takeMVar
empty?
or
full?
MVar

References : [16], [18], [19], [S31], [S12]

MVar and blocking


Haskell Thread

Haskell Thread

putMVar
BLOCKED
if full

takeMVar

full

empty

MVar

MVar

BLOCKED
if empty

References : [16], [18], [19], [S31], [S12]

MVar example
Haskell Thread #0

Haskell Thread #1

(3)
takeMVar

(2)
(1)

putMVar

MVar

timer context switch

time
Thread #0

Running

Blocked

Non Blocked

takeMVar
(1)
MVar

Running

(3) wakeup and takeMVar (atomic)

empty

full
putMVar

empty

(2)

Running

Thread #1
* single core case

References : [16], [18], [19], [S31], [S12]

MVar object view


User view

logical MVar object

heap

MVar
empty?
or
full?

physical MVar object

StgReturn
chain
of
StgMVarTSOQueue
head
tail

head
tail
value
closure

value
StgMVarTSOQueue

StgMVarTSOQueue

TSO

TSO

References : [16], [18], [19], [S31], [S12]

newEmptyMVar
Haskell Threads
newEmptyMVar
newMVar#
(1) call the Runtime premitive
Runtime System

stg_newMVarzh
ALLOC_PRIM_
SET_HDR
StgMVar_head
StgMVar_tail
StgMVar_value

(2) create a MVar object in the heap


MVar object

heap

stg_END_TSO_QUEUE_closure

head
tail
value
(3) link each fields

References : [16], [18], [19], [S31], [S12]

takeMVar (empty case)


Haskell Threads
takeMVar
takeMVar#

Runtime System

stg_takeMVarzh
create StgMVarTSOQueue (1)
append
(2)
StgReturn (3)

(3) return to the scheduler

(1) create
chain of
StgMVarTSOQueue

head

StgMVarTSOQueue

(2) append

tail
value
MVar object
References : [16], [18], [19], [S31], [S12]

takeMVar (full case)


Haskell Threads
takeMVar
takeMVar#

Runtime System

stg_takeMVarzh
(1) get value
(2) set empty
(3) remove head
(4) tryWakeupThread

(3) remove head


head

scheduler
run queue

fairness round robin


append

tail
value

(1) get value


(2) set empty

(4) wakeup the blocked thread

MVar object
References : [16], [18], [19], [S31], [S12]

Software transactional memory

Create a atomic block using atomically


atomically :: STM a -> IO a
atomically
STM a
writeTVar
writeTVar
readTVar
readTVar

TVar

readTVar

TVar

transactional
variable

Create and evaluate a composable atomic block


Atomic block = All or Nothing
References : [17], [19], [20], [C18], [S12], [S28]

Rollback and blocking control using retry


retry :: STM a

STM a
writeTVar
retry :: STM a

readTVar

TVar

TVar

Discard, blocking and try again

References : [17], [19], [20], [C18], [S12], [S28]

Compose OR case using orElse


orElse :: STM a -> STM a -> STM a

STM a

STM a
A

orElse ::
STM a

result

STM a -> STM a -> STM a

if retry A

or
B

if not retry A

A or B or Nothing
References : [17], [19], [20], [C18], [S12], [S28]

STM, TVar example (normal case)


STM a
writeTVar

TVar

TVar

atomic block (critical section)

time
Thread #0

writeTVar

writeTVar writeTVar
commit

TVar #A

old value

new value

TVar #B

old value

new value
atomic update

References : [17], [19], [20], [C18], [S12], [S28]

STM, TVar example (conflict case)

atomic block

rollback and
try again

atomic block

time
Thread #0
commit
old value

TVar #A
TVar #B

commit

old value

new value
other value

new value

commit
other thread

different

References : [17], [19], [20], [C18], [S12], [S28]

retry example

atomic block

retry

atomic block

time
blocked

Thread #0

wake up
TVar #A
TVar #B

commit

old value

new value

old value

changed value

new value

commit
other thread

References : [17], [19], [20], [C18], [S12], [S28]

STM, TVar data structure


retry
atomically

newTVar
StgTVar

StgTVar

writeTVar
readTVar
StgTVarWatchQueue

CurrentTSO

StgTVarWatchQueue

TRecEntry

TRecEntry

StgTRecHeader

StgTRecChunk
atomically

StgInvariantCheckQueue

References : [17], [19], [20], [C18], [S12], [S28]

newTVar, writeTVar, readTVar


newTVar

writeTVar

readTVar
or

Runtime System
TRecEntry
(transaction
Record)

TRecEntry

TRecEntry

invariant check
when commit
TVar
StgTVar

heap

References : [17], [19], [20], [C18], [S12], [S28]

block by retry, wake up by commit

retry :: STM a
Runtime System

commit by other thread

TSO
StgTVar

(4) commit

(1) retry

(3)yield
scheduler
run queue

WatchQueue

TVar

(4) commit

(2) append

(5)append

wakeup

References : [17], [19], [20], [C18], [S12], [S28]

FFI

FFI (Foreign Function Interface)


Haskell thread

ccall

STG land
(Haskell land)

foreign
out-call

StgReturn

Stg interface
C land

FFI

foreign
in-call

StgRun

Scheduler

FFI management

user space

Runtime system

Foreign
C code
OS API (system call)

OS

supervisor
space

References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

FFI and OS Threads


Haskell
Threads

(4)

Haskell
Threads

non-blocked

(2)
(1)

HEC

HEC

(5)

Foreign
C code
OS Thread
Physical Processor

(1) a safe foreign call (FFI)

HEC

Foreign
C code

(3)

OS Thread

OS Thread

Physical Processor

(2) move the HEC to other OS thread


(3) spawn or draw an OS thread

(4) move Haskell threads


(5) call the foreign C code
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

A safe foreign call (code)


Haskell Threads
ccall suspendThread
ccall FOREIGN_C_CODE
ccall resumeThread

releaseCapability_
giveCapabilityToTask
startWorkerTask
createOSThread

(3)

non-blocked
Haskell
Threads

(3)

waitForReturnCapability

(1)
(2)

(4)

(1)
HEC

HEC

(4)

Foreign
C code

(2)

OS Thread

OS Thread

Physical Processor

References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

a safe and an unsafe foreign call


a safe foreign call

an unsafe foreign call


non-blocked

Haskell
Threads

blocking
safe
foreign call

blocked
Haskell
Threads

HEC

HEC

Foreign
C code

blocking
unsafe
foreign call

HEC
Foreign
C code

OS Thread

OS Thread

Physical Processor

OS Thread
Physical Processor

faster,
but blocking to the other Haskell threads

References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

Safe/unsafe foreign call and bound/unbound thread


a safe foreign call
non-blocked

unbound
thread

an unsafe foreign call


non-blocked

non-blocked

ccall

an unbound
thread

HEC #0

Processor #0

non-blocked

blocked

ccall
HEC #1

HEC #1

HEC #0

Foreign
C code
OS Thread

unbound
thread

OS Thread

Foreign
C code
OS Thread

Processor #1

non-blocked

HEC #1

bound
thread

OS Thread

OS Thread

Processor #0

Processor #1

non-blocked

blocked

ccall

a bound
thread

HEC #0

HEC #1

HEC #1

ccall
HEC #0

HEC #1

Foreign
C code
OS Thread
Processor #0

OS Thread

OS Thread

Processor #1

bound
thread

HEC #1
Foreign
C code

OS Thread
Processor #0

OS Thread

OS Thread

Processor #1

References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

IO and FFI

IO

Haskell Thread

getLine (IO)

IO String

References : [6], [11]

IO example: getLine
getLine

IO String

standard IO lib
STG land
(Haskell land)

c_read

FFI (Foreign Function Interface)

C land

user
space

C IO lib (libio.a)
read system call
OS API (system call)
supervisor
space

OS
Device driver
Hardware
UART, serial, ...

References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

IO example: getLine (code)


User code

getLine
hGetLine
hGetLineBuffered
hGetLineBufferedLoop
maybeFillReadBuffer
getSomeCharacters
readTextDevice

Library

Buffered.fillReadBuffer
readBuf
readBuf
RawIO.read

STG land
(Haskell land)

fdRead
readRawBufferPtr
c_read

switch safe/unsafe,
non-threaded/ioManager

FFI (Foreign Function Interface)

C land

read

Runtime
System

OS API (system call)

OS
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]

IO manager

IO manager (single core)

Haskell Threads
IO manager
dispatcher thread
HEC

HEC

OS Thread

OS Thread

affinity

user space

OS Process
software

hardware

OS
(Linux, FreeBSD, Win, ...)

supervisor
space

Physical Processor
(x86, ARM, ...)

*Threaded option case (ghc -threaded)


References : [7], [5], [8]

IO manager (multi core)

Haskell
Threads

Haskell
Threads
IO manager
dispatcher
thread

IO manager
dispatcher
thread

HEC

HEC

HEC

HEC

OS Thread

OS Thread

OS Thread

OS Thread

user space

OS Process
OS

software

hardware

supervisor
space

(Linux, FreeBSD, Win, ...)

Physical Processor

Physical Processor

(x86, ARM, ...)

(x86, ARM, ...)

*Threaded option case (ghc -threaded)


References : [7], [5], [8]

IO manager
Haskell Threads

IO manager
event table

blocking IO

registerFD

event
loop
request and
set callback (MVar)
IO access
(epoll, kqueue)

takeMVar
(wait and wake up)

putMVar

HEC

HEC

OS Thread

OS Thread
OS Process

*Threaded option case (ghc -threaded)

system call
References : [7], [5], [8], [S29], [S30], [S32], [S37], [S35], [S3]

Bootstrap

Bootstrap sequence

ZCMain_main_closure

User
code

stg_ap_v_info
stg_enter_info
loadThreadState
emitLoadThreadState

STG land
(Haskell land)

stg_returnToStackTop
StgRun
schedule

C land

scheduleWaitThread

main

createIOThread
createThread, pushClosure
rts_evalLazyIO
hs_init_ghc
initScheduler, initStorage, initTimer, ioManagerStart, ...
real_main
hs_main

Runtime
System
(lib/libHSrts.a)

mainCRTStartup
(*Windows case)

OS
References : [S7], [S13], [S14], [S17], [S18], [S19], [S9], [S10], [S21], [S41]

Exit sequence
User
code

STG land
(Haskell land)

stg_stop_thread_info
StgReturn

C land

Runtime
System
(lib/libHSrts.a)

schedule
shutdownHaskellAndExit
stg_exit
exit

OS
References : [S19], [S18], [S17]

Initializing
STG land
(Haskell land)

startIOManagerThread
startIOManagerThreads
ensureIOManagerIsRunning

create IO Manager Threads (+RTS -N)

ioManagerStart

C land
initTimer, startTimer

create Timer

initGcThreads
allocNurseries
storageAddCapabilities
initGeneration
initStorage

initializing GC Threads (+RTS -N)


create nurseries (+RTS -N)

createOSThread
newTask
startWorkerTask
startWorkerTasks
initScheduler

create OS Threads (+RTS -N)


create Tasks (+RTS -N)

initCapabilities

create capabilities (+RTS -N)

create GC generations (+RTS -GN)

Runtime
System

hs_init_ghc
real_main
hs_main
main

OS

References : [1], [S7], [S13], [S14], [S17], [S15], [S16], [S24], [S21], [S34]

Create each layers


User code
Sparks
User
code

Runtime
system

Haskell threads

OS Threads

software

hardware

newSpark

forkIO
forkOn
forkOS

stg_forkzh, createThread
stg_forkOnzh, createThread
forkOS_createThread

OS Process

OS API

initCapaility

(Capability, Virtual processor)

(Worker thread) abstract OS Thread

OS

rpar

HECs
Tasks

Runtime system

forkOS

newTask

forkOS

createOSThread

pthread_create
fork

OS
(Linux, FreeBSD, Win, ...)

Physical Processor
(x86, ARM, ...)

References : [1], [5], [8], [9], [C11], [C17], [S12], [S26], [S22], [S15], [S23]

References

References
[1]

The Glorious Glasgow Haskell Compilation System User's Guide


https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/index.html

[2]

Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine Version 2.5
http://research.microsoft.com/en-us/um/people/simonpj/Papers/spineless-tagless-gmachine.ps.gz

[3]

Making a Fast Curry Push/Enter vs Eval/Apply for Higher-order Languages


http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/

[4]

Faster Laziness Using Dynamic Pointer Tagging


http://research.microsoft.com/en-us/um/people/simonpj/papers/ptr-tag/ptr-tagging.pdf

[5]

Runtime Support for Multicore Haskell


http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf

[6]

Extending the Haskell Foreign Function Interface with Concurrency


http://community.haskell.org/~simonmar/papers/conc-ffi.pdf

[7]

Mio: A High-Performance Multicore IO Manager for GHC


http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf

[8]

The GHC Runtime System


web.mit.edu/~ezyang/Public/jfp-ghc-rts.pdf

[9]

The GHC Runtime System


http://www.scs.stanford.edu/14sp-cs240h/slides/ghc-rts.pdf

[10]

Evaluation on the Haskell Heap


http://blog.ezyang.com/2011/04/evaluation-on-the-haskell-heap/

References
[11]

IO evaluates the Haskell Heap


http://blog.ezyang.com/2011/04/io-evaluates-the-haskell-heap/

[12]

Understanding the Stack


http://www.well-typed.com/blog/94/

[13]

Understanding the RealWorld


http://www.well-typed.com/blog/95/

[14]

The GHC scheduler


http://blog.ezyang.com/2013/01/the-ghc-scheduler/

[15]

GHCs Garbage Collector


http://www.mm-net.org.uk/workshop190404/GHC's_Garbage_Collector.ppt

[16]

Concurrent Haskell
http://www.haskell.org/ghc/docs/papers/concurrent-haskell.ps.gz

[17]

Beautiful Concurrency
https://www.fpcomplete.com/school/advanced-haskell/beautiful-concurrency

[18]

Anatomy of an MVar operation


http://blog.ezyang.com/2013/05/anatomy-of-an-mvar-operation/

[19]

Parallel and Concurrent Programming in Haskell


http://community.haskell.org/~simonmar/pcph/

[20] Real World Haskell


http://book.realworldhaskell.org/

References
The GHC Commentary
[C1]
[C2]
[C3]
[C4]
[C5]
[C6]
[C7]
[C8]
[C9]
[C10]
[C11]
[C12]
[C13]
[C14]
[C15]
[C16]
[C17]
[C18]
[C19]

https://ghc.haskell.org/trac/ghc/wiki/Commentary
https://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HscMain
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CoreSynType
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StgSynType
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/SymbolNames
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Stack
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/GC
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/Registers
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/STM
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Libraries

References
Source code
[S1] includes/stg/Regs.h
[S2] includes/stg/MachRegs.h
[S3] includes/rts/storage/ClosureTypes.h
[S4] includes/rts/storage/Closures.h
[S5] includes/rts/storage/TSO.h
[S6] includes/rts/storage/InfoTables.h
[S7] compiler/main/DriverPipeline.hs
[S8] compiler/main/HscMain.hs
[S9] compiler/cmm/CmmParse.y.source
[S10] compiler/codeGen/StgCmmForeign.hs
[S11] compiler/codeGen/Stg*.hs
[S12] rts/PrimOps.cmm
[S13] rts/RtsMain.c
[S14] rts/RtsAPI.c
[S15] rts/Capability.h
[S16] rts/Capability.c
[S17] rts/Schedule.c
[S18] rts/StgCRun.c
[S19] rts/StgStartup.cmm
[S20] rts/StgMiscClosures.cmm
[S21] rts/HeapStackCheck.cmm
[S22] rts/Threads.c
[S23] rts/Task.c
[S24] rts/Timer.c
[S25] rts/sm/GC.c

[S26] rts/Sparks.c
[S27] rts/WSDeque.c
[S28] rts/STM.h
[S29] rts/posix/Signals.c
[S30] rts/win32/ThrIOManager.c
[S31] libraries/base/GHC/MVar.hs
[S32] libraries/base/GHC/Conc/IO.hs
[S33] libraries/base/GHC/Conc/Sync.lhs
[S34] libraries/base/GHC/Event/Manager.hs
[S35] libraries/base/GHC/Event/Thread.hs
[S36] libraries/base/GHC/IO/BufferedIO.hs
[S37] libraries/base/GHC/IO/FD.hs
[S38] libraries/base/GHC/IO/Handle/Text.hs
[S39] libraries/base/System/IO.hs
[S40] libraries/base/System/Posix/Internals.hs
[S41] AutoApply.o (utils/genapply/GenApply.hs)

Connect the algorithm and transistor

You might also like