MMU Talk 20121031

Java for High Performance Computing
High Performance Cloud Computing

Performance Evaluation
Conclusions
Java for High Performance Cloud Computing
Guillermo Lpez Taboada
Computer Architecture Group

University of A Corua (Spain)
taboada@udc.es
October 31st, 2012, Future Networks and Distributed SystemS (FUNDS),

Manchester Metropolitan University, Manchester (UK)
Guillermo Lpez Taboada Java for High Performance Cloud Computing

Conclusions
Outline
1 Java for High Performance Computing

Java Shared Memory Programming
Java Message-Passing
Java GPGPU
Development of Efficient HPC Benchmarks
2 High Performance Cloud Computing
AWS IaaS for HPC and Big Data
3 Performance Evaluation
Evaluation of Java HPC
Evaluation of HPC in the Cloud
Case study: ProtTest-HPC

Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks
Java is an Alternative for HPC in the Multi-core Era
Language popularity:
(% skilled developers)
#1 C (19.8%)
#2 Java (17.2%)
#3 Objective-C
(9.48%)
#4 C++ (9.3%)
#19 Matlab (0.59%)
#27 Fortran (0.35%)
C and Java are the

most popular
languages. Fortran,
popular language in
HPC, is the 27#
(0.42%)

Java is an Alternative for HPC in the Multi-core Era
Many productive parallel/distributed

Interesting features:
programming libs:
Built-in networking
Java shared memory programming (high
Built-in multi-threading level facilities: Concurrency framework)
Portable, platform Java Sockets
independent
Java RMI
Object Oriented
Message-Passing in Java (MPJ) libraries
Main training language
Apache Hadoop

Java Adoption in HPC
HPC developers and

users usually want to use
Java in their projects. Pros and Cons:
Java code is no longer high programming productivity.
slow (Just-In-Time
compilation)! but they are highly concerned
about performance.
But still performance
penalties in Java
communications:

HPC developers and

users usually want to use
Java in their projects. JIT Performance:
Java code is no longer Like native performance.

slow (Just-In-Time Java can even outperform native
compilation)! languages (C, Fortran) thanks to
But still performance the dynamic compilation!!
penalties in Java
communications:

HPC developers and

users usually want to use High Java Communications Overhead:
Java in their projects. Poor high-speed networks support.
Java code is no longer The data copies between the Java
slow (Just-In-Time heap and native code through JNI.
compilation)! Costly data serialization.
But still performance The use of communication protocols
penalties in Java unsuitable for HPC.
communications:

Emerging Interest in Java for HPC

Current State of Java for HPC

Java for HPC
Current options in Java for HPC:

Java Shared Memory Programming (popular)
Java RMI (poor performance)
Java Sockets (low level programming)
Message-Passing in Java (MPJ) (extended, easy and reasonably good
performance)
Apache Hadoop (for High Throughput Computing)

Java for HPC
Java Shared Memory Programming:

Java Threads
Concurrency Framework (ThreadPools, Tasks ...)
Parallel Java (PJ)
Java OpenMP (JOMP and JaMP)

Listing 1: Java Threads

class BasicThread extends Thread {
/ / T h i s method i s c a l l e d when t h e t h r e a d runs
public void run ( ) {
f o r ( i n t i =0; i <1000; i ++)
increaseCount ( ) ;
}
}
class Test {
i n t c o u n t e r = 0L ;
increaseCount ( ) {
c o u n t e r ++;
}
public s t a t i c void main ( S t r i n g argv [ ] ) {

/ / Create and s t a r t t h e t h r e a d
Thread t 1 = new BasicThread ( ) ;
Thread t 2 = new BasicThread ( ) ;
t1 . s t a r t ( ) ;
t2 . s t a r t ( ) ;
System . o u t . p r i n t l n ( " Counter= " + c o u n t e r ) ;
}
}

Listing 2: Java Concurrency Framework highlights

class Test {
/ / create the thread pool

ThreadPool t h r e a d P o o l = new ThreadPool ( numThreads ) ;
/ / run example t a s k s ( t a s k s implement Runnable )

f o r ( i n t i = 0 ; i < numTasks ; i ++) {
t h r e a d P o o l . runTask ( createTask ( i ) ) ;
}
/ / c l o s e t h e p o o l and w a i t f o r a l l t a s k s t o f i n i s h .
threadPool . j o i n ( ) ;
}
/ / Another i n t e r e s t i n g c l a s s e s from concurrency framework :

/ / CyclicBarrier
/ / ConcurrentHashMap
/ / PriorityBlockingQueue
/ / Executors . . .

JOMP is the Java OpenMP binding
Listing 3: JOMP example

i n t myid ;
/ / omp p a r a l l e l p r i v a t e ( myid )
{
myid = OMP. getThreadNum ( ) ;
System . o u t . p r i n t l n ( H e l l o from + myid ) ;
}
/ / omp p a r a l l e l f o r
f o r ( i =1; i <n ; i ++) {
b [ i ] = ( a [ i ] + a [ i 1]) 0 . 5 ;
}
}

MPJ is the Java binding of MPI
Listing 4: MPJ example

import mpi . ;
public class H e l l o {

MPI . I n i t ( args ) ;
i n t rank = MPI .COMM_WORLD. Rank ( ) ;
i n t msg_tag = 1 3 ;
i f ( rank == 0 ) {
i n t peer_process = 1 ;
S t r i n g [ ] msg = new S t r i n g [ 1 ] ;
msg [ 0 ] = new S t r i n g ( " H e l l o " ) ;
MPI .COMM_WORLD. Send ( msg , 0 , 1 , MPI . OBJECT, peer_process , msg_tag ) ;
} else i f ( rank == 1 ) {
i n t peer_process = 0 ;
S t r i n g [ ] msg = new S t r i n g [ 1 ] ;
MPI .COMM_WORLD. Recv ( msg , 0 , 1 , MPI . OBJECT, peer_process , msg_tag ) ;
System . o u t . p r i n t l n ( msg [ 0 ] ) ;
}
MPI . F i n a l i z e ( ) ;
}
}

Socket High-speed API
Pure Java Impl.

impl. network
support
mpiJava 1.2
Other APIs
InfiniBand
JGF MPJ
Java NIO
Myrinet
Java IO
SCI
MPJava X X X
Jcluster X X X
Parallel Java X X X
mpiJava X X X X
P2P-MPI X X X X
MPJ Express X X X X
MPJ/Ibis X X X X
JMPI X X X
F-MPJ X X X X X X

FastMPJ
MPJ implementation developed at the Computer Architecture

Group of University of A Corunna.
Features of FastMPJ include:
High performance intra-node communication
Efficient RDMA transfers over InfiniBand and RoCE
Fully portable, as Java
Scalability up to thousands of cores
Highly productive development and maintenance
Ideal for multicore servers, cluster and cloud computing

FastMPJ Supported Hardware

FastMPJ runs on InfiniBand through IB Verbs, can rely on
TCP/IP and has shared memory support through Java threads:
MPJ Applications
MPJ API (mpiJava 1.2)
FastMPJ Library
MPJ Collective Primitives
MPJ PointtoPoint Primitives
The xxdev layer

mxdev psmdev ibvdev niodev/iodev smdev
Java Native Interface (JNI) Java Sockets Java Threads
MX/OpenMX InfiniPath PSM IBV API TCP/IP

Myrinet/Ethernet InfiniBand Ethernet Shared Memory

Java GPGPU
Table: Available solutions for GPGPU computing in Java

Java bindings User-friendly
CUDA JCUDA Java-GPU
jCuda Rootbeer
OpenCL JOCL Aparapi
JogAmps JOCL

Java GPGPU
Table: Description of the GPU-based testbed

CPU 1 Intel(R) Xeon hexacore X5650 @ 2.67GHz
CPU performance 64.08 GFLOPS DP (10.68 GFLOPS DP per core)
GPU 1 NVIDIA Tesla Fermi M2050
GPU performance 515 GFLOPS DP
Memory 12 GB DDR3 (1333 MHz)
OS Debian GNU/Linux, kernel 3.2.0-3
CUDA version 4.2 SDK Toolkit
JDK version OpenJDK 1.6.0_24

Java GPGPU
Matrix Multiplication performance (Single precision) Matrix Multiplication performance (Double precision)
1000 500
CUDA CUDA
900 jCuda 450 jCuda
800 Aparapi 400 Aparapi
Java Java
700 350
600 300
GFLOPS
GFLOPS
500 250
400 200
300 150
200 100
100 50
0 0
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Problem size Problem size
Figure: Matrix multiplication kernel performance (SHOC GEMM)

Java GPGPU
Stencil2D performance (Single precision) Stencil2D performance (Double precision)

512 1024
Java Java
128 jCuda 256 jCuda
CUDA CUDA
Runtime (seconds)
Runtime (seconds)
64 128
32 64
16 32
8 16
4 8
2 4
1 2
0.5 1
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Figure: Stencil 2D kernel performance (SHOC Stencil2D)

Java GPGPU
Stencil2D performance (Single precision) Stencil2D performance (Double precision)

120 60
110 CUDA 55 CUDA
jCuda jCuda
90 Java 45 Java
80 40
GFLOPS
GFLOPS
70 35
60 30
50 25
40 20
30 15
20 10
10 5
0 0
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Figure: Stencil 2D kernel performance (SHOC Stencil)

Java GPGPU
FFT performance (Single precision) FFT performance (Double precision)

256 256
128 Java 128 Java
Aparapi Aparapi
64 jCuda 64 jCuda
32 CUDA 32 CUDA
16 16
8 8
4 4
2 2
1 1
0.5 0.5
0.25 0.25
0.125 0.125
0.0625 0.0625
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Figure: FFT kernel performance (SHOC FFT)

Java GPGPU
FFT performance (Single precision) FFT performance (Double precision)

400 300
CUDA CUDA
350 jCuda jCuda
Aparapi 250 Aparapi
300 Java Java
200
250
GFLOPS
GFLOPS
200 150
150
100
100
50
50
0 0
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Figure: FFT kernel performance (SHOC FFT)

NPB-MPJ Characteristics (10,000 SLOC (Source LOC))
Kernel
Applic.
Communicat.
Name Operation SLOC
intensiveness
CG Conjugate Gradient 1000 Medium X

EP Embarrassingly Parallel 350 Low X
FT Fourier Transformation 1700 High X
IS Integer Sort 700 High X
MG Multi-Grid 2000 High X
SP Scalar Pentadiagonal 4300 Medium X

Java HPC optimization: NPB-MPJ success case
NPB-MPJ Optimization:
JVM JIT compilation of heavy and frequent methods with
runtime information
Structured programming is the best option
Small frequent methods are better.
mapping elements from multidimensional to one-dimensional
arrays (array flattening technique:
arr3D[x][y][z]arr3D[pos3D(lenghtx,lengthy,x,y,z)])
NPB-MPJ code refactored, obtaining significant
improvements (up to 2800% performance increase)

Experimental Results on One Core (relative perf.)
NPB Kernels Serial Performance on Amazon EC2 (C Class)

2200
GNU compiler
2000 Intel compiler
1800 JVM
1600
1400
MOPS
1200
1000
800
600
400
200
0
C
FT
FT
IS
IS
M
G
G
(c
(c
(c
(c
(c
(c
(c
(c
c1
c2
c1
c2
c
c1
c2
1.
2.
.4
.8
.4
.8
.4
.8
4x
8x
xl
xl
xl
xl
xl
xl
ar
ar
ar
ar
l
l
ar
ar
ar
ar
ge
ge
ge
ge
ge
ge
ge
ge
)
)
)
)
)
)
Conclusions
IaaS: Infrastructure-as-a-Service
IaaS
The access to infrastructure supports the execution of
computationally intensive tasks in the cloud. There are many
IaaS providers:
Amazon Web Services (AWS)
Google Compute Engine (beta)
IBM cloud
HP cloud
Rackspace
ProfitBricks
Penguin-On-Demand (POD)
IMHO, AWS is the most suitable in terms of available resources,

high performance features
Guillermo and performance/cost
Lpez Taboada ratio.
Java for High Performance Cloud Computing
Conclusions
HPC and Big Data in AWS
AWS
Provides a set of instances for HPC
CC1: dual socket, quadcore Nehalem Xeon (93 GFlops)
CG1: dual socket, quadcore Nehalem Xeon con 2 GPUs
(1123 GFlops)
CC2: dual socket, octcore Sandy Bridge Xeon (333 GFlops)
Up to 63 GB RAM
10 Gigabit Ethernet high performance network

Conclusions
HPL Linpack in AWS (TOP500)
Table: Configuration of our EC2 AWS cluster
CPU 2 Xeon quadcore X5570@2.93GHz (46.88 GFlops DP each)

EC2 Compute Units 33.5
GPU 2 NVIDIA Tesla Fermi M2050 (515 GFlops DP each)
Memory 22 GB DDR3
Storage 1690 GB
Virtualization Xen HVM 64-bit platform (PV drivers for I/O)
Number of CGI nodes 32
Interconnection network 10 Gigabit Ethernet
Total EC2 Compute Units 1072
Total CPU Cores 256 (3 TFLOPS DP)
Total GPUs 64 (32.96 TFLOPS DP)
Total FLOPS 35.96 TFLOPS DP

Conclusions

HPL Linpack ranks the performance of supercomputers in TOP500 list
In AWS 14 TFlops are available for everybody! (67 USD$/h on demand, 21

USD$/h spot)
HPL Performance on Amazon EC2 HPL Scalability on Amazon EC2

16000 1400
CPU CPU
14000 CPU+GPU (CUDA) CPU+GPU (CUDA)
1200
12000
1000
10000
Speedup
GFLOPS
800
8000
600
6000
400
4000
2000 200
0 0
1 2 4 8 16 32 1 2 4 8 16 32
Number of Instances Number of Instances

Conclusions
References
Finis Terrae (CESGA): 14,01 TFlops
Our AWS Cluster: 14,23 TFlops
AWS (06/11): 41,82 TFlops (# 451 TOP500 06/11)
MareNostrum (BSC): 63,83 TFlops (# 299 TOP500)
MinoTauro (BSC): 103,2 TFlops (# 114 TOP500)
AWS (11/11): 240,1 TFlops (# 42 TOP500)

Conclusions
Radiography enhanced through Montecarlo (MC-GPU)

Conclusions
Image processing with Montecarlo (MC-GPU)

Time: originally 187 minutes, in AWS with HPC
6 seconds!
Cost: processing 500 radiographies in AWS: aprox. 70 USD$,

0,14 USD$ per unit
MC-GPU Execution Time on Amazon EC2 MC-GPU Scalability on Amazon EC2

1600 1800
CPU CPU
1400 CPU+GPU (CUDA) 1600 CPU+GPU (CUDA)
1400
1200
1200
1000
Runtime (s)
Speedup
1000
800
800
600
600
400
400
200 200
0 0
1 2 4 8 16 32 1 2 4 8 16 32
Number of Instances Number of Instances

Conclusions
AWS HPC and Big Data Remarks
HPC and Big Data in AWS

High computational power (new Sandy Bridge processors
and GPUs)
High performance communications over 10 Gigabit
Ethernet
Efficient access to file network systems (e.g., NFS,
OrangeFS)
Without waiting times in accessing resources
Easy applications and infrastructure deployment,
ready-to-run applications in AMIs

Conclusions
Java HPC in an InfiniBand cluster
Point-to-point Performance on DAS-4 (IB-QDR)

5 28
MVAPICH2
Open MPI 26
4.5
FastMPJ (ibvdev) 24
4 22
20
Bandwidth (Gbps)
3.5
18
Latency (s)
3
16
2.5 14
12
2
10
1.5 8
1 6
4
0.5
2
0 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M
Message size (bytes)

Conclusions
NPB CG Kernel Performance on DAS-4 (IB-QDR)

70000
MVAPICH2
Open MPI
60000 FastMPJ (ibvdev)
50000
40000
MOPS
30000
20000
10000
0
1 16 32 64 128 256 512
Number of Cores

Conclusions
NPB FT Kernel Performance on DAS-4 (IB-QDR)

250000
MVAPICH2
Open MPI
FastMPJ (ibvdev)
200000
150000
MOPS
100000
50000
0
1 16 32 64 128 256 512
Number of Cores

Conclusions
NPB IS Kernel Performance on DAS-4 (IB-QDR)

6000
MVAPICH2
Open MPI
FastMPJ (ibvdev)
5000
4000
MOPS
3000
2000
1000
0
1 16 32 64 128 256 512
Number of Cores

Conclusions
NPB MG Kernel Performance on DAS-4 (IB-QDR)

400000
MVAPICH2
Open MPI
350000 FastMPJ (ibvdev)
300000
250000
MOPS
200000
150000
100000
50000
0
1 16 32 64 128 256 512
Number of Cores

Conclusions
NPB Performance Evaluation in AWS
Point-to-point Performance on 10 GbE (cc1.4xlarge)

75 10
MPICH2
OpenMPI 9
FastMPJ
70
8
Bandwidth (Gbps)
7
65
Latency (s)
6
60 5
4
55
3
2
50
1
45 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M

Conclusions
Point-to-point Performance on 10 GbE (cc2.8xlarge)

75 10
MPICH2
OpenMPI 9
FastMPJ
70
8
Bandwidth (Gbps)
7
65
Latency (s)
6
60 5
4
55
3
2
50
1
45 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M

Conclusions
Point-to-point Performance on SHMEM (cc1.4xlarge)

0.8 75
MPICH2
OpenMPI 70
0.7 FastMPJ 65
60
0.6 55
Bandwidth (Gbps)
50
Latency (s)
0.5 45
40
0.4
35
0.3 30
25
0.2 20
15
0.1 10
5
0 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M

Conclusions
Point-to-point Performance on SHMEM (cc2.8xlarge)

0.8 75
MPICH2
OpenMPI 70
0.7 FastMPJ 65
60
0.6 55
Bandwidth (Gbps)
50
Latency (s)
0.5 45
40
0.4
35
0.3 30
25
0.2 20
15
0.1 10
5
0 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M

Conclusions
CG C Class Performance (cc1.4xlarge)

10000
MPICH2
9000 OpenMPI
FastMPJ
8000
7000
6000
MOPS
5000
4000
3000
2000
1000
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
FT C Class Performance (cc1.4xlarge)

25000
MPICH2
OpenMPI
FastMPJ
20000
15000
MOPS
10000
5000
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
IS C Class Performance (cc1.4xlarge)

600
MPICH2
OpenMPI
FastMPJ
500
400
MOPS
300
200
100
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
MG C Class Performance (cc1.4xlarge)

60000
MPICH2
OpenMPI
FastMPJ
50000
40000
MOPS
30000
20000
10000
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
CG C Class Performance (cc2.8xlarge)

16000
MPICH2
OpenMPI
14000 FastMPJ
12000
10000
MOPS
8000
6000
4000
2000
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
FT C Class Performance (cc2.8xlarge)

22000
MPICH2
20000 OpenMPI
FastMPJ
18000
16000
14000
MOPS
12000
10000
8000
6000
4000
2000
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
IS C Class Performance (cc2.8xlarge)

700
MPICH2
OpenMPI
600 FastMPJ
500
400
MOPS
300
200
100
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
MG C Class Performance (cc2.8xlarge)

90000
MPICH2
80000 OpenMPI
FastMPJ
70000
60000
MOPS
50000
40000
30000
20000
10000
0
1 8 16 32 64 128 256 512
Number of Cores

Conclusions
HPC in AWS
Efficient Cloud Computing support
CG Kernel OpenMPI Performance on Amazon EC2 CG Kernel OpenMPI Performance on Amazon EC2
13000 16000
Default Default
12000 Default (Fillup) 16 Instances
Default (Tuned) 14000 32 Instances
11000 16 Instances (Fillup) 64 Instances
16 Instances (Tuned)
10000
12000
9000
8000 10000
MOPS
MOPS
7000
8000
6000
5000 6000
4000
4000
3000
2000
2000
1000
0 0
1 8 16 32 64 128 1 8 16 32 64 128 256 512
Number of Processes
Guillermo Lpez Taboada Java for High Performance Number of Processes
Cloud Computing
Conclusions
ProtTest-HPC Phylogeny application
ProtTest: a phylogeny application

ProtTest is a Java application widely used in Phylogeny (4000 users)
Implements 112 models to validate the maximum likelihood (ML) of the
evolution of several taxa
The input data are amino acid sequencies from several taxa
Base on Markov Chains / Monte Carlo simulations
Time consuming sequential application. A fairly simple example can
last weeks/months
Relies on a C application for the computational intensive part of the
application (ML optimization)

Conclusions
ProtTest-HPC Phylogeny application

Conclusions
ProtTest-HPC design

Conclusions
ProtTest-HPC Implementation (Shared Memory)

Conclusions
ProtTest-HPC Implementation (MPJ)

Conclusions
ProtTest-HPC Implementation (Hybrid MPJ+Threads)

Conclusions
ProtTest-HPC Performance
ProtTestHPC shared memory performance (8core system)
8
RIB
COX
7 HIV
RIBML
COXML
6
HIVML
5
Speedup
0
1 2 4 8
Number of Threads
Figure: ProtTest-HPC shared memory (8-core machine)

Conclusions
ProtTestHPC shared memory performance (24core system)
20 RIB
COX
18 HIV
RIBML
16 COXML
HIVML
14
Speedup
12
10
0
1 2 4 8 16 24
Number of Threads
Figure: ProtTest-HPC shared memory (24-core machine)

Conclusions
ProtTestHPC distributed memory performance (Harpertown)
48
RIBML
44 COXML
HIVML
40 10K
20K
36
100K
32
Speedup
28
24
20
16
12
0
1 8 16 28 56 112
Number of Cores
Figure: ProtTest-HPC distributed memory (256-core cluster)

Conclusions
ProtTestHPC hybrid implementation performance (Harpertown)
160
RIBML
COXML
140 HIVML
10K
20K
120
100K
100
Speedup
80
60
40
20
0
1 8 16 28 56 112 224
Number of Cores
Figure: ProtTest-HPC hybrid shared/distributed memory (28 nodes)

Conclusions
ProtTestHPC hybrid implementation performance (Nehalem)
RIBML
60 COXML
HIVML
10K
50 20K
100K
40
Speedup
30
20
10
1 2 4 8 16 32 64
Number of Cores
Figure: ProtTest-HPC hybrid shared/distributed memory (8 nodes)

High Performance Cloud Computing Summary
Performance Evaluation Questions
Conclusions
Summary
Current state of Java for HPC (interesting/feasible

alternative)
Available programming models in Java for HPC:
Shared memory programming
Distributed memory programming
Distributed shared memory programming
Active research on Java for HPC (>30 projects)
...but still not a mainstream language for HPC
Adoption of Java for HPC in the cloud:

It is an alternative for the cloud (tradeoff some performance
for appealing features)
Performance evaluations are highly important
Analysis of current projects (promotion of joint efforts)
High Performance Cloud Computing Summary
Performance Evaluation Questions
Conclusions
Questions?
J AVA FOR H IGH P ERFORMANCE C LOUD C OMPUTING

FUNDS S EMINAR S ERIES 12
Guillermo Lpez Taboada
Email: taboada@udc.es
WWW: http://gac.udc.es/gltaboada/
Computer Architecture Group, University of A Corua

MMU Talk 20121031

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MMU Talk 20121031

Uploaded by

Copyright:

Available Formats

Java for High Performance Computing

High Performance Cloud Computing

Java for High Performance Cloud Computing

Guillermo Lpez Taboada

Computer Architecture Group

October 31st, 2012, Future Networks and Distributed SystemS (FUNDS),

Guillermo Lpez Taboada Java for High Performance Cloud Computing

1 Java for High Performance Computing

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java is an Alternative for HPC in the Multi-core Era

C and Java are the

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java is an Alternative for HPC in the Multi-core Era

Many productive parallel/distributed

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java Adoption in HPC

HPC developers and

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java Adoption in HPC

HPC developers and

Java code is no longer Like native performance.

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java Adoption in HPC

HPC developers and

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Emerging Interest in Java for HPC

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Current State of Java for HPC

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java for HPC

Current options in Java for HPC:

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Java for HPC

Java Shared Memory Programming:

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Listing 1: Java Threads

public s t a t i c void main ( S t r i n g argv [ ] ) {

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Listing 2: Java Concurrency Framework highlights

public s t a t i c void main ( S t r i n g argv [ ] ) {

/ / create the thread pool

/ / run example t a s k s ( t a s k s implement Runnable )

/ / Another i n t e r e s t i n g c l a s s e s from concurrency framework :

Guillermo Lpez Taboada Java for High Performance Cloud Computing

JOMP is the Java OpenMP binding

Listing 3: JOMP example

public s t a t i c void main ( S t r i n g argv [ ] ) {

Guillermo Lpez Taboada Java for High Performance Cloud Computing

MPJ is the Java binding of MPI

Listing 4: MPJ example

public s t a t i c void main ( S t r i n g argv [ ] ) {

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Socket High-speed API

Pure Java Impl.

Guillermo Lpez Taboada Java for High Performance Cloud Computing

MPJ implementation developed at the Computer Architecture

Guillermo Lpez Taboada Java for High Performance Cloud Computing

FastMPJ Supported Hardware

The xxdev layer

Java Native Interface (JNI) Java Sockets Java Threads

MX/OpenMX InfiniPath PSM IBV API TCP/IP

Guillermo Lpez Taboada Java for High Performance Cloud Computing

Table: Available solutions for GPGPU computing in Java