You are on page 1of 67

Java for High Performance Computing

High Performance Cloud Computing


Performance Evaluation
Conclusions

Java for High Performance Cloud Computing

Guillermo Lpez Taboada

Computer Architecture Group


University of A Corua (Spain)
taboada@udc.es

October 31st, 2012, Future Networks and Distributed SystemS (FUNDS),


Manchester Metropolitan University, Manchester (UK)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
Performance Evaluation
Conclusions

Outline

1 Java for High Performance Computing


Java Shared Memory Programming
Java Message-Passing
Java GPGPU
Development of Efficient HPC Benchmarks
2 High Performance Cloud Computing
AWS IaaS for HPC and Big Data
3 Performance Evaluation
Evaluation of Java HPC
Evaluation of HPC in the Cloud
Case study: ProtTest-HPC

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java is an Alternative for HPC in the Multi-core Era

Language popularity:
(% skilled developers)
#1 C (19.8%)
#2 Java (17.2%)
#3 Objective-C
(9.48%)
#4 C++ (9.3%)
#19 Matlab (0.59%)
#27 Fortran (0.35%)

C and Java are the


most popular
languages. Fortran,
popular language in
HPC, is the 27#
(0.42%)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java is an Alternative for HPC in the Multi-core Era

Many productive parallel/distributed


Interesting features:
programming libs:
Built-in networking
Java shared memory programming (high
Built-in multi-threading level facilities: Concurrency framework)
Portable, platform Java Sockets
independent
Java RMI
Object Oriented
Message-Passing in Java (MPJ) libraries
Main training language
Apache Hadoop

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java Adoption in HPC

HPC developers and


users usually want to use
Java in their projects. Pros and Cons:
Java code is no longer high programming productivity.
slow (Just-In-Time
compilation)! but they are highly concerned
about performance.
But still performance
penalties in Java
communications:

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java Adoption in HPC

HPC developers and


users usually want to use
Java in their projects. JIT Performance:

Java code is no longer Like native performance.


slow (Just-In-Time Java can even outperform native
compilation)! languages (C, Fortran) thanks to
But still performance the dynamic compilation!!
penalties in Java
communications:

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java Adoption in HPC

HPC developers and


users usually want to use High Java Communications Overhead:
Java in their projects. Poor high-speed networks support.
Java code is no longer The data copies between the Java
slow (Just-In-Time heap and native code through JNI.
compilation)! Costly data serialization.
But still performance The use of communication protocols
penalties in Java unsuitable for HPC.
communications:

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Emerging Interest in Java for HPC

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Current State of Java for HPC

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java for HPC

Current options in Java for HPC:


Java Shared Memory Programming (popular)
Java RMI (poor performance)
Java Sockets (low level programming)
Message-Passing in Java (MPJ) (extended, easy and reasonably good
performance)
Apache Hadoop (for High Throughput Computing)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java for HPC

Java Shared Memory Programming:


Java Threads
Concurrency Framework (ThreadPools, Tasks ...)
Parallel Java (PJ)
Java OpenMP (JOMP and JaMP)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Listing 1: Java Threads


class BasicThread extends Thread {
/ / T h i s method i s c a l l e d when t h e t h r e a d runs
public void run ( ) {
f o r ( i n t i =0; i <1000; i ++)
increaseCount ( ) ;
}
}

class Test {
i n t c o u n t e r = 0L ;
increaseCount ( ) {
c o u n t e r ++;
}

public s t a t i c void main ( S t r i n g argv [ ] ) {


/ / Create and s t a r t t h e t h r e a d
Thread t 1 = new BasicThread ( ) ;
Thread t 2 = new BasicThread ( ) ;
t1 . s t a r t ( ) ;
t2 . s t a r t ( ) ;
System . o u t . p r i n t l n ( " Counter= " + c o u n t e r ) ;
}
}

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Listing 2: Java Concurrency Framework highlights


class Test {

public s t a t i c void main ( S t r i n g argv [ ] ) {

/ / create the thread pool


ThreadPool t h r e a d P o o l = new ThreadPool ( numThreads ) ;

/ / run example t a s k s ( t a s k s implement Runnable )


f o r ( i n t i = 0 ; i < numTasks ; i ++) {
t h r e a d P o o l . runTask ( createTask ( i ) ) ;
}

/ / c l o s e t h e p o o l and w a i t f o r a l l t a s k s t o f i n i s h .
threadPool . j o i n ( ) ;
}

/ / Another i n t e r e s t i n g c l a s s e s from concurrency framework :


/ / CyclicBarrier
/ / ConcurrentHashMap
/ / PriorityBlockingQueue
/ / Executors . . .

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

JOMP is the Java OpenMP binding

Listing 3: JOMP example

public s t a t i c void main ( S t r i n g argv [ ] ) {


i n t myid ;
/ / omp p a r a l l e l p r i v a t e ( myid )
{
myid = OMP. getThreadNum ( ) ;
System . o u t . p r i n t l n ( H e l l o from + myid ) ;
}

/ / omp p a r a l l e l f o r
f o r ( i =1; i <n ; i ++) {
b [ i ] = ( a [ i ] + a [ i 1]) 0 . 5 ;
}
}

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

MPJ is the Java binding of MPI

Listing 4: MPJ example


import mpi . ;

public class H e l l o {

public s t a t i c void main ( S t r i n g argv [ ] ) {


MPI . I n i t ( args ) ;
i n t rank = MPI .COMM_WORLD. Rank ( ) ;
i n t msg_tag = 1 3 ;

i f ( rank == 0 ) {
i n t peer_process = 1 ;
S t r i n g [ ] msg = new S t r i n g [ 1 ] ;
msg [ 0 ] = new S t r i n g ( " H e l l o " ) ;
MPI .COMM_WORLD. Send ( msg , 0 , 1 , MPI . OBJECT, peer_process , msg_tag ) ;
} else i f ( rank == 1 ) {
i n t peer_process = 0 ;
S t r i n g [ ] msg = new S t r i n g [ 1 ] ;
MPI .COMM_WORLD. Recv ( msg , 0 , 1 , MPI . OBJECT, peer_process , msg_tag ) ;
System . o u t . p r i n t l n ( msg [ 0 ] ) ;
}
MPI . F i n a l i z e ( ) ;
}
}

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Socket High-speed API

Pure Java Impl.


impl. network
support

mpiJava 1.2

Other APIs
InfiniBand

JGF MPJ
Java NIO

Myrinet
Java IO

SCI
MPJava X X X
Jcluster X X X
Parallel Java X X X
mpiJava X X X X
P2P-MPI X X X X
MPJ Express X X X X
MPJ/Ibis X X X X
JMPI X X X
F-MPJ X X X X X X

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

FastMPJ

MPJ implementation developed at the Computer Architecture


Group of University of A Corunna.
Features of FastMPJ include:
High performance intra-node communication
Efficient RDMA transfers over InfiniBand and RoCE
Fully portable, as Java
Scalability up to thousands of cores
Highly productive development and maintenance
Ideal for multicore servers, cluster and cloud computing

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

FastMPJ Supported Hardware


FastMPJ runs on InfiniBand through IB Verbs, can rely on
TCP/IP and has shared memory support through Java threads:

MPJ Applications
MPJ API (mpiJava 1.2)

FastMPJ Library
MPJ Collective Primitives
MPJ PointtoPoint Primitives

The xxdev layer


mxdev psmdev ibvdev niodev/iodev smdev

Java Native Interface (JNI) Java Sockets Java Threads

MX/OpenMX InfiniPath PSM IBV API TCP/IP


Myrinet/Ethernet InfiniBand Ethernet Shared Memory

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

Table: Available solutions for GPGPU computing in Java


Java bindings User-friendly
CUDA JCUDA Java-GPU
jCuda Rootbeer
OpenCL JOCL Aparapi
JogAmps JOCL

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

Table: Description of the GPU-based testbed


CPU 1 Intel(R) Xeon hexacore X5650 @ 2.67GHz
CPU performance 64.08 GFLOPS DP (10.68 GFLOPS DP per core)
GPU 1 NVIDIA Tesla Fermi M2050
GPU performance 515 GFLOPS DP
Memory 12 GB DDR3 (1333 MHz)
OS Debian GNU/Linux, kernel 3.2.0-3
CUDA version 4.2 SDK Toolkit
JDK version OpenJDK 1.6.0_24

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

Matrix Multiplication performance (Single precision) Matrix Multiplication performance (Double precision)
1000 500
CUDA CUDA
900 jCuda 450 jCuda
800 Aparapi 400 Aparapi
Java Java
700 350
600 300
GFLOPS

GFLOPS
500 250
400 200
300 150
200 100
100 50
0 0
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Problem size Problem size

Figure: Matrix multiplication kernel performance (SHOC GEMM)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

Stencil2D performance (Single precision) Stencil2D performance (Double precision)


512 1024
Java Java
256 Aparapi 512 Aparapi
128 jCuda 256 jCuda
CUDA CUDA
Runtime (seconds)

Runtime (seconds)
64 128
32 64
16 32
8 16
4 8
2 4
1 2
0.5 1
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Problem size Problem size

Figure: Stencil 2D kernel performance (SHOC Stencil2D)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

Stencil2D performance (Single precision) Stencil2D performance (Double precision)


120 60
110 CUDA 55 CUDA
jCuda jCuda
100 Aparapi 50 Aparapi
90 Java 45 Java
80 40
GFLOPS

GFLOPS
70 35
60 30
50 25
40 20
30 15
20 10
10 5
0 0
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Problem size Problem size

Figure: Stencil 2D kernel performance (SHOC Stencil)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

FFT performance (Single precision) FFT performance (Double precision)


256 256
128 Java 128 Java
Aparapi Aparapi
64 jCuda 64 jCuda
32 CUDA 32 CUDA
16 16
8 8
4 4
2 2
1 1
0.5 0.5
0.25 0.25
0.125 0.125
0.0625 0.0625
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Problem size Problem size

Figure: FFT kernel performance (SHOC FFT)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java GPGPU

FFT performance (Single precision) FFT performance (Double precision)


400 300
CUDA CUDA
350 jCuda jCuda
Aparapi 250 Aparapi
300 Java Java
200
250
GFLOPS

GFLOPS
200 150

150
100
100
50
50

0 0
2048x2048 4096x4096 8192x8192 2048x2048 4096x4096 8192x8192
Problem size Problem size

Figure: FFT kernel performance (SHOC FFT)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

NPB-MPJ Characteristics (10,000 SLOC (Source LOC))

Kernel
Applic.
Communicat.
Name Operation SLOC
intensiveness

CG Conjugate Gradient 1000 Medium X


EP Embarrassingly Parallel 350 Low X
FT Fourier Transformation 1700 High X
IS Integer Sort 700 High X
MG Multi-Grid 2000 High X
SP Scalar Pentadiagonal 4300 Medium X

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Java HPC optimization: NPB-MPJ success case

NPB-MPJ Optimization:
JVM JIT compilation of heavy and frequent methods with
runtime information
Structured programming is the best option
Small frequent methods are better.
mapping elements from multidimensional to one-dimensional
arrays (array flattening technique:
arr3D[x][y][z]arr3D[pos3D(lenghtx,lengthy,x,y,z)])
NPB-MPJ code refactored, obtaining significant
improvements (up to 2800% performance increase)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing Java Shared Memory Programming
High Performance Cloud Computing Java Message-Passing
Performance Evaluation Java GPGPU
Conclusions Development of Efficient HPC Benchmarks

Experimental Results on One Core (relative perf.)

NPB Kernels Serial Performance on Amazon EC2 (C Class)


2200
GNU compiler
2000 Intel compiler
1800 JVM
1600
1400
MOPS

1200
1000
800
600
400
200
0
C

FT

FT

IS

IS

M
G

G
(c

(c
(c

(c
(c

(c

(c

(c
c1

c2
c1

c2
c

c1

c2
1.

2.

.4

.8
.4

.8

.4

.8
4x

8x

xl

xl
xl

xl

xl

xl
ar

ar
ar

ar
l

l
ar

ar

ar

ar
ge

ge
ge

ge
ge

ge

ge

ge
)

)
)

)
)

)
Guillermo Lpez Taboada Java for High Performance Cloud Computing
Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

IaaS: Infrastructure-as-a-Service
IaaS
The access to infrastructure supports the execution of
computationally intensive tasks in the cloud. There are many
IaaS providers:
Amazon Web Services (AWS)
Google Compute Engine (beta)
IBM cloud
HP cloud
Rackspace
ProfitBricks
Penguin-On-Demand (POD)

IMHO, AWS is the most suitable in terms of available resources,


high performance features
Guillermo and performance/cost
Lpez Taboada ratio.
Java for High Performance Cloud Computing
Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

HPC and Big Data in AWS

AWS
Provides a set of instances for HPC
CC1: dual socket, quadcore Nehalem Xeon (93 GFlops)
CG1: dual socket, quadcore Nehalem Xeon con 2 GPUs
(1123 GFlops)
CC2: dual socket, octcore Sandy Bridge Xeon (333 GFlops)
Up to 63 GB RAM
10 Gigabit Ethernet high performance network

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

HPL Linpack in AWS (TOP500)

Table: Configuration of our EC2 AWS cluster

CPU 2 Xeon quadcore X5570@2.93GHz (46.88 GFlops DP each)


EC2 Compute Units 33.5
GPU 2 NVIDIA Tesla Fermi M2050 (515 GFlops DP each)
Memory 22 GB DDR3
Storage 1690 GB
Virtualization Xen HVM 64-bit platform (PV drivers for I/O)
Number of CGI nodes 32
Interconnection network 10 Gigabit Ethernet
Total EC2 Compute Units 1072
Total CPU Cores 256 (3 TFLOPS DP)
Total GPUs 64 (32.96 TFLOPS DP)
Total FLOPS 35.96 TFLOPS DP

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

HPL Linpack in AWS (TOP500)


HPL Linpack ranks the performance of supercomputers in TOP500 list

In AWS 14 TFlops are available for everybody! (67 USD$/h on demand, 21


USD$/h spot)

HPL Performance on Amazon EC2 HPL Scalability on Amazon EC2


16000 1400
CPU CPU
14000 CPU+GPU (CUDA) CPU+GPU (CUDA)
1200

12000
1000
10000

Speedup
GFLOPS

800
8000
600
6000
400
4000

2000 200

0 0
1 2 4 8 16 32 1 2 4 8 16 32
Number of Instances Number of Instances

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

HPL Linpack in AWS (TOP500)

References
Finis Terrae (CESGA): 14,01 TFlops
Our AWS Cluster: 14,23 TFlops
AWS (06/11): 41,82 TFlops (# 451 TOP500 06/11)
MareNostrum (BSC): 63,83 TFlops (# 299 TOP500)
MinoTauro (BSC): 103,2 TFlops (# 114 TOP500)
AWS (11/11): 240,1 TFlops (# 42 TOP500)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

Radiography enhanced through Montecarlo (MC-GPU)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

Image processing with Montecarlo (MC-GPU)


Time: originally 187 minutes, in AWS with HPC
6 seconds!

Cost: processing 500 radiographies in AWS: aprox. 70 USD$,


0,14 USD$ per unit

MC-GPU Execution Time on Amazon EC2 MC-GPU Scalability on Amazon EC2


1600 1800
CPU CPU
1400 CPU+GPU (CUDA) 1600 CPU+GPU (CUDA)

1400
1200
1200
1000
Runtime (s)

Speedup
1000
800
800
600
600
400
400

200 200

0 0
1 2 4 8 16 32 1 2 4 8 16 32
Number of Instances Number of Instances

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing
AWS IaaS for HPC and Big Data
Performance Evaluation
Conclusions

AWS HPC and Big Data Remarks

HPC and Big Data in AWS


High computational power (new Sandy Bridge processors
and GPUs)
High performance communications over 10 Gigabit
Ethernet
Efficient access to file network systems (e.g., NFS,
OrangeFS)
Without waiting times in accessing resources
Easy applications and infrastructure deployment,
ready-to-run applications in AMIs

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

Java HPC in an InfiniBand cluster

Point-to-point Performance on DAS-4 (IB-QDR)


5 28
MVAPICH2
Open MPI 26
4.5
FastMPJ (ibvdev) 24
4 22
20

Bandwidth (Gbps)
3.5
18
Latency (s)

3
16
2.5 14
12
2
10
1.5 8
1 6
4
0.5
2
0 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M

Message size (bytes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

Java HPC in an InfiniBand cluster

NPB CG Kernel Performance on DAS-4 (IB-QDR)


70000
MVAPICH2
Open MPI
60000 FastMPJ (ibvdev)

50000

40000
MOPS

30000

20000

10000

0
1 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

Java HPC in an InfiniBand cluster

NPB FT Kernel Performance on DAS-4 (IB-QDR)


250000
MVAPICH2
Open MPI
FastMPJ (ibvdev)
200000

150000
MOPS

100000

50000

0
1 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

Java HPC in an InfiniBand cluster

NPB IS Kernel Performance on DAS-4 (IB-QDR)


6000
MVAPICH2
Open MPI
FastMPJ (ibvdev)
5000

4000
MOPS

3000

2000

1000

0
1 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

Java HPC in an InfiniBand cluster

NPB MG Kernel Performance on DAS-4 (IB-QDR)


400000
MVAPICH2
Open MPI
350000 FastMPJ (ibvdev)

300000

250000
MOPS

200000

150000

100000

50000

0
1 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

Point-to-point Performance on 10 GbE (cc1.4xlarge)


75 10
MPICH2
OpenMPI 9
FastMPJ
70
8

Bandwidth (Gbps)
7
65
Latency (s)

6
60 5
4
55
3
2
50
1
45 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M
Message size (bytes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

Point-to-point Performance on 10 GbE (cc2.8xlarge)


75 10
MPICH2
OpenMPI 9
FastMPJ
70
8

Bandwidth (Gbps)
7
65
Latency (s)

6
60 5
4
55
3
2
50
1
45 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M
Message size (bytes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

Point-to-point Performance on SHMEM (cc1.4xlarge)


0.8 75
MPICH2
OpenMPI 70
0.7 FastMPJ 65
60
0.6 55

Bandwidth (Gbps)
50
Latency (s)

0.5 45
40
0.4
35
0.3 30
25
0.2 20
15
0.1 10
5
0 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M
Message size (bytes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

Point-to-point Performance on SHMEM (cc2.8xlarge)


0.8 75
MPICH2
OpenMPI 70
0.7 FastMPJ 65
60
0.6 55

Bandwidth (Gbps)
50
Latency (s)

0.5 45
40
0.4
35
0.3 30
25
0.2 20
15
0.1 10
5
0 0
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M 16M 64M
Message size (bytes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

CG C Class Performance (cc1.4xlarge)


10000
MPICH2
9000 OpenMPI
FastMPJ
8000
7000
6000
MOPS

5000
4000
3000
2000
1000
0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

FT C Class Performance (cc1.4xlarge)


25000
MPICH2
OpenMPI
FastMPJ
20000

15000
MOPS

10000

5000

0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

IS C Class Performance (cc1.4xlarge)


600
MPICH2
OpenMPI
FastMPJ
500

400
MOPS

300

200

100

0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

MG C Class Performance (cc1.4xlarge)


60000
MPICH2
OpenMPI
FastMPJ
50000

40000
MOPS

30000

20000

10000

0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

CG C Class Performance (cc2.8xlarge)


16000
MPICH2
OpenMPI
14000 FastMPJ

12000

10000
MOPS

8000

6000

4000

2000

0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

FT C Class Performance (cc2.8xlarge)


22000
MPICH2
20000 OpenMPI
FastMPJ
18000
16000
14000
MOPS

12000
10000
8000
6000
4000
2000
0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

IS C Class Performance (cc2.8xlarge)


700
MPICH2
OpenMPI
600 FastMPJ

500

400
MOPS

300

200

100

0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

NPB Performance Evaluation in AWS

MG C Class Performance (cc2.8xlarge)


90000
MPICH2
80000 OpenMPI
FastMPJ
70000

60000
MOPS

50000

40000

30000

20000

10000

0
1 8 16 32 64 128 256 512
Number of Cores

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

HPC in AWS
Efficient Cloud Computing support

CG Kernel OpenMPI Performance on Amazon EC2 CG Kernel OpenMPI Performance on Amazon EC2
13000 16000
Default Default
12000 Default (Fillup) 16 Instances
Default (Tuned) 14000 32 Instances
11000 16 Instances (Fillup) 64 Instances
16 Instances (Tuned)
10000
12000
9000

8000 10000
MOPS

MOPS
7000
8000
6000

5000 6000
4000
4000
3000

2000
2000
1000

0 0
1 8 16 32 64 128 1 8 16 32 64 128 256 512
Number of Processes
Guillermo Lpez Taboada Java for High Performance Number of Processes
Cloud Computing
Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Phylogeny application

ProtTest: a phylogeny application


ProtTest is a Java application widely used in Phylogeny (4000 users)
Implements 112 models to validate the maximum likelihood (ML) of the
evolution of several taxa
The input data are amino acid sequencies from several taxa
Base on Markov Chains / Monte Carlo simulations
Time consuming sequential application. A fairly simple example can
last weeks/months
Relies on a C application for the computational intensive part of the
application (ML optimization)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Phylogeny application

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC design

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Implementation (Shared Memory)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Implementation (MPJ)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Implementation (Hybrid MPJ+Threads)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Performance
ProtTestHPC shared memory performance (8core system)
8
RIB
COX
7 HIV
RIBML
COXML
6
HIVML

5
Speedup

0
1 2 4 8
Number of Threads

Figure: ProtTest-HPC shared memory (8-core machine)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Performance
ProtTestHPC shared memory performance (24core system)
20 RIB
COX
18 HIV
RIBML
16 COXML
HIVML
14
Speedup

12

10

0
1 2 4 8 16 24
Number of Threads

Figure: ProtTest-HPC shared memory (24-core machine)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Performance
ProtTestHPC distributed memory performance (Harpertown)
48
RIBML
44 COXML
HIVML
40 10K
20K
36
100K
32
Speedup

28

24

20

16

12

0
1 8 16 28 56 112
Number of Cores

Figure: ProtTest-HPC distributed memory (256-core cluster)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Performance
ProtTestHPC hybrid implementation performance (Harpertown)
160
RIBML
COXML
140 HIVML
10K
20K
120
100K

100
Speedup

80

60

40

20

0
1 8 16 28 56 112 224
Number of Cores

Figure: ProtTest-HPC hybrid shared/distributed memory (28 nodes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
Evaluation of Java HPC
High Performance Cloud Computing
Evaluation of HPC in the Cloud
Performance Evaluation
Case study: ProtTest-HPC
Conclusions

ProtTest-HPC Performance
ProtTestHPC hybrid implementation performance (Nehalem)
RIBML
60 COXML
HIVML
10K
50 20K
100K

40
Speedup

30

20

10

1 2 4 8 16 32 64
Number of Cores

Figure: ProtTest-HPC hybrid shared/distributed memory (8 nodes)

Guillermo Lpez Taboada Java for High Performance Cloud Computing


Java for High Performance Computing
High Performance Cloud Computing Summary
Performance Evaluation Questions
Conclusions

Summary

Current state of Java for HPC (interesting/feasible


alternative)
Available programming models in Java for HPC:
Shared memory programming
Distributed memory programming
Distributed shared memory programming
Active research on Java for HPC (>30 projects)
...but still not a mainstream language for HPC

Adoption of Java for HPC in the cloud:


It is an alternative for the cloud (tradeoff some performance
for appealing features)
Performance evaluations are highly important
Analysis of current projects (promotion of joint efforts)
Guillermo Lpez Taboada Java for High Performance Cloud Computing
Java for High Performance Computing
High Performance Cloud Computing Summary
Performance Evaluation Questions
Conclusions

Questions?

J AVA FOR H IGH P ERFORMANCE C LOUD C OMPUTING


FUNDS S EMINAR S ERIES 12
Guillermo Lpez Taboada
Email: taboada@udc.es
WWW: http://gac.udc.es/gltaboada/
Computer Architecture Group, University of A Corua

Guillermo Lpez Taboada Java for High Performance Cloud Computing

You might also like