You are on page 1of 48

A Parallel Implementation of the Spiral

Algorithm for Cataloguing Fullerene Isomers


Jeffery L. Thomas Faculty Advisor: Prof. Daniel Bennett
Edinboro University of Pennsylvania
Motivation

Why this project?

• Lying at the intersection of mathematics


and computer science, this project
provides the opportunity to conduct
research in both fields independently.
What is a Fullerene?
Fullerenes and Buckeyballs
C60:

Fullerenes exist with atom counts n=20 and n≥24.


What is a Fullerene?
What is a Fullerene?
What is a Fullerene?
What is a Fullerene?
A Fullerene Graph

Taken from “An Atlas of Fullerenes” by P.W. Fowler and D.E. Manolopoulos
A Fullerene Graph

Taken from “An Atlas of Fullerenes” by P.W. Fowler and D.E. Manolopoulos
A Fullerene Graph

Taken from “An Atlas of Fullerenes” by P.W. Fowler and D.E. Manolopoulos
A Fullerene Graph

Taken from “An Atlas of Fullerenes” by P.W. Fowler and D.E. Manolopoulos
The Spiral Algorithm
The Fullerene Isomer Problem

• To be able to find and catalogue all fullerene


isomers with a given vertex count, n

... C32:6
The Spiral Algorithm

The Spiral Algorithm relies on two


important ideas:

1. A 3-dimensional fullerene can be “unwound” into a


continuous string of 5’s and 6’s.
2. There are exactly 12 pentagons in any fullerene
isomer, regardless of the number of vertices.
Representing Spirals

65656566656656656566566565656566
Taken from “An Atlas of Fullerenes” by P.W. Fowler and D.E. Manolopoulos
The Spiral Algorithm

All fullerene isomers with a given vertex count,


n, can be generated with a maximum of

n 
 2  2  spiral combinations.
 
 12 
The Spiral Algorithm

n Spiral combinations n Spiral combinations


20 1 36 125,970
22 13 38 293,930
24 91 40 646,646
26 455 42 1,352,078
28 1,820 44 2,704,156
30 6,188 46 5,200,300
32 18,564 48 9,657,700
34 50,388 50 17,383,860
The Spiral Algorithm

n Fullerene isomers n Fullerene isomers


20 1 36 15
22 0 38 17
24 1 40 40
26 1 42 45
28 2 44 89
30 3 46 116
32 6 48 199
34 6 50 271
Justification for Parallelization
FORTRAN Implementation
do 1 j1 = 1, m-11*jpr
do 2 j2 =j1 +jpr, m-10*jpr
do 3 j3 =j2 +jpr, m- 9*jpr
do 4 j4 =j3 +jpr, m- 8*jpr
• First published in do 5 j5 =j4 +jpr, m- 7*jpr
do 6 j6 =j5 +jpr, m- 6*jpr
1991 do 7 j7 =j6 +jpr, m- 5*jpr
do 8 j8 =j7 +jpr, m- 4*jpr
do 9 j9 =j8 +jpr, m- 3*jpr
do 10 j10=j9 +jpr, m- 2*jpr

• BRUTE FORCE do 11 j11=j10+jpr, m- 1*jpr


do 12 j12=j11+jpr, m
do 14 j=1,m
.
.
.

• Becomes rapidly 14
13
continue
continue
12 continue
complex as n increases 11
10
continue
continue

beyond 100 9
8
continue
continue
7 continue
6 continue
5 continue
4 continue
3 continue
2 continue
1 continue
FORTRAN Implementation

16
O(n )
NOTE: THIS IS BAD!!!
Sequential Implementation
Sequential Implementation Using C++

OBJECTIVES
• Translate main FORTRAN code to C++

• Ensure proper output

• Minimize / eliminate added complexity


Sequential Implementation Using C++

Run-time Comparisons between C++ and FORTRAN


900

800

700

600

Time 500 C++ time

(sec) 400 FORTRAN time

300

200

100

0
30 35 40 45 50 55 60

Number of Vertices
Sequential Implementation Using C++
gprof profile for n = 50
Flat profile: (Each sample counts as 0.01 seconds.)

% cumulative self self total


time seconds seconds calls us/call us/call name

74.57 107.84 107.84 5096665 21.16 21.16 Matrix::FconvertC(int*)


OUR CULPRIT
21.98 139.62 31.78 5096665 6.24 6.24 windup_
2.24 142.86 3.24 main
1.21 144.61 1.75 59046 29.64 29.64 unwind_
0.00 144.61 0.00 4101 0.00 0.00 std::setw(int)
0.00 144.61 0.00 2144 0.00 0.00 __gnu_cxx::.....
0.00 144.61 0.00 1543 0.00 0.00 __gnu_cxx::.....
0.00 144.61 0.00 1533 0.00 0.00 __gnu_cxx::.....
0.00 144.61 0.00 1052 0.00 0.00 bool __gnu_cxx::operator!=
0.00 144.61 0.00 782 0.00 0.00 Full::~Full()
0.00 144.61 0.00 782 0.00 0.00 void std::_Destroy<Full....
0.00 144.61 0.00 782 0.00 0.00 operator new(.....
0.00 144.61 0.00 511 0.00 0.00 void std::_Construct<.....
0.00 144.61 0.00 271 0.00 0.00 Full::Full(int, .....
0.00 144.61 0.00 271 0.00 0.00 __gnu_cxx::new_.....
0.00 144.61 0.00 271 0.00 0.00 std::vector<Full,.....
Sequential Implementation Using C++

Run-times of C++ and FORTRAN


10000.000

9000.000

8000.000

7000.000

6000.000
Time 5000.000
C++ time
FORTRAN time
(sec)
4000.000

3000.000

2000.000

1000.000

0.000
32 42 52 62 72 82

# of Vertices
Strategy for Parallelization
Strategy for Parallelization
do 1 j1 = 1, m-11*jpr
do 2 j2 =j1 +jpr, m-10*jpr
do 3 j3 =j2 +jpr, m- 9*jpr MASTER NODE
do 4 j4 =j3 +jpr, m- 8*jpr
do 5 j5 =j4 +jpr, m- 7*jpr
do 6 j6 =j5 +jpr, m- 6*jpr
do 7 j7 =j6 +jpr, m- 5*jpr
do 8 j8 =j7 +jpr, m- 4*jpr
do 9 j9 =j8 +jpr, m- 3*jpr
do 10 j10=j9 +jpr, m- 2*jpr
do 11 j11=j10+jpr, m- 1*jpr
do 12 j12=j11+jpr, m
do 14 j=1,m
.
.

14
.
continue
. WORKER NODES
13 continue
12 continue
11 continue
10 continue
9 continue
8 continue
7 continue
6 continue
5 continue
4 continue
3 continue
2
1
continue
continue
MASTER NODE
Strategy for Parallelization

MASTER
Set j1 to 1 Set count to 0
Set j2 to j1+jpr wait for a message
Set j3 to j2+jpr switch on Message Type
Set j4 to j3+jpr case: Get Data
wait for message send end
switch on message type case: Point Found
case: get data Receive (j1-j12, nmr[], group[])
Send (j1, j2, j3, j4) until count = processes
Increment (j1, j2, j3, j4) organize and output data
case: point found
Receive (j1-j12, nmr[], group[])
until all j1-j4 are sent.
Strategy for Parallelization

WORKER

do
Request Data
for j5 = j4 + jpr
for j6 = j5 + jpr
.
.
for j12 = j11 + jpr
build sequence
pass to math functions
if point found:
send point
until request returns end
Strategy for Parallelization

node 0

MASTER

node 1 node 2 node p

WORKER WORKER ….. WORKER

windup unwind windup unwind windup unwind


Strategy for Parallelization

• Embarrassingly parallel! (13 nested loops)


• Scalability: Does our performance hold as we
add more processes?
• Scalable up to [# of combinations
being checked] processes.
• This upper bound increases greatly
with as n increases.
Parallel Implementation
The Parallel Implementation

We chose the MPI library for C++:


MPI: the standard
• Defines the behavior of the
implementations
MPI: the implementation
• Consists a library of function calls
• Allows for utilization of sequential C++
implementation
The Parallel Implementation

Sample Code node 0’s main loop:


for(j1=1; j1 <= (m-(11*jpr))); j1++) {
else if(status.MPI_TAG == PT_FOUND)
for(j2=(j1+jpr); j2 <= (m-(10*jpr))); j2++) {
{ // RECEIVE J-VALUES AND RECORD.
for(j3=(j2+jpr); j3 <= (m-(9*jpr))); j3++) {
int arr[6];
for(j4=(j3+jpr); j4 <= (m-(8*jpr))); j4++){
for(int i=0; i < 6; i++)
// WAIT FOR A MESSAGE.
arr[i] = buf[i+12];
MPI_Recv( buf, 18, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD,
Full temp(count, buf[0], buf[1], buf[2],
&status);
……, buf[10], buf[11], arr);
if(status.MPI_TAG == GET_DATA)
matches.push_back(temp);
{ // LOAD J-VALUES INTO BUFFER AND
count++;
// SEND TO WORKER.
}
buf[0]=j1;
}
buf[1]=j2;
}
buf[2]=j3;
}
buf[3]=j4;
}
dest = status.MPI_SOURCE;

MPI_Send( buf, 4, MPI_INT, dest, GET_DATA,


MPI_COMM_WORLD);
iterate = true;
iters++;
}
The Parallel Implementation

Sample Code Workers’ main loop:


MPI_Send( buf, 18, MPI_INT, 0, GET_DATA, // CHECK SUCCESSFUL FULLERENE FOR
MPI_COMM_WORLD); // UNIQUENESS
MPI_Recv( buf, 4, MPI_INT, 0, MPI_ANY_TAG, unwind_(D1, &m, s, group, nmr, &ier);
MPI_COMM_WORLD, &status); .
while(status.MPI_TAG != END) .
{ Record: for(j=0; j<5; j++)
j1=buf[0]; j2=buf[1]; j3=buf[2]; j4=buf[3]; {
for(j5=(j4+jpr); j5 <= (m-(7*jpr)); j5++) if(nmr[j] == 0)
{ . { // LOAD J-VALUES AND NMR MATRIX
. // INTO BUFFER AND SEND TO MASTER
. buf[0] = j1; buf[1] = j2; buf[2] = j3;
for(j12=(j11+jpr); j12 <= m; j12++) ………; buf[17] = nmr[5];
{ // BUILD SEQUENCE.
for(j=1; j <= m; j++) { MPI_Send( buf, 18, MPI_INT, 0, PT_FOUND,
s[j-1] = 6; MPI_COMM_WORLD);
} goto thirteen;
s[j1-1]=5; s[j2-1]=5; }
.........; s[j12-1]=5; }
.
// CHECK IF SPIRAL CREATES A FULLERENE .
int D1[MMAX*MMAX]; .
windup_(s, &m, D1, &ipr, &ier); MPI_Send( buf, 18, MPI_INT, 0, GET_DATA,
. MPI_COMM_WORLD);
. MPI_Recv( buf, 4, MPI_INT, 0, MPI_ANY_TAG,
MPI_COMM_WORLD, &status);
}
Results: Benchmark Tests

Benchmark Results for Parallel Implementation


7000.000

6000.000

5000.000

Time 4000.000
(sec) Sequential time

Parallel time
3000.000

2000.000

1000.000

0.000
32 42 52 62 72 82

Number of vertices
Results: Benchmark Tests

Partial table of numeric results with np=4:

N Sequential time Parallel time


50 10.906 4.294
60 131.711 44.779
70 1,070.531 354.809
80 6,423.951 2,132.237
90 32,241.250 10,423.418
100 194,700.216 82,307.413
110 Still running… 155916.244
•times in seconds
Results: Benchmark Tests

Partial table of numeric results with np=4:

N Sequential time Parallel time


50 11 sec 4 sec
60 132 sec 45 sec
70 18 min 6 min
80 107 min 36 min
90 537 min 207 min
100 54 hrs 22.9 hrs
110 Still running… 43.3 hrs
Results: Speedup

Speedup refers to how much faster a parallel algorithm


is compared to a corresponding sequential algorithm.
It is defined as

T1 T1  Fastest sequential time


Sp 
T2 T2  Fastest parallel time

• Ideally, we want this ratio to be equal to p. This is


referred to as linear speedup.
Results: Speedup

The resulting speedup for our parallel


implementation is as follows:
S2  1.01119 50.6% of linear speedup with np=2 processes

S3  1.99867 66.6% of linear speedup with np=3 processes

S4  2.99029 74.8% of linear speedup with np=4 processes

S5  3.94575 78.9% of linear speedup with np=4 processes

S6  4.93136 82.2% of linear speedup with np=4 processes


Results: Speedup

Speedup for C60 vs. # of Processes


6

Sp 3 Speedup

0
1 2 3 4 5 6 7

Number of processes
Results

What does it all mean?

• The parallelization allows the cataloguing of larger numbers


of vertices.

• It also opens up discussion about increasing the


upper bounds of the implementation’s validity to n = 1000
and beyond.
What’s Next?
The Fullerene Isomer Database Project

User(s) Skynet

Run
Scheduler

WEB SERVER

DBMS

User(s)
References

1. P.W. Fowler, D.E. Mananopolus, An Atlas of


Fullerenes, 1995

2. Dr. Douglas Puharic, The Face Consistency and


Embedability of Fullerenes, 2006

3. Peter S. Pacheco, Parallel Programming with


MPI, 1997
Thank You for Your Time.

• Questions?

You might also like