A Parallel Implementation of The Spiral Algorithm For Enumerating Fullerene Isomers

A Parallel Implementation of the Spiral
Algorithm for Cataloguing Fullerene Isomers

Jeffery L. Thomas Faculty Advisor: Prof. Daniel Bennett
Edinboro University of Pennsylvania
Motivation
Why this project?
• Lying at the intersection of mathematics

and computer science, this project
provides the opportunity to conduct
research in both fields independently.
What is a Fullerene?
Fullerenes and Buckeyballs
C60:
Fullerenes exist with atom counts n=20 and n≥24.

A Fullerene Graph
Taken from “An Atlas of Fullerenes” by P.W. Fowler and D.E. Manolopoulos
A Fullerene Graph
A Fullerene Graph
A Fullerene Graph
The Spiral Algorithm
The Fullerene Isomer Problem
• To be able to find and catalogue all fullerene

isomers with a given vertex count, n
... C32:6
The Spiral Algorithm relies on two

important ideas:
1. A 3-dimensional fullerene can be “unwound” into a

continuous string of 5’s and 6’s.
2. There are exactly 12 pentagons in any fullerene
isomer, regardless of the number of vertices.
Representing Spirals
65656566656656656566566565656566
All fullerene isomers with a given vertex count,

n, can be generated with a maximum of
n 
 2  2  spiral combinations.
 
 12 
n Spiral combinations n Spiral combinations

20 1 36 125,970
22 13 38 293,930
24 91 40 646,646
26 455 42 1,352,078
28 1,820 44 2,704,156
30 6,188 46 5,200,300
32 18,564 48 9,657,700
34 50,388 50 17,383,860
n Fullerene isomers n Fullerene isomers

20 1 36 15
22 0 38 17
24 1 40 40
26 1 42 45
28 2 44 89
30 3 46 116
32 6 48 199
34 6 50 271
Justification for Parallelization
FORTRAN Implementation
do 1 j1 = 1, m-11*jpr
do 2 j2 =j1 +jpr, m-10*jpr
do 3 j3 =j2 +jpr, m- 9*jpr
• First published in do 5 j5 =j4 +jpr, m- 7*jpr
1991 do 7 j7 =j6 +jpr, m- 5*jpr
do 10 j10=j9 +jpr, m- 2*jpr
• BRUTE FORCE do 11 j11=j10+jpr, m- 1*jpr

do 12 j12=j11+jpr, m
do 14 j=1,m
.
.
.
• Becomes rapidly 14
13
continue
continue
12 continue
complex as n increases 11
10
continue
continue
beyond 100 9
8
continue
continue
7 continue
6 continue
5 continue
4 continue
3 continue
2 continue
1 continue
FORTRAN Implementation
16
O(n )
NOTE: THIS IS BAD!!!
Sequential Implementation
Sequential Implementation Using C++
OBJECTIVES
• Translate main FORTRAN code to C++
• Ensure proper output
• Minimize / eliminate added complexity

Run-time Comparisons between C++ and FORTRAN

900
800
700
600
Time 500 C++ time
(sec) 400 FORTRAN time
300
200
100
0
30 35 40 45 50 55 60
Number of Vertices
gprof profile for n = 50
Flat profile: (Each sample counts as 0.01 seconds.)
% cumulative self self total

time seconds seconds calls us/call us/call name
74.57 107.84 107.84 5096665 21.16 21.16 Matrix::FconvertC(int*)

OUR CULPRIT
21.98 139.62 31.78 5096665 6.24 6.24 windup_
2.24 142.86 3.24 main
1.21 144.61 1.75 59046 29.64 29.64 unwind_
0.00 144.61 0.00 4101 0.00 0.00 std::setw(int)
0.00 144.61 0.00 2144 0.00 0.00 __gnu_cxx::.....
0.00 144.61 0.00 1543 0.00 0.00 __gnu_cxx::.....
0.00 144.61 0.00 1533 0.00 0.00 __gnu_cxx::.....
0.00 144.61 0.00 1052 0.00 0.00 bool __gnu_cxx::operator!=
0.00 144.61 0.00 782 0.00 0.00 Full::~Full()
0.00 144.61 0.00 782 0.00 0.00 void std::_Destroy<Full....
0.00 144.61 0.00 782 0.00 0.00 operator new(.....
0.00 144.61 0.00 511 0.00 0.00 void std::_Construct<.....
0.00 144.61 0.00 271 0.00 0.00 Full::Full(int, .....
0.00 144.61 0.00 271 0.00 0.00 __gnu_cxx::new_.....
0.00 144.61 0.00 271 0.00 0.00 std::vector<Full,.....
Run-times of C++ and FORTRAN

10000.000
9000.000
8000.000
7000.000
6000.000
Time 5000.000
C++ time
FORTRAN time
(sec)
4000.000
3000.000
2000.000
1000.000
0.000
32 42 52 62 72 82
# of Vertices
Strategy for Parallelization
do 1 j1 = 1, m-11*jpr
do 2 j2 =j1 +jpr, m-10*jpr
do 3 j3 =j2 +jpr, m- 9*jpr MASTER NODE
do 10 j10=j9 +jpr, m- 2*jpr
do 11 j11=j10+jpr, m- 1*jpr
do 12 j12=j11+jpr, m
do 14 j=1,m
.
.
14
.
continue
. WORKER NODES
13 continue
12 continue
11 continue
10 continue
9 continue
8 continue
7 continue
6 continue
5 continue
4 continue
3 continue
2
1
continue
continue
MASTER NODE
MASTER
Set j1 to 1 Set count to 0
Set j2 to j1+jpr wait for a message
Set j3 to j2+jpr switch on Message Type
Set j4 to j3+jpr case: Get Data
wait for message send end
switch on message type case: Point Found
case: get data Receive (j1-j12, nmr[], group[])
Send (j1, j2, j3, j4) until count = processes
Increment (j1, j2, j3, j4) organize and output data
case: point found
Receive (j1-j12, nmr[], group[])
until all j1-j4 are sent.
WORKER
do
Request Data
for j5 = j4 + jpr
for j6 = j5 + jpr
.
.
for j12 = j11 + jpr
build sequence
pass to math functions
if point found:
send point
until request returns end
node 0
MASTER
node 1 node 2 node p
WORKER WORKER ….. WORKER
windup unwind windup unwind windup unwind

• Embarrassingly parallel! (13 nested loops)

• Scalability: Does our performance hold as we
add more processes?
• Scalable up to [# of combinations
being checked] processes.
• This upper bound increases greatly
with as n increases.
Parallel Implementation
The Parallel Implementation
We chose the MPI library for C++:

MPI: the standard
• Defines the behavior of the
implementations
MPI: the implementation
• Consists a library of function calls
• Allows for utilization of sequential C++
implementation
Sample Code node 0’s main loop:

for(j1=1; j1 <= (m-(11*jpr))); j1++) {
else if(status.MPI_TAG == PT_FOUND)
for(j2=(j1+jpr); j2 <= (m-(10*jpr))); j2++) {
{ // RECEIVE J-VALUES AND RECORD.
for(j3=(j2+jpr); j3 <= (m-(9*jpr))); j3++) {
int arr[6];
for(j4=(j3+jpr); j4 <= (m-(8*jpr))); j4++){
for(int i=0; i < 6; i++)
// WAIT FOR A MESSAGE.
arr[i] = buf[i+12];
MPI_Recv( buf, 18, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD,
Full temp(count, buf[0], buf[1], buf[2],
&status);
……, buf[10], buf[11], arr);
if(status.MPI_TAG == GET_DATA)
matches.push_back(temp);
{ // LOAD J-VALUES INTO BUFFER AND
count++;
// SEND TO WORKER.
}
buf[0]=j1;
}
buf[1]=j2;
}
buf[2]=j3;
}
buf[3]=j4;
}
dest = status.MPI_SOURCE;
MPI_Send( buf, 4, MPI_INT, dest, GET_DATA,

MPI_COMM_WORLD);
iterate = true;
iters++;
}
Sample Code Workers’ main loop:

MPI_Send( buf, 18, MPI_INT, 0, GET_DATA, // CHECK SUCCESSFUL FULLERENE FOR
MPI_COMM_WORLD); // UNIQUENESS
MPI_Recv( buf, 4, MPI_INT, 0, MPI_ANY_TAG, unwind_(D1, &m, s, group, nmr, &ier);
MPI_COMM_WORLD, &status); .
while(status.MPI_TAG != END) .
{ Record: for(j=0; j<5; j++)
j1=buf[0]; j2=buf[1]; j3=buf[2]; j4=buf[3]; {
for(j5=(j4+jpr); j5 <= (m-(7*jpr)); j5++) if(nmr[j] == 0)
{ . { // LOAD J-VALUES AND NMR MATRIX
. // INTO BUFFER AND SEND TO MASTER
. buf[0] = j1; buf[1] = j2; buf[2] = j3;
for(j12=(j11+jpr); j12 <= m; j12++) ………; buf[17] = nmr[5];
{ // BUILD SEQUENCE.
for(j=1; j <= m; j++) { MPI_Send( buf, 18, MPI_INT, 0, PT_FOUND,
s[j-1] = 6; MPI_COMM_WORLD);
} goto thirteen;
s[j1-1]=5; s[j2-1]=5; }
.........; s[j12-1]=5; }
.
// CHECK IF SPIRAL CREATES A FULLERENE .
int D1[MMAX*MMAX]; .
windup_(s, &m, D1, &ipr, &ier); MPI_Send( buf, 18, MPI_INT, 0, GET_DATA,
. MPI_COMM_WORLD);
. MPI_Recv( buf, 4, MPI_INT, 0, MPI_ANY_TAG,
MPI_COMM_WORLD, &status);
}
Results: Benchmark Tests
Benchmark Results for Parallel Implementation

7000.000
6000.000
5000.000
Time 4000.000
(sec) Sequential time
Parallel time
3000.000
2000.000
1000.000
0.000
32 42 52 62 72 82
Number of vertices
Partial table of numeric results with np=4:
N Sequential time Parallel time

50 10.906 4.294
60 131.711 44.779
70 1,070.531 354.809
80 6,423.951 2,132.237
90 32,241.250 10,423.418
100 194,700.216 82,307.413
110 Still running… 155916.244
•times in seconds
Partial table of numeric results with np=4:
N Sequential time Parallel time

50 11 sec 4 sec
60 132 sec 45 sec
70 18 min 6 min
80 107 min 36 min
90 537 min 207 min
100 54 hrs 22.9 hrs
110 Still running… 43.3 hrs
Results: Speedup
Speedup refers to how much faster a parallel algorithm

is compared to a corresponding sequential algorithm.
It is defined as
T1 T1  Fastest sequential time

Sp 
T2 T2  Fastest parallel time
• Ideally, we want this ratio to be equal to p. This is

referred to as linear speedup.
Results: Speedup
The resulting speedup for our parallel

implementation is as follows:
S2  1.01119 50.6% of linear speedup with np=2 processes

Results: Speedup
Speedup for C60 vs. # of Processes

6
Sp 3 Speedup
0
1 2 3 4 5 6 7
Number of processes
Results
What does it all mean?
• The parallelization allows the cataloguing of larger numbers

of vertices.
• It also opens up discussion about increasing the

upper bounds of the implementation’s validity to n = 1000
and beyond.
What’s Next?
The Fullerene Isomer Database Project
User(s) Skynet
Run
Scheduler
WEB SERVER
DBMS
User(s)
References
1. P.W. Fowler, D.E. Mananopolus, An Atlas of

Fullerenes, 1995
2. Dr. Douglas Puharic, The Face Consistency and

Embedability of Fullerenes, 2006
3. Peter S. Pacheco, Parallel Programming with

MPI, 1997
Thank You for Your Time.
• Questions?

A Parallel Implementation of The Spiral Algorithm For Enumerating Fullerene Isomers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Parallel Implementation of The Spiral Algorithm For Enumerating Fullerene Isomers

Uploaded by

Copyright:

Available Formats

A Parallel Implementation of the Spiral

Algorithm for Cataloguing Fullerene Isomers

Why this project?

• Lying at the intersection of mathematics

Fullerenes exist with atom counts n=20 and n≥24.

• To be able to find and catalogue all fullerene

The Spiral Algorithm relies on two

1. A 3-dimensional fullerene can be “unwound” into a

All fullerene isomers with a given vertex count,

n Spiral combinations n Spiral combinations

n Fullerene isomers n Fullerene isomers

• BRUTE FORCE do 11 j11=j10+jpr, m- 1*jpr

• Ensure proper output

• Minimize / eliminate added complexity

Run-time Comparisons between C++ and FORTRAN

Time 500 C++ time

(sec) 400 FORTRAN time

% cumulative self self total

74.57 107.84 107.84 5096665 21.16 21.16 Matrix::FconvertC(int*)

Run-times of C++ and FORTRAN

node 1 node 2 node p

WORKER WORKER ….. WORKER

windup unwind windup unwind windup unwind

• Embarrassingly parallel! (13 nested loops)

We chose the MPI library for C++:

Sample Code node 0’s main loop:

MPI_Send( buf, 4, MPI_INT, dest, GET_DATA,

Sample Code Workers’ main loop:

Benchmark Results for Parallel Implementation

Partial table of numeric results with np=4:

N Sequential time Parallel time

Partial table of numeric results with np=4:

N Sequential time Parallel time

Speedup refers to how much faster a parallel algorithm

T1 T1  Fastest sequential time

• Ideally, we want this ratio to be equal to p. This is

The resulting speedup for our parallel

S3  1.99867 66.6% of linear speedup with np=3 processes

S4  2.99029 74.8% of linear speedup with np=4 processes

S5  3.94575 78.9% of linear speedup with np=4 processes

S6  4.93136 82.2% of linear speedup with np=4 processes

Speedup for C60 vs. # of Processes

What does it all mean?

• The parallelization allows the cataloguing of larger numbers

• It also opens up discussion about increasing the

1. P.W. Fowler, D.E. Mananopolus, An Atlas of

2. Dr. Douglas Puharic, The Face Consistency and

3. Peter S. Pacheco, Parallel Programming with

You might also like