You are on page 1of 31

Building Scientific Workflows on the Grid: A Comparison

between OpenMole and Taverna

Bojana Koteska Boro Jakimovski Anastas Mishev


November 2014, Bucharest, Romania

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Overview
Scientific workflows
Characteristics of Scientific Workflow Systems:

OpenMole and Taverna


Workflow Implementation in OpenMole and Taverna
Conclusion and future work

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Scientific workflows
workflows needed for different scientific applications and

scientific experiments
designing, automating, controlling and managing the

complex flow of data and processes


scientific communities workflow systems to perform

experiments with a large set of data and computations


business companies - workflow systems to automate the

manufacturing process and to avoid redundant tasks.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Scientific workflows
Workflow - automation of a process during which documents,

information, or tasks are passed from one participant to another


for action, according to a set of procedural rules.
Workflow engine - software service or "engine" that provides

the run time execution environment for a workflow instance.


Workflow management system - a system that defines,

creates, and manages the execution of workflows through the


use of software running on one or more workflow engines.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Grid
coordinated use of many heterogeneous and distributed

resources
solving problems in many computing and data intensive

scientific applications (physics, biology and chemistry)


large scalability, transparent and consistent access to

resources, distributed supercomputing support, highthroughput computing support, data-intensive computing


support

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Difference between scientific and business


workflows
business workflows -modeling the control flow of the

process
scientific workflows -modeling large-scale data-intensive

and compute-intensive scientific processes

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

OpenMole
parallel execution environments for naturally parallel

processes
advanced numerical experiments on simulation models
distribution of the workflows on multi-core machines,

desktop-grids, clusters and grids


scalable and it supports TB of data and millions of tasks.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Mole components
Tasks
Transitions
Prototypes
Samplings
Environments
Hooks
Sources
In order to be run, a mole must contain at least one task
and a starting task (capsule)

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Mole components
Prototype - a variable that operates in the workflow (must

be a part of a task input or output)


name, type (Integer, String, etc.), dimension ( 0-scalar, 1-vector,

2- matrix, etc.).

Task - several types of tasks: Groovy, Exploration,

SystemExec, Mole Task, Agent based model, Sensitivity,


and Stat.
name, input and output slots (for receiving or sending prototype to

another task), environment (to assign a computing resources to a


given task - grid, cluster,..)

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Mole components - Tasks


Groovy task - run some small pieces of script code or call code running on

the JVM (Java, Scala..)


Exploration task - enables to embed a sampling which will provide an array

of combination of prototypes at the runtime.


SystemExec task - execute C, C++ and Python codes.
Mole task - runs another Mole. This enables to embed a Mole within a Task.
Stat task - calculates the average, the sum and the median of prototype with

dimension one.
Sensitivity task - enables variance-based method

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Mole components
Sampling - can be composed graphically,
many kinds of samplings: complete, shuffle, zip, combine
and domains: range, multiple file, single file, uniform distribution,
logarithmic range domain, etc.
Hook - a Mole listener. It performs a particular action on the

output.
Source - reads data from a CSV file and maps it to

prototypes included in the workflow

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Taverna
available as a desktop client application (Taverna

Workbench), Taverna Server and command line


integrated with two other myGrid Tools: my experiment

(workflow sharing environment) and BioCatalogue


(catalogue of Web services for Life Sciences)
scalable scientific workflow system which has access to

local and remote resources and grid services (more than


3500).

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Taverna
support for calling client libraries (in Ruby and Java) and

tools and scripts on local or remote machines.


users define how data will flow between services, but they

should not care how the services will be invoked.


workflow can be shared on myExperiment and searched

and downloaded from other users.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Taverna components
Services
WSDL-style Web services (just URL address should be provided)
BioMoby Web services
BioMart Web services
SoapLab Web services
local Java services (Beanshell scripts)
local Java API (API consumer)
R script on an R server (RShell scripts -analyses using the R statistical
package)
Xpath
Oauth services
component services
string constant (for setting a fixed-value input for a service).

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Taverna components
Lists and iterations - Services in Taverna can return single

values or lists of values or lists of lists.


Loop - usually used for invoking asynchronous services web service or similar where any action should be repeated.
Control link - used to set dependencies between services
that do not directly share data
Merges - combination of outputs from more services into
one single input.
Parallel service invocation - used when the same service
should be invoked more times in parallel.
Component - a reusable unit of functionality which is
defined by a workflow.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Comparison between OpenMole and Taverna


similarities in terms of functionality
differ slightly in the way of their implementation
Taverna - addition of existing Web services, the automatic

conversion of the list depths in order to connect two


services
OpenMole simpler definition of composite inputs (for ex.

making combinations of the elements in array)

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Comparison between OpenMole and Taverna

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Comparison between OpenMole and Taverna

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The workflow
circle represents a file containing some data
squares a1; a2; a3;.. a64 file actions- merging of files'

contents
choose always 64 combinations of 3 files
two copies of the file list in the directory
each file list is shuffled and the first 4 files of each of the
three file lists are chosen
"folderA" represents the directory where the combinations
are chosen
result of an each action a1; a2; a3;.. a64 is a file saved in a
new directory "folderB

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The workflow
actions a1; a2; a3;.. a64 are performed in parallel
when three files are created in "folderB", the action b is

performed- merges the contents of the 3 files into a new file


files in "folderB" are deleted
b is repeated 64/3 = 21 times in each iteration and it is
performed in parallel with the actions a1; a2; a3;.. a64
immediately after creating 3 new files in "folderB
The new file is moved to "folderA
workflow can be executed repeatedly

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in OpenMole

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in OpenMole


"defining i" capsule- groovy task, setting the integer prototype i to

1, number of workflow iterations, output i


"combinations" capsule - exploration task, defining the sampling
Three multiple file domains "in folderA" and three file prototypes file",

file1" and file 2"


All three multiple domains have the same directory path
The samplings: "Zip with string name", "Zip with string1 name" and "Zip
with string2 name" are used for making arrays of file names in "folderA".
Each array is shuffled (Shuffle sampling) and first 4 elements are taken
(Take (4) Sampling).
The four file names of each array are combined (Combine sampling)
The 64 combinations (4 x 4 x 4) are made.

7 outputs from this task : the prototype i and 6 arrays (file[1],

file1[1], file2[1], string[1], string1[1] and string2[1]).

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in OpenMole

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in OpenMole


capsule "comb. 3 files" - groovy task, includes a jar library.
call the method from the library that has three strings as input
parameters (file names from each combination)
The three files will be merged into a new file
output-integer prototype i
capsule "comb. 3 first" - groovy task, includes a jar file.
Call the method from the library used for checking if three files exist
merging them in the directory "folderB".
executed in parallel with the previous task "comb. 3 files".
output - integer prototype i.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in OpenMole


capsule "increase i" , groovy task used for increasing the

integer prototype i.
transition between "comb. 3 first and this capsule is an aggregation
executed once in each workflow iteration.
capsule "comb. 3 first" - groovy task, includes a jar file

In order to run the tasks on the Grid, the Grid environment

should be created and added as an execution environment


of the capsules "comb. 3 files" and "comb. 3 first".

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in Taverna

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in Taverna


"Workflow 14" - nested workflow - executed n times.
input n - list of integer numbers.
integer input port ni increasing in each workflow iteration
Service "DirectoryPath- beanshell script - the path of the

directory "folderA" is specified as a string "directory".


3 services "List Files", "List Files 1", "List Files 2" - identical,
but used for making the same array of files in the input
directory.
Each array is shuffled and first four elements are taken (Java code)
64 random combinations, each of the three files
Output of each service - list of four files.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

The Workflow Implementation in Taverna


Service "concatenate and write in files by 3 Files
Inputs with length 0 - it takes one by one element of each
of the three lists of four files (combinations)
Also a beanshell script and it merges the contents of each
combination of three files repeated 64 times (a1; a2; a3; a64).
Service "combine first three" - checking if three files exist

and merging them in the directory "folderB


Executed in parallel with the previous service "concatenate and write

in les by 3 Files".
contents are merged into a new file

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Conclusion and Future Work


analyzed the most important characteristics of the two workflow

systems
Different implementation logic
Taverna offers natural parallelism of processes
in OpenMole the user should explicitly define that

Taverna - more suitable for designing complex workflows

(plenty of predefined services)


OpenMole - simple and interactive interface, useful for people
that do not have experience in designing workflows, defining
samples automatically
analyzing and measuring the execution times of more complex
workflows in both workflow systems.

Building Scientific Workflows on the Grid: A Comparison between OpenMole and Taverna

Thank You for the attention!

You might also like