You are on page 1of 11

17-755 Architectures for Software Systems

Architectural Analysis of Platform Integration Strategies for Scientific Software


Amber Lynn McConahy, Carnegie Mellon University
Abstract - Scientific software communities have been largely ignored by software engineers, and yet these systems form the foundation for most scientific breakthroughs. Consequently, these communities are plagued with problems leading to the inability to provide a unified, operating platform for users in a similar domain. Although some communities are attempting to migrate towards this commonality, it is impossible to discern the viable and pertinent tools that could be useful to the mass of users. This paper presents an architectural analysis of both scientific software itself and the underlying community infrastructure upon which the software is provided. Based on this analysis, some architectural guidelines will be provided that can assist developers in implementing platform solutions that promote survivability in these communities. Index Terms scientific software, open grid services architecture, service oriented architecture, platform development

platforms are Microsoft Windows, Android, and Eclipse IDE. These systems all offer a stable, maintained core upon which additional software components can be integrated, thus providing a customizable end-user product. The benefits of a platform infrastructure for scientific software are extensive, yet current scientific software communities do not currently provide platforms for scientific software users. Instead, all available software is dumped into a repository without indication of common implementations or dependencies among software. This paper will analyze the architectures of scientific software itself, as well as, scientific software community infrastructures where this software is typically provided. Through this analysis, guidelines related to the implementation of a core platform of domain specific scientific software will be identified. II. SCIENTIFIC SOFTWARE A. Historical Perspective Scientific software is plagued with problems, which leads to both duplication of effort and unwisely allocated funding. Some of the major issues affecting these development efforts include: Lack of incentives Lack of commercialization Tragedy of the Commons preventing platform development Reimplementation rather than reuse Discouragement of collaboration In order to better understand the needs of scientific software, it is crucial to analyze the architectural drivers that guide developers in their endeavors. Although a precise list of drivers for all scientific software is impossible to discern, some common architectural drivers can be identified that offer a generalized perspective. B. Common Architectural Drivers 1) Functional Requirements Supports high demand computations Provides data for research publications 2) Technical Constraints Must support peak load processing of data 3) Business Constraints Must fall within laboratory or NSF funding budget

I. INTRODUCTION

OFTWARE has permeated every facet of human life today. From Fortune 500 corporations to high-tech innovative pioneers, software is an indispensible asset that serves to propel these firms into the next era. However, the use of software is not restricted to these types of organizations. In fact, scientific research is dependent on software to complete the rigorous analytics required for their work. Scientific software poses a unique area of study with regards to software usage. Scientific communities routinely need software, yet training software engineers in the science required to generate the software is not feasible due to the high complexity needed to understand many of the algorithms used in scientific software. Consequently, scientists routinely write their own software devoid of formal software engineering practices resulting from a lack of formalized training. According to Howison and Herbsleb in [2], scientific software has been largely ignored by traditional software engineers, and therefore, scientists routinely attempt to haphazardly adopt methodologies that perpetuate problems rather than alleviate them. While scientific software developers are not following current trends, other communities have embraced software development strategies geared towards the implementation of platforms. Platforms provide a mechanism to extend the system boundary of software beyond a single organization or individual and provide a stable core set of tools that can be expanded upon by third party developers. Some popular

17-755 Architectures for Software Systems 4) Quality Attributes Performance Scientific software typically requires high amounts of processing capabilities due to the complexity of the algorithms utilized. Reliability applications must provide consistent and accurate data and results at all times. Interoperability analysis capabilities must be possible on heterogeneous hardware and software systems. 5) Representative Scenarios
a) Performance

2 operating systems and are not compatible. Scientific software applications Application being used in new environment The researcher is able to utilize the applications together without having to make major modifications to the operating environment. Thus, there is no need to purchase additional hardware or resources to deploy the applications. This is provided through an abstraction of functionality to a mechanism, such as a service, which provides a standard connection, such as a well-defined interface. Effort required to deploy the application in the new environment takes no more than 1 person week

Artifact Environment Response

ID Source Stimulus

Artifact Environment Response

Response Measure

SS_QA01 Scientist A scientist is using an application to measure neutrino activity using over 100,000 sensors, which takes a huge amount of processing capability. Scientific software application Application under peak load conditions The application does not crash even when resources are scarce or unavailable. Also, processing continues at a reasonable pace as expected under harsh conditions. Probability of application crashing due to lack of resources is kept to below 1%

Response Measure

III. SCIENTIFIC SOFTWARE INFRASTRUCTURE ARCHITECTURES Like most genres of software, scientific software communities also adhere to some common architectural patterns that will affect the ability to migrate towards the provision of platforms. These are highlighted in the subsequent sections. A. Grid Grid architectures are rapidly becoming the de facto standard environment for scientific software communities. Open Grid Services Architecture, OGSA, serves as the governing body designating the relevant standards for Gridbased scientific communities, such as the Open Science Grid, SBGrid, and SURAgrid. OGSA offers a unique perspective on service-oriented architectures. According to Ian Foster in [5], Grid systems provide two fundamental components: infrastructure and tools. These components make resource sharing in a distributed environment simplified and straightforward. The Grid infrastructure builds upon the Internet on which it is deployed and provides services that address end-to-end issues of authentication, resource discovery, and resource access. In order to provide new mechanisms for performing work, tools are then built upon the infrastructure services and include tools for resource discovery, data management, scheduling of computation, security, etc. Currently, science portals are being routinely developed that utilize Grid technologies and allow for remote software packages to be invoked by reducing many of the barriers associated with the use of this software. Figure 1 shows a layered view of a representative science Grid provided from Science Grid magazine in [10]. Foster in [4] points out that although Grid architectural diagrams are viewed as layers, they do not conform to a pure layered style where each layer can only communicate with the layer immediately below it. However, this diagram is still useful in getting a visual perspective of the Grids architecture.

b)

Reliability

ID Source Stimulus

Artifact Environment Response

Response Measure
c)

SS_QA02 Scientist A scientist is analyzing data gathered from a structural biology application and plans on publishing the results of the research to a popular structural biology journal. Scientific software application Application under normal operating conditions The data generated from the software is accurate. There are no anomalies in the data, and the scientist can be assured that publications based on the data collected are not skewed or misrepresented. Probability of generating skewed or inaccurate results is kept below 1%
Interoperability

ID Source Stimulus

QA03 Research scientist at collaborating laboratory A researcher at a laboratory that is collaborating on a current project needs to use the software developed at another laboratory in conjunction with a tool developed in-house. The two applications are running on different

17-755 Architectures for Software Systems

Figure 1: Layered View of Grid Architecture from [10] Since Grid architectures are fundamental deployment environments for scientific software, a discussion of the key architectural drivers is highlighted in the subsequent sections for consideration when choosing a Grid solution. policies should be in place, so that customers can negotiate the appropriate level of service and facilitate tuning for performance or availability. Job Execution support must be provided for various job types (simple and composite workflows), job management capabilities, manageability interfaces, orchestration and choreography services, resource provisioning, and scheduling. Data Services huge quantities of data must be managed by OGSA; therefore, data services, including data access, data consistency, data persistency, data location management, and data integrations must be provided. Virtual Organizations VOs must be supported by the Grid, whereby members from around the globe can create virtual organizations that allow them to collaborate on research agendas and workflows. Figure 2 from [3] shows a visual representation of how a VO works.

.
1) Functional Requirements According to Foster in [4], OGSA was developed with the integration of heterogeneous and legacy systems in mind, and thus must adhere to the following functional requirements: Resource virtualization & sharing facilitate the sharing of resources and collaborative effort across business and hardware domains and include, site autonomy, metadata services, global name space, and data usage metrics. Optimization flexible resource allocation policies, workload adjustment policies, and resource metering and monitoring procedures must be included to support optimal resource utilization. Quality of Service Assurance (QoS) Service level agreements, service level attainment, and migration

17-755 Architectures for Software Systems

Figure 2: Conceptual View of Virtual Organization from [3] 2) Technical Constraints Standard Protocols and Schemas o Support must be provided for multiple existing security infrastructures o Policies for accessing resources across organizational boundaries through VOs (virtual organizations) must be provided. 3) Business Constraints Policy-based management automation of Grid control systems is required to ensure conformance to service provisions. Application contents management- application-related information must be described and managed as a single logical unit to facilitate maintenance by administrators. Problem determination mechanisms mechanisms must be put into place to respond to problems quickly and effectively. Common management capabilities consistent and uniform management of resources must be provided by the Grid. This includes capabilities to discover resources and to perform queries with these resources. 4) Quality Attributes Interoperability resources on the Grid are heterogeneous and the ability of these resources to coexist and work in conjunction with one another is the paramount quality that the Grid must provide. This means that the Grid must support resources written in a variety of programming languages and running on a diverse set of operating systems. The Grid provides interoperability through the following mechanisms: o Resource virtualization provide a uniform operating environment for resources. o Common management capabilities policies must be enforced to manage diverse resources. o Resource discovery and query resource properties must be available so that users can find the desired resources. o Standard protocols and schemas ensure that resources are provided in a uniform manner and ease transition to the Grid. Security security services must be supported by the Grid and include: o Authentication the identity of users must be verified o Authorization policies must be in place to restrict the access of users to resources to which they have been granted privileges o Isolation resources must be isolated from one another where deemed necessary. o Delegation - support for the delegation of access from service requestors to service providers is required. o Security policy exchange the exchange of security policies between requestors and providers must be dynamically supported. o Intrusion detection, protection, and secure logging monitoring and logging for security breaches must be supported. Scalability - the Grid must support continued growth through the following: o Management architecture hierarchical or peer-to-peer management is needed to provide scaling capabilities for a multitude of resources. o High-throughput computing mechanisms the Grid must adjust and optimize parallel job execution and improve throughput as resources are added to the Grid. Availability fault tolerant hardware clusters must be provided through the resource pool of member organizations that when working together provide a stable, reliable execution environment. However, in order to support heterogeneity on the Grid, components with unstable or long mean-time-torepair (MTTR) must be managed. Consequently,

17-755 Architectures for Software Systems Policy-based Management and Provisioning are needed to provide recoverability. To address these problems, the Grid employs the following availability techniques: o Disaster recovery mechanisms support in the event of natural or human-caused disasters must be provided by the Grid to help alleviate the risk of long-term service disruption. These mechanisms include remote backup and automated recovery procedures. o Fault management mechanisms monitoring, fault detection, potential impact, and causal analysis must be provided by the Grid. Usability - OSGA should shield the user from the underlying complexity of the Grid. Therefore, customized abstractions must be provided to support both novice and expert users. This means that users are able to choose the level of interaction with the system that is required. Extensibility - components must support replacement and permit the evolution of the Grid. In addition, applications on the Grid should be able to support third party add-ons, which expand the functionality of the original application if deemed appropriate by the applications owner. 5) Example Quality Attribute Scenarios
a) Interoperability

5 access their data illegally to help further his/her own research. Grid applications; Grid infrastructure Grid under normal operating conditions Authentication and authorization mechanisms are provided by the Grid. Additionally, private resources are isolated from other Grid users. Therefore, access to resources for which the user is not granted privileges is prevented. Probability of a user accessing data for which he/she is not authorized is kept below .01%, assuming social engineering tactics have not been realized.

Artifact Environment Response

Response Measure

c)

Scalability

ID Source Stimulus Artifact Environment Response

ID Source Stimulus

Artifact Environment Response

Response Measure

G_QA01 Grid User A user accessing the Grid via a Windows machine needs to access a resource running on an IBM mainframe and use it in conjunction with another resource running on a Debian server. Grid applications; Grid infrastructure Grid under normal operating conditions The user is able to access and use both resources without making any changes to his/her Windows machine. The Grid orchestrates all interactions and provides the necessary mechanisms for the resources to be used simultaneously by default and without user interaction. Probability of a user not being able to utilize a resource or set of resources is kept below 5%

Response Measure

G_QA03 Grid Management Several new organizations are wishing to join the Grid network Grid infrastructure Grid under update and expansion conditions Grid management has policies in place to easily add additional organizations and resources to the Grid. Optimization mechanisms allow for increased loads on the Grid without decreasing performance. Effort required to add an organization to the Grid is kept below 1 person week

d)

Availability

ID Source Stimulus Artifact Environment Response

Response Measure

G_QA04 Grid User A server on the Grid has crashed while a user is performing a vital transaction. Grid applications; Grid infrastructure Grid under error conditions The user does not lose any transaction data, and the transaction is completed by a backup server that holds redundant data. Time in minutes with regards to the delay required to reroute the transaction and complete processing is kept to less than 30 minutes.

b)

Security

e)

Usability

ID Source Stimulus

G_QA02 Grid User (scientist) A scientist using the Grid knows that a competing laboratory is making progress on a complex research question and wants to

ID Source Stimulus

G_QA05 Grid User A new user is accessing the Grid for the first time and wants to access an application without any specific knowledge of the Grid itself.

17-755 Architectures for Software Systems Artifact Environment Response Grid applications; Grid infrastructure Grid under normal operating conditions The user is able to access the application without having any knowledge of the Grid. The application is accessible simply by choosing it from the list of available apps. Time required to access and start using an application is kept to less than 1 hour

Response Measure

f)

Extensibility

ID Source Stimulus

Artifact Environment Response

Response Measure

G_QA06 Grid Application Extension Provider A third party application developer is offering an add-one to a current application housed on the Grid. Grid applications; Grid infrastructure Grid under normal operating conditions The application provider is able to deploy the add-on with minimal effort, and without having to make modifications to the application for which the add-on was developed. Effort required to deploy the add-on is less than 1 person week

B. SOA Since scientific software typically is not written in a consistent manner, an architecture that supports heterogeneity and distribution of resources is needed. Service-oriented architectures are the traditional style for which distributed, diverse systems are integrated. SOAs represent a way to provide software as a service, typically utilizing web services and standards such as WSDL, BPEL, and BPN. The VIEW, CAFISE-S, and SOAR are examples of SOA solutions with regards to scientific software. A reference architecture for SOA in scientific computing is shown in Figure 3, as provided by [14].

Figure 3: SOA in science abstraction from [14]

17-755 Architectures for Software Systems SOAs vary in their functionality and capabilities, but the following generalized architectural drivers will be utilized for the purposes of the discussion of SOAs in this paper and are based upon Eric Newcomers description in [11]. 1) Functional Requirements Services are defined through a formal contract that separates the functionality of the service from the technical implementation. Services only interact through their well-defined interfaces. Abstraction of services is based upon business activities and functionalities. Services must provide a service needed by a service requester. Services must be loosely coupled. Related services should utilize the same XML document types to facilitate communication. Each service must perform a discrete task and provide an interface for access. Services must provide metadata, generally through a repository, that defines both their constraints and capabilities. Services must not dependent on other services. Support for composite services must be provided. 2) Technical Constraints Services must be accessible through invocation methodologies that comply with industry standards, such as SOAP, UDDI, HTTP, JMS, XML Schema, or WSDL. Legacy systems should be supported by services. 3) Business Constraints SOA must align with underlying business goals. SOA should be implemented to increase return on investment with regards to IT technologies. 4) Quality Attributes Reusability service reuse is the foundational benefit of an SOA implementation, whereby a given service can be used by multiple requestors without the need for reimplementation. Reusability is supported by SOA through the following characteristics. o Well-defined service level contracts o Metadata driven o Loose process coupling o Loose technology coupling o Open, standards based o Predictable service-level agreements o Multiple invocation styles available with a service Maintainability modular and loosely coupled applications simplify the task of routine software maintenance by allowing programmers to make changes to services at will provided they remain in conformance with the service contract. Interoperability open standards and message communication protocols must provide the ability for heterogeneous components to interact with one another. 5) Representative Scenarios
a) Reusability

ID Source Stimulus

Artifact Environment Response

Response Measure
b)

SOA_QA01 Service Provider An additional service requestor has recently been identified and the existing service must be provided to them. SOA service SOA service under normal operating conditions The service provider is able to reuse the currently deployed service without modification. This is possible through the services well-defined interfaces and the service contract provided as a WSDL file. Additionally, the service provider exchanges messages with the service requester using standard communication protocols, such as SOAP and REST. Effort required by the service provider is less than 1 person hour.
Maintainability

ID Source Stimulus

Artifact Environment Response

Response Measure

SOA_QA02 Service Developer Routine maintenance must be done to an existing service, requiring some modifications to the services code. SOA service SOA service under update conditions The service itself can be modified, so long as the interface remains consistent without affecting any service requestors. Thus, as long as the service contract is adhered to, the service can be modified independently of any other services or components. Changes are localized through loose coupling of services through standards. Probability of a routine maintenance activity propagating the necessity for change to service requesters (assume service contract is conformed to) is less than .01%
Interoperability

c)

ID Source Stimulus

Artifact Environment Response

SOA_QA03 Service Provider and Service Requestor A service requestor wishes to access a service provider that is running in a heterogeneous environment. SOA services SOA service communications The service requestor is able to utilize the service contract that was provided by the service provider to connect to the service

17-755 Architectures for Software Systems without a need for additional hardware or other additional resources. Communications between the requestor and provider are handled using standard communication protocols. Probability of service requestor having to purchase new equipment to utilize a provided service in less than .01%

Response Measure

Figure 4: SORASCS v. 2 Architecture from [16] C. Hybrid SOA Sometimes a pure SOA implementation is not sufficient to meet the needs of the organization. In such instances, a hybrid SOA architecture may be appropriate and can provide the necessary infrastructure for the organization. These implementations would likely have the same architectural drivers as the generic SOA, as well as some additional drivers. An example of a hybrid SOA is the SORASCS system discussed by Schmerl and Garlan in [16] and is shown in Figure 4. Instead of constructing the SOA with only a user interface layer, services layer, and tools layer, the final version of SORASCS added an additional layer for Socio-cultural, which was inserted between the user interface and services layers and facilitated the establishment of custom orchestrations by users without forcing them to learn SOA coding standards, such as BPEL. In other words, the user is able to perform analysis and designate workflows using methodologies that are familiar. SORASCS is just one example of a hybrid SOA that implements an additional domain specific layer into the traditional SOA infrastructure, and others exist that employ similar mechanisms. In addition, many other hybrid SOA implementations exist with different structural anomalies that could be useful for scientific software implementations. Architectural drivers listed above for typical SOAs are still relevant for these systems, but the following additional architectural drivers may be seen in such systems. 1) Functional Requirements Additional domain specific must be supported Support for tools not offered as a service Ability to support orchestrations composed of both services and standalone applications 2) Technical Constraints Support for services and standalone applications 3) Business Constraints Conformance to typical business practices employed by community users 4) Quality Attributes Usability user expectations of domain specific procedures and processes for performing routine tasks should be maintained to reduce knowledge acquisition required to utilize the system and facilitate ease of use. Flexibility users should be able to make use of existing tools in conjunction with services provided via the hybrid SOA. In other words, it should be possible to construct composite workflow scenarios that use both services and standalone applications.

17-755 Architectures for Software Systems 5) Representative Scenarios


a) Usability

9 problematic tradeoffs that should be considered before deploying to the Grid. These include: Performance vs. Security the security policies enforced by OGSA will likely impede responsiveness. OGSA does put in place optimization procedures to help mitigate this; however, it is unlikely that the same level of performance can be achieved as in environment without stringent security rules. Performance vs. Interoperability strategies geared towards modularization and loose coupling, such as web service standards, are known to slow processing capabilities down. In order to support heterogeneity, Grid users must sacrifice some performance to gain interoperability. Time Spend Learning Grid vs. Rapid Deployment the barriers to entry for a developer wanting to use the Grid architecture are relatively high assuming the user has not used OGSA standards before. On the other hand, the developer is probably seasoned at writing standalone applications, which can be rapidly deployed. A potential Grid developer must decide if the benefits of the Grid outweigh the time needed to learn the technologies. Usability vs. Interoperability standards and control mechanisms may not provide the expected usability. Users may be accustomed to their operating environment and not want to sacrifice ease of use for interoperability. Complexity of Orchestrating Service Interactions vs. Time Spent Developing New Application service orchestrations may be difficult to understand and utilize without prior knowledge. Therefore, it may take more time to learn these procedures than it would to write a tool that achieved the functionality desired from the composite service. However, once knowledge acquisition on orchestration is gained, compositions can be completed faster than writing new software. Collaborative Network vs. Security there are obviously risks associated with the formation and use of VOs. The decision to implement in a collaborative environment such as the Grid may not be a desirable quality due to privacy of data. However, VOs provide the ability for collaboration of distributed research teams, which could fuel both innovation and breakthrough findings. B. SOA SOAs offer a multitude of benefits with regards to reusability and interoperability. However, these qualities come at a price, and the following tradeoffs should be considered when implementing an SOA. Performance vs. Interoperability as previously mentioned, web services tend to have more latency than other standalone applications. However, SOA, in general, is not bound by strict security standard guidelines needed by the VOs in OGSA, and it would be possible to offer a modified service contract that only supported the absolutely necessary data types and

ID Source Stimulus

Artifact Environment Response

Response Measure
b)

HSOA_QA01 Hybrid SOA User A user wants to construct a custom workflow requiring the orchestration of two services, and he/she is not familiar with BPEL or other SOA specific technologies. Hybrid SOA Service SOA service communications Doman specific layer will provide usercentered abstractions that facilitate workflow designs that are familiar to users. Therefore, users will not have to construct orchestrations using the underlying SOA infrastructure. Average ease of use reported in user studies is above average
Flexibility

ID Source Stimulus

Artifact Environment Response

Response Measure

HSOA_QA01 Hybrid SOA User A wants to construct a custom workflow requiring the use of a service and a standalone application Hybrid SOA Service and Standalone application SOA service communications Orchestrations will be supported for workflows utilizing both standalone applications and services by providing data acquisition methodologies to gather the necessary information from the standalone application and make it available to the service. The resulting combined workflow will then be presented through the hybrid SOA user interface for review by the user. Probability of establishing composite workflows composed of a service and a standalone application is greater than 90% assuming that the standalone application does not have licensing provisions preventing such interaction.

IV. INFRASTRUCTURE TRADEOFFS AND CONSIDERATIONS As with any architectural decision, the selection of an architectural pattern when designing for a scientific software platform requires some tradeoffs with regards to supported qualities and capabilities. This section outlines some of the key tradeoffs that could be expected with each of the aforementioned infrastructure architectures. A. Grid OGSA does provide numerous benefits with respect to supported quality attributes. However, this comes with some

17-755 Architectures for Software Systems functions when implementing an SOA. This would minimize the performance loss evident with parsing and marshalling XML files, yet it would still not provide the performance realized by solutions not requiring standard protocols. Reliability vs. Resource Sharing there is no guarantee that a service a user interacts with is generating valid and reliable data when services are managed by multiple entities. Providing shared resources through services is a valid example. The user has a service contract, but there is no way to ensure that the algorithms employed in the service core are accurate. Vendor Lock In vs. Scalability/Flexibility most SOA tools support a predefined set of functions that are available to users and thus inhibit flexibility. This means that support for all possible features and functionalities is not realistic. Over time, requirements may also change and the SOA tool may no longer provide the necessary functionality, which would in turn inhibit scalability. Usability vs. Manageability support for some of the usability considerations, such as familiar user interfaces, may not be available in the service implementation. However, standardization provided by the SOA environment would provide consistency and standards to support manageability of the current system. C. Hybrid SOA Hybrid SOAs, such as SORASCS, offer a solution that mitigates some of the tradeoffs required by traditional SOAs. However, there are still some tradeoffs that should be considered when choosing this route. Relevant tradeoffs include: Development Costs for Customization vs. Usability deviating from the generic SOA infrastructure would require additional development effort and increased costs to the organization. However, saving money on a less usable system could prevent the adoption of a system and ultimately lead to the failure of a system. Standardization vs. Familiarity of Interfaces- providing familiar interfaces for users would likely impede the ability to provide a standard interface for interaction. This would mean that users would have to use multiple interfaces for which navigation and operating procedures could be vastly different.

10 should take time and consideration in evaluating both the architectural drivers and tradeoff s to determine the best match with regards to architectural styles prior to finalizing architectural decisions. Although individual analysis of communities is required to achieve the optimal solution, some general guidelines with respect to the above infrastructures can still be extrapolated. These are highlighted in the subsequent sections. A. Grid Currently, several software communities, such as the Open Science Grid and SBGrid are providing community hardware and software resources on a network for collaboration that uses OGSA standards. Therefore, since these communities have already launched software on the Grid, it provides a viable launching infrastructure for platforms. Using metrics gathered from the software launched in these networks, it would be possible to ascertain what software is being used most frequently, as well as, any of the relevant dependencies needed when utilizing these software tools. Once an initial list of candidates could be established, further analytics could be performed to determine the most suitable software for the platform. This analysis would involve establishing mechanisms for maintenance and proliferation through interviews with the original software developers, who would be needed to ensure that the software were kept up to date and functioning. Once the platform software tools were established, they could then be bundled as a platform to Grid users using a VO, which could then be accessible to users as a platform. Grid architecture is also a good alternative when shared hardware resources are required by the community for highdemand processing. If we refer back to Figure 1, there is a resource layer that provides hardware resources polled from multiple organizations. The coordination of these hardware components when linked together form a hardware infrastructure that provides unparalleled computing capabilities when compared to any single organizations computing power. Organizations can thus reduce overhead and the need to purchase additional hardware resources through the establishment of a Grid infrastructure offering shared hardware resources. Grid architectures are also a poor solution with regards to the needs of certain organizations. Conditions that are not conducive to implementing a Grid infrastructure for a platform include: Communities where there is little or no collaboration between organizations and VOs would not be beneficial Communities that frequently deal with confidential or private data that would require isolation from other organizations Communities where the knowledge barrier required to shift to Grid services outweighs the benefits Communities where the performance provided by standalone applications supersedes those provided by the Grid and high performance is of the upmost importance to the organizations computing needs

V. PLATFORM DEVELOPMENT RECOMMENDATIONS As is evident from the above architectural analysis, there is no one size fits all solution for establishing platforms for scientific software. Instead, potential platform contributors need to analyze the operating environment and tradeoffs relevant to their domain specific software community. For example, a platform deployment strategy that is appropriate for the structural biology community may not be conducive for the physics community because of differences in the priorities of their architectural drivers or tradeoffs. Therefore, architects

17-755 Architectures for Software Systems B. SOA Service-oriented architectures are appropriate solutions for platforms in science in some solutions. One such solution is when the community is already employing web services for internal usage. This means that the community is familiar with the technologies needed and the barriers for entry are low. Through the integration of an SOA, the users could extend current web service offerings to include services geared towards legacy systems. Organizations would also benefit from an SOA implementation when tool reuse through services could help eliminate the duplication of effort among organizations. Software provided as a service can be provided to various service requestors without a need to rework the implementation. Rather, the SOA service contract would provide the standardization necessary for the reuse of software components in a heterogeneous environment. This would avoid redundant coding efforts and provide a viable infrastructure for communities wishing to leverage these benefits. Communities where SOA implementations may not be desired include: Communities where available SOA tooling does not support the required functionality Communities where associated performance degradation due to standard protocols, such as XML would impede the ability to achieve the needed functionality Communities where user expectations necessitate the adherence to current user interfaces and procedures C. Hybrid SOA A hybrid SOA may be an appropriate solution when service implementation of software may not be desired, and integration of services and standalone applications is needed. Such communities were traditional SOA tooling is insufficient, but where value can be gained from services may be good candidates for a hybrid SOA. Hybrid SOA implementations may also be desirable in communities where users expect to still be able to access applications in a predicable manner. A hybrid SOA would be a poor choice if the following conditions are prominent in the software community: Communities that do not have the budget to implement a customized solution Communities where a generic SOA implementation would work equally well and no additional value could be added through the hybrid SOA

11 Hopefully, with additional research, many of the problems found in scientific software can be eliminated, and the tendency to produce stable software is promoted rather than hindered. REFERENCES
[1] [2] [3] Herbsleb, James H. Project Description SciSIP: The Scientific Software Network Map. NSF Grant Proposal. Spring 2011 Howison, James and James Herbsleb. CSCW 2011. March 19-23, 2011, ACM. Foster, I., Kesselman, C., Nick, J. and Tuecke, S. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Argonne National Laboratory, 2002, URL:http://www.globus.org/alliance/publications/papers/ogsa.pdf Foster,Ian et al. The Open Grid Services Architecture, Version 1.0. Grid Forum, 2005. URL: http://www.gridforum.org/documents/GWDI-E/GFD-I.030.pdf Foster, Ian. The Grid: A New Infrastructure for 21st Century Science. Physics Today, 55(2):42-47, 2002. Foster, Ian. What is the Grid? A Three Point Checklist. GRIDToday, July 20, 2002. I. Foster, D. Gannon, H. Kishimoto, J. Von Reich. Open Grid Services Architecture Use Cases. Information Document, Global Grid Forum (GGF), October 28, 2004. S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maguire, T. Sandholm, P. Vanderbilt, D. Snelling; Open Grid Services Infrastructure (OGSI) Version 1.0. Global Grid Forum Draft Recommendation, 6/27/2003. Mary Fran Yafchak and Mary Trauner. The Grid Technology Cookbook. Southeastern Universities Research Association (SURA) et al, 2006-8 URL:http://hv3.phys.lsu.edu:8000/cookbook/gtcb/index.php Science Grid. September 14, 2005. Picture from CERN in GridCafe. URL: http://www.interactions.org/sgtw/2005/0914/ Newcomer, Eric and Greg Lomow. Understanding SOA with Web Services. Upper Saddle River, NJ: Pearson Education, Inc. 2005. Sriram Krishnan and Karan Bhatia. SOAs for Scientific Applications: Experiences and Challenges. Future Generation Computer Systems. 2009 April 1; 25(4): 466473. doi:10.1016/j.future.2008.09.001. Lin, Cui et al. A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution. IEEE Transactions on Services Computing. Vol. 2. No. 1. January-March 2009. Andrea Bosin, Nicoletta Dessi, Barbara Pes, Extending the SOA paradigm to e-Science environments. Future Generation Computer Systems, Volume 27, Issue 1, January 2011, Pages 20-31, ISSN 0167739X, DOI: 10.1016/j.future.2010.07.003. http://www.sciencedirect.com/science/article/B6V06-50JHBMY1/2/d29be0bba398b60a33734dd195949db4 Zhuofeng Zhao; Jun Fang; Jing Cheng; , "CAFISE-S: An Approach to Deploying SOA in Scientific Information Integration," Web Services, 2008. ICWS '08. IEEE International Conference on , vol., no., pp.425432, 23-26 Sept. 2008 doi: 10.1109/ICWS.2008.84 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=46702 04&isnumber=4670130 Bradley Schmerl, David Garlan, Vishal Dwivedi, Michael Bigrigg and Kathleen M. Carley. SORASCS: A Case Study in SOA-based Platform Design for Socio-Cultural Analysis. In Proceedings of the 33rd International Conference on Software Engineering., Hawaii, USA, 2011. David Garlan, Kathleen M. Carley, Bradley Schmerl, Michael Bigrigg and Orieta Celiku. Using Service-Oriented Architectures for SocioCultural Analysis. In Proceedings of the 21st International Conference on Software Engineering and Knowledge Engineering (SEKE2009), Boston, USA, 1-3 July 2009. Kunsz, Peter Z. The Open Grid Services Architecture. A Summary and Evaluation. IT Division - Database Group, CERN, 1211 Geneva Switzerland. April 16, 2002

[4]

[5] [6] [7]

[8]

[9]

[10] [11] [12]

[13]

[14]

[15]

[16]

VI. CONCLUSION In conclusion, research into platform strategies for scientific software communities is still needed. However, the provided architectural analysis does provide a starting point for potential implementation strategies. That being said it should be reiterated that these are simply generalized guidelines, and that without proper analysis of each individual community, the appropriate architecture for the platform cannot be identified.

[17]

[18]

You might also like