You are on page 1of 10

2017 IEEE International Conference on Big Data (BIGDATA)

Application of Big Data analytics in process safety and risk management

Pankaj Goel1,2 , Aniruddha Datta1 , M.Sam Mannan2


Department of Electrical and Computer Engineering,
1
2
Mary Kay O’Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering,
Texas A&M University,College Station, USA
Email: pankaj.goel@tamu.edu, datta@ece.tamu.edu, mannan@tamu.edu

Abstract—In recent years, there has been an increasing initiative” in 2009 to harness big data for development and
interest in the field of big data analytics. It has been established humanitarian actions and published a report [4] highlighting
that there exist large amounts of data in the energy industry1 . the challenges and opportunities. According to [5], [6],
However, there is a need to develop methods combining domain
knowledge to transform this data into meaningful information a social, economic, and technical revolution has emerged
to return business intelligence. The existing literature on big around us, resulting in an exponential growth of data. This
data analytics focuses on applications in various fields such as data is generated at different levels in the form of social
healthcare, aviation industry, finance, energy industry, and sup- media information, smart devices, Internet of Things (IoT),
ply chain. However, within the energy industry, the application bank services, and reports etc. With the advancements in
of big data analytics in process safety and risk management is
in the nascent stages. The objective of this study is to discuss the computing technologies, it is easier to store data (clouds,
potential of big data analytics in the area of process safety and data warehouses), and draw insights with the help of tools
risk management in the energy industry. The paper outlines the such as artificial intelligence (AI), machine/deep learning,
systemic framework with different stakeholders, data sources, granular computing [7], cognitive computing, and computer
challenges, and discusses the benefits of big data analytics in vision. Big data has been defined by different users and
process safety. Four case studies with different applications
ranging from incident database analysis, predictive modeling selected definitions of big data are summarized in Table I.
for pump failures, dynamic risk mapping of operating plant, The attributes of big data are defined as 7 V’s and listed as
and image analysis to gain insights are demonstrated. It is follows[8], [9]:
concluded that the application of big data analytics would • Volume: large amounts of data generated from devices.
provide valuable insights for more informed policy, strategic,
and operational risk decision-making leading to a safer and • Variety: heterogeneity of data types, representation, and
more reliable industry. semantic interpretation.
• Velocity: data is generated at a rapid rate compared to
Keywords-Big Data Analytics; Process Safety; Risk Map;
Incident Database; Fault Detection; Image Analysis the traditional systems and requires processing.
• Value: added value from the information extracted.
I. I NTRODUCTION • Veracity: uncertainty, accuracy, and reliability of data.
• Variability: number of inconsistencies, variable data
“Big Data” has transformed from a buzzword to a real
sources and data changes (dynamic).
value creator in recent years and is serving as a key
• Valence: inter-connectedness, inter-relation.
enabler in boosting the performance of operations, economy,
and businesses. Several countries and organizations have This paper provides the basis of application of big data
started various projects to harness the big data. In the analytics in process safety that would provide valuable in-
United States, The Obama Administration launched the ‘Big sights. This would result in more informed policy, strategic,
Data Research and Development Initiative’ in 2012, and and operational risk decision-making leading to a safer and
in 2016 the administration released “The Federal Big Data more reliable industry. The paper is organized as follows:
Research and Development Strategic Plan”. This highlights the applications in other industrial sectors and value created
the emerging big data capabilities and provides guidance by big data analytics are described in Section II. Section III
for developing or expanding federal big data research and outlines the system framework with different stakeholders,
development (R&D) plans [1], [2]. In China, Ministry of data sources, challenges, and discusses the benefits of big
Industry and Information Technology (MIIT) have prepared data analytics in process safety. In Section IV, four different
a five year plan for developing big data infrastructure case studies in process safety and risk management and their
through standardized systems [3]. In Japan, big data is a key results are explained. Section V concludes the paper and
component of the national technological strategy since 2012. highlights future research areas and applications.
The United Nations (UN) established the “Global Pulse II. B IG DATA A NALYTICS AND A PPLICATION
1 Energy industry includes oil & gas industry , petroleum refining, In addition to the attributes of big data mentioned in Sec-
chemical manufacturing. tion I, it is essential that mechanisms exist for visualization

978-1-5386-2715-0/17/$31.00 ©2017 IEEE 1143


Table I Table II
D EFINITIONS OF B IG DATA B IG DATA APPLICATION BY INDUSTRY

Organization Definition Energy in- Health- Supply Finance Customer


NIST [10] “ Big Data refers to the inability of tradi- dustry care chain focused
tional data architectures to efficiently han- Regulation Clinical Supply Advanced Customer
dle the new datasets.” and policy decision chain opti- forecasting segmenta-
International Data Corpo- “Big data technologies describe a new gen- support mization tion
ration (IDC) [11] eration of technologies and architectures, Frauds, Individual Customer Governance, Brand &
designed to economically extract value from cybersecu- analytics satisfac- risk and Sentiment
very large volumes of a wide variety of data, rity, risk tion compli- analysis
by enabling high-velocity capture, discov- management ance
ery, and/or analysis.” Operational Personalized Product Financial Pricing,
Oracle [12] “Big data is the data characterized by four performance, medicine reviews, perfor- Prof-
key attributes: volume, variety, velocity and optimization profitabil- mance, itability,
value.” ity frauds satisfaction
IBM [13] “Big data is the data characterized by three
attributes: volume, variety and velocity.”

alytics in process safety and risk management. The elements


include data, stakeholders, methods, and technology that act
and understanding of the information and relations between as Enablers. Each of these elements has sub-elements to
the data and the inference of meaningful information out contribute to the overall process of PSBDMF.
of it returning, what is called, business intelligence (BI).
This requires data storage and management, hardware and
software resources, appropriate domain knowledge, and new
methods and technologies. Combining big data with analyt-
ics can provide a significant advantage to make timely and
efficient decisions related to 1) cost, 2) time, 3) product
development, and 4) optimization. A humongous amount of
data is captured and stored in different formats (structured,
semi-structured and unstructured), from different sources
(sensors, machines, applications, web, IoT) and stored by
the organizations. The data is captured, stored, processed
in batches or real-time with the help of algorithms or
mechanical processes. Application of these methods vary for
different sectors, ranging from aviation, automotive industry,
banking and capital investments, communications, energy,
utilities and mining, government, health industry, insurance,
retail, technology etc.. It is important for these industries
to make most out of the weak signals from several key
Figure 1. Process Safety Big Data Management Framework
data sources both structured and unstructured and deliver a
real time impact for an easy, quick and effective decision
making. Organizations and industries are exploring data
analysis methods to discover insights and prepare person-
alized solutions to the challenges faced [14]. Some of the
potential key areas and big data methods applications related
to them are highlighted in Table II.

III. P ROCESS S AFETY B IG -DATA M ANAGEMENT


F RAMEWORK
A. System introduction
Considering the significance of big data in process safety,
this study establishes the Process Safety Big Data Man-
agement Framework (PSBDMF). As shown in Figure 1,
this framework is a two-pronged approach comprising of
Challenges and Elements. The challenges are policy related, Figure 2. Process safety data sources
strategic, and operational that act as Drivers for big data an-

1144
B. Process Safety Data
Within the energy industry, data is generated continuously
from various sources and available in different formats.
Process safety related data can be broadly categorized into
three different levels as depicted in Figure 2.
• Data collected by regulatory agencies such as Depart-
ment of Transportation (DoT), Occupational Safety and
Health Administration (OSHA), United States Environ-
mental Protection Agency (USEPA) and similar agen-
cies in other countries. Some examples of databases are
incident statistics, statutory fines;
• Data collected by industry consortiums such as Amer-
ican Petroleum Institute (API), Oil and Gas Producers
Association (OGP), and many more. Some examples of
databases are metrics system, injury records, production Figure 3. Types and classification of process safety data
data;
• Data collected by organizations (manufacturing facili-
ties) such as chemical plants, oil and gas exploration [15],[16],[17],[18]. Some reasons noted in literature are
units etc. These databases are further classified into failure to learn from incidents [19], challenges in alarm
seven areas based on the source and type of data. These management and decision making [20], inadequate tools to
are as follows: quantify social (human & organizational) aspects etc. [21].
– Historian: process parameters, production data, Also, there is an increase in the development of different
alarm logs, machine monitoring, system fault process safety and risk assessment methodologies and tools
records. over the past few decades (1970-2020) [15]. From the data
– Design data: process flow diagrams (PFDs), pip- application viewpoint, the authors of this study believe that
ing and instrumentation diagrams (P&IDs), plant in each of those development stages, data was collected and
layouts, standard operating procedures (SOPs), in- utilized in some form in the past. However, a systematic
sturment and equipment data-sheets. approach has not been established to implement process
– Operational data: work permits, mechanical in- safety big data management. This gap can be filled with
tegrity and quality insurance data. the incorporation of PSBDMF in addition to current risk
– Centralized Maintenance Management System assessment and mitigation methods. Some of the critical
(CMMS): maintenance and reliability records, risk- questions, which most risk assessors deal with are - what is
based inspections and filed visit records. the right format for data collection?, what data are significant
– Laboratory Information Management System to collect?, are our facilities becoming any safer?, which
(LIMS): quality reports, lab test reports. metrics have an impact on safety?, can we analyze the health
– Process Safety Management (PSM) system: audit of safety barriers?, and what will be an effective maintenance
reports, Learning From Incident (LFI) communica- schedule? The incorporation of PSBDMF will address the
tions, training records, safety culture assessments. above-mentioned questions at different levels. Challenges
– Process safety studies: process hazard analysis related to these questions can be categorized into policy,
(PHA)/ hazard and operability studies (HAZOP), strategic, and operational as follows, see Figure 4.
emergency response plan evaluation studies, inci-
dent investigation reports. • Policy: This refers to the policy or rule making related
challenges. These can be addressed by the regulatory
Process safety data can have following types: static, dy-
agencies. Analysis of current databases can help infer
namic, and classifications: structured, and unstructured as
knowledge on which other data may be relevant, or
shown in Figure 3. Static data means data or report generated
effective usage of collected data, or prioritizing the
over a period and remains fixed for a considerable amount
inspection schedules.
of time, while dynamic data means data which change with
• Strategic: This refers to the industry consortiums such
time and are continuous in nature. Structured data refers to
as API, and OGP, which collect data for industrial sec-
data in table or specific report formats whereas unstructured
tors. Analysis of these databases can help in the identifi-
refers to data primarily expressed as text.
cation of robust metrics that influence the process safety
C. Process Safety Challenges significantly, or improvement of data collection and
Many authors have established that one major chal- management structure, or improvement in monitoring
lenge in process safety is that incidents continue to occur with insights on new metrics

1145
• Operational: This refers to the manufacturing
plants/facilities, which collect a wealth of data
within the organization. Analysis of these databases
can help in the identification of weak signals, or
evaluation of the effectiveness of safety barriers,
or recognition of optimal maintenance schedule, or
barriers prioritization and resource allocation for
emergency response based on dynamic risk profiles.

Figure 5. PSBDMF phases

• Safer and reliable operations by incorporation of in-


Figure 4. Challenges in process safety
sights from data analytics enabling optimal mainte-
nance schedules to avoid unplanned shutdowns.
D. Data analytics application and benefits • Resource allocation towards risk reduction and mitiga-
As explained in Section III.B, there exist huge amounts tion utilizing information on risk ranking for various
of data that is being generated related to process safety areas in the facility.
at different levels-regulatory agencies, industry consortiums, • Effective action items from trending and analysis of
and plant/facility. Based on data science, this raw data needs process safety indicators.
to be subjected to the process of pre-processing or cleaning • Improve monitoring by the introduction of new metrics
in order to organize it into information. The data in the and/or revision of existing metrics.
form of information or developed databases is then available • Correlation development and use of detailed analysis
for use by analysts to extract value from it and convert (structured and unstructured data) to improve audits,
it into intelligence. These then lead to appropriate actions incident investigations, hazard evaluation studies.
and support decision-making. Figure 5 highlights PSBDMF • Development of visualization dashboards for personnel
items for various phases to support improved process safety from different levels within the organization.
and risk management. These phases are established follow- For the above mentioned challenges and benefits, some of
ing the [22] model with steps: data understanding, data the application examples are described in Section IV as case
preparation, modeling, evaluation, and deployment. After studies.
the first phase of collection of process safety databases, the
following phases help in answering process safety questions: IV. C ASE STUDIES
• Descriptive analytics: deals with determining what hap- A. Case study I: Pipeline and Hazardous Materials Safety
pened and converting the data into information such as Administration (PHMSA) incident database analysis
pattern charts or histograms.
• Diagnostic analytics: refers to data presentation to un- Table III
D ETAILS OF PHMSA DATASETS USED FOR ANALYSIS
derstand why something happened or underlying causes
for undesirable situations or events. Details Dataset-A Dataset-B
• Predictive analytics: refers to developing models on (2002-2009) (2010-2017)
existing datasets to extract information on what will Number of data points 3029 2969
happen or predict future trends. Number of missing values 81 0
• Prescriptive analytics: refers to support decision- Number of states for reported 48 33
incidents
making or what should be done by use of advanced
Property damage range for re- $0 to $150 $0 to $ 840
analytics. ported incidents(millions)
Some of the significant benefits of implementing PSBDMF Number of unique commodi- 4 5
are as follows: ties
• Dynamic evaluation of risk profile of a facility with the Number of unique causes of 8 8
support of real time visualization. incidents reported

1146
Most of the organizations in the oil & gas industry have
implemented process safety management which involves
capturing the details related to near-misses or an incident
along with other PSM elements. Similarly, several federal
agencies such as DoT capture similar information related to
their jurisdictions. From these databases important trends,
areas of concerns and improvement methods can be derived
to reduce the losses due to downtime, injuries, property
damage and environmental impact. A detailed analytics
plan can be used to address the challenges and derive the
information for the stakeholders. One of the main challenges Figure 7. Choropleth incident map (developed from PHMSA database:
2002-2017)
during analysis is how to select specific variables from hun-
dreds of variables and draw meaningful conclusions [23]. To
demonstrate the application of such analysis on an incident available data, model generation and application of the
reporting database, we used a publically available dataset of model. For this purpose, one of the significance criteria
HAZMAT incident from PHMSA website [24], processed it was used based on property loss>=$50,000USD [24]. First
to build a predictive model and validated it with the help of a descriptive analysis was performed in Python to under-
Python [25] and IBM SPSS Modeler [26]. The summary of stand the datasets available and understand the nature of
the database is described in Table III Two datasets A and B the incidents, types of commodities involved as illustrated
have been used in this analysis. In general the missing data is through a graphic in Figure 6 and to categorize the states
imputed during the analysis; however, in this case it is not based on number of incidents, a choropleth map shown in
possible since the database is an incident database, based Figure 7 is prepared using ‘Python’[25] and ‘Plotly’[27].
on actual scenarios and investigations. Hence, the missing To prepare a predictive model for the incident significant
values observed were discarded. For ease in visualization classification, first the dataset from 2002-2009 was used to
following changes were made to the datasets: train the model and generate various decision rules with IBM
1) The description for commodity classification was SPSS Modeler. The details observed from different methods
made short. used for the purpose of model generation and deployment
2) The description of commodities and causes for both are show in table IV and a chi-squared automatic interaction
datasets were made universal. detection (CHAID) tree is shown in Figure 8. Out of the
mentioned models, classification and regression tree (C&RT)
was selected as it was providing the predictions at a higher
accuracy according to the significance rule defined prior to
the study.
Table IV
M ODELS TESTED ON DATASETS

Model Lift Overall Accuracy


No property loss selected
CHAID 1.916 75.91
Random tree 1.882 72.8
C&R tree 1.831 75.916
Property loss selected
CHAID 3.333 96.4
Random tree 3.333 96.4
C&R tree 3.333 100

The C&RT results are shown in Figure 9 and 10 for


dataset A and dataset B respectively. The predicted sig-
nificance result for dataset B is shown in Figure 11. It is
Figure 6. No. of accidents by commodity observed that for dataset B, the generated model based on
C&RT is able to predict the significance of the event with
The data analytics methods were used to analyze the data 96% accuracy. As there was no missing information the
and machine learning techniques to categorize and predict model was able to predict with great accuracy. Such studies
the significance of the incident. This was based on the and models are useful in identifying the areas of concern

1147
Figure 8. CHAID tree for dataset A

Figure 12. Extracted features snapshot

Table V
D ETAILS OF FEATURES EXTRACTED FOR THE DATASET
Figure 9. Tree for datset A Figure 10. Tree for datset B
Variable Extracted features
Vibration 3-h and 24-h rolling meand and standard deviation
Voltage 3-h and 24-h rolling meand and standard deviation
Faults Number and type of faults
Maintenance Days since last replacement of pump parts
Failures Actual failures

Figure 11. Predicted significance of dataset B


reliability and maintenance. Equipment reliability has been
studied in literature due to its significance in avoiding
both from the organization’s and regulatory authorities’ mechanical problems [28], [29]. Predictive maintenance is
perspective. At the same time, such systems can be used an important component in the MI system and plays a key
to interpret the historical data available as reports, notes role in improving reliability to reduce the probability of
and copies to understand the importance of reporting and unexpected shutdowns, production losses due to equipment
learning from incidents. Similar results could be further downtime, and safety incidents. Prediction of these problems
utilized by agencies to allocate resources optimally towards would improve operations and support effective mainte-
prioritization of inspections and understanding the key focus nance. Various kinds of operational data for equipment such
areas. as vibration and other condition monitoring data, planned
or unplanned maintenance event data, fault occurrence,
B. Case study II: Predictive model for equipment failure failures are collected from different systems such as CMMS,
A key element in Process Safety Management is Me- historian etc. in the manufacturing facility. With the use of
chanical Integrity (MI) that is associated with equipment this historical data and application of data analytics, model

1148
Table VI Table VII
C ONFUSION MATRIX E VALUATION M ETRICS FOR CONFUSION MATRIX

Actual Predicted Precision Recall F


None Part 1 Part 2 None 0.99587 0.99689 0.96388
None 964 0 3 Part 1 1 0.67 0.8
Part 1 1 2 0 Part 2 0.5 0.5 0.5
Part 2 3 0 3 Overall accuracy: 0.99283
Kappa score: 0.58539

based failure predictions of equipment or equipment parts


can be made.
A case study of a pump is used to demonstrate the applica-
tion of data analytics for failure prediction. The data types
used for this study are: time-series data which consists of
Figure 13. Actual vs Predicted Figure 14. Actual vs Predicted
vibration and voltage (hourly), fault logs (vibration, voltage) (Part 1, Part 2 class) (None class)
for two parts (rotor, motor), planned and unplanned main-
tenance records, and failure of the parts. Synthetic datasets
were generated following certain statistical distributions in
C. Case study III: Dynamic risk mapping
programming language R for the year 2016 [30]. For these
datasets, complexities were added and certain reasoning Manufacturing facilities such as chemical plants, offshore
was followed to make them similar to real situations. For platforms have been recognized as complex socio-technical
example, outliers were placed in the time series vibration system by researchers [15]. These facilities have various sub-
and voltage data; actual failures were assigned for the higher systems and/or components that have complex interactions,
number of days since last replacement of a part. Features which result in changing operations environment. This af-
to predict the health of the pump are extracted from these fects the risk profile of the facilities and hence it is important
data sources by using the time-stamps from time-series data. to study the emergent behavior of these interactions within
Table V shows the different extracted features to develop the complex systems. There is a relatively small body of
the prediction model [31],[32],[33]. Figure 12 provides the literature [34] that is concerned with dynamic risk profiles
snapshot of all features that are incorporated in the training due to emergent behavior of complex process systems using
formula. big data analytics. In this paper, a systematic methodology is
For the modeling process, features dataset is divided into described and developed. For this purpose, the process unit
training and testing datasets. The training dataset comprises system is reproduced as a system of layers as illustrated in
of the first eight months of data and the testing data com- Figure 15. Based on this system of layers, dynamic risk
prises of the last four months. Figures 13 and 14 illustrates profile is obtained by the incorporation of the wealth of
the results of pump predictive maintenance model for three data generated in the facility from various sources such
classes ‘None’ represents no failure, ‘Part 1’ represents as historian, CMMS, operational data, PSM system. The
failure of the rotor, and ‘Part 2’ represents failure of motor. evaluation of dynamic risk involves calculation of initiating
Table VI represents the confusion matrix, which gives the event frequency (F1), operations hazard factor (OHF) (F2),
true positives (TP), true negatives (TN), false positives (FP), final probability of failure on demand (F3) to give the
false negatives (FN) for classes. It is observed from this final risk as R= F1* F2* F3 similar to LOPA aaproach
matrix that 964 of the 967 ‘None’, 2 of the 3 ‘Part 1’, consequence is assumed to be fixed
and 3 of the ‘Part 2’ classes were predicted correctly. Table Figure 16 presents a flowchart outlining the procedure and
VII provides the metrics such as precision, recall, F factor, data sources for dynamic risk analysis calculations. In the
overall accuracy, and kappa score to interpret the robustness current study, guidelines to evaluate operations hazard factor
of the model. It is observed that for prediction of ‘Part 2’ and penalty factors for barriers are based on literature
failures, the model does not yield good results. This could be sources [35], [36]. However, with the real plant data, these
due to the use of synthetic data in the model, real datasets could be assessed from safety culture surveys data, PSM
would provide a more accurate analysis and prediction of system studies, audit reports and more. In this case study,
the failures. Application of similar studies would aid in the an example of a modified accident scenario is considered
early recognition of failures, time for operations to adapt, to analyze and map the dynamic risk profile. The following
cost reduction in maintenance by following a good strategy scenario with two different cases is considered:
and improve safety by reducing unexpected failures. Scenario: Modified scenario based on sugar dust explosion

1149
at Imperial Sugar manufacturing facility [37],[38].
Case1: Existing or conventional method
Case2: Proposed method with consideration of operations
hazard factor (hot work) and penalty factors for safety
barriers (no maintenance).
The methodology outlined in the flowchart is applied and
Figures 17 and 18 show the input and output screens for
the two cases. It is evident that the risk level changed
for case2 with inclusion of dynamic components in plants
such as OHF (hot work) and penalty of no maintenance.
This type of dynamic risk profile analysis would support
more informed operational decisions, improved maintenance
plans, work execution strategies, and overall safer and more Figure 17. Dynamic Risk Analysis Case 1
reliable operations.

Figure 18. Dynamic Risk Analysis Case 2

D. Case study IV: Application of image analysis


As defined in section II, with the advancement in IoT,
digitization and use of handheld devices, the recording and
Figure 15. Big Data Dynamic Risk Framework capturing of unstructured data (images and videos) has
increased significantly in the manufacturing plants. During
normal field visits, operators use these devices to enter the
details related to the health of the equipment, instruments,
process streams etc. This has resulted in mobility and
real-time insights of the plant for the managers and other
concerned engineers. They can access this information
through a dedicated application either on their phone or
computer at any global location. Image analysis can be
used as a tool to assist and provide better insights for
decision making by using images or videos (unstructured
data type). One such example is the use of seismic imaging
technique to capture seismic images data to prepare 3D
maps to improve prediction, and mitigate the risk. Such
analysis is performed at a higher level of the organization
to improve productivity and make business decisions.
Another application of image analysis can be comparing
two images captured at a different time to understand the
physical changes occuring in the system, equipment or
the instrument over a period of time. For this purpose, a
Figure 16. Dynamic Risk Analysis Procedure
dedicated application can be created, where images taken
at different times can be compared to show the differences,

1150
data, identify changes, and provide directions for the users to
take appropriate actions. This can reduce the overall analysis
time, action time and increase efficiency, productivity and
availability of the systems.

V. C ONCLUSION

The application of big data analytics in process safety


and risk management is evolving. Its application would
provide valuable insights for more informed policy, strategic,
and operational risk decision-making leading to a safer and
Figure 19. Original image [39] Figure 20. Analyzed image
more reliable industry. This paper represents a beginning
in gathering process safety related data and harnessing the
value of the data collected to improve process safety at
which may or may not be visible during the routine
these facilities. A systemic framework called PSBDMF on
operational visits. One of such case is described in this
process safety big data is presented and various sources and
example: Two images similar to a plant condition, electrical
types of data and challenges that can be solved using big
panel in this case are selected for this analysis. The analysis
data analytics are described. Large amounts of data are and
is conducted with the help of a code written in Python with
will continue to be generated and collected in this area in
‘Open CV’ library [40], [41], ‘scikit-image’[42] package
the three different levels of PSBDMF - regulatory, industry
and ‘imutils’ package. Using this method, we are able to
consortiums, and manufacturing facilities. The challenge
determine whether the two images are identical or not with
is to develop ideas and methods for analyzing data for
respect to (x, y) coordinate location. The code provided
detecting abnormal situations, optimizing processes, bench-
the result by comparing the original image (Figure 19)
marking performance and preventing catastrophic failures.
and modified image, highlighting the changes observed
In this paper, four case studies on predictive modeling for
in Figure 20 marked as the red rectangles.The quality of
pump failures, incident databases analysis, image analysis
images is measured with the help of Structural Similarity
to gain insights, and dynamic risk mapping of the plant
Index (SSIM). SSIM is an image quality metric used to
were presented. These types of applications can be further
assess the visual impact of three characteristics in an image:
developed into mature models and methods. Further work
luminance, contrast, and structure.The overall index is a
needs to be done to explore and address the following areas:
multiplicative combination of the three terms.The value of
the Structural Similarity Index SSIM index can be between • Modifications needed to the databases collected by
[-1, 1], where 1 will represent the perfect match of the regulatory authorities to make the data more meaningful
images [43]. and useful.
• Improvements in data collection and management struc-
SSIM (x, y) = [l(x, y)α ].[c(x, y)β ].[s(x, y)γ ] ture for various consortiums in order to enable data
sharing for analysis.
2µ µ +C1
where l(x, y)= µ2 +µ
x y 2σ σ +C2
2 +C1 , c(x, y)= σ 2 +σ 2 +C2 ,
x y • Identification of correlations between incident statistics,
x y x y
plant conditions and financial metrics for optimal allo-
σxy
s(x, y)= σx +σ
+C3 cation of resources.
y +C3
where µx , µy , σx , σy , and σxy are the local means, • Identification of scope of developing new policies or
standard deviations,and cross-covariance of image x,y. If regulations based on patterns and trends across indus-
α = β = γ = 1 (the default components), and C3= C22 trial sectors.
(default selection of C3) then the index simplifies to: • Preparation of dynamic risk dashboards for manufac-
(2µx µy +C1).(2σx σy +C2)
SSIM (x, y)= (µ2 +µ 2 +C1).(σ 2 +σ 2 +C2)
turing facilities to improve operating practices; detect
x y x y
emerging instrument and equipment failures.
The SSIM value obtained for this analysis is 0.9942,
which indicates that there were some differences between VI. ACKNOWLEDGMENTS
the two images as observed and highlighted correctly by the
code. The code prepared is specific for these images and The Mary Kay O’Connor Process Safety Center supported
can be modified for required cases or inputs. With the help this research. The authors acknowledge receiving useful
of similar tools, several specific applications pertaining to suggestions from Mr. Ramesh Desabhotla, an oil industry
the image analysis and video analysis can be generated to professional, and Prerna Jain, a doctoral student at Texas
identify the information hidden in the historical and current A&M University.

1151
R EFERENCES systems approach for improved risk and safety management,”
Journal of Loss Prevention in the Process Industries, 2017.
[22] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz,
[1] M. K. (2106) Administration issues strategic plan for big data C. Shearer, and R. Wirth, “Crisp-dm 1.0 step-by-step data
research and development. mining guide,” 2000.
[2] N. NITRD et al., “The federal big data research and devel- [23] S. Anand, N. Keren, M. J. Tretter, Y. Wang, T. M. OConnor,
opment strategic plan,” 2016. and M. S. Mannan, “Harnessing data mining to explore
[3] Xinhua. (2017) China to manage big data through standard- incident databases,” Journal of Hazardous Materials, vol.
ized system. 130, no. 1, pp. 33–41, 2006.
[4] E. Letouzé et al., “Big data for development: Challenges & [24] PHMSA. (2017). [Online]. Available:
opportunities, new york: Un global pulse (white paper): Big https://www.phmsa.dot.gov/
data for development: Opportunities & challenges (2012),” [25] G. Van Rossum et al., “Python programming language.” in
Retrieved on, vol. 13, 2016. USENIX Annual Technical Conference, vol. 41, 2007, p. 36.
[5] D. Maltby, “Big data analytics,” in 74th Annual Meeting of the [26] I. S. Modeler, “14.2 algorithms guide,” IBM Corporation,
Association for Information Science and Technology (ASIST), 2011.
2011, pp. 1–6. [27] P. T. Inc. (2015) Collaborative data science. Montral, QC.
[6] L. Chiang, B. Lu, and I. Castillo, “Big data analytics [Online]. Available: https://plot.ly
in chemical engineering,” Annual Review of Chemical and [28] F. I. Khan and M. M. Haddara, “Risk-based maintenance
Biomolecular Engineering, no. 0, 2017. (rbm): a quantitative approach for maintenance/inspection
[7] A. Skowron, A. Jankowski, and S. Dutta, “Interactive granular scheduling and planning,” Journal of loss prevention in the
computing,” Granular Computing, vol. 1, no. 2, pp. 95–113, process industries, vol. 16, no. 6, pp. 561–573, 2003.
2016. [29] M. Čepin, “Optimization of safety equipment outages im-
[8] P. Bellini, M. Di Claudio, P. Nesi, and N. Rauch, “Tasson- proves safety,” Reliability Engineering & System Safety,
omy and review of big data solutions navigation,” Big Data vol. 77, no. 1, pp. 71–80, 2002.
Computing To Be Published 26th July, 2013. [30] R. C. Team, “R: A language and environment for statistical
[9] Y. Demchenko, P. Grosso, C. De Laat, and P. Membrey, computing. vienna, austria: R foundation for statistical com-
“Addressing big data issues in scientific data infrastructure,” puting; 2014,” 2014.
in Collaboration Technologies and Systems (CTS), 2013 In- [31] H. Wickham, ggplot2: Elegant Graphics for Data Analysis.
ternational Conference on. IEEE, 2013, pp. 48–55. Springer-Verlag New York, 2009. [Online]. Available:
[10] W. L. Chang, “Nist big data interoperability framework: http://ggplot2.org
Volume 1, definitions,” Special Publication (NIST SP)-1500- [32] scales: Scale functions for visualization. [Online]. Available:
1, 2015. https://cran.r-project.org/web/packages/scales/index.html
[11] D. IDC-Vesset, B. Woo, H. Morris, R. Villars, G. Little, [33] Statisticat and LLC., LaplacesDemon: Complete Environment
J. Bozman, L. Borovick, C. Olofson, S. Feldman, S. Conway for Bayesian Inference, 2016, r package version 16.0.1. [On-
et al., “Market analysis–worldwide big data technology and line]. Available: http://www.bayesian-inference.com/software
services 2012-2015 forecast,” IDC Analyze the Future, vol. 1, [34] M. Neill et al., “An integrated approach to operational risk
pp. 1–34, 2012. management–the role of process safety management,” in SPE
[12] J.-P. Dijcks, “Oracle: Big data for the enterprise,” Oracle Health, Safety, Security, Environment, & Social Responsibility
White Paper, 2012. Conference-North America. Society of Petroleum Engineers,
[13] IBM. (2017) Bringing big data to the enterprise. [Online]. 2017.
Available: http://www-01.ibm.com/software/in/data/bigdata/ [35] T. Whipple and R. Pitblado, “Applied risk-based process
[14] A. Oussous, F.-Z. Benjelloun, A. A. Lahcen, and S. Belfkih, safety: A consolidated risk register and focus on risk commu-
“Big data technologies: A survey,” Journal of King Saud nication,” Process Safety Progress, vol. 29, no. 1, pp. 39–46,
University-Computer and Information Sciences, 2017. 2010.
[15] P. Jain, H. J. Pasman, S. P. Waldram, W. J. Rogers, and [36] T. Aven, S. Hauge, S. Sklet, and J. E. Vinnem, “Methodology
M. S. Mannan, “Did we learn about risk control since seveso? for incorporating human and organizational factors in risk
yes, we surely did, but is it enough? an historical brief and analysis for offshore installations,” International Journal of
problem analysis,” Journal of Loss Prevention in the Process Materials & Structural Reliability, vol. 4, no. 1, pp. 1–14,
Industries, 2016. 2006.
[16] M. S. Mannan, O. Reyes-Valdes, P. Jain, N. Tamim, and [37] N. Khakzad, F. Khan, and P. Amyotte, “Dynamic risk analysis
M. Ahammad, “The evolution of process safety: current using bow-tie approach,” Reliability Engineering & System
status and future direction,” Annual review of chemical and Safety, vol. 104, pp. 36–44, 2012.
biomolecular engineering, vol. 7, pp. 135–162, 2016. [38] CSB, “Imperial sugar dust explosion and fire final investiga-
[17] MARSH, “The 100 largest losses 1974-2015.” tion report,” 2009.
[18] P. Jain, A. M. Reese, D. Chaudhari, R. A. Mentzer, and M. S. [39] Electric panel. [Online]. Available:
Mannan, “Regulatory approaches-safety case vs us approach: https://www.flickr.com/photos/scottbb/202290560
Is there a best solution today?” Journal of Loss Prevention [40] The OpenCV Reference Manual, 2nd ed., Itseez, April 2014.
in the Process Industries, vol. 46, pp. 154–162, 2017. [41] Itseez, “Open source computer vision library,”
[19] A. Hopkins et al., Failure to learn: the BP Texas City refinery https://github.com/itseez/opencv, 2015.
disaster. CCH Australia Ltd, 2008. [42] F. Boulogne, J. D. Warner, and E. Neil Yager, “scikit-image:
[20] P. Goel, A. Datta, and M. S. Mannan, “Industrial alarm Image processing in python,” 2014.
systems: Challenges and opportunities,” Journal of Loss Pre- [43] Mathworks. (2017) Structural similarity index (ssim) for
vention in the Process Industries, vol. 50, pp. 23–36, 2017. measuring image quality.
[21] P. Jain, H. J. Pasman, S. Waldram, E. Pistikopoulos, and M. S.
Mannan, “Process resilience analysis framework (praf): A

1152

You might also like