Professional Documents
Culture Documents
I. INTRODUCTION
Organizations are established with the intent of profitability
[1]. To achieve and improve profitability, high level business
goals must be established to position the organization for future
success. To disseminate these goals, an organization typically
pushes the goals downward through the business layers.
Meaningful and achievable strategies must be extracted from
the goals at each business layer.
This process of setting goals and then translating them into
objectives for the various organizational layers is designed to
provide top down control over the organization. The top layer
in this hierarchy of control is known as the strategic layer [1].
In this layer high-level business goals are established, and a
business strategy is devised to achieve these goals. The
business strategy will provide an overall direction for the
organization [1]. At the next layer of the organization, known
as the tactical layer, the business strategy must be translated
into tactics which various business units can employ to
contribute towards the strategy [1]. Finally, at the lowest level
of the organization control hierarchy, the operational layer, the
business units tactics are transformed into actions which can
be accomplished by business processes in their day-to-day
operations [1].
II. PREDICTIVE ANALYTICS & ORGANIZATIONAL LAYERS
The organizational control flow begins with goal setting.
The process of establishing goals and the strategies to achieve
them requires a tremendous amount of insight. Todays
businesses have the benefit of massive amounts of data
available to them about the external environment, and the
ability to collect large amounts of data regarding their internal
operations of their business. With this data countless insights
can be drawn. The problem, however, is that this massive
amount of data is difficult to organize and analyze. Thus the
specialized fields have arisen to focus on organizing and
IV. SAS
SAS is a software suite originally developed at North
Carolina State University and released in 1972 [7]. Today it is
comprised of over 200 components [7]. It can mine, alter,
manage and retrieve data from a variety of sources and
perform statistical analysis on it [7]. SAS provides a graphical
point-and-click user interface for non-technical users and
more advanced options through the SAS programming
language.
A. Enterprise Miner
The SAS module of importance, in reference to Predictive
Analytics, is SAS Enterprise Miner. This module is focused
on the mining of data relationships and the creation of
accurate descriptive and predictive data models [8].
B. Predictive Analysis in Enterprise Miner
The process of data mining and analysis in Enterprise
Miner is described in 5 steps (SEMMA) in the Getting Started
with SAS Enterprise Miner 14.1 [8, page 1].
Sample the data by creating one or more data sets. The
sample should be large enough to contain significant
information, yet small enough to process. This step includes
the use of data preparation tools for data import, merge,
append, and filter, as well as statistical sampling techniques.
Explore the data by searching for relationships, trends, and
anomalies in order to gain understanding and ideas. This step
includes the use of tools for statistical reporting and graphical
exploration, variable selection methods, and variable
clustering.
Modify the data by creating, selecting, and transforming the
variables to focus the model selection process. This step
includes the use of tools for defining transformations, missing
value handling, value recoding, and interactive binning.
Model the data by using the analytical tools to train a
statistical or machine learning model to reliably predict a
desired outcome. This step includes the use of techniques such
as linear and logistic regression, decision trees, neural
networks, partial least squares, LARS and LASSO, nearest
neighbor, and importing models defined by other users or even
outside SAS Enterprise Miner.
Assess the data by evaluating the usefulness and reliability
of the findings from the data mining process. This step
includes the use of tools for comparing models and computing
new fit statistics, cutoff analysis, decision support, report
generation, and score code management.
After the models have been created and assessed,
Enterprise Miner will report your models which have the
highest degree of accuracy [8, page 2]. These models can then
be applied to new data, to score the likelihood of the target
outcome [8, page 2].
2) Explore
a) Association
The Association node is used to identify relationships
of interest between variables in the dataset [8, page 48].
b) Graph Explore
The Graph Explore node provides data visualizations
along with interactive exploration to aid in pattern and
trend discovery [8, page 49].
c) MultiPlot
The MultiPlot node will automatically create charts
of the input and target variables [8, page 49].
d) Path Analysis
The Path Analysis node is used to explore web logs
to determine the paths taken to navigate a website [8,
page 49].
e) StatExplore
The StatExplore node is used to examine the
statistical properties of an input data set [8, page 49].
f) Variable Clustering
The Variable Clustering node is used to replace
redundant and collinear variables in the data with a
single variable [8, page 49].
g) Variable Selection
The Variable Selection node is used to select which
variables are useful to the model for target prediction [8,
page 49].
3) Modify
a) Drop
The Drop node is used to remove variables from the
dataset or to hide metadata [8, page 49].
b) Impute
The Impute node is used to compute and input
missing variable values into the dataset [8, page 50].
c) Transform Variables
The Transform Variable node is used to replace a
variable with a value that is a transformation of its
existing value [8, page 50].
4) Model
a) AutoNeural
The AutoNeural node is an automated tool that will
aid in discovering the optimal configuration for a neural
network model [8, page 50].
b) Decision Tree
The Decision Tree node is used to fit decision trees to
the data and includes auto-ranking of the input variables
[8, page 51].
c) Model Import
The Model Import node is used to import a model not
created in Enterprise Miner [8, page 52].
d) Neural Network
The Neural Network node is used to construct, train,
and validate multilayer, feed-forward neural networks
[8, page 52].
e) Regression
The Regression node is used to fit linear and logistic
regression models to the data [8, page 52].
5) Assess
a) Model Comparison
The Model Comparison node is used to compare
model nodes and their predictions. The node will
generate visualizations showing the usefulness of the
given models [8, page 53].
b) Score
The Score node is used to manage SAS scoring code
that is generated by the models. Scoring is the
generation of predicted values for a data set that might
not contain a target variable [8, page 53].
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Guazzelli, A., Zeller, M., Lin, W., & Williams, G. [2009]. PMML: An
Open Standard for Sharing Models. The R Journal, 1, 60-65. doi:May 1,
2009
[7]
SAS.
[n.d.].
Retrieved
December
https://en.wikipedia.org/wiki/ SAS_[software]
[8]
Getting started with SAS Enterprise Miner 14.1. [2015]. Cary, North
Carolina: The SAS Institute.
7,
2015,
from