You are on page 1of 6

Introduction and Overview of SAP Data

Services
Hello Everyone,
Now a days almost all the organizations are depending on multiple applications to run
their day to day business and also wants to see what is happening in their organizations
at a higher level as well as at the smallest level of details.
When we have multiple source systems where the organization data is residing, building
data warehousing system on them would be difficult because the data integration and
transformation plays major role which is often complex when multiple sources being
involved.

The below image represents data warehouse system landscape with multiple source
systems.
Data integration and transformations can be performed using database programming
languages like SQL and PLSQL however it will be expensive to manage/maintain the
landscape. This is where ETL (Extraction, Transformation and Loading) tools place major
role in the industry. These tools are specifically designed to have single platform where
developers can build the logic for transformations and administrators also can easily
maintain the system.
We have lot of competitors in the industry for this sector, some of them are Informatica,
SAP Data Services, Cognos, Data Stage, SSIS (SQL Server Integration Services) etc.
In this article, I would like to give brief introduction and overview about SAPs ETL tool
which is Data Services. The topics which we are going to cover in this article are:

What is Data Services?

Data Services Architecture

Important terminology

Development Components overview


What is SAP Data Services?
SAP Data Services gives a single enterprise level solution for data integration,
transformation, data quality, data profiling and text data processing which allows us to:

1.

Build a trusted data warehouse platform using data integration, data transform
and data profiling

2.

Provides single graphical user interface application for developers to build


everything in the system.

Web based tools for managing application including the reporting on system
runtime statistics, metadata, user maintenance.

1.

Enables organizations to maximize operational efficiency with single solution for


data warehousing for improving data quality and integrating data from heterogeneous
systems.

2.

The latest version of SAP Data Services is 4.2 as of 6/18/2015. SAP Data Services
was actually two different components like Data Integrator (BODI) and Data Quality
(BODQ) till the release of version 4.0 which was eventually combined into one and
named it as SAP Data Services from later versions.

Note: Data Services was initially built by company called Business Objects which got
acquired by SAP in 2007. When SAP acquired Business Objects, they added SAP in front
of all the tools which made Business Objects Data Integrator/Quality (BODI/BODQ) to
SAP Business Objects Data Integrator/Quality (SAP BODI/SAP BODQ). When SAP
released Data Services 4.1, they removed Business Objects and changed the name to
SAP Data Services.
Data Services Architecture:
The below figure shows the architecture of SAP Data Services and relation between
different components. It also shows different components involved while moving data
from one system to other using SAP Data Services.

Below are the major components of SAP Data Services

Designer

Management Console

Repository (Local/Central/Profiler)

Job Server

Access Server

Address Server

CMC (Central Management Console)


Lets look at what are these components?

Designer: This is the front end GUI (Graphical User Interface) tool where developers
can login and build the jobs in SAP Data Services to move the data from one system to
other or with in the system and define the logic for transformations.
Management Console: It is a web based console for managing SAP Data Services like
scheduling the jobs, looking at system statistics on memory usage, runtime of jobs, CPU
utilization etc.
Repository: SAP Data Services is a software application and needs backup database to
store the developed objects created by individual developers as well as system
generated metadata. Repositories are used to store all this data and these repositories
can be created in any database like Oracle, HANA, MS SQL and MySQL.
Note: We cannot open data services designer without selecting the repository while
login.
We have 3 different types of repositories in Data Services namely Local, Central and
Profiler.
Local Repository: This is mandatory repository in Data Services, at least one local
repository is required to be setup before start using Data Services system. Usually
individual local repositories are created for each developer in the system to store all
his/her developments.
Central Repository: This is required to have more security and integration between
different developments in multi user environment. When we have multiple developers
working i same development, this type of repository is recommended because of its
robust version control and users can access the objects by using check-out and check-in
functionality. This is an optional repository in the system.
Profiler Repository: This is used to store the data profiling information performed by
developers and also tasks history. This is an optional in the system.
Job Server: This is one of the main server component in data services and is used to
execute all the batch jobs created by developers in the system. Repositories should be
attached to at least to one job server to execute the jobs in the repository, otherwise
developer cannot execute the jobs.
Access Server: This server is used to execute the real-time jobs created by developers
in the repositories.
Address Server: This server is used to perform data quality and data cleansing
activities in data services. This is also associated with Address directories and
dictionaries which are required for data cleansing.
CMC (Central Management Console): This web based tool for managing data services
users, repositories configurations to jobs servers and security management etc.
Development Components Overview:
Lets look at different development objects we can create in Data Services. The below
image shows the different objects we can create in data services and its hierarchy.

Projects: Project is the highest level of organization offered by data services. It is used
to group the jobs relevant to an individual process or application. Having the jobs
grouped under projects makes it easy to maintain the objects in the system.
We can have multiple projects created in the same repository. Only one project can be
opened at a time in project area in designer.
Example: PRJ_SALES, PROJ_PURCHASING, PROJ_FINANCE_DATAMART etc.
Jobs: Job is the only executable object in SAP Data Services. It can contain any number
of workflows, data flows, scripts and conditions. Developers can manually execute the
jobs in designer or can be scheduled using management console.
Workflow: Workflow is used to define the data flow execution sequence with the help of
conditional operations. Lets say if we have 5 data flows in a job out of which 2 has to
run daily and 3 has to run on month end. We can define two workflows to group the data
flows and then write the condition based on calendar date to define execution process.
Workflows are majorly used to organize the data flows with in a job and its not
mandatory to have work flows in the job.
Conditionals: Conditionals are single use objects to implement if-then-else logic in a
work flow to control the execution process.
Scripts: Scripts are single used objects to call functions, define values to variable,
manipulating the tables at database level in a work flow. Scripts cannot be created in
data flows. One of the example is maintaining job control, which requires scripts to be
defined to pick the start time, end time and insert into a database table.
Dataflow: Data Flow contains actual source and target objects along with the
transformations defined. This is where we define everything about data movement like

source table, target table, transformation logic. Once the data flow is created we can
include it either in a job or in a workflow (and then into job) to execute the ETL process.
Datastore: Datastores are defined to access any system and use it as either source or
target. We can create Datastores for most of the applications, databases and softwares
available in the industry.
File Formats: If the source or target is some external file like flat files, excel
workbooks, xml files then we can configure this using File Formats option in Data
Services. Once the file formats are defined we can use it as either source or target.
Transforms: SAP Data Services provides lot of transforms to define the transformation
logic in the system. These transforms are grouped into four different categories. They
are
1.

Data Integrator

2.

Data Quality

3.

Platform

4.

Text Data Processing

The above images shows us the different transforms that we have under each group in
data services. Depending on our requirement we can use individual or combination of
transforms to achieve the business logic.

Thus we have looked into the introduction and overview of SAP Data Services which is
one of leading ETL platform in the industry.
SAP Data Services and SAP Information Steward are part of the Enterprise Information
Management suite of products that target the Information Management personas: the
administrator, the designer, and the subject matter experts in charge of data stewardship
and data governance.
SAP Data Services delivers a single enterprise-class solution for data integration, data
quality, data profiling, and text data processing that:
Allows you to integrate, transform, improve, and deliver trusted data to critical
business processes.
Provides development user interfaces, a metadata repository, a data connectivity layer,
a run-time environment, and a management consoleenabling IT organizations to lower
total cost of ownership and accelerate time to value.
Enables IT organizations to maximize operational efficiency with a single solution to
improve data quality and gain access to heterogeneous sources and applications.
SAP Information Steward provides business analysts, data stewards, and IT users with a
single environment to discover, assess, define, monitor, and improve the quality of their
enterprise data assets through the various modules:
Data Insight: Profile data, create and run validation rules, monitor data quality through
scorecards, and create data cleansing solutions based on your data's content-type
identification results and SAP best practices for your specific data.
Metadata Management: Catalog the metadata across their system landscape, analyze
and understand the relationships of their enterprise data.
Metapedia: Define business terms for data and organize the terms into categories.
Cleansing Package Builder: Define cleansing packages to parse and standardize data.

You might also like