You are on page 1of 25

FPA applied to DATA WAREHOUSING

Version 1.1.0

www.nesma.nl

PUBLICATION OF THE NETHERLANDS SOFTWARE METRICS USERS ASSOCIATION


FPA applied to DATA WAREHOUSING

ISBN: 978-90-76258-25-6

© Copyright NESMA 2008.


All rights reserved. Netherlands software metrics users association (NESMA). No part of this publication may be
reproduced or made public in any way or in any form, without prior written consent by the NESMA. After having
been granted permission, the title page of the document that includes (parts of) this document must contain the
following text: “This work contains material from ‘FPA applied to DATA WAREHOUSING’. Permission for
publication has been granted by the NESMA”.

2
FPA applied to DATA WAREHOUSING

Table of Contents

1 Introduction .....................................................................................................................5
1.1 Background to and version history of this document ................................................5
1.2 Overview..................................................................................................................5
1.3 Disclaimer ................................................................................................................6
1.4 References ..............................................................................................................6
2 General features .............................................................................................................7
2.1 Estimated count .......................................................................................................7
2.2 Counting the project size..........................................................................................7
2.3 The end user............................................................................................................8
2.4 Count for each DWH component – choice of architecture ........................................8
3 The Data Staging Area....................................................................................................9
3.1 Counting guidelines for the Data Staging Area....................................................... 10
3.2 Requirements design document for the Data Staging Area.................................... 10
4 The Star Schemas of the Data Warehouse ................................................................... 11
4.1 Counting guidelines for the Dimensions of a Star Schema..................................... 13
4.2 Counting guidelines for the Facts of a Star Schema............................................... 13
4.3 Requirements design document for Dimensions and Facts.................................... 13
5 The Data Marts in the Data Mart Area........................................................................... 14
5.1 Counting guidelines for the Data Mart Area ........................................................... 15
5.2 Requirements of the design document for the Data Mart Area............................... 15
6 Reports ......................................................................................................................... 16
6.1 Counting guidelines for Reports ............................................................................. 17
6.2 Requirements of the design document for Reports ................................................ 17
7 Other elements of a Data Warehouse environment....................................................... 18
7.1 Cleansing............................................................................................................... 18
7.2 Meta Data relating to the logistic process............................................................... 18
7.3 Meta Data relating to the meaning of the data–Business meta data....................... 18
7.4 Importing FPA tables ............................................................................................. 18
8 Alternative architectures................................................................................................ 20
8.1 By-passing the Staging Area.................................................................................. 20
8.2 Reports from the Star Schema............................................................................... 20
8.3 Multiple data groups in a single file delivery ........................................................... 20
8.4 Operational Data Store - ODS ............................................................................... 20
8.4.1 Counting guidelines for the ODS............................................................................ 21
8.5 The federated Data Warehouse ............................................................................. 21
8.6 Inmon..................................................................................................................... 21
8.7 BDWM ................................................................................................................... 21
Appendix A. Data Warehousing Architecture Reference Model ....................................... 23
Appendix B. Summary – Functions and Files in the DWH ............................................... 24
Appendix C. Requirements basic design document relating to function point counts ....... 25

3
FPA applied to DATA WAREHOUSING

Changes and versions

version date change author(s)

1.1.0 5-9-2008 Version for publication via NESMA-website Rob Eveleens,


Theo Kersten,
Changes initiated after comments by IFPUG: Nico Mak,
Included relation with IFPUG terminology; Jolijn Onvlee.
Rephrased parts to clarify intentions for experts in
the field of counting

The counting guideline now is usable for


estimating project size and product size.

New Guidelines for:


• Functionality to manage the DWH-processes
• ODS
• BDWM

4
FPA applied to DATA WAREHOUSING

1 Introduction
This document provides counting guidelines for performing an estimated Function Point
Analysis to determine the size of Data Warehouse projects - from the source up to (and
including) the reporting stage in an existing Data Warehouse environment.

The FPA count is based on version 2.2 of NESMA’s ‘Definitions and counting guidelines for
the application of Function Point Analysis’ (Definities en telrichtlijnen voor de toepassing van
functiepuntanalyse), paying particular attention to the key principles involved when applying
FPA to Data Warehouse projects. The functional architecture – the distribution of the
functionality across the Data Warehouse as a system – is based on the Architecture
Reference Model of Atos Origin Nederland B.V.

1.1 Background to and version history of this document

These guidelines were developed collaboratively by Jos Hendriksen (Atos Origin BAS Oost)
and Rob Eveleens (Atos Origin BI&CRM) from their experience of using the FPA guidelines
in data warehouse projects at KPN. Based on their work, Nico Mak (ABN AMRO Bank N.V.),
Theo Kersten (ATOS Origin Software Development & Maintenance Center) and Rob
Eveleens reviewed the architecture and functionality with a view to improving alignment with
the NESMA guidelines. From version 0.2. onwards, Jolijn Onvlee (Onvlee Opleidingen &
Advies) and Freek Keijzer were brought in to refine the wording and the rationale. Version
0.5 was reviewed internally by NESMA and was also discussed by Achmea, as a result of
which several suggestions were made (and incorporated) that take account of situations in
which the core of the DWH has been developed more along the lines of Inmon’s approach.

1.2 Overview

This document assumes that the reader is familiar with FPA and its associated terminology
and abbreviations, and with Data Warehousing. As there are different ways of thinking about
Data Warehousing we have included in this document the approach used as the basis for
these counting guidelines. See Appendix A – Data Warehousing Architecture Reference
Model.

Each chapter on counting concludes with a brief summary of the results.

To be able to carry out an estimated count, it is preferable to have a design document that
includes the following:
• the conceptual (logical) data model
• a description of the functions and the inward and outward data flows

For each component of the Data Warehouse, the ‘additional requirements’ needed to ensure
that the count can be carried out on the basis of the design document have been made
explicit and have been included at the end of each chapter.

Please refer to the three appendices for a summary of counting guidelines, a framework for
Data Warehouses and an overview of the ‘additional requirements’.

5
FPA applied to DATA WAREHOUSING

1.3 Disclaimer

The FPA method described in this publication has been put into practice by various
businesses and projects. Nevertheless, the NESMA does not claim that the methodology
has been tested scientifically. Supplementary research and further practical use are needed
in order to establish its usability.

By publishing the present document, ‘FPA applied to Data Warehousing’, NESMA seeks to
help advance understanding of the application of FPA counting guidelines in other
environments. NESMA cannot be held responsible for the use of this publication, nor for the
results obtained from using it.

NESMA welcomes your comments on the document, information about your experience of
using it, and any suggested additions or improvements. Please contact NESMA via:
office@nesma.nl.

1.4 References

• [Kimball 2008] The Data Warehouse Lifecycle Toolkit, 2nd edition, Practical Techniques
for Building Data Warehouse and Business Intelligence Systems. John Wiley & Sons,
2008.
• [Inmon 2005] Building The Data Warehouse, 4th edition, John Wiley and Sons, NY NY,
2005.

6
FPA applied to DATA WAREHOUSING

2 General features
2.1 Estimated count

NESMA distinguishes three types of count:


• indicative function point count gives an indication of the size of an information system
or project, based solely on a conceptual data model.
• estimated or approximated function point count is a count of function points in which
the number of functions is determined for each type of user function (user
transactions and logical files), and which uses a standard values for complexity:
‘Average’ for the user transactions and ‘Low’ for the logical files.
• detailed function point count is the most precise count, identifying all the
specifications needed for a FPA in detail. This means that the user transactions are
broken down to the level of referred logical files and data element types, and the
logical files are broken down to the level of record types and data element types.
Therefore, this type of count allows the complexity of each identifiable user function
to be determined.

These guidelines describe the implementation of an estimated function point count for Data
Warehouses. In practice an estimated count is sufficiently accurate to allow confident
budgeting and subsequent costing of projects.

We repeat, perhaps unnecessarily: The user functions that need to be counted are identified
without examining the complexity of those functions. In an estimated function point count,
transactional functions are counted as being of ‘Average’ complexity and data functions as
being of ‘Low’ complexity. 1.

2.2 Counting the project size

In applying FPA, NESMA distinguishes between product size and project size. The following
purposes of function point counts can be distinguished:
• determining the product size 2 :
The number of function points is a measure of the extent of the functionality delivered, or to
be delivered, to a user by an information system. It is also a measure of the size of an
information system that has to be maintained.
• determining the project size 3 :
The number of function points is a measure of the extent of the functionality either of a
complete (or partial) new information system to be created by a single project, or of the
extent of changes to an existing system. In the latter case, changes may include one or
more of the following: the addition, modification or removal of user functions. The project
size is an essential parameter for determining the effort needed for the project.

These counting guidelines are usable for both purposes.

1
KGV:5FP, ILGV:7FP, IF:4FP, UF:5FP, OF:4FP.
2
IFPUG: application count
3
IFPUG: development count or enhancement count
7
FPA applied to DATA WAREHOUSING

2.3 The end user

These guidelines assess functionality from the perspective of the end user, managers, and
staff in those parts of the organisation for which the DWH is being developed.

Where functionality is being developed for managing the Data Warehouse per se, the
perspective of the engineers who support the data warehouse becomes significant. The
current document does not contain any guidelines on this area as that functionality can be
counted as usual.

2.4 Count for each DWH component – choice of architecture

The FP count for each component of the system can be carried out on the basis of a
functional design document, which itemises all the tables for each component of the system
and identifies the transformation processes between those tables in general terms. For the
purposes of this count we distinguish the following groups of functions of the Data
Warehouse as a system:
1. Transporting data to the Data Warehouse’s Data Staging Area
2. Populating the Data Warehouse core
3. Preparing the Data Warehouse’s Data Marts
4. Producing Reports
The information architecture we envisage is set out in Appendix A – Data Warehousing
Architecture Reference Model.

The components of the DWH are not regarded as separate systems. Groups of data from
one component within the DWH are not regarded as External Interface Files (EIF’s) for other
components.

KIMBALL AND INMON


There is little argument about the merit of structuring the data in the form of Star Schemas
(with Facts and Dimensions) as far as Data Marts are concerned. However, in regard to the
core of the Data Warehouse there are two schools of thought:

On the one hand, there are those who favour the Kimball approach based on a core of Star
Schemas [Kimball, 2008] whilst the others, prefer a core based on a relational model [Inmon,
2005].

There has been plenty of discussion about the functional counting of Facts and Dimensions
from Star Schemas, but much less about counting in relation to relational Data Warehouses.
When a Data Warehouse is set up using Inmon’s philosophy, the guidelines on counting
Dimensions and Facts are equally relevant, albeit at somewhat later stage of the process,
i.e. when setting up the Data Marts.

So please do not be misled: These guidelines are applicable to both Kimball’s approach and
Inmon’s. This document also discusses the counting of the dimensions of the Facts from a
Star Schema when populating the Data Warehouse: Kimball’s approach. A clear distinction
can thus be made in relation to the counting of Data Marts. Section §Inmon, contains a
number of tips for the relational Data Warehouse.

REPORTS COVERED SEPARATELY


Reports are normally created using completely different tools from the other functions, with
productivity varying greatly between function points. Because of this variation in productivity
it is desirable to keep the FP counted for reports separately so that those who have to
assess the effort required can take account of that difference in productivity.
8
FPA applied to DATA WAREHOUSING

3 The Data Staging Area


In essence, the processes in the Data Staging Area (DSA or STA) provide the separate
interface with the various sources of data for the Data Warehouse. Data are imported into
the STA and, after a minimum of processing, stored until further processing (integration into
the Data Warehouse) is possible. Integrating the data into the DWH from the STA only
begins when all the data required are available in the Data Warehouse or in the STA, or
when all the preceding periods or deliveries (up to the period or deliveries to be integrated)
have been processed (‘process dependency’) or because the frequency of delivery is greater
than the frequency of processing (timing).

The processes work on a single flat file, usually converting it into a single table, in which the
information is supplemented to include identification of the source and the period to which
the data relate. The data received is kept as received, so that when new functionality is
introduced, or in the event of recovery operations, data processing may be carried out
sequentially again. These data are also frequently used to answer detailed queries about
reports, and to answer ad hoc queries.

This stage of the process is regarded as a technical solution for making data available to the
Data Warehouse. Technical solutions cannot be included in the functional count so the total
of functional points for this component is usually zero. The function is seen as a preparatory
component of the functions in the next chapter, which populate the Star Schema or Facts of
the Data Warehouse. Only if user reports have been defined on the basis of data direct from
the DSA are there any data that can be included in the functional count.

That could be all we mention here on the DSA. We expect a lot of comment on the viewpoint
and we therefore look at the technical mechanism in detail so that the reader may apply the
count to his/her individual situation.

In practice, two types of interface with sources are encountered:


1. an interface using (flat) files
2. a direct interface between the STA and the source database.

Ad 1. AN INTERFACE USING (FLAT) FILES


Where the interface uses (flat) files, the following functions and files can be distinguished:
1. a function that creates the (flat) file from the system supplying the data
2. a flat file containing the data
3. a function that transfers the (flat) file from the source environment to the DWH
environment
4. a function which ensures sequential processing of the (flat) files and carries out a
number of substantive checks on the integrity of the data transport
5. a function that takes the data in the (flat) file and puts them into a table in the STA,
carries out a number of checks and, if necessary, adds extra information such as the
time of processing
6. a table into which the data is placed in the STA
7. a function that archives data supplied

Implementation of point 1 is usually outside the scope of the project, but could be counted as
an EO (output function) of the source system. The (flat) file (2) is not counted as it is clearly
the output from the function in point 1. Points 3, 4 and 7 do not have to be redesigned and
rebuilt each time. They are functions of the application that controls the input of the DWH
and its configuring is an element of the implementation plan. So we only have step 5 and 6
to assess for the count.

9
FPA applied to DATA WAREHOUSING

If the functionality relates to only one or more of the points below, there is no reason to count
the data in the DSA as ILFs (internal logical files), as no user functionality is involved:
• Translating source codes to DWH codes
• The buffer function for matching the frequency of data deliveries (per day as
opposed to per week or per month)
• Adding the period to which the data relate
• Adding meta data relating to processing (e.g. source of the data)

When does a file actually have to be counted?


• When a user report is generated directly from the data in the DSA.

If the latter (reporting) is the reason for counting the data in the DSA as ILFs (Internal Logical
Files), along with associated EIs (External Inputs), the reports in question should, of course,
also be counted as External Output functions of the system.

To summarise: in the case of an interface using (flat) files, the imported tables are not to be
counted as ILFs in the DSA.

Ad 2. A DIRECT INTERFACE BETWEEN STA AND THE SOURCE DATABASE

A direct interface allows IT tools to be used to give one database environment (in this case,
the DWH) direct access to data in another database environment (in this case, a source
system). In this way the source data remain in the source system – no transfer takes place –
and the data are merely ‘made available’ to the Data Warehouse.

When a direct interface is used each interfaced logical file in the source system is counted
as an EIF (External Interface File). Count these available data groups as EIFs for the Data
Warehouse.
For better understanding the counting result, report them as counted within the DSA.

3.1 Counting guidelines for the Data Staging Area

When importing data using (flat) files, the export functions from the other source system
should be counted as EOs (External Output functions) for that source system.

The imported data are not counted as ILF within the DSA. The data groups and functions are
not counted till later, in the next component of the Data Warehouse.

When importing data using a direct interface with a logical data file in the source system,
each file is counted as an EIF (External Interface File).

3.2 Requirements design document for the Data Staging Area

None.

10
FPA applied to DATA WAREHOUSING

4 The Star Schemas of the Data Warehouse


This section follows the philosophy that the processing of data that takes place in this
component of the Data Warehouse is determined primarily by the company’s own corporate
rules. It is only in the next step, described in the next chapter, that data are processed
according to requirements for the production of reports for departments, applications or
policy/corporate management functions.

The Data Warehouse’s Star Schemas store historical data in a standardised manner in
accordance with a range of views. Two important types of data group can be discerned: the
Facts (the basic events for which reports are wanted) and their Dimensions (the views,
groupings by which the Facts will be reported), together making up the Star Schema(s) of
the DWH. The tables in this component of the Data Warehouse are often referred to as the
Data Warehouse or the Star Schema. Facts are the core element of the star schema, while
the dimensions are its rays. This component of the Data Warehouse usually consists of a
number of Star Schemas, in which as many dimensions as possible are re-used.

First, the data about the Dimensions for a specified period of time must be processed before
the Facts for the same or earlier periods can be attached to the Dimensions.

DIMENSIONS Supplementary information 1:


We use the structure of an Why are there Star Schemas in the Data Warehouse?
organisation as an illustration of the
counting dilemma that Dimensions Without delving into data warehousing theoretical concepts,
confront us with. First, we discuss there are terms which are important to understand when
how the number of ILFs is counting a data warehouse: facts and dimensions.
determined, and then the number of
In the world of data transactions the database designer
EIs for each Dimension. relies on principles such as the normalisation of data, in
which each relevant real event will result in just a single
Suppose that in analysing change to the data and which is the basis of many groups of
information, relationships are data relevant to the user. The guiding principle for the
designer of a Data Warehouse should be to minimise the
described, i.e. those between
effort needed to carry out a query, given the expected usage
natural persons (internal employees of the data. A normalised data model is not entirely
and external relations), legal entities appropriate for this purpose: when requests for management
and organisational entities, both information have to be answered in a normalised
inside and outside one’s own environment, the relationships between many data groups
have to be recreated repeatedly. The designer of the Data
organisation. The organisational Warehouse tries to avoid that situation by simplifying the
structure is described as an layout into a star schema: a table of facts, along with tables
organisational entity which may or of dimensions in which all relationships are summarised.
may not be part of another
organisational entity or legal entity (non-mandatory recursive relationship also known as a
hierarchical relation). Natural persons may belong to a legal entity or an organisational unit.
In the case of natural persons who do not belong to one’s own organisation it is necessary to
know if they are resident in the Netherlands or not. Only legal entities established in the
Netherlands should be registered with the Chamber of Commerce. Internal employees
always have a relationship with an internal organisational unit.

The designer of the source system will model this in detail leading to a numerous ILFs and
EIs. The designer of the Data Warehouse has to simply this into a structure which is easier
to query and navigate by users within while reporting. He may opt to translate this structure
into a single Dimension, ‘relationships’, using corporate rules for the relationships within the
Dimension and (optional or mandatory) attributes.

11
FPA applied to DATA WAREHOUSING

Does this mean there is just one ILF? Or as many as in the source system? Or as many ILFs
as there are possible layers in the hierarchy of the dimension “relationships”? And how many
EI functions must be distinguished?

The guideline is as follows. For a dimension, count 1 ILF. Then, in order to determine the
number of input functions (EIs), determine how many record types can be distinguished
within the Dimension: Examine the number of levels in the Dimension, examine whether
levels are handled differently (differences in the handling of attributes, for example), and set
out the chosen options in the counting report.

Where such information about the levels Supplementary information 2:


is not or not yet available, assume 3 History in the Data Warehouse
record element types (one for the highest
layer, one for the lowest layer and one for In the Data Warehouse history is kept by registration of
all the layers in between) if the system is the period in which an record was valid. The situation at
any given time is found in a record which has a start
know to be hierarchical - otherwise and end of validity either side of that time. The last
assume 1 record element type. (actual) situation can be identified by examining the
record with a validity till eternity.
Now that the number of record element
When new information for the dimension is available,
types of the ILF for the Dimension is records are inserted with a eternal validity starting at
known, the number of EIs can be insertion time.
determined:
• 1 EI for the addition of new When changes are made to (relevant) attributes, then a
new record is also added with perpetual validity (a), but
occurrences of the record-type
also (b) the validity of a record that, until that point, was
• No EI for the processing of perpetual is cut off at that point in time (the record is
changes. A new occurrence is ‘closed’).
added. Closing the validity is
part of the same logical unit of Given the purposes of the Data Warehouse, i.e. to
maintain a history, records are never deleted. If it
processing becomes necessary to delete a record, the validity is
• and in exceptional cases only: cut off (as in b). The latter type of change occurs rarely
1 EI for the deletion of data compared to the other types.
(restricting the validity of an
The above is known as type 2 history. In the case of
event which, up until that type 1 history, only the current value is stored.
moment, was valid in
perpetuity).
So, for each record element type of the ILF for the Dimension, it is usual to count 1 EI.

FACTS OF A STAR SCHEMA


Facts must be linked to Dimensions by functions. It will be straightforward to count the Facts,
the core elements of Star Schemas. Facts are added to the Data Warehouse, never
changed or deleted.
Count 1 ILF and 1 EI for each table of facts.4

MULTIPLE SYSTEMS AS SOURCES


Data Warehouses have multiple business systems as sources. Data for a specific file can
also be drawn from multiple business systems. Usually the processing by each business
system of a data file will be logically different from the processing by another. Therefore, for
an ILF, 1 EI will be counted for each source.

4
The number of Dimensions to be linked is an obvious measure of the complexity of the function, as is the number of
attributes to be calculated. An estimated count does not take these into consideration.

12
FPA applied to DATA WAREHOUSING

4.1 Counting guidelines for the Dimensions of a Star Schema

For each hierarchical dimension: 1 ILF.


For each record element type in the hierarchical dimension: 1 EI for the inputting.
If the number of record element types in the hierarchical dimension is not known, 1 ILF and 3
EIs should be counted.
For each non-hierarchical dimension: 1 ILF and 1 EI.
Remember that a new EI is needed for each source.
Only when deletion is described explicitly: For the relevant record type: 1 additional EI.

4.2 Counting guidelines for the Facts of a Star Schema

For each file containing Facts: 1 ILF en 1 EI.


Remember that a new EI is needed for each source.

4.3 Requirements design document for Dimensions and Facts

To simplify counting it is preferred to have some documentation available. Look for the
following documentation.

For each dimension:


• A logical model of the layers and/or data groups in the dimension;

For each layer in the dimension:


• the data that is drawn from (refer to the logical data model of the DSA for this purpose)
• the handling
• the functionality to be realised, giving special attention to:
o the processing of changes;
o whether or not data is being deleted.
o whether data is drawn from multiple source systems (when populating the DSA).

For each Fact:


• the logical data model surrounding the Fact (the star schema), insofar as it is relevant to
the processing
• the data that is drawn from (refer to the logical data model of the DSA for this purpose)
• the processes by which the Facts are extracted.

13
FPA applied to DATA WAREHOUSING

5 The Data Marts in the Data Mart Area


Earlier, in regard to the data warehouse described in the previous chapter, it was stated that
the Star Schema records the business view of the events. In the Data Marts Area the data is
tailored to specific recipients, such as information systems (decision support), company
departments (marketing) or policy/corporate management roles (risk analyst) and stored in a
specific Data Mart. A Data Mart should be visualised as the central core of a smaller Star
Schema, which has fewer and/or weaker rays than the Star Schemas in the Data
Warehouse. A Data Mart is usually accessed by multiple reports or multiple applications.

We distinguish three possible characteristics of the data in Data Marts.


1. Data generated as a result of selection and aggregation (sum, average, maxima,
percentages) of Facts from the Data Warehouse.
2. Data generated as a result of combining Facts from the Data Warehouse (number of
operations per employee, margin per product, risk classes).
3. Data that can no longer be derived from the Facts of the Data Warehouse because,
due to the huge numbers of Facts, their period of storage in the DWH is shorter than
that of data in the Data Marts. This means that it is not possible anymore for Data
Marts to be reconstructed from older data.

The first characteristic is not sufficient to justify regarding the Data Mart as one or more ILFs,
but the second is and the third also5.

From a logical point of view, data is appended to Data Marts. New occurrences are added to
the data already in the Data Mart. There are usually no functions for changing data in the
Data Mart and it is also rare for data to be deleted. When data are deleted, it is usually in the
form of data cleansing. Such cleansing is usually done as part of the overall control
mechanism of the Data Warehouse.

There are environments in which views of the Data Warehouse are made available as the
starting point for the generation of reports. As the data cannot exist independently of the
data in the Data Warehouse there is no ILF.

For each Data Mart, count 1 ILF for storage of the data and 1 EI for processing of the data.

In principle Data Marts use the same Dimensions as the Data Warehouse. Therefore no
ILFs need to be counted for the Dimensions of the Data Marts.

There are sometimes differences in technical implementation in regard to making the Data
Marts and associated Dimensions available to users. The Data Marts and Dimensions may
be accessed directly by the reporting functions. To improve performance, however, or to
maintain security of the data in the Data Warehouse, or to facilitate distribution of the data,
technical copies of the data may be published on a special platform, which is independent of
the Data Warehouse. Those technical copies should not be counted.

5
Suppose that, for each cash register transaction, the time and the outlet are recorded. A second table of facts contains the
clock times for each employee for each outlet. The number of transactions per week is calculated in a Data Mart. That is not
sufficient to justify regarding the Data Mart as an ILF. The same applies when the number of transactions per week per outlet is
calculated. However, a Data Mart containing the number of transactions per employee is regarded as an ILF
14
FPA applied to DATA WAREHOUSING

5.1 Counting guidelines for the Data Mart Area

If Facts from the Data Warehouse are combined into new Facts (other than by aggregation)
or if some of the data can no longer be drawn from the Facts in the Data Warehouse, the
count for each Data Mart should be: 1 ILF and 1 EI.

5.2 Requirements of the design document for the Data Mart Area

To support the ease of counting some documentation will help.

For each Data Mart:


• the logical data model of the Data Mart;
• the data that are drawn from (refer to the data in the DWH for this purpose);
• for each operation – the functionality to be achieved, with special attention to:
o the period of storage.

15
FPA applied to DATA WAREHOUSING

6 Reports
Reports are normally created using completely different tools from the other functions, with
productivity varying greatly between function points. Because of this variation in productivity
it is desirable to keep the FP counted for reports separately so that those who have to
assess the effort required can take account of that difference in productivity
There are a large number of (functionally and technically) different report solutions on the
market. This counting guideline attempts to estimate the underlying functionality without
regard to such differences.

In many DWH environments we find two types of report, each addressing specific functions
within the organisation(s):
• Fixed or canned reports.
• OLAP reports.

Many fixed reports 6 from a Data Warehouse relate to a specific cross-section of the data
available in the Data Marts. As a result many reports of similar format and content are
specified, in many variants. According to the NESMA guidelines, simply counting each of the
required reports as an EO
function is not right as too many Supplementary information 3: Report variants
of the underlying logical
Staff of the corporate sales department request reports to be
processes are the same to allow designed that relate to the numer of sales (1) per week per
the functions to be regarded as employee, (2) per month per team and (3) per year per
unique. But it is also not right to department. The content is similar for the three reports but for the
regard all the reports together as level of detail and it will show either the name of the employee,
the team or name of the department.
one external output function:
there are logical data groups The user has to specify a selection meaningful to him from the
which sometimes are, and enormous number of possible reports. From the initial report
sometimes are not, shown in requirements we identify time and organisation as Dimensions,
variants of the reports. In such each with three levels (week, month, year and employee, team,
departement). Of the nine possible
cases, the NESMA guidelines lay combinations the user is asking for three.
down that a number of EO
functions must be distinguished. The guidelines suggest to count these three reports as one EO as
the logical proces of each report is the same.
7
In OLAP environments the data
to be reported, including all aggregations, are provided to the user via the requested views
for his ad hoc analyses. The user puts the views that are of current interest to him into his
report as columns and rows and then looks for the desired level of detail within that report.
All views are available constantly but if they have not been selected for the analysis they are
displayed to the user as “all items in this viewpoint”. In the case above, with the canned
reports, there is an enormous number of possible reports, in OLAP environments every
possible report is actually available. The functional specifications in this environment talk of
the measured data to be displayed (and their derivatives) followed by all possible groupings
of the Facts. In this environment the user sees this as one report and thus one is tempted to
count one EO.

This outlines the counting dilemma for us. In the fixed reports environment we may easily
incline to counting one EO function for each report designed. In the OLAP environment it
would be quite easy to count all the reports together as one EO function only. But that
means that the technical environment is determining the functions, rather than the functions
being defined by the functionality of the reports!

6
E.g. Business Objects® or Oracle Discoverer®
7
E.g. Cognos®
16
FPA applied to DATA WAREHOUSING

The guidelines therefore advocates the grouping of fixed or canned reports into report
groups based on a number of selection criteria, usually levels in dimensions, which makes
the functionality counted in the two reporting environments the same.

6.1 Counting guidelines for Reports

Group the reports required based on similar facts and layout but for selections within
dimensions and count one EO function for each report group.

6.2 Requirements of the design document for Reports

To support the ease of counting, look for design documentation describing, for each report
group:
• the reports that belong to that group;
• one model or a model of each report;
• a logical data model of the data in the report;
• the dimensions used in the DWH;
• the Data Marts used;
• (if applicable) the Facts used in the DWH;
• the required functionality, including:
• selections and choices of the end user.
Here, too, it is important to state whether the report group in question already exists or not.

17
FPA applied to DATA WAREHOUSING

7 Other elements of a Data Warehouse environment


7.1 Cleansing

As already stated, usually no data is deleted from a DWH by Data Warehouse functions. The
available history may be limited by data cleansing functions, which limit the history in the
Star Schema to (for instance) 25 months and the history in the Data Marts to (for instance)
61 months. Such cleansing functions are normally incorporated into control of the data
warehouse, or a result from physical re-use of parts of the database system (the latter
meaning that data are physically overwritten with new data after, say, 24 months).

Count the cleansing function at the level of functions (i.e. do not count one input function for
each ILF).

7.2 Meta Data relating to the logistic process

The data used to manage the Data Warehouse may be, for example: dates on which a
process has added data to a table from a source; the number of records added, amended or
rejected at that time; or the parameters used for that processing operation. The processes to
be developed must read and edit these Meta data.

These functions can not be identified by users and must not be counted. However, the
control mechanism must be created when the Data Warehouse is set up. For the purposes
of counting this functionality, we regard the administrators as users and count in accordance
with the standard NESMA guidelines.

7.3 Meta Data relating to the meaning of the data–Business meta


data

Describing the meaning of data is of great importance in any environment, particularly so in


the case of the Data Warehouse.

These functions and the data required for them are not counted when estimating the project
size. The assumption made is that describing the meaning is a business activity and that the
description in question will be captured in a tool that has been set up on a once-only basis
for the Data Warehouse.

For the purpose of counting this functionality, the administration and presentation of the
business meta data, the end-users of the DWH should be regarded, as users and counting
should be carried out in accordance with the standard NESMA guidelines.

7.4 Importing FPA tables

If tables containing only codes and descriptions are imported from sources in order to be put
into the DWH tables or to transform the input received - doesn’t that look very similar to the
transfer of so-called FPA tables? Do we need to these imports? If so, do we count just once
for the whole DWH or once for each source system?

18
FPA applied to DATA WAREHOUSING

Count one 1 FPA table ILF for the whole Data Warehouse, plus the associated EI, EQ and
EO function. If, when there is a subsequent increment of the Data Warehouse, a further FPA
table in a source system has to be accessed, we count one FPA-table ILF, one EI, one EQ
and one EO function.

REQUIREMENTS FOR THE DESIGN OF FPA TABLES


For each source: the code tables to be imported.

19
FPA applied to DATA WAREHOUSING

8 Alternative architectures
Data warehousing practice is constantly changing. We hope that the model offered here will
be easy to apply to your own practical situation. We will mention here a number of
alternative architectures or components and suggest a counting guideline.

8.1 By-passing the Staging Area

Data can be put directly into the Dimensions or Facts of a Star Schema. A DSA may not
required or desired.

The count should be performed as for Facts and Dimensions.

8.2 Reports from the Star Schema

Reports can be based directly on the data in the Star Schema (without using the Data
Marts).

The count should be performed as for reports.

8.3 Multiple data groups in a single file delivery

The frequency of delivery of data to Data Warehouses is increasing. The volume of each
data delivery is consequently declining. Taking this to the extreme we find a situation in
which the Data Warehouse is populated from real-time message queues. To enhance the
efficiency of the process a single message may contain two or more data groups. The
functionality must place the data in the correct manner into a number of logical data groups.
For example, a file for a fuel expenses claim: vehicle and owner, driver and registration
number, plus the number of litres purchased at each refuelling, the cost and the odometer
reading. For such files XML is a good way of structuring the content flexibly and marking it.
An XML file thus often contains more than one data group.

The count should be performed in the same way as for Facts and Dimensions, for each data
group that results from the file, regardless of the frequency of delivery.

8.4 Operational Data Store - ODS

We are familiar with the use of an operational data store (ODS). Into the ODS a selection of
the data from several transactional systems is “copied”, each item of data being replicated
one-to-one in a different environment. The reports are made available from that environment
via the queues, action lists and signal lists current at that moment 8. The primary motivation
for doing this is integration of data residing in the transactional systems and to relieve the
transactional systems of the burden imposed by demanding reports.

When the data in the ODS is only a copy of the source, count it as a EF.
Only where data is integrated a new ILF is created to hold the keys in the source, a
reference to the source, the algorithm used to create the relation, including a reference to
the corporate identification, when available. Count that ILF and count a EF as mentioned
above for each source/administration as well as 1 EI for the algorithm.

8
The ODS contains up-to-date information. The DWH contains history and is only current up to a certain point in time.
20
FPA applied to DATA WAREHOUSING

If an ODS is available in the environment of the Data Warehouse and the required data are
available there, then a direct link to the ODS, instead of to the source system, would seem
an obvious solution from the point of view both of relieving the burden on the transactional
system and of ease of accessibility.

One should regard the ODS as one of the sources of the DWH and count its functionality as
described in chapter 4.

8.4.1 Counting guidelines for the ODS

When data in the ODS is only a copy of the source, count it as a EF. When data is integrated
and stored, count that ILF and count a 1 EF and 1 EI for each source/administration that is
integrated.

8.5 The federated Data Warehouse

Some organisations have a number of Data Warehouses existing in parallel and serving as
source systems for one another. Usually one Data Warehouse supplying data will be seen
as a source system by another.

There will probably be a direct interface from the DWH to the other component and the count
should be based on one EIF for each file interfaced from the ODS.

8.6 Inmon

As indicated in section 2.4 (Count for each DWH Component – an Architecture Choice), the
core of the DWH may have been set up using a relational model. In that situation you must
take note of the following:
• Count an EI per record type of an ILF if a different logical handling is described for it;
• Count an EI for each other logical handling operation for the inputting of logical data
files;
Expect different EIs for the same ILF for each source. Usually the handling operation is
logically different for each source.

8.7 BDWM

As an example of an alternative structure in which the core of the Data Warehouse consists
of a (very highly) normalised model instead of a Star we would mention the Banking Data
Warehouse Model, IBM’s model for a data warehouse for banks. When implementing this
model sometimes the logical model is translated directly into a technical structure. As a
result of that approach data it becomes hard to see the difference between “new data” and
“new descriptions of data”. How to find the new data only? To illustrate, we will look at
recording of the item “The job of the person with the name Rob is that of writer”. The
technical recording can be represented as follows:

1. There is a table in which the existence of persons is recorded (BDWM: involved


party). A new entry is created. The attribute ‘Forename’ is given the value “Rob”.
2. There is a generic table structure containing possible values for domains. A new
domain, (for “jobs”), and its possible values (like “writer”) is described. Nothing
changes in these table but for new occurrences.

21
FPA applied to DATA WAREHOUSING

3. There is a generic table in which the relationship between various tables, including
the one for persons and the aforementioned table, can be made with the descriptive
attribute “the job of a person”. A new entry, referring to the new person and his job, is
thus created.
4. In a table the value for the relation in point 3 will refer to the involved party “rob”, the
relation “the job of a person” and the value “writer”.

The tables in 2. and 3. describe new possibilities, the table in 4. holds the new information.
The item “The colour of the eyes of the person with the name Rob is brown” could be stored
in the same way, in exactly the same tables. The value “brown” must appear in the generic
table that contains the wide variety of possible values for “colours”. In the third table
described above an occurrence would be created referring to “involved parties” and “colours”
to hold “the eye colour of a person”. The fourth will actually hold the information. If Rob’s job
changes then only the event in the third table changes. If a job history is required this is
achieved by using validity periods in that table. Is the information on job positions not
relevant any more, the relations period is limited.

As a result, the presentation of the logical design of the Data Warehouse contains an
enumeration of the identified relationships, classifications etc. The function point analyst’s
dilemma is now plain: what are the logical data files that should be maintained? Should each
attribute be counted as a logical data file? How to reconstruct the logical information model
from the technical implementation? Its hidden in the occurrences! Sure it is a flexible
approach.

We suggest not to count each attribute described as logical data file, including a function, but
to do one additional logical step: group information with similar identifying keys. “job” and
“eye-colour” are relations between an involved Party (a identifying key) and a domain are
easily grouped into one logical data file. “Rob has a son Bert” will have two identifying keys
and will be a separate logical data file. Refer to the FPA documentation of the NESMA on
grouping of data.

To summarize, for strongly normalized meta-data driven environments:


• do not count the structures for meta data
• study the data to be inserted in the meta data looking for the identifying relations
• group based on these identifying relations
• count for each group 1 ILF and 1 EI for the logic to maintain it.

22
FPA applied to DATA WAREHOUSING

Appendix A. Data Warehousing Architecture Reference Model

The figure above shows the flow of data from source to user and a distribution of functions
across architecture:
1. The left-hand layer in the diagram (Source Layer) relates to systems that supply the
information or contain the organisation’s primary operational process. From that
layer, data is made available to the second layer in the model.
2. In the Data Layer, firstly the data is imported (into the DSA), then modelled according
to the company’s corporate business rules (product definition, organisational
structure, market channels) and stored in the Data Warehouse. Only after that are
data refined to make them suitable for use for specific queries from departments or
decision-supporting systems or for statistical analyses in preparation for making the
data available to the users. They are then stored in Data Marts.
3. In the final two layers at the right-hand side the data are used for reports or
applications and subsequent presentation.

The leftmost and rightmost layers are outside the scope of this counting guideline for DWH.

Between those two layers a range of groups of functions can be identified:


• Populating the second layer is often done in two stages:
1. collecting the data into the Data Staging Area (See §3)
2. integrating the data into the Data Warehouse (See §4)
• Similarly, preparing the data for the users is done in two stages:
3. aggregating and extracting data for the purpose of the applications up to Data
Marts (See §5)
4. working the data up into reports (See §6)
The guideline for the ODS is stated in § 8.4Operational Data Store - ODS.
23
FPA applied to DATA WAREHOUSING

Appendix B. Summary – Functions and Files in the DWH

Application
Data boundary
DWH
store EI
3 EIs
EIF
EI
Star
Dimension Dimension Data Mart
Upload Staging
Fact Dimension
file data EOs
ILF Dimension Dimension Fact
Staging 5 ILFs Dimension
data 1 ILF
ILF Data Mart
Upload Staging Relational DWH Dimension
file data
ILF Entity Entity Fact
Upload Staging Entity Dimension EOs
file data Entity 3 ILFs
ILF
Technical
copy
EI
EI

Data Data Data Data


Reports
Sources Staging Area Ware House Mart

Component of the DWH Summary of the counting guidelines


The imported data are not counted as ILF within the DSA. The data
groups and functions are not counted till later, in the next component of
The Data Staging Area the Data Warehouse.
(see Guideline §3)
For each hierarchical dimension: 1 ILF.
For each record element type in the hierarchical dimension: 1 EI for the
inputting.
If the number of record element types in the hierarchical dimension is not
known, 1 ILF and 3 EIs should be counted.
For each non-hierarchical dimension: 1 ILF and 1 EI.
The Star Schemas of the Remember that a new EI is needed for each source.
Data Warehouse Only when deletion is described explicitly: For the relevant record type: 1
(see Guideline §4) additional EI.
If Facts from the Data Warehouse are combined into new Facts (other
than by aggregation) or if some of the data can no longer be drawn from
The Data Marts the Facts in the Data Warehouse, the count for each Data Mart should be:
(see Guideline §5) 1 ILF and 1 EI.
Group the reports required based on similar facts and layout but for
Reports selections within dimensions and count one EO function for each report
(see Guideline §6) group.

When data in the ODS is only a copy of the source, count it as a EF.
ODS When data is integrated and stored, count that ILF and count a 1 EF and 1
(see Guideline §8.4) EI for each source/administration that is integrated.

24
FPA applied to DATA WAREHOUSING

Appendix C. Requirements basic design document relating to function point counts

To enable efficient counting documentation is a preferred source. It should contain


information listed below.

Component of the DWH Summary of the requirements


The Data Staging Area None, however, given the follow-up to the process, it is important
to have a logical data model of the DSA.
The Star Schemas of the For each dimension:
Data Warehouse • A logical model of the layers and/or data groups in the
dimension;
For each layer in the dimension:
o the data that are drawn from (refer to the logical data
model of the DSA for this purposes)
o handling; the functionality to be achieved, giving
special attention to:
o handling of the changes;
o whether or not data are deleted.
o whether data are drawn from multiple source systems
(when populating the DSA).
For each Fact:
• the logical data model surrounding the Fact (the Star Schema)
insofar as it is relevant to the processing;
• the data that are drawn from (refer to the logical data model of
the DSA for this purpose);
• the processes by which the Facts are extracted.
The Data Marts For each Data Mart:
• the logical data model of the Data Mart;
• the data that are drawn from (refer to the data in the DWH for
this purpose);
• for each operation; the functionality to be achieved, giving
special attention to:
• the period of retention/storage.
Reports For each report group:
• the reports that belong to that group;
• one model or a model of each report;
• a logical data model of the data in the report;
• the Dimensions used in the DWH;
• the Data Marts used;
• (if applicable) the Facts used in the DWH;
• the required functionality, including:
• selections and choices made by the end user.

25

You might also like