Professional Documents
Culture Documents
Version 1.1.0
www.nesma.nl
ISBN: 978-90-76258-25-6
2
FPA applied to DATA WAREHOUSING
Table of Contents
1 Introduction .....................................................................................................................5
1.1 Background to and version history of this document ................................................5
1.2 Overview..................................................................................................................5
1.3 Disclaimer ................................................................................................................6
1.4 References ..............................................................................................................6
2 General features .............................................................................................................7
2.1 Estimated count .......................................................................................................7
2.2 Counting the project size..........................................................................................7
2.3 The end user............................................................................................................8
2.4 Count for each DWH component – choice of architecture ........................................8
3 The Data Staging Area....................................................................................................9
3.1 Counting guidelines for the Data Staging Area....................................................... 10
3.2 Requirements design document for the Data Staging Area.................................... 10
4 The Star Schemas of the Data Warehouse ................................................................... 11
4.1 Counting guidelines for the Dimensions of a Star Schema..................................... 13
4.2 Counting guidelines for the Facts of a Star Schema............................................... 13
4.3 Requirements design document for Dimensions and Facts.................................... 13
5 The Data Marts in the Data Mart Area........................................................................... 14
5.1 Counting guidelines for the Data Mart Area ........................................................... 15
5.2 Requirements of the design document for the Data Mart Area............................... 15
6 Reports ......................................................................................................................... 16
6.1 Counting guidelines for Reports ............................................................................. 17
6.2 Requirements of the design document for Reports ................................................ 17
7 Other elements of a Data Warehouse environment....................................................... 18
7.1 Cleansing............................................................................................................... 18
7.2 Meta Data relating to the logistic process............................................................... 18
7.3 Meta Data relating to the meaning of the data–Business meta data....................... 18
7.4 Importing FPA tables ............................................................................................. 18
8 Alternative architectures................................................................................................ 20
8.1 By-passing the Staging Area.................................................................................. 20
8.2 Reports from the Star Schema............................................................................... 20
8.3 Multiple data groups in a single file delivery ........................................................... 20
8.4 Operational Data Store - ODS ............................................................................... 20
8.4.1 Counting guidelines for the ODS............................................................................ 21
8.5 The federated Data Warehouse ............................................................................. 21
8.6 Inmon..................................................................................................................... 21
8.7 BDWM ................................................................................................................... 21
Appendix A. Data Warehousing Architecture Reference Model ....................................... 23
Appendix B. Summary – Functions and Files in the DWH ............................................... 24
Appendix C. Requirements basic design document relating to function point counts ....... 25
3
FPA applied to DATA WAREHOUSING
4
FPA applied to DATA WAREHOUSING
1 Introduction
This document provides counting guidelines for performing an estimated Function Point
Analysis to determine the size of Data Warehouse projects - from the source up to (and
including) the reporting stage in an existing Data Warehouse environment.
The FPA count is based on version 2.2 of NESMA’s ‘Definitions and counting guidelines for
the application of Function Point Analysis’ (Definities en telrichtlijnen voor de toepassing van
functiepuntanalyse), paying particular attention to the key principles involved when applying
FPA to Data Warehouse projects. The functional architecture – the distribution of the
functionality across the Data Warehouse as a system – is based on the Architecture
Reference Model of Atos Origin Nederland B.V.
These guidelines were developed collaboratively by Jos Hendriksen (Atos Origin BAS Oost)
and Rob Eveleens (Atos Origin BI&CRM) from their experience of using the FPA guidelines
in data warehouse projects at KPN. Based on their work, Nico Mak (ABN AMRO Bank N.V.),
Theo Kersten (ATOS Origin Software Development & Maintenance Center) and Rob
Eveleens reviewed the architecture and functionality with a view to improving alignment with
the NESMA guidelines. From version 0.2. onwards, Jolijn Onvlee (Onvlee Opleidingen &
Advies) and Freek Keijzer were brought in to refine the wording and the rationale. Version
0.5 was reviewed internally by NESMA and was also discussed by Achmea, as a result of
which several suggestions were made (and incorporated) that take account of situations in
which the core of the DWH has been developed more along the lines of Inmon’s approach.
1.2 Overview
This document assumes that the reader is familiar with FPA and its associated terminology
and abbreviations, and with Data Warehousing. As there are different ways of thinking about
Data Warehousing we have included in this document the approach used as the basis for
these counting guidelines. See Appendix A – Data Warehousing Architecture Reference
Model.
To be able to carry out an estimated count, it is preferable to have a design document that
includes the following:
• the conceptual (logical) data model
• a description of the functions and the inward and outward data flows
For each component of the Data Warehouse, the ‘additional requirements’ needed to ensure
that the count can be carried out on the basis of the design document have been made
explicit and have been included at the end of each chapter.
Please refer to the three appendices for a summary of counting guidelines, a framework for
Data Warehouses and an overview of the ‘additional requirements’.
5
FPA applied to DATA WAREHOUSING
1.3 Disclaimer
The FPA method described in this publication has been put into practice by various
businesses and projects. Nevertheless, the NESMA does not claim that the methodology
has been tested scientifically. Supplementary research and further practical use are needed
in order to establish its usability.
By publishing the present document, ‘FPA applied to Data Warehousing’, NESMA seeks to
help advance understanding of the application of FPA counting guidelines in other
environments. NESMA cannot be held responsible for the use of this publication, nor for the
results obtained from using it.
NESMA welcomes your comments on the document, information about your experience of
using it, and any suggested additions or improvements. Please contact NESMA via:
office@nesma.nl.
1.4 References
• [Kimball 2008] The Data Warehouse Lifecycle Toolkit, 2nd edition, Practical Techniques
for Building Data Warehouse and Business Intelligence Systems. John Wiley & Sons,
2008.
• [Inmon 2005] Building The Data Warehouse, 4th edition, John Wiley and Sons, NY NY,
2005.
6
FPA applied to DATA WAREHOUSING
2 General features
2.1 Estimated count
These guidelines describe the implementation of an estimated function point count for Data
Warehouses. In practice an estimated count is sufficiently accurate to allow confident
budgeting and subsequent costing of projects.
We repeat, perhaps unnecessarily: The user functions that need to be counted are identified
without examining the complexity of those functions. In an estimated function point count,
transactional functions are counted as being of ‘Average’ complexity and data functions as
being of ‘Low’ complexity. 1.
In applying FPA, NESMA distinguishes between product size and project size. The following
purposes of function point counts can be distinguished:
• determining the product size 2 :
The number of function points is a measure of the extent of the functionality delivered, or to
be delivered, to a user by an information system. It is also a measure of the size of an
information system that has to be maintained.
• determining the project size 3 :
The number of function points is a measure of the extent of the functionality either of a
complete (or partial) new information system to be created by a single project, or of the
extent of changes to an existing system. In the latter case, changes may include one or
more of the following: the addition, modification or removal of user functions. The project
size is an essential parameter for determining the effort needed for the project.
1
KGV:5FP, ILGV:7FP, IF:4FP, UF:5FP, OF:4FP.
2
IFPUG: application count
3
IFPUG: development count or enhancement count
7
FPA applied to DATA WAREHOUSING
These guidelines assess functionality from the perspective of the end user, managers, and
staff in those parts of the organisation for which the DWH is being developed.
Where functionality is being developed for managing the Data Warehouse per se, the
perspective of the engineers who support the data warehouse becomes significant. The
current document does not contain any guidelines on this area as that functionality can be
counted as usual.
The FP count for each component of the system can be carried out on the basis of a
functional design document, which itemises all the tables for each component of the system
and identifies the transformation processes between those tables in general terms. For the
purposes of this count we distinguish the following groups of functions of the Data
Warehouse as a system:
1. Transporting data to the Data Warehouse’s Data Staging Area
2. Populating the Data Warehouse core
3. Preparing the Data Warehouse’s Data Marts
4. Producing Reports
The information architecture we envisage is set out in Appendix A – Data Warehousing
Architecture Reference Model.
The components of the DWH are not regarded as separate systems. Groups of data from
one component within the DWH are not regarded as External Interface Files (EIF’s) for other
components.
On the one hand, there are those who favour the Kimball approach based on a core of Star
Schemas [Kimball, 2008] whilst the others, prefer a core based on a relational model [Inmon,
2005].
There has been plenty of discussion about the functional counting of Facts and Dimensions
from Star Schemas, but much less about counting in relation to relational Data Warehouses.
When a Data Warehouse is set up using Inmon’s philosophy, the guidelines on counting
Dimensions and Facts are equally relevant, albeit at somewhat later stage of the process,
i.e. when setting up the Data Marts.
So please do not be misled: These guidelines are applicable to both Kimball’s approach and
Inmon’s. This document also discusses the counting of the dimensions of the Facts from a
Star Schema when populating the Data Warehouse: Kimball’s approach. A clear distinction
can thus be made in relation to the counting of Data Marts. Section §Inmon, contains a
number of tips for the relational Data Warehouse.
The processes work on a single flat file, usually converting it into a single table, in which the
information is supplemented to include identification of the source and the period to which
the data relate. The data received is kept as received, so that when new functionality is
introduced, or in the event of recovery operations, data processing may be carried out
sequentially again. These data are also frequently used to answer detailed queries about
reports, and to answer ad hoc queries.
This stage of the process is regarded as a technical solution for making data available to the
Data Warehouse. Technical solutions cannot be included in the functional count so the total
of functional points for this component is usually zero. The function is seen as a preparatory
component of the functions in the next chapter, which populate the Star Schema or Facts of
the Data Warehouse. Only if user reports have been defined on the basis of data direct from
the DSA are there any data that can be included in the functional count.
That could be all we mention here on the DSA. We expect a lot of comment on the viewpoint
and we therefore look at the technical mechanism in detail so that the reader may apply the
count to his/her individual situation.
Implementation of point 1 is usually outside the scope of the project, but could be counted as
an EO (output function) of the source system. The (flat) file (2) is not counted as it is clearly
the output from the function in point 1. Points 3, 4 and 7 do not have to be redesigned and
rebuilt each time. They are functions of the application that controls the input of the DWH
and its configuring is an element of the implementation plan. So we only have step 5 and 6
to assess for the count.
9
FPA applied to DATA WAREHOUSING
If the functionality relates to only one or more of the points below, there is no reason to count
the data in the DSA as ILFs (internal logical files), as no user functionality is involved:
• Translating source codes to DWH codes
• The buffer function for matching the frequency of data deliveries (per day as
opposed to per week or per month)
• Adding the period to which the data relate
• Adding meta data relating to processing (e.g. source of the data)
If the latter (reporting) is the reason for counting the data in the DSA as ILFs (Internal Logical
Files), along with associated EIs (External Inputs), the reports in question should, of course,
also be counted as External Output functions of the system.
To summarise: in the case of an interface using (flat) files, the imported tables are not to be
counted as ILFs in the DSA.
A direct interface allows IT tools to be used to give one database environment (in this case,
the DWH) direct access to data in another database environment (in this case, a source
system). In this way the source data remain in the source system – no transfer takes place –
and the data are merely ‘made available’ to the Data Warehouse.
When a direct interface is used each interfaced logical file in the source system is counted
as an EIF (External Interface File). Count these available data groups as EIFs for the Data
Warehouse.
For better understanding the counting result, report them as counted within the DSA.
When importing data using (flat) files, the export functions from the other source system
should be counted as EOs (External Output functions) for that source system.
The imported data are not counted as ILF within the DSA. The data groups and functions are
not counted till later, in the next component of the Data Warehouse.
When importing data using a direct interface with a logical data file in the source system,
each file is counted as an EIF (External Interface File).
None.
10
FPA applied to DATA WAREHOUSING
The Data Warehouse’s Star Schemas store historical data in a standardised manner in
accordance with a range of views. Two important types of data group can be discerned: the
Facts (the basic events for which reports are wanted) and their Dimensions (the views,
groupings by which the Facts will be reported), together making up the Star Schema(s) of
the DWH. The tables in this component of the Data Warehouse are often referred to as the
Data Warehouse or the Star Schema. Facts are the core element of the star schema, while
the dimensions are its rays. This component of the Data Warehouse usually consists of a
number of Star Schemas, in which as many dimensions as possible are re-used.
First, the data about the Dimensions for a specified period of time must be processed before
the Facts for the same or earlier periods can be attached to the Dimensions.
The designer of the source system will model this in detail leading to a numerous ILFs and
EIs. The designer of the Data Warehouse has to simply this into a structure which is easier
to query and navigate by users within while reporting. He may opt to translate this structure
into a single Dimension, ‘relationships’, using corporate rules for the relationships within the
Dimension and (optional or mandatory) attributes.
11
FPA applied to DATA WAREHOUSING
Does this mean there is just one ILF? Or as many as in the source system? Or as many ILFs
as there are possible layers in the hierarchy of the dimension “relationships”? And how many
EI functions must be distinguished?
The guideline is as follows. For a dimension, count 1 ILF. Then, in order to determine the
number of input functions (EIs), determine how many record types can be distinguished
within the Dimension: Examine the number of levels in the Dimension, examine whether
levels are handled differently (differences in the handling of attributes, for example), and set
out the chosen options in the counting report.
4
The number of Dimensions to be linked is an obvious measure of the complexity of the function, as is the number of
attributes to be calculated. An estimated count does not take these into consideration.
12
FPA applied to DATA WAREHOUSING
To simplify counting it is preferred to have some documentation available. Look for the
following documentation.
13
FPA applied to DATA WAREHOUSING
The first characteristic is not sufficient to justify regarding the Data Mart as one or more ILFs,
but the second is and the third also5.
From a logical point of view, data is appended to Data Marts. New occurrences are added to
the data already in the Data Mart. There are usually no functions for changing data in the
Data Mart and it is also rare for data to be deleted. When data are deleted, it is usually in the
form of data cleansing. Such cleansing is usually done as part of the overall control
mechanism of the Data Warehouse.
There are environments in which views of the Data Warehouse are made available as the
starting point for the generation of reports. As the data cannot exist independently of the
data in the Data Warehouse there is no ILF.
For each Data Mart, count 1 ILF for storage of the data and 1 EI for processing of the data.
In principle Data Marts use the same Dimensions as the Data Warehouse. Therefore no
ILFs need to be counted for the Dimensions of the Data Marts.
There are sometimes differences in technical implementation in regard to making the Data
Marts and associated Dimensions available to users. The Data Marts and Dimensions may
be accessed directly by the reporting functions. To improve performance, however, or to
maintain security of the data in the Data Warehouse, or to facilitate distribution of the data,
technical copies of the data may be published on a special platform, which is independent of
the Data Warehouse. Those technical copies should not be counted.
5
Suppose that, for each cash register transaction, the time and the outlet are recorded. A second table of facts contains the
clock times for each employee for each outlet. The number of transactions per week is calculated in a Data Mart. That is not
sufficient to justify regarding the Data Mart as an ILF. The same applies when the number of transactions per week per outlet is
calculated. However, a Data Mart containing the number of transactions per employee is regarded as an ILF
14
FPA applied to DATA WAREHOUSING
If Facts from the Data Warehouse are combined into new Facts (other than by aggregation)
or if some of the data can no longer be drawn from the Facts in the Data Warehouse, the
count for each Data Mart should be: 1 ILF and 1 EI.
5.2 Requirements of the design document for the Data Mart Area
15
FPA applied to DATA WAREHOUSING
6 Reports
Reports are normally created using completely different tools from the other functions, with
productivity varying greatly between function points. Because of this variation in productivity
it is desirable to keep the FP counted for reports separately so that those who have to
assess the effort required can take account of that difference in productivity
There are a large number of (functionally and technically) different report solutions on the
market. This counting guideline attempts to estimate the underlying functionality without
regard to such differences.
In many DWH environments we find two types of report, each addressing specific functions
within the organisation(s):
• Fixed or canned reports.
• OLAP reports.
Many fixed reports 6 from a Data Warehouse relate to a specific cross-section of the data
available in the Data Marts. As a result many reports of similar format and content are
specified, in many variants. According to the NESMA guidelines, simply counting each of the
required reports as an EO
function is not right as too many Supplementary information 3: Report variants
of the underlying logical
Staff of the corporate sales department request reports to be
processes are the same to allow designed that relate to the numer of sales (1) per week per
the functions to be regarded as employee, (2) per month per team and (3) per year per
unique. But it is also not right to department. The content is similar for the three reports but for the
regard all the reports together as level of detail and it will show either the name of the employee,
the team or name of the department.
one external output function:
there are logical data groups The user has to specify a selection meaningful to him from the
which sometimes are, and enormous number of possible reports. From the initial report
sometimes are not, shown in requirements we identify time and organisation as Dimensions,
variants of the reports. In such each with three levels (week, month, year and employee, team,
departement). Of the nine possible
cases, the NESMA guidelines lay combinations the user is asking for three.
down that a number of EO
functions must be distinguished. The guidelines suggest to count these three reports as one EO as
the logical proces of each report is the same.
7
In OLAP environments the data
to be reported, including all aggregations, are provided to the user via the requested views
for his ad hoc analyses. The user puts the views that are of current interest to him into his
report as columns and rows and then looks for the desired level of detail within that report.
All views are available constantly but if they have not been selected for the analysis they are
displayed to the user as “all items in this viewpoint”. In the case above, with the canned
reports, there is an enormous number of possible reports, in OLAP environments every
possible report is actually available. The functional specifications in this environment talk of
the measured data to be displayed (and their derivatives) followed by all possible groupings
of the Facts. In this environment the user sees this as one report and thus one is tempted to
count one EO.
This outlines the counting dilemma for us. In the fixed reports environment we may easily
incline to counting one EO function for each report designed. In the OLAP environment it
would be quite easy to count all the reports together as one EO function only. But that
means that the technical environment is determining the functions, rather than the functions
being defined by the functionality of the reports!
6
E.g. Business Objects® or Oracle Discoverer®
7
E.g. Cognos®
16
FPA applied to DATA WAREHOUSING
The guidelines therefore advocates the grouping of fixed or canned reports into report
groups based on a number of selection criteria, usually levels in dimensions, which makes
the functionality counted in the two reporting environments the same.
Group the reports required based on similar facts and layout but for selections within
dimensions and count one EO function for each report group.
To support the ease of counting, look for design documentation describing, for each report
group:
• the reports that belong to that group;
• one model or a model of each report;
• a logical data model of the data in the report;
• the dimensions used in the DWH;
• the Data Marts used;
• (if applicable) the Facts used in the DWH;
• the required functionality, including:
• selections and choices of the end user.
Here, too, it is important to state whether the report group in question already exists or not.
17
FPA applied to DATA WAREHOUSING
As already stated, usually no data is deleted from a DWH by Data Warehouse functions. The
available history may be limited by data cleansing functions, which limit the history in the
Star Schema to (for instance) 25 months and the history in the Data Marts to (for instance)
61 months. Such cleansing functions are normally incorporated into control of the data
warehouse, or a result from physical re-use of parts of the database system (the latter
meaning that data are physically overwritten with new data after, say, 24 months).
Count the cleansing function at the level of functions (i.e. do not count one input function for
each ILF).
The data used to manage the Data Warehouse may be, for example: dates on which a
process has added data to a table from a source; the number of records added, amended or
rejected at that time; or the parameters used for that processing operation. The processes to
be developed must read and edit these Meta data.
These functions can not be identified by users and must not be counted. However, the
control mechanism must be created when the Data Warehouse is set up. For the purposes
of counting this functionality, we regard the administrators as users and count in accordance
with the standard NESMA guidelines.
These functions and the data required for them are not counted when estimating the project
size. The assumption made is that describing the meaning is a business activity and that the
description in question will be captured in a tool that has been set up on a once-only basis
for the Data Warehouse.
For the purpose of counting this functionality, the administration and presentation of the
business meta data, the end-users of the DWH should be regarded, as users and counting
should be carried out in accordance with the standard NESMA guidelines.
If tables containing only codes and descriptions are imported from sources in order to be put
into the DWH tables or to transform the input received - doesn’t that look very similar to the
transfer of so-called FPA tables? Do we need to these imports? If so, do we count just once
for the whole DWH or once for each source system?
18
FPA applied to DATA WAREHOUSING
Count one 1 FPA table ILF for the whole Data Warehouse, plus the associated EI, EQ and
EO function. If, when there is a subsequent increment of the Data Warehouse, a further FPA
table in a source system has to be accessed, we count one FPA-table ILF, one EI, one EQ
and one EO function.
19
FPA applied to DATA WAREHOUSING
8 Alternative architectures
Data warehousing practice is constantly changing. We hope that the model offered here will
be easy to apply to your own practical situation. We will mention here a number of
alternative architectures or components and suggest a counting guideline.
Data can be put directly into the Dimensions or Facts of a Star Schema. A DSA may not
required or desired.
Reports can be based directly on the data in the Star Schema (without using the Data
Marts).
The frequency of delivery of data to Data Warehouses is increasing. The volume of each
data delivery is consequently declining. Taking this to the extreme we find a situation in
which the Data Warehouse is populated from real-time message queues. To enhance the
efficiency of the process a single message may contain two or more data groups. The
functionality must place the data in the correct manner into a number of logical data groups.
For example, a file for a fuel expenses claim: vehicle and owner, driver and registration
number, plus the number of litres purchased at each refuelling, the cost and the odometer
reading. For such files XML is a good way of structuring the content flexibly and marking it.
An XML file thus often contains more than one data group.
The count should be performed in the same way as for Facts and Dimensions, for each data
group that results from the file, regardless of the frequency of delivery.
We are familiar with the use of an operational data store (ODS). Into the ODS a selection of
the data from several transactional systems is “copied”, each item of data being replicated
one-to-one in a different environment. The reports are made available from that environment
via the queues, action lists and signal lists current at that moment 8. The primary motivation
for doing this is integration of data residing in the transactional systems and to relieve the
transactional systems of the burden imposed by demanding reports.
When the data in the ODS is only a copy of the source, count it as a EF.
Only where data is integrated a new ILF is created to hold the keys in the source, a
reference to the source, the algorithm used to create the relation, including a reference to
the corporate identification, when available. Count that ILF and count a EF as mentioned
above for each source/administration as well as 1 EI for the algorithm.
8
The ODS contains up-to-date information. The DWH contains history and is only current up to a certain point in time.
20
FPA applied to DATA WAREHOUSING
If an ODS is available in the environment of the Data Warehouse and the required data are
available there, then a direct link to the ODS, instead of to the source system, would seem
an obvious solution from the point of view both of relieving the burden on the transactional
system and of ease of accessibility.
One should regard the ODS as one of the sources of the DWH and count its functionality as
described in chapter 4.
When data in the ODS is only a copy of the source, count it as a EF. When data is integrated
and stored, count that ILF and count a 1 EF and 1 EI for each source/administration that is
integrated.
Some organisations have a number of Data Warehouses existing in parallel and serving as
source systems for one another. Usually one Data Warehouse supplying data will be seen
as a source system by another.
There will probably be a direct interface from the DWH to the other component and the count
should be based on one EIF for each file interfaced from the ODS.
8.6 Inmon
As indicated in section 2.4 (Count for each DWH Component – an Architecture Choice), the
core of the DWH may have been set up using a relational model. In that situation you must
take note of the following:
• Count an EI per record type of an ILF if a different logical handling is described for it;
• Count an EI for each other logical handling operation for the inputting of logical data
files;
Expect different EIs for the same ILF for each source. Usually the handling operation is
logically different for each source.
8.7 BDWM
As an example of an alternative structure in which the core of the Data Warehouse consists
of a (very highly) normalised model instead of a Star we would mention the Banking Data
Warehouse Model, IBM’s model for a data warehouse for banks. When implementing this
model sometimes the logical model is translated directly into a technical structure. As a
result of that approach data it becomes hard to see the difference between “new data” and
“new descriptions of data”. How to find the new data only? To illustrate, we will look at
recording of the item “The job of the person with the name Rob is that of writer”. The
technical recording can be represented as follows:
21
FPA applied to DATA WAREHOUSING
3. There is a generic table in which the relationship between various tables, including
the one for persons and the aforementioned table, can be made with the descriptive
attribute “the job of a person”. A new entry, referring to the new person and his job, is
thus created.
4. In a table the value for the relation in point 3 will refer to the involved party “rob”, the
relation “the job of a person” and the value “writer”.
The tables in 2. and 3. describe new possibilities, the table in 4. holds the new information.
The item “The colour of the eyes of the person with the name Rob is brown” could be stored
in the same way, in exactly the same tables. The value “brown” must appear in the generic
table that contains the wide variety of possible values for “colours”. In the third table
described above an occurrence would be created referring to “involved parties” and “colours”
to hold “the eye colour of a person”. The fourth will actually hold the information. If Rob’s job
changes then only the event in the third table changes. If a job history is required this is
achieved by using validity periods in that table. Is the information on job positions not
relevant any more, the relations period is limited.
As a result, the presentation of the logical design of the Data Warehouse contains an
enumeration of the identified relationships, classifications etc. The function point analyst’s
dilemma is now plain: what are the logical data files that should be maintained? Should each
attribute be counted as a logical data file? How to reconstruct the logical information model
from the technical implementation? Its hidden in the occurrences! Sure it is a flexible
approach.
We suggest not to count each attribute described as logical data file, including a function, but
to do one additional logical step: group information with similar identifying keys. “job” and
“eye-colour” are relations between an involved Party (a identifying key) and a domain are
easily grouped into one logical data file. “Rob has a son Bert” will have two identifying keys
and will be a separate logical data file. Refer to the FPA documentation of the NESMA on
grouping of data.
22
FPA applied to DATA WAREHOUSING
The figure above shows the flow of data from source to user and a distribution of functions
across architecture:
1. The left-hand layer in the diagram (Source Layer) relates to systems that supply the
information or contain the organisation’s primary operational process. From that
layer, data is made available to the second layer in the model.
2. In the Data Layer, firstly the data is imported (into the DSA), then modelled according
to the company’s corporate business rules (product definition, organisational
structure, market channels) and stored in the Data Warehouse. Only after that are
data refined to make them suitable for use for specific queries from departments or
decision-supporting systems or for statistical analyses in preparation for making the
data available to the users. They are then stored in Data Marts.
3. In the final two layers at the right-hand side the data are used for reports or
applications and subsequent presentation.
The leftmost and rightmost layers are outside the scope of this counting guideline for DWH.
Application
Data boundary
DWH
store EI
3 EIs
EIF
EI
Star
Dimension Dimension Data Mart
Upload Staging
Fact Dimension
file data EOs
ILF Dimension Dimension Fact
Staging 5 ILFs Dimension
data 1 ILF
ILF Data Mart
Upload Staging Relational DWH Dimension
file data
ILF Entity Entity Fact
Upload Staging Entity Dimension EOs
file data Entity 3 ILFs
ILF
Technical
copy
EI
EI
When data in the ODS is only a copy of the source, count it as a EF.
ODS When data is integrated and stored, count that ILF and count a 1 EF and 1
(see Guideline §8.4) EI for each source/administration that is integrated.
24
FPA applied to DATA WAREHOUSING
25