You are on page 1of 27

Matthew Lawler lawlermj1@gmail.

com Datawarehouse Names

Datawarehouse
Names

Matthew Lawler lawlermj1@gmail.com

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 1 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

INTRODUCTION 3

HIGH LEVEL STANDARD 7

PROJECTION STANDARDS 11

CONSTRAINT STANDARD 17

ATTRIBUTE STANDARD 18

GENERAL RENAMING RULES 23

TIME STANDARD 24

PLATFORM CONSTRAINTS 26

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 2 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Introduction
Licence
As these are generic software documentation standards, they will be covered by the 'Creative
Commons Zero v1.0 Universal' CC0 licence.

Warranty
The author does not make any warranty, express or implied, that any statements in this document
are free of error, or are consistent with particular standard of merchantability, or they will meet the
requirements for any particular application or environment. They should not be relied on for solving
a problem whose incorrect solution could result in injury or loss of property. If you do use this
material in such a manner, it is at your own risk. The author disclaims all liability for direct or
consequential damage resulting from its use.

This is really a sample document, illustrating particular choices taken. As each DW environment is
different, so these choices will also be different. There is no universal standard for naming
conventions.

Purpose
This Naming standards document defines the standards to follow for schemas, subject areas,
projections, attributes, constraints, boilerplate columns, renaming, and time. This document is
needed to make it easier for Business users to understand column and table names, and reduce the
cost of egregious renaming.

Audience
The primary audience of this document are any staff who use the DW, or do design development or
maintenance on the DW. It will also be useful to Business Intelligence end users.

Assumptions
It is assumed that the naming conventions will support a variety of DW design patterns such as Data
Vault, Kimball, and source standards.

Approach
The naming standard will be based on common DW design patterns and other essential design
decisions.

Related Documents
There are many documents describing the DW design methodologies such as Kimball, Inmon, etc.
Some are below.

Author Reference Publisher Year

W Inmon Building the Data Warehouse Wiley 1996

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 3 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Author Reference Publisher Year

CJ Date Temporal Data & the Relational Model Morgan 2002


Kaufmann

Ralph Kimball The Data Warehouse Toolkit: The Complete Wiley 2002
Guide to Dimensional Modelling

Definitions
Term Source Definition

Aggregate Kimball An aggregate is a summary table using group by, often based on fact
and/or dimension tables.

Bridge Kimball A bridge table to capture many-to-many relationships, such as when a


fact table row can be associated with more than one value in a
dimension.

Data Model Chris Date The data model must represent demonstrably true statements about
the business area. That is, the entities must represent things that mean
something to the business, and the relationships between entities
must represent meaningful links.

Data Vault Linstedt Data Vault is a database method that is designed to provide long-term
historical storage of data coming in from multiple operational systems.
It provides a DW pattern that supports the Inmon goals of Subject-
orientation, non-volatility, integration and Time-variance.

Dimension Kimball An independent entity in a dimensional model that serves as an entry


point or as a mechanism for slicing and dicing the additive measure
located in the fact table of the dimensional model. For example, all
months, quarters, years, etc., make up a time dimension. Based on
Measure Theory.

Disemvowel DB This is a function that removes all vowels from a text string. E.g. A
sentence such as: “The quick brown fox jumps over the lazy dog”
would, after being disemvowelled, become: “Th qck brwn fx jmps vr th
lzy dg”.

EAV DB An Entity-Attribute-Value table. This is a type of RDF table. Data is


recorded in only three columns: Entity: the item being described;
Attribute or parameter: a foreign key into a table of attribute
definitions; Value of the attribute. This pattern is very popular with
business users, as they have the opportunity to redefine data in an
application. Often, much of the useful reporting data are in these
columns. However, it can be quite difficult to pivot this data into
usable reporting tables. See the Resource Description Framework

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 4 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Term Source Definition

(RDF) as an example.

Fact Kimball A business performance measurement, typically numeric and additive,


that is stored in a fact table. Based on Measure Theory.

Hub Linstedt A hub represents a core business concept such as Customer, Vendor,
Sale or Product. The hub table is formed around the business key, as
well as the source system keys.

Integrated Inmon Integration is the process of mapping dissimilar codes to a common


base, developing consistent data element presentations and delivering
this standardized data as broadly as possible.

Invariant Algebra An invariant is a property of a mathematical object that remains


unchanged when transformations of a certain type are applied to the
object. As an example in a DW, the count of network nodes should be
identical in a source database, as well as the target DW, at the same
point in time.

Join Codd A join is a binary operator on two relations or database tables.

Link Linstedt A link represents a natural business relationship between business


keys.

Logical name Chris Date These are full words from a standard business vocabulary. This
contrasts with physical names that are often abbreviated due to name
length limitations, etc.

Metadata DB Metadata is "data about data". While not often used in reporting,
these tables are important in DW standards, and for generating and
describing DW components.

Name Computer The prevention of accidental capture of identifiers, or name collision.


Hygiene Science This occurs especially when generating identifiers or names. In
programming languages, these are solved using hygienic macros.

Non-volatile Inmon Non-volatile design is essential. Non-volatility literally means that once
a row is written, it is never modified. This is necessary to preserve
incremental net change history. This, in turn, is required to represent
data as of any point in time. When a data row is updated, the past
information is destroyed. A fact or total that included the unmodified
data can never be recreated.

Object DB For the purposes of the DW, an object is a projection over a table or a
table join. They are often used in applications to layer and rename
physical column names to attribute names. Consequently, the end user
is more familiar with the attribute name, rather than the column

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 5 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Term Source Definition

name.

Pivot DB A pivot table is the transformation of an EAV table into columnar form.

Projection Codd A projection is a unary operation that selects a subset of attribute


names. In database terms, this is a select. This can be implemented as
tables, views, XML messages, etc.

RDF W3C The Resource Description Framework (RDF) is a family of World Wide
Web Consortium (W3C) specifications originally designed as a
metadata data model. It has come to be used as a general method for
conceptual description of information and/or knowledge management,
which is implemented in web resources, and other formats. An RDF
triple is a statement about resources in the form of a subject-
predicate-object expression. DF is a more general form of the original
EAV Entity-Attribute-Value design pattern, where the subject is an
entity, predicate is an attribute and object is a value. A set of such RDF
triples forms an RDF graph.

Relational Codd Relational algebra is an offshoot of first-order logic and of algebra of


Algebra sets concerned with operations over relations.

Satellite Linstedt A satellite contains the descriptive information (context) for a business
key.

Subject- Inmon Subject-orientation mandated a cross-functional slice of data drawn


orientation from multiple sources to support a diversity of needs. This is a
departure from serving only either the vertical application views of
data (supply-side) or the overlapping departmental needs for data
(demand side).

Time Variant Inmon Time variance calls for storage of multiple copies of the underlying
detail in aggregations of differing periodicity and/or time frames. There
may be detail for seven years along with weekly, monthly and
quarterly aggregates of differing duration. This is critical for
maintaining the consistency of reported summaries over time.

Tags
Business Intelligence ; Data Mapping ; Metadata ; Standards ; Data Transformation ; Data
Warehouse ; Database ; Fact / Dimension ; Data Load ; Data Model ; Data Vault ; Database Design ;
Extract Load Transform - ELT ; Extract Transform Load - ETL ; Inmon ; Kimball ; Massive Parallel
Processing - MPP ; Netezza ; Oracle ; Data Integration ; Data Lineage ; Data Traceability ; Time
Variant ; Metadata Glossary ; Hub / Satellite ; Projection ;

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 6 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

High Level Standard

Layer Standard
These are all defined as schemas in the DW. This assumes a simple 4 layer DW design. The final
layers start with A*, so they will appear first in any schema listing.

Layer Aligned Schema Name Description


Area

Gold Business AU_GOLD_ This is a schema for tables and views that are available to
Layer DECISION Business Decision Makers. It represents the final state of the
transformed data, suitably cleansed for the primary customer
of the DW, the business decision maker.

Silver Business AG_SILVER_ This is a schema that enables integration across multiple
Layer INTEGRATION source schemas. It represents transformed Lead data into
more valuable Silver data.

Iron Business FE_IRON_ This is a schema for all metadata data.


Layer METADATA

Lead Source PB_ This contains the raw loaded source data. The primary need is
Layer <Source for a unique definition of the data. This naming pattern
System Code>_ supports this. The name defines the important characteristics
<DB of the load. The first code identifies the source system. The
Name/Path>_ next code identifies the DBName for databases, or the
<Schema address for web sources or the directory path for file sources.
Name/File>_ The DBName must always be based on the ultimate
<Load Type> production database name, not the development name. In
the case of directory or path names, the protocol prefix will
need to be stripped off, and illegal characters like ":/" will
need to be substituted with underscores. The next code
identifies the Schema Name for databases, or the page for
web sources, or the file for file based sources. The next code
identifies the load type. This more precise method of
identifying sources will enable a much large number of
different sources to be identified, and to be able to manage
multiple loads from the same source, so that they do not
impact on each other. This approach is more self-
documenting, with a clearer self documenting data lineage.

Load Type
This is needed to distinguish how the data has been loaded. If data is brought in by separate ETL or
CDC processes, it will be subject to different time variance and recovery processes. This is even an
issue within CDC, where data can be either refreshed or mirrored. If the target is not distinct, then it

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 7 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

may be impossible to properly reason about its state, leading to incorrect replication and, ultimately,
erroneous business decisions.

Load Load Name Description


Type

CDC Change Data This is the default process for loading data.
Capture

ELT Extract Load A common pattern in DW is to load the data in before transformation.
Transform This results in faster loads in an MPP appliance.

ETL Extract ETL is used to load data into the DW. The transformation occurs in a
Transform separate ETL tool, which is a common anti-pattern which leads to poor
Load single threaded ETL performance.

MNL Manually This is practical for data that does not need to be loaded more than once,
loaded or very infrequently, which is generally invariant data.

REF Reference Used mainly for schema analysis, and possibly prototyping, but not to be
used for reporting. A throw away design area.

Just as there can be alternate means of loading, there can also be alternate source types from the
same source. In almost all cases, there is not a strict isomorphism between source and target, and
the source type plays an important part in determining the amount of change that occurs between
the source and target. In other words, the different source types have different constraints that
dictate how well the data is loaded. If the source type is also a file type, then they should conform to
standard file extensions. The source type may also be appended to the name, but it is not
mandatory.

File File Type Name Description


Type

DOC Microsoft Word Document

HPR Hyperion

HTML Hypertext Markup


Language

MDB Microsoft Access Database

MSP Microsoft SharePoint

NTZ Netezza

ODS Operational Data Store Many source systems currently provide a ODS layer. While
well meaning, this anti-pattern often creates a reconciliation

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 8 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

File File Type Name Description


Type

nightmare, as rarely are the ODS mappings back to the


source system actually documented.

ORA Oracle

XLS Microsoft Excel


Spreadsheet

XML Extensible Markup


Language

Subject Area Standard


The physical name can be used in schema names. The Subject Areas will have a three letter code.
The Subject Area codes can be used in projection names, or in primary key values. These subject
area names are examples only.

SA Subject Area Subject Area (physical) Layer


Code (logical)

AAT Authentication Authentication_Authorization, Iron


and Authorization Athntctn_Athrztn

GLS Glossary Glossary Silver

BUS Business Business Gold

CTL Control CONTROL Iron

DQ Data Quality DATA_QUALITY Iron

DVT Data Vault DATA VAULT Silver

ERL Entity & Reference Entity_Reference_Links Iron


Links

GLD General Ledger General_Ledger Gold

HRS Human Resources Human_Resources Gold

INV Invariant INVARIANT Iron

LNG Lineage DATA Lineage Iron

OPS Operations Operations Gold

PRJ Project Project_Management Gold

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 9 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

SA Subject Area Subject Area (physical) Layer


Code (logical)

Management

QLT Quality Quality Iron

SLN Source Lineage Source_LinEage Silver

TRC Traceability TRACEABILITY Iron

Note that some metadata can be placed in other layers. However, if the metadata is shared across
schemas or layers, then it is best to place in a separate schema. Lineage can be considered the
column to column mappings. Traceability can be considered the actual row value to row value
mappings.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 10 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Projection Standards
A projection can be thought of as a subset of columns. These are a series of codes or types that can
be used as part of a projection name. These codes will help to make the purpose of projection more
transparent, through consistent usage of the name patterns. The next chapter will give examples of
how they can be combined to create projection names.

Logical Projection Type (LPT)


Logical means the primary purpose of this projection type within a given methodology. This purpose
will help determine the correct set of columns (or attributes) for the projection. All new projections
MUST have a Logical Projection Type as a prefix, except SAL tables copied from source tables. This is
necessary to avoid name collisions, as most of these projection types can be used in any
layer/schema. As far as possible, each type should belong within a type set, so the types are
mutually exclusive, and all types form a complete type set. For example, within the Inmon type set,
there can only be 3 types: Ephemeral, Invariant or Time Variant.

LPT LPT Name Standard Definition


Code

A, Active (current) Source A projection of the logical active columns for an active table,
ACT filtered to show current data only, and excludes deleted
rows. An active column has a number of distinct values > 0
(greater than zero). An active table has at least one row.
Some redundancy here, as this is also defined by the column
filter type. However, this will be retained as it is now
standard practice.

B, Bridge, Helper Kimball A join over tables that can be used as a bridge in a Kimball
BRD schema.

C, Cube Kimball A projection that is a join of a Kimball Fact table, and related
CUB Kimball Dimension tables. This is useful when the users do
not know how to join tables. This does not use GROUP BY,
so it is different to an Aggregate.

D, Dimension Kimball A join over tables that can be used as a dimension in a


DMN Kimball schema. Typical columns are codes which have low
cardinality, which allow for grouping. A dimension can be
either a singleton view (over a single table), or based on a
set of joined tables.

E, Ephemeral Inmon Something that is true at an instant in time, but has no


EPH duration or longevity. That is, it only lasts momentarily, or
for a very short time. A typical example is a transaction or
trade, which is effectively instantaneous. Typically, this data

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 11 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

LPT LPT Name Standard Definition


Code

is not mutable. That is, it cannot be updated, except for


error correction.

F, Fact Kimball A join over tables that can be used as a fact in a Kimball
FCT schema. Typical columns are numeric which represent either
continuous data or countable data. This is often based on a
single table.

G, Aggregate Kimball A join over tables that can be used as an aggregate or


AGR summary. This denormalisation can be used in all other
standards. This uses GROUP BY, so it is different to a Cube.

H, Hub Data A join over tables that can be used as a hub in a Data Vault
HUB Vault schema.

I, INV Invariant Inmon Something that is always true. For example, system setup
data does not change over the life of a system, so this can be
considered invariant. This data is not mutable. That is, it
cannot be updated, except for error correction.

K, Key Source This represents cleansed Source tables that have had a
KEY surrogate and/or a distribution key added. A surrogate key is
needed for Fact/Dimension joins. A distribution key is critical
for adequate Netezza performance. Note that these tables
would still retain their source names.

L, Link Data A join over tables that can be used as a link in a Data Vault
LNK Vault schema.

M, Metadata A join over tables that can be used for metadata. Examples
MTD Metadata include A&A, Glossary, Data Quality, Data Lineage, Data
Traceability, Invariants, etc.

N, Nub Source This represents Source tables that have been cleansed, but
NUB without column name changes. For example, cleansing can
discard boilerplate columns, de-duplicate rows, add defaults
values (e.g. N/A for nulls), convert types (e.g. Text ->
Dates), fix column lengths, etc.

O, Other Source This represents some source system based column and/or
OTH table grouping which is not defined as a table/view the
source RDBMS. For example, this could represent a
manually maintained subject area grouping, or some set of
data in the application layer.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 12 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

LPT LPT Name Standard Definition


Code

P, Pivot (from Source A join over RDF/EAV tables that pivot or flatten the EAV data
PVT RDF/EAV) into multiple tables in columnar form. They can then be
joined to other tables.

R, Resource A join over source tables used to describe source data that is
RDF Description Metadata in RDF form. Typically, there are 4 tables needed: Resource
Framework (Entity) which defines tables and primary keys, Predicate
(Relation) which defines relationships, ResourceType which
defines type of each attribute (e.g. date, char, etc.) and RDF
(Value) which contains the RDF triples. This data can be used
to pivot into standard tables.

S, STL Satellite Data A join over tables that can be used as a satellite in a Data
Vault Vault schema.

T, Time Variant Inmon A projection of the logical active columns for an active table,
TMV (Historical) which shows current and historical data, and excludes
deleted rows. An active column has a number of distinct
values > 0 (greater than zero). This is only required for
source tables. The non-source DW Design Patterns all
support Time Variance. This data is mutable. That is, it
changes over time, and a new row is created whenever the
source data changes. Some redundancy here, as this is also
defined by the column filter type. However, this will be
retained as it is now standard practice.

U, Unconditional, Source A projection of all columns for all tables, without filtering
UCN (Audit) the column set or the row set. Therefore, this will show both
current and historical data. This also shows deleted rows.
Some redundancy here, as this is also defined by the column
filter type. However, this will be retained as it is now
standard practice.

Z, ZWork These projections satisfy technical requirements, and not


ZWK Metadata reporting requirements. These should not be visible to
reporting users. Many will be required by the RDBMS to
define projections like indices, etc.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 13 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

This hierarchy shows what standards the logical projection types belong to. The projection designer
needs to ensure that the projection conforms to these projection types. For example, if the
projection is a fact table, it must begin with F_*, and no other prefix.

Logical
Projection

Metadata Source Kimball Data Vault Inmon

M E
A Active B Bridge H Hub
Metadata Ephemeral

R RDF K Key C Cube L Link I Invariant

D T Time
Z Work N Nub S Satellite
Dimension Variant

O Other F Fact

G
P Pivot
Aggregate

U Uncond

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 14 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Physical Projection Type (PPT)


All physical types are defined, which will help minimize name collisions. The designer may choose
not to add any suffix, at the risk of subsequent name collisions. Not defining the Physical Projection
Type suffix is unsafe practice when generating projections automatically.

PPT PPT Name Oracle Netezza Comment


Code

A Abstract An ‘Abstract’ type is also provided, even though this does


not exist as a type in any database. The *_A type can be a
table or a view, etc. This will enable the designer to
replace a view with a table, without having to change the
name. This may help reduce impacts on downstream
users.

F Function True True

G Package True

I Index True True

L DB Link True

M Materialised True True Avoid using this type. Use V instead.

O Object View True

P Procedure True True

Q Queue True

R Synonym True True While a synonym can point to any other physical type, it
still needs to be managed as a particular type. Use *_A, if
*_R is too restrictive.

S Sequence True True

T Table True True This covers sub-types such as temporary and external.

U Type True

V View True True Views can be implemented as materialized or


dematerialized.

X XML True

Z Trigger True

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 15 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

The hierarchy below shows which environments physical projection types belong in.

Physical
Projection

A Abstract

Oracle Only
Shared Types
Types

F Function G Package

I index O Object View

M Materialized
L Link DB

P Procedure Q Queue

R Synonym U type

S Sequence X XML

T Table Z Trigger

V View

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 16 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Constraint Standard
Constraints will be implemented as a single character suffix. Note that these are RDBMS specific.

Constraint Code
CT Suffix Constraint Type Oracle Netezza Description

_C Check True

_F Foreign Key True True

_H Hash Expression True

_O Only View Read True

_P Primary Key True True

_R Referential Integrity True

_S Supplemental Logging True

_U Unique Key True

_V View Check True

Constraint Name Patterns


This shows what would be the most common examples. It is assumed that there will be a maximum
of 9 unique column sets on any given table. Given that these are logically equivalent to the primary
key, more than this is unlikely. Note that the Distribution Key is not a constraint, but a part of the
table structure, so it cannot be altered with an add or drop statement. Therefore, there no naming
standard applies to a Distribution Key as a constraint.

Constraint Pattern Example Base Result Constraint


Projection

Primary <tableName>_P D_NODE_T D_NODE_T_P


Key

Foreign < tableName>_<integer>_F D_NODE_T D_NODE_T_1234_F


Key

Unique < tableName> D_NODE_T D_NODE_T_UNIQUE3_U


_Unique<integer>_U

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 17 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Attribute Standard
Attribute Data Type (ADT)
These standards only apply to new attributes created for the DW. All source column names and data
types should remain unchanged. All non-source columns MUST have an attribute data type. The DB
data type can be varied where it makes sense. Some choices are provided. For example, an
enumeration can be INTEGER or VARCHAR. However, a date must be DATE. Similarly, length can also
be varied where it makes sense.

Suffix Description Oracle Data Type Netezza Data Definition


type

_A, Amount NUMBER(28,10) FLOAT(15) Any currency or monetary


_AMT amount, including balances,
prices, etc. This can enable
fractions of cents.

_B, Binary Large BLOB BINARY Any Binary Large Object such
_BL Object VARYING as an image, audio or video.
For example, well-known
binary (WKB), which is used
to define geometric objects
in binary.

_C, Enumeration (aka VARCHAR2(30), VARCHAR(30), An enumeration is a


_CD code or List of NUMBER INTEGER collection of items that is a
Values) complete, ordered listing of
all of the items in that
collection. For example,
Frequency which can be Day,
Week, Month, etc.
DAY_OF_WEEK_C can be
VARCHAR2(9) or
NUMBER(1,0).

_D, Date DATE DATE Any date that does not


_DT include time. That is, only
YYYY-MM-DD.

_E, Explain VARCHAR2(255) VARCHAR(255) A description or definition


_EX statement.

_F, Float NUMBER(28,18) FLOAT(15) Any numeric that is not an


_FL amount, a percentage, an

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 18 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

integer, a degree or a rate.

_G, Degree NUMBER(10,7) FLOAT(15) A number that can be used


_DG for latitude or longitude.

_I, Identifier (aka VARCHAR2(80), VARCHAR(80), An identifier is a name that


_ID Name or business NUMBER INTEGER identifies (that is, labels the
key) identity of) a unique object.
This is always provided by
the source system. It may
even be a surrogate in the
source, but is still an id. The
length or type may be
changed if needed. E.g.
NODE_I

_J Geometry SDO_GEOMETRY ST_GEOMETRY A type that defines a


geometic object. This type
varies between platforms.
The types can be converted
to WKB or WKT format.

_K, Kind VARCHAR2(255), VARCHAR(255), Any code that must follow


_KN NUMBER INTEGER rules. E.g. A FSA must be in
SAAA format.

_N, Note VARCHAR2(2000) VARCHAR(2000) Text or string is a sequence


_NT of characters. This can be
used for any definition or
description, except for
Identifiers, Codes or
Explanations. Oracle limit is
4,000; Netezza limit is
64,000.

_O, O’clock TIME TIME Any moment in a 24 hour


_OC period, independent of date.

_P, Percentage NUMBER(28,18) FLOAT(15) A percentage as a fractional


_PCT number.

_R, Rate/Percentage NUMBER(28,18) FLOAT(15) A rate as any fractional


_RT number.

_S, Surrogate Key NUMBER(28,0) BIGINT An identifier that is based on


_SK (Natural) the natural key of a table. On
non-time variant tables, this
is identical to the Unique

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 19 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Surrogate Key (_U). For time


variant tables, this will
change when there is a
change to the key.

_T, Timestamp TIMESTAMP(6) TIMESTAMP A date and time down to the


_TS micro-second. That is, YYYY-
MM-DD HH:MM:SS.FFFFFF.

_U, Unique Surrogate NUMBER(28,0) BIGINT A unique identifier for each


_UK Key row of a table. Incremented
whenever a new row is
added.

_W, While NUMBER(28,0) BIGINT Any time duration measured


_WH in seconds.

_X, XML CLOB VARCHAR XML Document. For


_XM example, well-known text
(WKT), which is used to
define geometric objects in
XML.

_Y, Boolean (aka, flag VARCHAR(1 CHAR) CHAR(1), An attribute type with only
_YN or indicator) 'Y' | 'N', VARCHAR(5 CHAR(5), two possible values: yes or
CHAR) 'True' | BYTEINT no. It must be expressed as a
'False', BIT, 0 | 1 question with a clear true or
NUMBER(1, 0), 0 | 1 false value e.g.
is_deleted_YN.

_Z, Integer NUMBER INTEGER Any integer. This excludes -


_ZI monetary or fractional
numeric. This could be count
or sequential value, etc. Z is
the default symbol for
integer in Maths.

The longer attribute type (e.g. _AMT) may be used if there less than 27 characters in the name.
Otherwise, the shorter attribute type must be used, even if that means truncating the name to 28
characters.

Why such large Numeric values for ORACLE?


Oracle implements NUMBER as a variable length column. So, the column size in bytes for a particular
numeric data value NUMBER(p), where p is the precision of a given value, can be calculated using
the following formula:

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 20 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

ROUND((length(p)+s)/2))+1

where s equals zero if the number is positive and s equals 1 if the number is negative. Therefore, the
large number sizes will not impact storage, as they are not fixed.

Boilerplate Physical Columns


System Column Description

SOURCE_CHANGED_ON_TS The date and time when the record was last modified in the source
system.

SOURCE_CREATED_ON_TS The date and time when the record was initially created in the
source system.

EFFECTIVE_AT_TS This column stores the moment date and time at which the record
represents a true value. A value is either assigned by the Unified
Data Store or extracted from the source. Note that this column is
not needed on most tables. It will only be on event type tables, and
certain kinds of fact tables.

EFFECTIVE_FROM_TS This column stores the date and time from which the record
represents a true value. A value is either assigned by the Unified
Data Store or extracted from the source. This column will be on all
SCD2 tables.

EFFECTIVE_TO_TS This column stores the date and time after which the record does
not represents a true value. The EFFECTIVE_TO_T of the previous
row MUST = EFFECTIVE_FROM_T of the next row – 1 TIME UNIT. A
value is either assigned by the Unified Data Store or extracted from
the source. This column will be on all SCD2 tables.

IS_DELETED_YN This boolean indicates the deletion status of the record in the
source system. A value of YT indicates the record is deleted from
the source system and logically deleted from the source aligned
layer. A value of NF indicates that the record is active.

PROCESS_INSERT_ID System field. This column is the unique identifier for the specific
ETL batch process used to create insert or update this data row.
Both INSERT_PROCESS_ID and UPDATE_PROCESS_ID are necessary
in order to be able to easily back out incorrectly loaded data.

PROCESS_UPDATE_ID This column is the unique identifier for the specific process used to
update this row. Both INSERT_PROCESS_ID and
UPDATE_PROCESS_ID are necessary in order to be able to easily
back out incorrectly loaded data.

<ForeignKeyName> ForeignKeyName is the name of the Primary Key on the table that
_DISTRIBUTION_ID, _DSTR_I is being used for distribution over the Netezza nodes.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 21 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

System Column Description

ROW_NATURAL_ID Concatenated artificial primary key. This is constructed out of the


columns that represent the true table key. These values are cast
into string type.

SOURCE_SYSTEM_C This should be a valid 3 character source system code.

TRANSACTION_TYPE_C This code indicates the kind of load transaction used for this
record. There are 4 possible values: I (Insert), U (Update), D
(Delete ), R (Refresh).

A flag IS_CURRENT_YN can be added to views, but it will not be needed on the physical tables, as
this can be derived from high date for on EFFECTIVE_TO_TS.

Primary and Foreign Key Names


This standard applies to all views, in all layers. They also apply to all tables not in SAL.

The primary key standard is needed to support the automatic key matching in Tableau and other
tools.

Name Pattern Description

Primary Key <Logical Table Apply in all layers.


Name name>_I

Foreign Key <Logical Table The only exception is when a child table has the two
Name name>_I relationships to the parent table. In this case, the most
important relationship should have the same name. All other
keys will have a prefix.

Foreign <Logical This acts like a "foreign key" column that uses the nth unique
Unique TableName> key of a parent table. N ranges from 1 to 9. These key types
Column Unique <n> ID must also match easily in Tableau.
Name

Unique <Logical This is the nth unique key for this table. N ranges from 1 to 9.
Column TableName> This kind of column can also be called an Alternate Primary
Name Unique <n> ID Key.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 22 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

General Renaming Rules

These are global rules for all names, whether they are used for schemas, projections, attributes, etc.

Logical to Physical Name Mapping Automated Function


This describes the function that changes a logical name to physical name. The key idea is to minimise
all renaming, as this only introduces confusion, and forces users to lookup reference lists. The rules
are only applied when needed, which is not often. The default is NO change at all.

For Oracle, MaxNameLength = 30, and Netezza, MaxNameLength = 128. Restrict the length to
(MaxNameLength - 4) chars, so that 2 chars can be prefixed and 2 chars can be suffixed e.g. F_*_T
(for fact table).

1. If the name length is 26 or less, then use the name as is, after taking out illegal characters
e.g. "Unique GNAFs incl Proposed" -> "Unique GNAFs incl Proposed" or
UNIQUE_GNAFS_INCL_PROPOSED

2. If the deduplicated, disemvowelled name length is 26 or less, then de-duplicate and


disemvowel the name e.g. "ERP Project Template Parameters" -> "ERP Prjct Tmplt Prmtrs" or
ERP_PRJCT_TMPLT_PRMTRS

3. Else de-duplicate, disemvowel, truncate all words to a max of 4 letters and truncate whole
name <= 26 e.g. "ERP Actual Burdened Total Cost Amount Total" -> "ERP Actl Brdn Ttl Cst Amn" or
ERP_ACTL_BRDN_TTL_CST_AMN

Logical to Physical Name Definitions


Disemvowelling means removing vowels from a word. e.g. Assignment -> Assgnmnt. The first char is
always retained, regardless of whether is a vowel or not. All other vowels are removed.

De-duplication removes repeated consonants. E.g. Access -> ACCSS -> ACS.

Truncation means removing all but the first 4 letters E.g. Burdened -> BRDND -> BRDN.

Logical to Physical Name Mapping Manual Process


This applies where a manual approach is being used. More flexibility is available here, while retaining
the overall goal of not forcing the user to look up some dictionary of valid abbreviations. For
example, DESC is a standard abbreviation for DESCRIPTION. So "ACCESS TECHNOLOGY DESCRIPTION"
could be "ACCESS TECHNOLOGY DESC". However, please note that DESC could also be DESCENDING,
or some other thing. So this approach should be used for standard abbreviations only. The
automated function above would become "ACCSS TCHNLGY DSCRPTN".

However, only use this approach when the name is approaching the Max Name Length (e.g. 26 for
Oracle). For example, if the Logical name is Appointment, turning this into APNT may create
confusion, as this could also be Apparent, or many other words starting with Ap*n*t. Use this
approach wisely.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 23 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Time Standard
Time Quantum or Time Delta
The DW time quantum will be a second. That is, this will be the smallest unit of time in the data.

High and Low Dates


The high date is 31-12-2999 00:00:00.

The low date will be the start time for Epoch or UNIX time. That is: (UTC), 1 January 1970 or 01-01-
1970 00:00:00. However, in general, this low date should not be used. Instead, the low date of a row
should be the date that the row was first created in the source system. Only use low date when a
date is mandatory, and the data is missing from the source system.

Time Zone
This is EST or Australian Eastern Standard Time.

Table types and Additional Time keys


History Type Definition Additional Time Keys Example

Time Data that is true over a period of EFFECTIVE_FROM_TS, Normally an SCD2


Variant (aka time. Must contain valid from and EFFECTIVE_TO_TS Dim table.
Duration) valid to as part of primary key. This
information is almost always based
on load date, rather than on a
specific business defined event date.

Ephemeral Data that is true at a point in time. EFFECTIVE_AT_TS This occurs on


(aka Must contain valid timestamp as part transaction grain
Activity) of primary key. This information is Fact tables, as well
based on business information such as some source
as transaction data, or effective date. tables.
As a last resort, it can be based on
load date.

Invariant Data that is always true or data Not Applicable Equivalent to an


(aka No whose change is not of interest. Date SCD1 Dim table.
History) is not part of key.

Time periods must be non-overlapping


That is, the EFFECTIVE_TO_TS of the previous row = EFFECTIVE_FROM_TS of the next row – 1 TIME
QUANTUM. If there is no clear distinction between the timestamps, then a query can result in 2 rows
of data being true for the overlapping time period. This will cause great confusion for any analyst
querying the data, and impact the credibility of the DW. The rule is to modify the previous EFFECTIVE

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 24 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

_FROM_TS to the next EFFECTIVE _TO_TS – 1 TIME QUANTUM. This is a fundamental standard that
must be adhered to.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 25 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

Platform Constraints
Summary
Most of these standards are dictated by platform constraints. In many cases, this is obvious, as in the
set of Physical Projection Types is clearly dictated by the Oracle and Netezza DB engines. However,
there are some other standards that are more subtle. These are listed below.

Standard Platform Rationale

Time Oracle The minimum granularity of Oracle Logging is 1 second. Therefore,


Quantum Logging this is the finest/smallest discrete time unit available.

Upper Case Datastage This ETL tool does not support mapping to case sensitive object
only in Target names. So, for physical tables loaded by Datastage, they will be in
upper case. Note that this does not apply to the CDC tool.

Column count Oracle Oracle has a 1,000 column limit for tables and views.
limit

Column size Oracle Oracle has a 4K column size limit, for VARCHAR.
limit

Row size limit Oracle Also known as record size. No limit.

Column size Netezza Netezza has a 64K column size limit.


limit

Row size limit Netezza Netezza has a 65,535K total row size table limit. This is the sum of the
column lengths in a row.

Column count Netezza Netezza have a 1,600 column limit for tables and views.
limit

Oracle
The following list of rules applies to both quoted and nonquoted Schema Object Name identifiers
unless otherwise indicated:

1. Names must be from 1 to 30 bytes long.

2. Nonquoted identifiers cannot be Oracle Database reserved words. Quoted identifiers can be
reserved words, although this is not recommended.

3. The Oracle has many pseudo reserved words with special meanings, such as DIMENSION,
SEGMENT, ALLOCATE, DISABLE, and so forth. These words are not reserved, but as Oracle uses them
internally in specific ways, this may lead to unpredictable results.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 26 of 27


Matthew Lawler lawlermj1@gmail.com Datawarehouse Names

4. Use ASCII characters in names.

5. Nonquoted identifiers must begin with an alphabetic character from your database character set.
Quoted identifiers can begin with any character.

6. Nonquoted identifiers can contain only alphanumeric characters from your database character set
and the underscore (_), dollar sign ($), and pound sign (#). Oracle strongly discourages you from
using $ and # in nonquoted identifiers.

7. Within a namespace, no two objects can have the same name. The following schema objects
share one namespace: Tables, Views, Sequences, Private synonyms, Stand-alone procedures, Stand-
alone stored functions, Packages, Materialized views and User-defined types. Each of the following
schema objects has its own namespace: Indexes, Constraints, Clusters, Database triggers, Private
database links, Dimensions. Because tables and views are in the same namespace, a table and a view
in the same schema cannot have the same name. However, tables and indexes are in different
namespaces. Therefore, a table and an index in the same schema can have the same name.

Each of the following nonschema objects also has its own namespace: User roles, Public synonyms,
Public database links, Tablespaces, Profiles, Parameter files (PFILEs) and server parameter files
(SPFILEs). Because the objects in these namespaces are not contained in schemas, these namespaces
span the entire database.

8. Nonquoted identifiers are not case sensitive. Oracle interprets them as uppercase. Quoted
identifiers are case sensitive.

9. Columns in the same table or view cannot have the same name. However, columns in different
tables or views can have the same name.

10. Procedures or functions contained in the same package can have the same name, if their
arguments are not of the same number and datatypes. Creating multiple procedures or functions
with the same name in the same package with different arguments is called overloading the
procedure or function.

Netezza
Netezza objects include tables, views, and columns. Follow these naming conventions:

1. A name must be from 1 to 128 characters long.

2. A name must begin with a letter (A through Z), diacritic marks, or non-Latin characters (200-377
octal).

3. A name cannot begin with an underscore (_). Leading underscores are reserved for system
objects.

4. Names without quotes are not case sensitive. For example, CUSTOMER and Customer are the
same, but object names are converted to lowercase when they are stored in the Netezza database.
However, if a name is enclosed in quotation marks, then it is case sensitive.

For optimal Netezza performance, the distributions keys should be converted to integer data types.

D:\D\Documents\DW Me\0 Publish\DW Names.docx February 13, 2018 27 of 27

You might also like