03-1 DWH Data Warehouse - Time Dimension

3-1 Data Warehouse The Time Dimension
Data Warehousing Spring Semester 2011
R. Marti
The Data Warehouse in the DWh Reference Architecture

Reports
Data Warehousing
Source Database
Data Mart
Interactive Analysis
Source Database
Data Warehouse
Dashboards
Source Database
Data Mart
Focus
Architectural options and variations in data warehouse projects Design of the single integrated data warehouse, in particular - how to model temporal aspects - how to ensure common dimensions (=> Master Data Management)
R. Marti 3-1 DWh 2011: Data Warehouse
Master Data
2
Recap: Time in Classical Data Mart Designs (1)
R. Marti
3-1 DWh 2011: Data Warehouse
Page 3
Recap: Time in Classical Data Mart Designs (2)

Rows in fact tables are associated with a specific time by the foreign key reference to the time dimension, indicating as of when they are valid. However, rows in dimension tables are not associated with a time!
- new rows (rows with an unknown source system identifier) are simply added - usually, no rows are deleted from a dimension table, even if rows with known source system identifiers are missing in a batch upload: . existing (old) facts still refer to objects corresponding to these missing rows . if sources do not send explicit information on deletions, it is unclear whether the corresponding objects have effectively become invalid or not (Note: Sending this information might mean re-designing the source system!) - changes in values of dimension rows with known source system identifiers are . either simply overwritten, . or a new row with a new surrogate (but the old source system id) is added (see topic slowly changing dimensions)
R. Marti 3-1 DWh 2011: Data Warehouse 4
Temporal Database Systems + Languages

For some types of analysis, dimensions should also be historized, especially for comparisons of measures across different time periods. Example: How did buying habits of customers change over the last few years, grouped by where they live. History of addresses of customers should also be kept!
Since 1980, a lot of research has been conducted in temporal data models, temporal query languages, and temporal database systems. Generic support for temporal data is beginning to emerge in products: Teradata Database 13.10, IBM DB2 V10, Oracle
Notions of Time
Valid Time is the time during which a fact in the real world was, is, or will be true or, more precisely: was / is believed to be true or believed to become true. Note: This time is determined by the user. Sometimes also called effective time, as of time or business time.
Transaction Time is the time during which a fact in the real world was or is (rightly or wrongly) stored in the database. Note: This time is determined by the system (unless the user decides to delay entering the data, of course ... ) . Sometimes also called system time.
Example of an announcement made (and stored in a DB there and then) on October 1 2010 (= transaction time): David Cole will be Chief Risk Officer as of March 1 2011 (= valid time).
Associating Time with Data

Assumption: For each relation, a clock with a given temporal granularity is specied, e.g., a day, a second, or a millisecond." Conceptually, the extension of a temporal relation R can then be viewed as a sequence of snapshot relations Rt = t(R) for every time point t of this clock." time attributes snapshot at time t tuples
t is called snapshot operator (sometimes also timeslice operator)"

Benefits and Pitfalls of Sequence of Snapshots Model

Good for theoretical considerations, in particular determining equivalence of different temporal representations gauging the expressive power of temporal query languages
May be impractical as an implementation model, given that it may require lots of space, especially when granularity of time is fine-grained (minutes, seconds, milliseconds, ... ) represented facts do not change often, i.e. stay the same over a longer interval (usually because they describe states rather than events)
R. Marti
From Sequence of Snapshots Model to Time Intervals

Remedy: Dont store data that did not change since the previous clock tick again Collect identical snapshots of suitable smaller parts of a relation (e.g., tuples or attribute values) and associate them with time intervals rather than time points
Alternatives: (1) (2) associate temporal intervals with every tuple associate temporal intervals with every attribute value (but the 2nd approach requires complex attributes, violating 1NF)
3-1 DWh 2011: Data Warehouse 9
R. Marti
Valid Time Relations capturing State

Conceptually, every tuple which captures a state is timestamped with a time interval [tfrom, tto] indicating the validity of the (non-temporal) data represented in the tuple
Remarks: Transformation into 1NF by replacing V_INTERVAL by V_FROM (valid from) and V_TO (valid to) The symbol ? means unknown, until now or until further notice. In standard SQL, it is usually represented by null or by the date 9999-12-31, both of which are not entirely satisfactory ...
Typical Queries (1): Snapshot of Valid Time Relation

Snapshots of the previous valid time relation:
Remarks: We assume that ID is the primary key at every point in time (in every snapshot). Producing a snapshot from a valid time relation is a simple selection in rel. algebra: select ID, NAME, FNAME, ADDR, SAL from EMP where :t in V_INTERVAL (or: where :t between V_FROM and V_TO )
Valid Time Relations capturing Recurring States

A specific state of affairs can recur several times ( several time periods)
transformation to 1NF
The first two tuples are called value equivalent since they have the same values in all attributes except the temporal attributes V_FROM and V_TO.
Options in the Representation of Time

Canonical representation using maximal time intervals (as on previous slide):
One (of many) possible alternative representations using two (non-maximal) contiguous intervals (assuming a temporal granularity of a day):
R. Marti
13
Issues with non-canonical Representations

Non-canonical representations may lead to incorrect answers:
Example Query: Who left the company before 2008-01-01 and when? select ID, NAME, FNAME, V_TO from EMP where V_TO < date '2008-01-01' (Incorrect) Result:
R. Marti
14
Avoiding non-canonical Representations: By Design

Ensure that intervals remain maximal when inserting or updating: Let R be a valid time relation in canonical form (i.e., with maximal time intervals) n be a new valid time tuple to be inserted into the relation R x1, ... , xn (n 0) be all existing valid time tuple in relation R which are value equivalent to x (cf. p. 12) Then, for all i, 0 i n, the following must hold (in pseudo-SQL notation): not exists ( value equivalence select * from R xi intervals do not touch or overlap where xi = n and (n.V_FROM - 1 between xi.V_FROM and xi.V_TO or n.V_TO + 1 between xi.V_FROM and xi.V_TO) )
(This could be specified as declarative check constraint if implementation supported it )
Typical Queries (2): Temporal Projection

Unfortunately, (intermediate) query results may be non-canonical, even if applied to a canonical representation:
Example: Where did employees live and when (irrespective of salary)? select ID, NAME, FNAME, ADDR, V_FROM, V_TO from EMP Result:
R. Marti
16
Avoiding non-canonical Representations: By Coalescing

Non-canonical representations can be transformed into the canonical representation by an operation called temporal coalescing which maximizes the length of all intervals by coalescing adjacent and overlapping intervals of value-equivalent tuples.
Coalesced form:
R. Marti
17
Temporal Coalescing in (Pseudo-) SQL

with recursive Rclos as ( -- initial ("anchor") query select R.values, R.V_FROM, R.V_TO from R union -- recursive query: executed until no new data generated select R.values, R.V_FROM, Rclos.V_TO from R, Rclos where Rclos.values = R.values and Rclos.V_FROM >= R.V_FROM and Rclos.V_FROM-1 <= R.V_TO ) select Rclos.values, Rclos.V_FROM, Rclos.V_TO from Rclos where not exists ( select * from R more efficient where R.values = Rclos.values implementation and ( R.V_FROM < Rclos.V_FROM uses window or R.V_TO > Rclos.V_TO ) functions (see [Zhou et al 2006]) )
Typical Queries (3): Temporal Join

Sometimes, the history of information stored in two relations is of interest:
Example: Who worked on which projects and when? Result:
R. Marti
19
Temporal Join in SQL (without temporal coalescing!)

Construct time intervals of result by intersecting time intervals of operands (and keeping rows with non-empty intervals): select * from ( select w.PROJ_ID, w.EMP_ID, e.NAME, e.FNAME, case when e.V_FROM > w.V_FROM then e.V_FROM else w.V_FROM end as V_FROM, case when e.V_TO < w.V_TO then e.V_TO else w.V_TO end as V_TO from WORKS_ON w, EMP e where e.ID = w.EMP_ID ) where V_FROM <= V_TO
Note: This gets more tedious when (temporally) joining 3 or more relations
Proposals for Temporal Support in SQL

There are proposals to hide this (and more, see following slides) temporal complexity in SQL, e.g., the SQL/Temporal part of a future SQL3 standard. A temporal join (including temporal coalescing) would look as follows: validtime select w.PROJ_ID, w.EMP_ID, e.NAME, e.FNAME, from WORKS_ON w, EMP e where e.ID = w.EMP_ID see e.g. [Snodgrass 1999] Richard T. Snodgrass: Developing Time-Oriented Database Applications. Morgan Kaufmann, 1999.
Note: This publication is out of print, but available electronically as pdf a http://www.cs.arizona.edu/people/rts/publications.html
DB2 10 for z/OS and Teradata Database V13.10 support most of the SQL/ Temporal proposal.
Transaction Time Relations

Note that transaction time should be automatically determined by the system at insert/update/delete time (or, more precise, commit time), not by the user; granularity is typically as fine as possible Transaction time can be represented exactly like valid time, by associating a time interval with tuples.
Example: Transaction time history of employee 676 (also see slide 10)" "1. 2006-07-01: insert 676 lives in Baar und earns 7000.
"2. 2008-04-01: update 676 lives in Bern.
"3. 2009-11-01: update 676 earns 7500."
R. Marti
22
Using DBMS Logging to capture Transaction Time

Since transaction time can be automatically determined by the system, the DBMS logging facilities can be used. This is/was done e.g. in Postgres/PostgreSQL/Illustra (and in Oracle). Example: Transaction time history of employee 676 (see slide 15)"
"1. 2006-07-01: insert 676 lives in Baar and earns 7000.
"2. 2008-04-01: update 676 lives in Bern.
"3. 2009-11-01: update 676 earns 7500.
Normal (snapshot) table containing current contents.
Undo log table containing changes to produce previous contents of associated snaphsot table (before images).
Implementing Logging Using Triggers

create or replace trigger TR_AU_EMP after update on EMP for each row declare l_log EMP_UNDO_LOG%rowtype;
written in Oracle PL/SQL
begin l_log.X_TIME := current_timestamp; l_log.UNDO_OP_CODE := 'update'; l_log.ID := :old.ID; l_log.NAME := :old.NAME; l_log.FNAME := :old.FNAME; l_log.ADDR := :old.ADDR; l_log.SAL := :old.SAL; insert into EMP_UNDO_LOG values l_log; end TR_AU_EMP; /
should probably check that ID has not changed and raise an application error if this were the case
similar triggers required for inserts and deletes

24
R. Marti
Bitemporal Relations
Valid time and transaction time can be combined to allow for a complete history of what information was/is believed to be true and when this was stored in the database.
Example: Complete (bitemporal) history of employee 676" "1. 2006-07-01: insert 676 lives in Baar and earns 7000 as of 2006-08-01.

R. Marti
25
Bitemporal Relations (2)
Example (continued): Complete (bi-temporal) history of employee 676" "2. 2008-04-01: update 676 lives in Bern as of 2008-03-01.

R. Marti
26
Example (continued): Complete (bi-temporal) history of employee 676" "3. 2009-11-01: update 676 earns 7500 as of 2010-01-01.

R. Marti
27
Example (continued): Complete (bi-temporal) history of employee 676" "4. 2009-11-11: update correction: 676 earns 7700 as of 2010-01-01.

R. Marti
28
Design of Temporal Databases

Basic idea Do non-temporal database design Annotate which tables / attributes need to be historized (especially valid time) and how (state-based vs. event-based) Generate temporal data structures ... but how? Questions: Entity integrity (implemented by primary keys) temporal entity integrity Referential integrity (implemented by foreign keys) temporal referential integrity Arbiter: sequence of snapshots model
Temporal Entity Integrity (1)

Temporal entity integrity = for every snapshot, entity integrity should hold. Pro memoria: - primary keys should consist of a minimal number of attributes which unqiuely identify each tuple - these attributes should ideally not change over time Options for the primary key of a valid time relation (e.g. for table EMP) (1) ID, V_FROM (2) ID, V_TO (3) ID, V_FROM, V_TO (non-minimal primary key!) (4) ID, SEQ_NO (where SEQ_NO is a sequence number or counter) Since all attributes except ID (and SEQ_NO) can change over the lifetime of the identified tuple - alternative (4) is probably the best, - followed by alternative (1) as V_FROM only changes in case of an error (and should not be referenced by foreign keys, as well see)
Temporal Entity Integrity (2)

In addition, it might be desirable to enforce other constraints, including Time intervals must not be empty Time intervals should be maximal (unless e.g. queries like what was the case before or after a specific point in time are not of importance)
create table EMP ( ID integer not null, SEQ_NO integer not null, NAME varchar(20) not null, ... V_FROM date not null, V_TO date default date '9999-12-31', primary key (ID, SEQ_NO), check ( V_FROM <= V_TO ), check ( not exists ( select * from EMP other where other.ID = ID and other.NAME = NAME and ... and ( other.V_FROM between V_FROM-1 and V_TO+1 or other.V_TO between V_FROM-1 and V_TO+1 ) ) ) )
Referential Integrity between Snapshot Relations

The foreign key (FK) attribute value(s) in the referencing relation must exist as primary key (PK) values in the referenced relation: Example: Works_On[Emp_Id] Emp[Id]
Note: In relational theory, this is sometimes also called an inclusion dependency.
R. Marti
32
Temporal Referential Integrity (1)

Temporal referential integrity = for every snapshot, referential integrity must hold. Problem: - primary keys now have a temporal part (on top of the non-temporal part) - valid time periods in the foreign key (referencing) relation are not necessarily the same as those of the primary key (referenced) relation
At every point in time when the FK value was valid, the referenced PK value must be valid. t ( t(Works_On[Emp_Id]) t (Emp[Id]) )
R. Marti
33

t ( t(Works_On[Emp_Id]) t (Emp[Id]) ) holds for employee 676 because projection followed by temporal coalescing would result in:
Of course, performing temporal coalescing for - adding tuples to and/or extending time intervals of the referencing relation - deleting tuples from and/or shrinking time intervals in the referenced relation would be an expensive proposition Recommendation: Track complete lifetimes of objects in a separate relation

Split valid time relation on referenced (PK) side into an object relation and a property relation. Add a referential integrity constraint from property relation to object relation. Re-route non-temporal referential integrity constraints from other relations to the object relation.
R. Marti
35

In referencing relations, it might be desirable to enforce referential integrity non-temporal part: as usual temporal part: time interval contained in time interval of referenced object
create table WORKS_ON ( EMP_ID integer not null, PROJ_ID integer not null, SEQ_NO integer not null, V_FROM date not null, V_TO date default date '9999-12-31', primary key (EMP_ID, PROJ_ID, SEQ_NO), check ( V_FROM <= V_TO ), foreign key (EMP_ID) references EMP_OBJ(ID), check ( exists ( select * from EMP_OBJ ref where ref.ID = EMP_ID and ref.V_FROM <= V_FROM and ref.V_TO >= V_TO ) ) ... -- e.g. temporal FK to a table PROJ_OBJ )
Temporal Normalization (1): Time-invariant Attributes

Assume that attribute FName cannot change over the lifetime of an Emp (except to correct mistakes). In other words, the functional dependency (FD) Id FName holds relation Emp_Prop below is not in 2NF (attribute depends on part of PK) relation Emp_Prop exhibits update anomalies when having to fix a mistake in Sues first name (e.g. change to Susan)
R. Marti
37
Temporal Normalization (2): Time-invariant Attributes

Recommendation: Consider moving time-invariant attributes (e.g. FName) from the property relation (e.g. Emp_Prop) to the object relation (e.g. Emp_Obj). In Emp_Obj, the FD Id FName still holds (and is enforced by the PK), so the relation does not exhibit update anomalies. In Emp_Prop, all attributes are now fully dependent on the PK
but there is still an issue ...
R. Marti
38
Temporal Normalization (3): Asynchronous Changes

Example: After having inserted the salary raise to employe 676 as of beginning of 2010, we learn that she actually moved to Aarau as of Dev 1 2009. update anomaly: several tuples need to be changed (in addition to an insert)!
Recommendation: Attributes whose values change independently of other attributes should be put into different relations (somewhat like achieving 4NF in the face of multi-valued dependencies).
Temporal Normalization (4): Asynchronous Changes

Example: Since address and salary of an employee may change independently (and asynchronuously), these attributes should be put into different relations. no update anomaly: one tuple needs to be changed (in addition to an insert)!
Employee salaries remain untouched:
R. Marti
40
Summary of Design Recommendations
ollowing Remember: F lunch! them is no free
For kernel entity types (with objects whose existence is independent of other entities), consider the introduction of an object relation to capture the lifetime of these objects main benefits: - referential integrity checking over time - home for time-invariant attributes For relations representing object properties (or relationships between objects) and their history, consider choosing a temporal primary key consisting of the non-temporal primary key attributes plus a (meaningless) sequence number. For relations representing object properties (or relationships between objects), consider decomposing them into groups of attributes which - are either time-invariant this attribute group is moved to the object relation - or change independently of one another (i.e., potentially at different times) each such attribute group is moved into a separate relation keeping track of the history of the values
Return to (Valid) Time in Warehousing

TIME
Motivating Example Compare profits over the years - grouped by business divisions - grouped by client ratings What happens if, over time, - business divisions change (e.g. profit centers are shifted)? - ratings of clients change? - two clients merge (e.g., primary insurers in the reinsurance business)?
PRODUCT
PROD_ID
POLICY_PTF
<ForeignKeys> PREMIUM_AMT LOSS_AMT EXPENSE_AMT PROFIT_AMT
PROF_CENTER
PC_ID PC_NAME DIV_ID DIV_NAME
CLIENT
CL_ID CL_NAME CL_RATING
Slide 42
First impressions
profit [CHF] measure
+24% X -40% Y
+80% Z
time
2009
2010
dimensional values (e.g., names of business divisions)

R. Marti 3-1 DWh 2011: Data Warehouse Slide 43
First impressions can be deceiving

profit [CHF]
X1 X1
X2 X3 Y1 Y2 S
+24%
X2 X3 Y1
-40%
+0%
Y2 S
Profit Center Shift
Z1 Z2
Z1 Z2
+80%
+11%
time
2009
2010
R. Marti
Slide 44
Terminology and Concepts: Dimensional Hierarchies

Dimensions often have a hierarchical structure, e.g., in previous example: Product: hierarchical LineOfBusiness
All Lines
P&C Lines Special Lines
L&H Lines
Property
Casualty
Life
Health
ProfitCenter: embedded in hierarchical org structure ProfitCenter Division Group Client: hierarchical groupings possble, e.g., grouping by country continent,
Coping with Business Change

successful completion of business transaction captured measures refer to dimensional structures valid at this time changes to referenced dimensional structures
tCapture[1]
tCapture[2]
tReport time
report production which dimensional structure should reported measures refer to? original structures valid at respective capture times (tCapture[i])? need history + valid times need succession mapping structures valid at report time (tReport)? other times?
Running Example
Country
CountryId CountryName
Population
changes CountryId Year
Year
dimension measure
R. Marti
Slide 47
Changes to Dimensional Structures

Type 1 add 2 rename 3 invalidate
A A A B B
Image
A A A B C
Description New value added Old value (name) will be replaced by new value A value will not any longer be available for new contracts n old values will be merged into one value
A2
Key Questions
Succession Mapping
4 merge 5 split 6 move
A1 A2 A A B C D B
A A1 A C D
Old value will be divided into n values One value changes position in hierarchy
Taxonomic Relationship
R. Marti
Slide 48
Examples of Changes to Dimensional Structures

invalidate add
merge
split
rename
adapted from Temporal Data Warehousing: Business Cases and Solutions, J. Eder et al.
Issues: History, Validity and Succession of Values

Dimensional values to be tracked over time must have
1
a unique, invariant, not-to-be-reused identifier for the concept that the value represents
e.g. an identifier for the country first named Zaire and later Kongo
a validity period indicating the overall lifetime of the concept which the value represents
e.g. the lifetime of the country first named Zaire and later Kongo
validity periods indicating the lifetime of the values used to represent the concept
e.g. the lifetimes of the names Zaire and Kongo
invalid dimensional values must have another dimensional value as successor

e.g., East Germany is succeeded by Germany
R. Marti
Slide 50
Unique Identifier
DB2 Colloquium 2006-10-25
R. Marti
Slide 51
Succession of Dimensional Values
R. Marti
Slide 52

Step 1: Find countries which have a successor
Step 2: Aggregate if two or more countries have the same successor
R. Marti
Slide 53
Step 3: Reassemble parts
R. Marti
Slide 54
SQL Statement to do all 3 steps SELECT COALESCE(s.CurrId, p.CountryId) AS CountryId , p.Year , SUM(p.Population) AS Population FROM CountryPopulation p LEFT OUTER JOIN CountrySuccession s ON s.Id = p.CountryId GROUP BY p.CountryId, p.Year
R. Marti
Slide 55
Side Issue: Difficulties with the Split Operation

Example measures population and GNP (gross national product) have been collected for Czechoslovakia up to 1992 as of 1993, the same measures are collected for Czech and Slovakia Possible solutions after 1993, keep Czechoslovakia and compute its population and GNP figures by summing the figures of Czech and Slovakia before 1992, compute approximate percentages of the population and GNP figures from Czechoslovakia for Czech and Slovakia
note: in general, the precentages of the various measures are not identical
leave countries as is and perform no mapping in either direction
R. Marti
Slide 56
Handling Splits (Sketch)
Step 1: Aggregate over Taxonomy
Step 2: Extrapolate
R. Marti
Slide 57
Lifecycle of Concepts
Move Start of validity
Superseded
Active
define successor
Inactive
Introduction as Inactive
Move
Active can be used to book new business and appear on reports Inactive can appear on reports but cannot be used to book new business Superseded cannot appear on reports nor be used to book new business
3-1 DWh 2011: Data Warehouse Slide 58
R. Marti
Validity (Lifetime) of Concepts
R. Marti
Slide 59
Validity (Lifetime) of Names of Concepts
DB2 Colloquium 2006-10-25 R. Marti 3-1 DWh 2011: Data Warehouse Slide 60
Modified Star Schema Design

CountryNames
CountryId VTimBeg VTimEnd CountryName
Principle Add valid times in dimensions in the Data Warehouse using - an object table (Country) - a single property table (here: CountryNames) both with an associated valid time interval. Let foreign keys in fact tables refer to the unchanging ID in object tables. Generate standard Data Marts from this data model as needed, most often a history of measure according to the current dimensional structure.
3-1 DWh 2011: Data Warehouse Slide 61
Country
CountryId VTimBeg VTimEnd
CountrySuccession
Id -- original identifier SuccId -- direct successor CurrId -- ultimate successor
Population
CountryId Year
Year
R. Marti
Coping with a Distributed Environment (Teaser)

Note: Of course, in a global enterprise, all of this all happens in a distributed environment
Analytical Data Stores
Integration Data Stores History Stores (DWh) Exchange Stores (ODS)
Transactional Data Stores additional identifiers measures tied to ref data
e.g., Claims and Underwriting Systems
Master Data Stores identifiers dimensional attributes
e.g., MDM, CRM, ForEx, Geo DB
Flow of Master Data (e.g. Dimension Attributes + Values)

Flow of Transactional Data

Slide 62
Kimballs Types of Slowly Changing Dimensions

Ralph Kimball proposed 3 (well actually 2 only) poor mans solutions to the historization of dimensions slowly changing dimensions (SCD) in the context of the Star Schema. SCD Type 1: no history of the dimensional attribute is needed simply overwrite the value
e.g. the correction of mistakes in names, birthdays etc.
SCD Type 2: versions of some dimensional attributes are needed store new records in the dimension table, with a new DWh identifier (ID), the existing stable source system ID, and the new (changed) values
e.g. a change in the rating of a client, or the new business division a profit center belongs to
SCD Type 3: current and original (or previous) versions are needed introduce a current and original attribute in the dimension table
e.g. the current rating and the original rating of each client
Slowly Changing Dimensions Type 1

Pros Simple to understand for business users and simple to implement (especially when using MOLAP tools) Requires the least space and has the best response time
Conses Simplicity for business users is deceiving A change in a dimensional attribute effectively changes the context for all facts captured prior to the change
R. Marti
Slide 64

Pros Reasonably understandable and simple to implement (regardless of MOLAP / ROLAP tool) Captures parts of the history Conses The time of a change in a dimension is not captured Requires more space since a single dimensional object is possibly represented in several rows (but this is usually not an issue) Can be confusing since changed dimensional data objects appear more than once, with identical source system IDs, but at least one changed attribute value Checking when it is ok to refer to which DWh IDs is not possible

Pros Reasonably simple to implement (regardless of MOLAP / ROLAP tool) Captures parts of the history Conses Can only have 2 versions of any attribute (usually original and current) Each historized attribute A must be represented by 2 or 3 attributes (namely, A_Original, A_Current and possibly A_Valid_From) This requires more space This can also be confusing to some users Checking when it is ok to refer to which DWh IDs is not possible
Literature
General Temporal Database Concepts [Snodgrass 1999] Richard T. Snodgrass: Developing Time-Oriented Database Applications. Morgan Kaufmann, 1999. (see http://www.cs.arizona.edu/people/rts/publications.html) [Zhou et al 2006] Xin Zhou, Fusheng Wang, Carlo Zaniolo: Efficient Temporal Coalescing Query Support in Relational Database Systems. Proc. 17th International Conference on Database and Expert Systems Applications - DEXA '06, 2006. [Johnston & Weis 2010] Tom Johnston, Randall Weis: Managing Time in Relational Databases: How to Design, Update and Query Temporal Data. Morgan Kaufmann, 2010.
Data Warehouse Design [Kimball & Ross 2002] Ralph Kimball, Margy Ross: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition. John Wiley, 2002. [Imhoff et al 2003] Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger: Mastering Data Warehouse Design: Relational and Dimensional Techniques. John Wiley, 2003. [Golfarelli & Rizzi 2009] Matteo Golfarelli, Stefano Rizzi: Data Warehouse Design: Modern Principles and Methodologies. McGraw Hill, 2009. [Adamson 2010] Christopher Adamson: Star Schema: The Complete Reference. McGraw Hill, 2010.

03-1 DWH Data Warehouse - Time Dimension

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03-1 DWH Data Warehouse - Time Dimension

Uploaded by

Copyright:

Available Formats

3-1 Data Warehouse The Time Dimension

Data Warehousing Spring Semester 2011

The Data Warehouse in the DWh Reference Architecture

Recap: Time in Classical Data Mart Designs (1)

3-1 DWh 2011: Data Warehouse

Recap: Time in Classical Data Mart Designs (2)

Temporal Database Systems + Languages

Associating Time with Data

t is called snapshot operator (sometimes also timeslice operator)"

Benefits and Pitfalls of Sequence of Snapshots Model

3-1 DWh 2011: Data Warehouse

From Sequence of Snapshots Model to Time Intervals

Valid Time Relations capturing State

Typical Queries (1): Snapshot of Valid Time Relation

Valid Time Relations capturing Recurring States

Options in the Representation of Time

3-1 DWh 2011: Data Warehouse

Issues with non-canonical Representations

3-1 DWh 2011: Data Warehouse

Avoiding non-canonical Representations: By Design

Typical Queries (2): Temporal Projection

3-1 DWh 2011: Data Warehouse

Avoiding non-canonical Representations: By Coalescing

3-1 DWh 2011: Data Warehouse

Temporal Coalescing in (Pseudo-) SQL

Typical Queries (3): Temporal Join

Example: Who worked on which projects and when? Result:

3-1 DWh 2011: Data Warehouse

Temporal Join in SQL (without temporal coalescing!)

Proposals for Temporal Support in SQL

Transaction Time Relations

3-1 DWh 2011: Data Warehouse

Using DBMS Logging to capture Transaction Time

Implementing Logging Using Triggers

written in Oracle PL/SQL

similar triggers required for inserts and deletes

3-1 DWh 2011: Data Warehouse

3-1 DWh 2011: Data Warehouse

Bitemporal Relations (2)

3-1 DWh 2011: Data Warehouse

Bitemporal Relations (3)

3-1 DWh 2011: Data Warehouse

Bitemporal Relations (4)

3-1 DWh 2011: Data Warehouse

Design of Temporal Databases

Temporal Entity Integrity (1)

Temporal Entity Integrity (2)

Referential Integrity between Snapshot Relations

3-1 DWh 2011: Data Warehouse

Temporal Referential Integrity (1)

3-1 DWh 2011: Data Warehouse

Temporal Referential Integrity (2)

Temporal Referential Integrity (3)

3-1 DWh 2011: Data Warehouse

Temporal Referential Integrity (4)

Temporal Normalization (1): Time-invariant Attributes

3-1 DWh 2011: Data Warehouse

Temporal Normalization (2): Time-invariant Attributes

3-1 DWh 2011: Data Warehouse

Temporal Normalization (3): Asynchronous Changes

Temporal Normalization (4): Asynchronous Changes

Employee salaries remain untouched: