Professional Documents
Culture Documents
ETL ARCHITECTURE
IN DEPTH
DATE
LOCATION
INSTRUCTORS
INFORMATION AND REGISTRATION
Organized by
10 13 June 2013
Amsterdam
Ralph Kimball and Bob Becker
www.Q4K.com
10 13 June 2013
RALPH KIMBALL
Ralph Kimball is known worldwide as an innovator,
writer, educator, speaker and consultant in the field
of data warehousing. He has remained steadfast in
his long-term conviction that data warehouses must be
designed to be understandable and fast. His books
on dimensional design techniques have become the
all time best sellers in data warehousing. His books
include The Data Warehouse Toolkit (2nd Edition),
The Data Warehouse Lifecycle Toolkit (2nd Edition),
The Data Webhouse Toolkit, The Data Warehouse ETL
Toolkit, The Microsoft Data Warehouse Toolkit (2nd
Edition) and The Kimball Group Reader. To date Ralph
has written more than 100 articles and columns for
Intelligent Enterprise and its predecessors, winning the
Readers Choice Award five years in a row.
After receiving a Ph.D. in 1972 from Stanford in
electrical engineering (specializing in man-machine
systems), Ralph joined the Xerox Palo Alto Research
BOB BECKER
Bob Becker has worked with business managers and IT
professionals to prioritize, justify and implement largescale decision support and data warehousing systems
since 1990. Regardless of the industry, he is highly
skilled at identifying business requirements, facilitating
organizational consensus and designing dimensional
data models. Bob leverages these consulting
experiences when teaching the Kimball University on-site
courses. He co-authored The Data Warehouse Lifecycle
Toolkit (2nd Edition) and The Kimball Group Reader.
WWW.Q4K.COM
10 13 June 2013
Business needs
Compliance
Data profiling
Security
Integration needs
Latency (daily, hourly, seconds, instantaneous)
Archiving (recent history, very long term)
Lineage and impact
User profiles (developers, business users, analysts)
Existing IT skills (traditional EDW, new Big Data systems)
Existing technology licenses
Hand coding vs. ETL tool choice
Class roundtable exercise: Challenges in students environments
Extract Steps: Bringing the Data to the Back Room
Data types used in ETL systems
(1) Data profiling
Source to target map
Access methods, source types (including new Big Data)
Software, techniques
(2) Change data capture
(3) Extract window
(3) Immediate transformations
(3) Extract staging table designs, table types, retention, backup
(3) Technical extraction tips
(3) Traditional mainframe sources
(3) XML sources and persistence of structures in back room
ERP system sources
Example vendors: Microsoft SSIS, Pentaho Kettle
Service oriented architectures, WSDL, and SOAP
Big data sources
(22) Job scheduler
(22) Exception handling architecture
(23) Back up
Short term and long term recovery, archiving, sunsetting
(24) Recovery, (24) Restart
Architecture of Data Cleansing
(4) Data quality architecture
(4) Data quality screens (column, structure, and business rule)
(4) Business rule screens from statistical forecasts
(4) Column and structure screens from data profiling
(5) Error event fact tables
Fact table surrogate keys
(6) Building the audit dimension and exposing in BI tool
(4, 5, 6) Implementing data quality architecture in agile environment
(7) De-duplication and survivorship
Real Time Data Warehousing
Hot partition
Streaming versus batch ETL
Streaming delivery, query, reporting, dashboards, notifications
Enterprise application integration (EAI) architecture
Micro-batch ETL (MBETL) architecture
Enterprise information integration (EII) architecture
Big Data Predictive Analytics
Big Data use cases
Four Vs: volume, variety, velocity, value
MapReduce, Hadoop, Pig, Hive, Hbase
When to export to conventional RDBMS
Architecture of Data Integration
(8) Conforming dimensions, definition, impact on BI
(8) Centralized and distributed responsibilities using conformed dimensions
(8) Implementing conformed dimensions in agile environment
(8) Example vendors: Pentaho, Microsoft SSIS, Informatica, Zend Studio
(28) Sorting
(25) Version control
(26) System and version migration, testing and regression
WWW.Q4K.COM
(27)
(27)
(23)
(29)
(30)
Workflow monitor
Example vendors: Microsoft SSIS, IB Tivoli, Informatica
Job scheduler
Lineage and dependency analyzer
Problem escalation system
10 13 June 2013
Registration fee
The fee for this 4-day course is EUR 2.695,00 per person. This includes four days of
instruction, lunch and morning/afternoon snacks, course materials and a KU Certificate
of Completion. Students receive a copy of The Data Warehouse ETL Toolkit.
We offer the following discounts. Discounts cannot be combined.
10% Early Bird discount for students registering before 19 April 2013. Payment must
be received before the cut off date to receive the discount.
10% discount for groups of 3 or more students from the same company registering
at the same time.
20% discount for groups of 5 or more students from the same company registering
at the same time. Register 5 students, only pay for 4.
Note: Groups that register at a discounted rate must retain the minimum group size or the
discount will be revoked.
WWW.Q4K.COM
Venue
Steigenberger Airport Hotel Amsterdam is
located on the outskirts of the Amsterdam
Forest. Only 7 minutes from Schiphol
airport and a few minutes away from the
highway. The hotel underground car park
can accommodate up to 266 vehicles
and the hotel offers a free shuttle bus
service from and to the airport.
Steigenberger Airport Hotel Amsterdam
Stationsplein ZW 951
1117 CE Amsterdam Schiphol-Oost
The Netherlands
T: +31 20 5400-777
F: +31 20 5400-700
E: airporthotel-amsterdam@steigenberger.nl
W: www.steigenberger.com
10 13 June 2013
10 13 June 2013
Amsterdam
Company Details
Company Name:
E-mail:
Contact Name:
Telephone:
Address:
Fax:
Postal Code:
Website:
City:
Invoice Address:
Country:
Postal Address:
VAT Number:
Purchase Order no.:
Student Details
First Name:
Gender:
Last Name:
E-Mail:
Job Title:
Telephone:
Authorization
Registration Information
Name:
Job Title:
Date:
Signature:
WWW.Q4K.COM
Male
Female
10 13 June 2013
Organized by
Kimball University
Kimball University (KU) is the definitive source for dimensional
data warehouse education. KU provides the highest quality
and most practical education consistent with KU instructors
books and extensive experience in the dimensional
approach. Youll learn from the best in the business. Kimball
University offers public classes in venues around the US and
internationally. In addition, KU teaches classes on-site at client
locations. All class content is vendor neutral, with the exception
of the Microsoft-centric course.
WWW.Q4K.COM