Describes the Data Warehouse Project Documentation Roadmap
Note: This is only a summary of presentations found on internet.
Credits to the owner of presentations gathered and summarized.
Describes the Data Warehouse Project Documentation Roadmap
Note: This is only a summary of presentations found on internet.
Credits to the owner of presentations gathered and summarized.
Describes the Data Warehouse Project Documentation Roadmap
Note: This is only a summary of presentations found on internet.
Credits to the owner of presentations gathered and summarized.
DATA WAREHOUSE PROJECT DOCUMENTATION PART II: 1. DOCUMENTATION OVERVIEW 2. TEMPLATES WHAT IS DOCUMENTATION?
• Documentation is a set of documents
provided on paper, or online, or on digital, such as audio tape or CDs. • Documentation is distributed via websites, software products, and other on-line applications. DOCUMENTATION ROADMAP TEMPLATES - THE TEMPLATES ARE DIVIDED INTO 11 CATEGORIES. - WITHIN EACH CATEGORY, THE DOCUMENTS ARE NUMBERED SEQUENTIALLY.
NOTE: 11 TEMPLATES ARE JUST GENERAL
ONES THAT CAN BE USED AS REQUIRED. 1. CONCEPT
Definition: The business may have a concept and the
IT team will be able to describe the major component and concepts of a data warehouse. 1.1 Business Concepts for the Data Warehouse - It describes subject areas and their broad relationships as well as key performance indicators used by the business. 1.2 Overview Architecture for Enterprise Data Warehouses – It is a design pattern for data warehousing to describe the basic concepts of the data warehouse. 2 REQUIREMENTS
Definition: The objective of these templates is
to give breadth and depth to the requirements. Breadth is the ability to ensure that all truly required information would be covered, whilst depth is the amount of detail that is specified in the requirements to ensure that the developers have sufficient, unambiguous, detail with which to develop. 2.1 Data Warehouse Business Requirements (WBR) – It details the ‘soft’ requirements for business information according to a number of subject areas of interest to the business. 2 REQUIREMENTS
2.2 Data Warehouse Data Requirements (WDR)
- This is the refinement of the business requirements in that the analysts can use the business requirement to drive out the data required to answer the questions. 2.3 Data Warehouse Query Requirements (WQR) – It lists a number of potential queries to which the solution should be able to provide answers. 2.4 Data Warehouse Technical Requirements (WTR) - Details the functional and non-functional requirements that are expected of the solution. 2 REQUIREMENTS
2.5 Data Warehouse Interface Requirements
(WIR) - Details the requirements for interfaces that feed from the data warehouse out to other systems. 2.6 Business Definitions Dictionary (BDD) - It is important that a common dictionary is developed and kept so that there is a common reference for words. 3 ARCHITECTURE Definition: The architecture category contains a number of documents that describe how the system should be built. 3.1 Technical Architecture - Describes the technical components that will be used to build the system include the hardware, software and network configuration, along with specific versions where appropriate and standard. 3.2 Security Model – it should describe all the required roles/groups etc. that will be required for each component of the system. 3 ARCHITECTURE 3.3 Resilience Plan - This should include the need for redundant hardware and networks, incremental, cumulative and full backups, restores of individual components or entire systems, how to back out records individually, as a group or entire sets and how disasters such the loss of a data centre etc. are managed. 3.4 Data Quality Plan (DQP) – This will include the principles of where data is cleansed (in the source, in the staging, in the data warehouse itself, etc.) how it is profiled, what type of cleansing is carried out (e.g. rule based or heuristic ), how it is profiled, what metrics are set and monitored for data improvement, etc . 4 DATA MODELS
Definition: Data models are (normally) graphical
representations of the data that is required. 4.1 Data Modelling Standards - Describes the naming conventions of objects in the database, as well as any particular modelling methods (e.g., a hierarchy must always be modelled in a specific way and any exceptions noted along with a justification for the difference). 4.2 Logical Model – Is a model that represents the true structure of data used by the business, independent of software or hardware implementation constraints. 4 DATA MODELS
4.3 Repository Data Model - Is a physical data
model of the main storage area within a data warehouse. 4.4 Data Mart Data Model(s) - Are the physical models of the part of the system that the user will query. 5 ANALYSIS
Definition: The goal of the analysis phase is to
identify the sources of the information required to populate the physical data models. 5.1 Source Systems Analysis (SSA) - Is a high- level analysis that gathers information about available systems.
5.2 Data Profiling - Is a process whereby an
existing source system is examined in order to collect information and statistics about that data held. 5 ANALYSIS
5.3 Source Entity Analysis (SEA) - Is the
detailed documentation of the sources selected because data profiling has validated these sources as being useful for the data warehouse .
5.4 Target Orientated Analysis (TOA) - Is used
to describe which sources will be used to populate which target entities. 6 DESIGN
Definition: The design phase concentrates on taking
the analysis and creating a plan for the code build . 6.1 ETL Execution Plan - Is a document that explains from the high level down to the low level how the ETL code will be put together. 6.2 Initial Capacity Plan - Describes the sizes of the databases and database objects required for the initial build. 6.3 Coding Standards - Describes the naming conventions for all objects that will be created, including but not limited to: database objects such as table and column names, ETL mapping names, script names etc. 7 BUILD
Definition: Is a roadmap to the documentation
that should be produced during a data warehouse project. 7.1 Code Repository - A lot of the code will contain valuable documentation in the form of comments. It is also vital that the history of changes to code is recorded. 7.2 Data Cleansing Integration - Will have generated a number of rules that will have to be implemented in order to maintain data quality. 8 TEST
Definition: Testing software is operating the
software under controlled conditions , to 1. Verify that it behaves “as specified” Verification is the checking or testing of items, including software, for conformance and consistency by evaluating the results against pre- specified requirements. 2. 2. To detect errors, testing should intentionally attempt to make things go wrong to determine if things happen when they should not or things do not happen when they should. 8 TEST 8.1 Unit Testing - designed to validate what an individual unit of development work (normally an ETL mapping, input screen or report) is functioning as expected. 8.2 System Testing - designed to check that a suite of newly developed or changed units work correctly together in the expected manner. 8.3 Integration Testing - designed to ensure that the suites of newly developed or changed units work with other suites that are already deployed on the system and do not damage the existing product environment. 8.4 Performance Testing - is designed to ensure the performance of the system. 9 IMPLEMENTATION Definition: After the development and testing are over the system has to be deployed into production and left operating. 9.1 Configuration Management Procedures - It should cover all aspects of the changes to the configuration from applying patches and new releases through to system software upgrades. 9.2 Operations Guide – It is intended for those with responsibility for looking after the system on a day -to- day basis. 9.3 Capacity Plan - Describing the Initial Capacity Plan will have already been produced. 9.4 Service Level Agreements (SLA) – It is a formal negotiated agreement between two parties. 9 IMPLEMENTATION 9.5 Helpdesk Scripts It needs to be able to handle support calls. 9.6 Training Plan – It will need to provide a training plan . This is how users become competent enough to use the system. 9.7 Operational Schedule – It is the list of tasks that must be performed each hour, day, week and month, etc. and any dependencies (e.g. must run after midnight, must only run if a previous job is successful etc.). 9.8 System Monitoring Plan - The system monitoring plan is the list of system components that are going to be monitored, along with threshold at which warnings and errors are signalled. 10 PROJECT MANAGEMENT
Definition: It has described documents required for
individual phases of the project. 10.1 Documentation Roadmap – It is the document that describes all the documents that should be produced for each of the phases of a project. 10.2 Project Plan – It is the list of tasks and activities with timescales, resources and dependencies that must be performed to deliver the solution. 10.3 ‘DRIVE’ Statements – It is short one page template that helps a project manager assess whether a project, or work package should be undertaken. 10 PROJECT MANAGEMENT 10.4 ‘SWOT’ Analysis - is often used in data warehouse projects as a way of comparing different approaches to a problem. 10.5 ‘MoSCoW’ Analysis - is a method of prioritizing a list of requirements of features of the system by breaking the list down. 10.6 Change Requests (CR) - is a critical component of any project and is vital to data warehouse projects . 10.7 Risk Register - is a list of events that may happen. If the event occurs then it will have some negative impact on the project in terms of cost, resource or time. 10.8 Issue Log (BUG) - is the active management of issues that have arisen. 10 PROJECT MANAGEMENT
10.9 Key Design Decisions (KDD) - The key design
decision is a template to record significant design decisions. It records the issue, the chosen option, any rejected options and rationale behind the decision. 11 MISCELLANEOUS
Definition: The final category of this document
describes some general-purpose documents that a project will find useful. 11.1 General Purpose Document - A standard look and feel document with the required categories for any project document required. 11.2 General Purpose Presentation – It is a presentation with a standard look and feel. 11.3 Meeting Agenda - A standard agenda template for meetings. 11.4 Memo – It provides a standard memo format for anyone who is recording formal aspects of the project outside the documentation roadmap . SUMMARY
Data Management & Warehousing has identified
three aspects to essential documentation: • A roadmap that describes what documentation is required and how it fits together. • Team members within the project to use the templates, create quality documents and store them to the project repositories. • Easy access for people outside the project team to the documentation including publication or notification of changes, updates and new releases. THANK YOU FOR LISTENING