You are on page 1of 48

SvalTech

An Overview of Database Archiving


Toronto DAMA Chapter Meeting 16 September, 2009
Jack E. Olson jack.olson@SvalTech.com www.svaltech.com

Database Archiving: How to Keep Lots of Data for a Long Time Jack E. Olson, Morgan Kaufmann, 2008
Copyright SvalTech, Inc., 2009

SvalTech

Topics
Database Archiving Definitions Database Archiving Application Profiles Elements of a Successful Implementation Solution Comparisons Business Case Basics

Copyright SvalTech, Inc., 2009

SvalTech

Database Archiving Definitions

Copyright SvalTech, Inc., 2009

SvalTech

Definition
The process of removing selected data items from operational databases that are not expected to be referenced again and storing them in an archive database where they can be retrieved if needed.

Physical Documents application forms mortgage papers prescriptions

File Archiving structured files source code reports

Document Archiving Multi-media files Email Archiving pictures outlook word sound lotus notes pdf telemetry excel XML

Database Archiving DB2 IMS ORACLE SAP PEOPLESOFT

Copyright SvalTech, Inc., 2009

SvalTech

Data Sources
Data created and maintained by either custom applications or packaged applications that store data in structured database management systems or structured records in file systems. transaction data reference data
DB2 SAP IMS Oracle Financials ADABAS Siebel IDMS PeopleSoft ORACLE SQL SERVER VSAM

Copyright SvalTech, Inc., 2009

SvalTech

Data Domain Business Records


The data captured and maintained for a single business event or to describe a single real world object. Databases are collections of Business Records. Database Archiving is Records Retention.
customer employee stock trade purchase order deposit loan payment

Copyright SvalTech, Inc., 2009

SvalTech

Drivers
Longer Data Retention requirements Expanded Business Mergers and Acquisitions overloaded operational databases Operational problems

Data Governance e-Records Retention e-Discovery Readiness concerns Application Changes

Copyright SvalTech, Inc., 2009

SvalTech

Data Retention
The requirement to keep data for a business object for a specified period of time. The object cannot be destroyed until after the time for all such requirements applicable to it has past.
Business Requirements

Regulatory Requirements

The Data Retention requirement is the longest of all requirement lines.

Copyright SvalTech, Inc., 2009

SvalTech

Data Retention
Retention requirements vary by business object type Retention requirements from regulations are exceeding business requirements Retention requirements will vary by country Retention requirements imply the obligation to maintain the authenticity of the data throughout the retention period Retention requirements imply the requirement to faithfully render the data on demand in a common business form understandable to the requestor The most important business objects tend to have the longest retention periods The data with the longest retention periods tends to be accumulate the largest number of instances Retention requirements often exceed 10 years. more years for some applications Requirements exist for 25, 50, 70 and

Copyright SvalTech, Inc., 2009

SvalTech

Data Time Lines


for a single instance of a business record

create event
operational phase reference phase inactive phase

discard event

operational phase

can be updated, can be deleted, may participate in processes that create or update other data

reference phase

used for business reporting, extracted into business intelligence or analytic databases, anticipated queries

inactive phase

no expectation of being used again, no known business value, being retained solely for the purpose of satisfying retention requirements. Must be available on request in the rare event a need arises.

Copyright SvalTech, Inc., 2009

10

SvalTech

Data Time Lines


for a single instance of a business record
Create PO Update PO Create Invoice Backorder Create Financial Record Update on Ship Update on Ack Weekly Sales Report Quarterly Sales report Extract for data warehouse Extract for bus analysis Common customer queries Common bus queries Ad hoc requests Law suit e-Discovery requests Investigation data gathering

Retention requirement

operational

reference

inactive

Copyright SvalTech, Inc., 2009

11

SvalTech

Data Time Lines


Some objects exit the operational phase almost immediately (financial records) Some objects never exit the operational phase (customer name and address) Most transaction data has an operational phase of less than 10% of the retention requirement and a reference phase of less than 20% of the retention requirement Inactive data generally does not require access to application programs: only access to ad hoc search and extract tools

Copyright SvalTech, Inc., 2009

12

SvalTech

Application Segments
An application segment is a set of business objects generated from a single version of an application where all business records in the segment have data consistent with a single metadata definition. A metadata break is a point in the life of the operational database where a change in metadata is implemented that changes the structure of the data or the manner in which data is encoded.

An application will have many segments over time Minor changes in metadata can sometimes be implemented without forcing a segment change Major metadata changes will always generate a segment change where data created in the previous segment cannot be recast to the new metadata definition without some compromise in the data Application segments can be generated in parallel with one operational implementation using one version of the application at the same time that another operational instance is using a different version of the application

Copyright SvalTech, Inc., 2009

13

SvalTech

Application Segments
case 1
OS1
time

Application: customer stock transactions


S1

Source 1 = Trades All Stock Trades

case 2
OS1

Application: customer stock transactions


S1

OS2
time

S2

Source 1 = Stock Trades North American Division Source 2 = Stock Trades Western Europe Division
= major metadata break Copyright SvalTech, Inc., 2009

14

SvalTech

Application Segments
case 3
OS1

Application: customer stock transactions


S1

OS2

S2

OS3

S3

OS4

S2

time

Source 1 = Stock Trades North American Division application X Source 2 = Stock Trades Western Europe Division application Y Source 3 = acquisition of Trader Joe: merged with Source 1 on 7/15/2009 Source 4 = acquisition of Trader Pete: merged with Source 1 on 8/15/2009
= major metadata break Copyright SvalTech, Inc., 2009

15

SvalTech

Application Segments
A well designed database archive preserves application segments
Data is always kept in segment format Metadata is preserved at the segment level The archive administrative catalog shows
Segments
Segment version number Time period covered System generated from

Time order of consecutive segment strings Parallel segment strings for the same application

Copyright SvalTech, Inc., 2009

16

SvalTech

Database Archiving Application Profiles

Copyright SvalTech, Inc., 2009

17

SvalTech

Overloaded Operational Database


Transaction data Lots of data
Hundreds of millions of rows High daily transaction rate

24/7 operational availability requirement Long retention period (7 years or more) Short useful active life (less than 2 years) Low access requirements during the inactive period
Very low access frequency Response time not critical Access requirements are simple, easily satisfied with ad hoc tools

Copyright SvalTech, Inc., 2009

18

SvalTech

If You Dont Archive


Continue to keep all data in operational Database. Inactive Data will impact operational performance Harder to tune Scans take longer Utility functions will take longer to execute Backups Database reorganizations Recovery Operations take longer
Outage recoveries Disaster recoveries Need more expensive online storage Need system upgrades Pay more for application and DBMS software

System Costs will Escalate


Older data will become less reliable

Copyright SvalTech, Inc., 2009

19

SvalTech

Retired Application
Merger of companies results in an operational application being duplicated Data Structures are not compatible
One keeps data elements not in other One encodes data elements differently One designed for different OS/DBMS than other

Decision is made to use one system and abandon the other one Meets all requirements of an operational application
Copyright SvalTech, Inc., 2009

20

SvalTech

If You Dont Archive


Must retain old application environment to access data
Old System Old Application Program Old DBMS Application experts System experts DBA function Pay the high cost of the old application environment and staff until last record reaches end of retention period. $$$$$$$$$$$$$$$$$$$$$$$$ Higher cost and time of conversion
Data conversion problems Data loss Resolution of data quality issues Operational problems Lengthy Utility runs Lengthy Recovery periods Escalating system costs

Must keep knowledgeable staff to access

Or, Must merge data into active application

Resulting database is huge


Copyright SvalTech, Inc., 2009

21

SvalTech

Application Renovation Project


Application is undergoing major change
Replaced with packaged application Legacy modernization Legacy termination Rewritten to be web-centric Need to satisfy new requirements

Old data structures are out of date


Legacy DBMS Legacy file system

Data meets all other requirements for archiving operational application


Copyright SvalTech, Inc., 2009

22

SvalTech

If You Dont Archive


Must convert all data in one system to other system

More expensive and complex design phase


Must accommodate old data in new design May compromise new design

Higher and longer conversion period


Data conversion problems Data loss Resolution of data quality issues

Resulting data is less reliable

Copyright SvalTech, Inc., 2009

23

SvalTech

Elements of a Successful Implementation

Copyright SvalTech, Inc., 2009

24

SvalTech

Archive Staff
Database Archive Specialist
Received education on database archive design and implementation Knows tools available Experienced Full time job

Database Archive Administrator


Received education on database archiving administration Full time job

Supporting Roles
Storage Administrators Database Administrators Data Stewards Security Administrators Compliance staff IT management Business Unit Management

Copyright SvalTech, Inc., 2009

25

SvalTech

Architecture of Database Archiving


Operational System
Application program OP DB

Archive Extractor

Archive extractor

Archive Administrator Archive Designer Archive Data Manager Archive Access Manager

Archive Server

archive catalog

archive storage

Copyright SvalTech, Inc., 2009

26

SvalTech

Archive Designer
Metadata
Capture current metadata Validate it Enhance it Design archive storage format

Data
Define business records to be archived Define source of data Define data structures within operational system Define reference data needed to include with it Define archive format of data

Policies
Define extract policy (when a record becomes inactive) Define operational disposal policy (when to remove from operational database) Define storage policy (how to protect data in archive) Define discard policy (when to remove from archive)

Copyright SvalTech, Inc., 2009

27

SvalTech

Archive Extractors
Extractor process
Verify consistency with design metadata Extract data as defined in designer Mark or delete from operational database as defined in designer Pass data to archive data manager Keep audit records on everything done Do not impact operational performance Support interruptions with transaction level recovery Support restart Finish scans within acceptable time periods

Scheduling
Establish periodic executions Find non-disruptive periods Be consistent

Copyright SvalTech, Inc., 2009

28

SvalTech

Archive Extractors
Physical vs. Application Extractors
Operational System
Application program Archive extractor OP DB

Archive Extractor

Physical Extractor
Gets/deletes data directly from the database tables, rows, columns

Application Extractor
Gets/deletes data from an application API virtual tables, rows, columns

application program

Copyright SvalTech, Inc., 2009

29

SvalTech

Archive Data Manager


Put data away
Receive data from extractors Format into archive segment files Determine metadata version affinity Format and store metadata files if new Build or update segment indexes both internal and external

Execute Storage policies


Encryption/ signatures Backup copies created and stored Geographic dispersion of backups Register archive files with archive catalog Enter audit trail information

Fetch metadata on request


Return to accessing programs

Fetch data on request


Scan archive segments Search through indexes

Execute Archive Discard Process


Periodic scheduling Delete qualifying business records Update archive catalog

Copyright SvalTech, Inc., 2009

30

SvalTech

Archive Access
Query Capability
Determine applicability based on archive segment versions of metadata SQL based in best, if possible Employ external indexes to determine which archive segments to look into Employ internal indexes to avoid reading all of an archive segment

Support standard access tools


Report generation (such as Crystal Reports) Generic query tools JDBC interface

Support metadata version browsing Support generation of load files based on query results Support generation of load files based on original data source based on query results

Copyright SvalTech, Inc., 2009

31

SvalTech

Archive Administration
Manage Archive Catalog
Application archive designs Audit trails Results logs

Manage Archive Storage Systems


Ensure periodic readability checks Maintain access audit trails

Manage Archive Access


Authorizations for users Authorizations for specific events Unloads Ensure audit records are created for all access

Manage e-Discovery requests Ensure Extract and Discard processes are run when they are supposed to Manage Metadata Change Process

Copyright SvalTech, Inc., 2009

32

SvalTech

Solution Comparisons

Copyright SvalTech, Inc., 2009

33

SvalTech

How Archive Data is Stored


Parallel databases Partitions of operational db
typically homegrown solutions

Database LOAD Files Saved image copies

Reformatted archive segments stored as files load files typically vendor XML files solutions special files

Copyright SvalTech, Inc., 2009

34

SvalTech

Storage Comparisons
DB Solutions
parallel partitioned db arrays

Dont get $$$$ savings Requires database administration Problems handling metadata changes

Backup Solutions
image copies unload files

Requires restaging data to access Not searchable in archive Problems handling metadata changes Indexed Direct access via SQL Separated by archive segments Metadata resolution across archive segments Can exploit storage subsystem capabilities Can use hosted storage

Non-DB Special files


XML load files plus proprietary

Copyright SvalTech, Inc., 2009

35

SvalTech

Data Structure Comparisons


Things to Look for
Is metadata maintained in archive Is metadata validated Is metadata enhanced Is data restructured to achieve source independence from application programs from DBMS type from source OS/ hardware Is reference information captured in archive Is data maintained in original form in archive forever Can user see data form prior to conversions
Copyright SvalTech, Inc., 2009

36

SvalTech

Data Access in the Archive


Things to Look for
Can requests be satisfied directly from the archive Can common generic tools be used JDBC Report generators Can data be unloaded in forms for re-platforming Can data be accessed efficiently Is it indexed Is representations consistent Are metadata differences accounted for

Copyright SvalTech, Inc., 2009

37

SvalTech

Administration of the Archive


Things to Look for
Is there a full time administrator Is there an archive catalog database what is in the archive where is it stored Is security maintained different from operational Are actions and events logged

Copyright SvalTech, Inc., 2009

38

SvalTech

A Myth
Homegrown Solutions are good enough. Truth: They do solve the problem of getting inactive data out of operational databases However, They do not realize maximum cost savings They generally do not realize any cost savings They generally cannot be directly accessed They often require original application environment They are never indexed They often compromise data integrity across metadata changes They often offer less protection from data loss

Copyright SvalTech, Inc., 2009

39

SvalTech

A Myth
Homegrown Solutions are cheaper and faster to implement. Truth: A good vendor solution will guide you through the process and get done quickly Managing the archive is easier and cheaper than managing databases

Copyright SvalTech, Inc., 2009

40

SvalTech

Business Case Basics

Copyright SvalTech, Inc., 2009

41

SvalTech

An Assertion
To Get the Benefits from Database Archiving improved operational efficiency better data governance lower risks It does not need to cost a penny. If done properly, database archiving can realize cost benefits larger than the cost of implementation and maintaining the archive. In most cases the savings can justify database archiving by itself.

Copyright SvalTech, Inc., 2009

42

SvalTech

Reason for Archiving


All data in operational db most expensive system most expensive storage most expensive software
In a typical op db 60-80% of data is inactive Size Today This percentage is growing

Inactive data in archive db least expensive system least expensive storage least expensive software

Operational

operational

archive

Copyright SvalTech, Inc., 2009

43

SvalTech

Cost Saving Elements


Look for and compute difference in storage costs front-line vs archive storage byte counts differences between operational and archive Look for and compute difference in system costs operational vs archive systems are operational system upgrades avoided are software upgrades avoided can systems be eliminated for application can software be eliminated for application Look for savings on people costs can people be eliminated or redirected for retired applications Potential savings on changes/ application renovations simplification of design elimination of data conversions

Copyright SvalTech, Inc., 2009

44

SvalTech

Operational Efficiency Impacts


Will operational performance be enhanced with less data Will utility time periods be reduced (backup, reorganization) fewer occurrences needed less data to process each time Will recovery times be reduced and what is that worth interruption recoveries disaster recoveries Will implementation of data structure changes be improved avoided reduced amount of data to unload/modify/reload

Copyright SvalTech, Inc., 2009

45

SvalTech

Risk Factors
Will the saved data have better authenticity not changed in archive shielded from updates or damage traceable back to original form Will e-Discovery benefit from archiving can locate and process data outside of operational environment can easily create legal-hold archive units Will exposure of data reduced fewer authorized users against the archive complete audit trails of all access

Copyright SvalTech, Inc., 2009

46

SvalTech

Business Case Summary


Database Archiving solutions generally provide for lower cost software, can use lower cost storage more efficiently, and run on smaller machines. Each business case is different Many factors can be used in building business case Seen an application justified on storage costs alone Seen an application justified on disaster recovery time alone Seen an application justified on better data security alone Each organization will have many potential applications Having a database archiving practice can create synergies across many applications thus adding more value

Copyright SvalTech, Inc., 2009

47

SvalTech

Final Thoughts
Database Archiving is coming Database Archiving is good Reduces cost Improves operational efficiency Reduces Risk Need a complete solution to be effective Need professional staff Educated Fulltime

Copyright SvalTech, Inc., 2009

48

You might also like