Professional Documents
Culture Documents
Database Archiving: How to Keep Lots of Data for a Long Time Jack E. Olson, Morgan Kaufmann, 2008
Copyright SvalTech, Inc., 2009
SvalTech
Topics
Database Archiving Definitions Database Archiving Application Profiles Elements of a Successful Implementation Solution Comparisons Business Case Basics
SvalTech
SvalTech
Definition
The process of removing selected data items from operational databases that are not expected to be referenced again and storing them in an archive database where they can be retrieved if needed.
Document Archiving Multi-media files Email Archiving pictures outlook word sound lotus notes pdf telemetry excel XML
SvalTech
Data Sources
Data created and maintained by either custom applications or packaged applications that store data in structured database management systems or structured records in file systems. transaction data reference data
DB2 SAP IMS Oracle Financials ADABAS Siebel IDMS PeopleSoft ORACLE SQL SERVER VSAM
SvalTech
SvalTech
Drivers
Longer Data Retention requirements Expanded Business Mergers and Acquisitions overloaded operational databases Operational problems
SvalTech
Data Retention
The requirement to keep data for a business object for a specified period of time. The object cannot be destroyed until after the time for all such requirements applicable to it has past.
Business Requirements
Regulatory Requirements
SvalTech
Data Retention
Retention requirements vary by business object type Retention requirements from regulations are exceeding business requirements Retention requirements will vary by country Retention requirements imply the obligation to maintain the authenticity of the data throughout the retention period Retention requirements imply the requirement to faithfully render the data on demand in a common business form understandable to the requestor The most important business objects tend to have the longest retention periods The data with the longest retention periods tends to be accumulate the largest number of instances Retention requirements often exceed 10 years. more years for some applications Requirements exist for 25, 50, 70 and
SvalTech
create event
operational phase reference phase inactive phase
discard event
operational phase
can be updated, can be deleted, may participate in processes that create or update other data
reference phase
used for business reporting, extracted into business intelligence or analytic databases, anticipated queries
inactive phase
no expectation of being used again, no known business value, being retained solely for the purpose of satisfying retention requirements. Must be available on request in the rare event a need arises.
10
SvalTech
Retention requirement
operational
reference
inactive
11
SvalTech
12
SvalTech
Application Segments
An application segment is a set of business objects generated from a single version of an application where all business records in the segment have data consistent with a single metadata definition. A metadata break is a point in the life of the operational database where a change in metadata is implemented that changes the structure of the data or the manner in which data is encoded.
An application will have many segments over time Minor changes in metadata can sometimes be implemented without forcing a segment change Major metadata changes will always generate a segment change where data created in the previous segment cannot be recast to the new metadata definition without some compromise in the data Application segments can be generated in parallel with one operational implementation using one version of the application at the same time that another operational instance is using a different version of the application
13
SvalTech
Application Segments
case 1
OS1
time
case 2
OS1
OS2
time
S2
Source 1 = Stock Trades North American Division Source 2 = Stock Trades Western Europe Division
= major metadata break Copyright SvalTech, Inc., 2009
14
SvalTech
Application Segments
case 3
OS1
OS2
S2
OS3
S3
OS4
S2
time
Source 1 = Stock Trades North American Division application X Source 2 = Stock Trades Western Europe Division application Y Source 3 = acquisition of Trader Joe: merged with Source 1 on 7/15/2009 Source 4 = acquisition of Trader Pete: merged with Source 1 on 8/15/2009
= major metadata break Copyright SvalTech, Inc., 2009
15
SvalTech
Application Segments
A well designed database archive preserves application segments
Data is always kept in segment format Metadata is preserved at the segment level The archive administrative catalog shows
Segments
Segment version number Time period covered System generated from
Time order of consecutive segment strings Parallel segment strings for the same application
16
SvalTech
17
SvalTech
24/7 operational availability requirement Long retention period (7 years or more) Short useful active life (less than 2 years) Low access requirements during the inactive period
Very low access frequency Response time not critical Access requirements are simple, easily satisfied with ad hoc tools
18
SvalTech
19
SvalTech
Retired Application
Merger of companies results in an operational application being duplicated Data Structures are not compatible
One keeps data elements not in other One encodes data elements differently One designed for different OS/DBMS than other
Decision is made to use one system and abandon the other one Meets all requirements of an operational application
Copyright SvalTech, Inc., 2009
20
SvalTech
21
SvalTech
22
SvalTech
23
SvalTech
24
SvalTech
Archive Staff
Database Archive Specialist
Received education on database archive design and implementation Knows tools available Experienced Full time job
Supporting Roles
Storage Administrators Database Administrators Data Stewards Security Administrators Compliance staff IT management Business Unit Management
25
SvalTech
Archive Extractor
Archive extractor
Archive Administrator Archive Designer Archive Data Manager Archive Access Manager
Archive Server
archive catalog
archive storage
26
SvalTech
Archive Designer
Metadata
Capture current metadata Validate it Enhance it Design archive storage format
Data
Define business records to be archived Define source of data Define data structures within operational system Define reference data needed to include with it Define archive format of data
Policies
Define extract policy (when a record becomes inactive) Define operational disposal policy (when to remove from operational database) Define storage policy (how to protect data in archive) Define discard policy (when to remove from archive)
27
SvalTech
Archive Extractors
Extractor process
Verify consistency with design metadata Extract data as defined in designer Mark or delete from operational database as defined in designer Pass data to archive data manager Keep audit records on everything done Do not impact operational performance Support interruptions with transaction level recovery Support restart Finish scans within acceptable time periods
Scheduling
Establish periodic executions Find non-disruptive periods Be consistent
28
SvalTech
Archive Extractors
Physical vs. Application Extractors
Operational System
Application program Archive extractor OP DB
Archive Extractor
Physical Extractor
Gets/deletes data directly from the database tables, rows, columns
Application Extractor
Gets/deletes data from an application API virtual tables, rows, columns
application program
29
SvalTech
30
SvalTech
Archive Access
Query Capability
Determine applicability based on archive segment versions of metadata SQL based in best, if possible Employ external indexes to determine which archive segments to look into Employ internal indexes to avoid reading all of an archive segment
Support metadata version browsing Support generation of load files based on query results Support generation of load files based on original data source based on query results
31
SvalTech
Archive Administration
Manage Archive Catalog
Application archive designs Audit trails Results logs
Manage e-Discovery requests Ensure Extract and Discard processes are run when they are supposed to Manage Metadata Change Process
32
SvalTech
Solution Comparisons
33
SvalTech
Reformatted archive segments stored as files load files typically vendor XML files solutions special files
34
SvalTech
Storage Comparisons
DB Solutions
parallel partitioned db arrays
Dont get $$$$ savings Requires database administration Problems handling metadata changes
Backup Solutions
image copies unload files
Requires restaging data to access Not searchable in archive Problems handling metadata changes Indexed Direct access via SQL Separated by archive segments Metadata resolution across archive segments Can exploit storage subsystem capabilities Can use hosted storage
35
SvalTech
36
SvalTech
37
SvalTech
38
SvalTech
A Myth
Homegrown Solutions are good enough. Truth: They do solve the problem of getting inactive data out of operational databases However, They do not realize maximum cost savings They generally do not realize any cost savings They generally cannot be directly accessed They often require original application environment They are never indexed They often compromise data integrity across metadata changes They often offer less protection from data loss
39
SvalTech
A Myth
Homegrown Solutions are cheaper and faster to implement. Truth: A good vendor solution will guide you through the process and get done quickly Managing the archive is easier and cheaper than managing databases
40
SvalTech
41
SvalTech
An Assertion
To Get the Benefits from Database Archiving improved operational efficiency better data governance lower risks It does not need to cost a penny. If done properly, database archiving can realize cost benefits larger than the cost of implementation and maintaining the archive. In most cases the savings can justify database archiving by itself.
42
SvalTech
Inactive data in archive db least expensive system least expensive storage least expensive software
Operational
operational
archive
43
SvalTech
44
SvalTech
45
SvalTech
Risk Factors
Will the saved data have better authenticity not changed in archive shielded from updates or damage traceable back to original form Will e-Discovery benefit from archiving can locate and process data outside of operational environment can easily create legal-hold archive units Will exposure of data reduced fewer authorized users against the archive complete audit trails of all access
46
SvalTech
47
SvalTech
Final Thoughts
Database Archiving is coming Database Archiving is good Reduces cost Improves operational efficiency Reduces Risk Need a complete solution to be effective Need professional staff Educated Fulltime
48