You are on page 1of 51

These materials are the copyright of John Wiley & Sons, Inc.

and any dissemination, distribution, or unauthorized use is strictly prohibited.

Archiving
FOR

DUMmIES

ORACLE SPECIAL EDITION

by Lawrence C. Miller, CISSP

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Archiving For Dummies Oracle Special Edition , Published by John Wiley & Sons, Inc. 111 River St. Hoboken, NJ 07030-5774 www.wiley.com Copyright 2012 by John Wiley & Sons, Inc., Hoboken, New Jersey Published by John Wiley & Sons, Inc., Hoboken, New Jersey No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/ go/permissions. Trademarks: Wiley, the Wiley logo, For Dummies, the Dummies Man logo, A Reference for the Rest of Us!, The Dummies Way, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not be used without written permission. Oracle is a registered trademark of Oracle International Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Business Development Department in the U.S. at 317-572-3205. For details on how to create a custom For Dummies book for your business or organization, contact info@dummies.biz. For information about licensing the For Dummies brand for products or services, contact BrandedRights&Licenses@Wiley.com. ISBN: 978-1-118-28494-0 (pbk); ISBN: 978-1-118-28765-1 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Contents at a Glance
Introduction .................................................................. 1 Chapter 1: Recognizing Todays IT Challenges ...... 3
Explosive Data Growth ..................................................4 Diverse Data Types and Uses .......................................5 Legal and Regulatory Requirements ...........................7 Logical and Physical Data Migration ...........................8 Rising Costs ....................................................................9

Chapter 2: Archive 101 .............................................. 11


How Does an Archive Differ from a Backup? ............11 What Are the Different Types of Archives? ..............15 Why Is Tape the Best Archive Media? .......................17

Chapter 3: Archive Components.............................. 21


Archive Software ..........................................................21 Tape Software...............................................................23 Tape Libraries ..............................................................26 Drives and Media .........................................................31

Chapter 4: Archive Use Cases ................................. 35


Healthcare.....................................................................36 Media and Entertainment ...........................................36 Telecommunications ...................................................37 High-Performance Computing (HPC) ........................38

Chapter 5: Ten Key Factors to Consider in Implementing Your Archive ..................................... 41


These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Publishers Acknowledgments
Were proud of this book and of the people who worked on it. For details on how to create a custom For Dummies book for your business or organization, contact info@dummies.biz. For details on licensing the For Dummies brand for products or services, contact BrandedRights&Licenses@Wiley.com.Some of the people who helped bring this book to market include the following:

Acquisitions, Editorial, and Vertical Websites


Senior Project Editor: Zo Wykes Editorial Manager: Rev Mengle Acquisitions Editor: Katie Feltman

Composition Services
Senior Project Coordinator: Kristie Rees Layout and Graphics: Lavonne Roberts Proofreader: John Greenough

Special Help from Oracle: Scott Allen, Doug Chamberlain, Senior Business Development Representative: Karen L. Hattan Donna Harland, Cindy McCurley, Arthur Pasquinelli, Custom Publishing Project Christine Rogers, Allison Roth, Specialist: Michael Sullivan Mark Schaffer, Kerstin Woods

Publishing and Editorial for Technology Dummies


Richard Swadley, Vice President and Executive Group Publisher Andy Cummings, Vice President and Publisher Mary Bednarek, Executive Director, Acquisitions Mary C. Corder, Editorial Director

Publishing and Editorial for Consumer Dummies


Kathleen Nebenhaus, Vice President and Executive Publisher

Composition Services
Debbie Stailey, Director of Composition Services

Business Development
Lisa Coleman, Director, New Market and Brand Development

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Introduction
ince the beginning of time, mankind has communicated written ideas and information with symbols. From the cave paintings of the Paleolithic Age and the hieroglyphs of ancient Egypt, to modern alphabets around the world, information becomes more or less permanent when it is written. All that is required to read these permanent records is the ability to see it and interpret it. Today, enormous amounts of information whether trivial or profound is written and recorded digitally in thousands of different applications and formats at an absolutely stunning pace. Yet, ironically, this digital information is written as symbols (1s and 0s on magnetic media) that represent other symbols (alphabets, for example) that cannot possibly be seen by human eyes let alone interpreted without the proper tools: computers and their associated software and applications. Managing these vast repositories and archives for our use today is a challenge in and of itself. But what computers and technology will exist 50, 100, or even 1,000 years from now to interpret the wealth of information that modern society has amassed? What will be the predominant file format? Will your expensive enterprise hard disks be unreadable fossils in the next millennia? Or will all of our achievements over the last 50 years be lost to future generations in what Popular Mechanics has called the Digital Ice Age?

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

2
While this book cant answer all of these questions for the ages, it can help you solve your organizations archive and data management challenges today and for at least the foreseeable future!

About This Book

This book consists of five short chapters, covering todays data archiving challenges, the basics of archives, archive components, use cases, and key factors to consider for your archive solution. Each chapter is written as a stand-alone chapter, so feel free to start reading anywhere and skip around throughout the book!

Icons Used in This Book

Throughout this book, we occasionally use icons to call attention to important information that is particularly worth noting. Heres what to expect. This icon points out information that may well be worth committing to your nonvolatile memory! If youre an insufferable insomniac or vying to be the life of a World of Warcraft party, take note. This icon explains the jargon beneath the jargon. Thank you for reading, hope you enjoy the book, please take care of your writers! Seriously, this icon points out helpful suggestions and useful nuggets of information.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Chapter 1

Recognizing Todays IT Challenges


In This Chapter
Seeing the data forest and all its trees Using, and re-using, different types of data Complying with data retention regulations Keeping data formats and media current Managing data storage costs

ata retention in our modern digital era is a major challenge for businesses and organizations of all sizes, in all industries, worldwide. Common issues include the explosive growth of digital data, different data types and uses, complex regulatory requirements, data migration difficulties, and rising power, space, cooling, and management costs. This chapter explores these data retention challenges in depth.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Explosive Data Growth

The march of digital data growth continues at a stunning pace. The 2011 IDC Digital Universe Study estimates that by 2020 the total amount of digital information created, captured, and replicated will grow to 35,000 exabytes (see Figure 1-1). Just to put that in context, it would take almost 1.9 quadrillion (yes, quadrillion) trees to print 35,000 exabytes of data! Thats nearly 5,000 times the number of trees on the entire planet (which NASA estimates at approximately 400 billion)! A terabyte is equal to 1024 gigabytes, a petabyte is equal to 1024 terabytes, and an exabyte is equal to 1024 petabytes.

Figure 1-1: The nature of storage and data management has to change!

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

5
In many industries such as health care, life sciences, media/entertainment, and energy and in specialized markets, such as video surveillance and product life cycle management, the shift to digital content is now beyond the point of no return. These digital transformations are already spurring exponential increases in image data and associated content. The expanding use of automated sensors, high resolution medical scanners, earth observation satellites, and high performance technical computing applications across a broad range of industries is likewise driving much of this data growth. At the same time, companies are leveraging more collaboration, social networking, and web-based business applications to boost productivity and improve customer support. Large databases are at the heart of many of these applications. Data mining and analysis of these databases for business intelligence to improve efficiencies and market opportunities is driving the need for storage-intensive data warehousing.

Diverse Data Types and Uses

Enterprises must not only manage the growth of data, but also recognize the value and types of data and its anticipated uses within their organizations, as well. It is widely estimated that more than 80 percent of all organizational data is unstructured. This means that the vast majority of your storage capacity is being used

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

6
for e-mail, documents, images, and audio and video files. This unstructured data probably has a different value than the data in your business-critical databases, for example. Rather than treating all of your data equally, shouldnt your lower value data have a corresponding lower storage cost? According to IDC, unstructured data is projected to grow at a compound annual growth rate (CAGR) of more than 60 percent, compared to approximately 20 percent CAGR for transactional data. Data use and re-use presents another challenge or opportunity for organizations seeking innovative solutions to their growing data storage costs. Eighty percent of all data (both structured and unstructured) is never again used or accessed after 90 days. How often do you look at an e-mail message, a sales transaction, or a shipping manifest that is more than 90 days old? Yet this data is frequently stored on the same high-speed, high-performance, high-cost disks as the rest of your active production data. At the same time, when you do need a file from last year, it holds high value to you again. Therefore, you cannot simply delete all of that data. And with advances and new ways to search and analyze data coming every day the data you consider inactive today may hold untapped value just around the corner. Archive data must still be retained, protected, and readily accessible when needed, but there are lowercost alternatives that are better suited to data that is infrequently or never again accessed.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Legal and Regulatory Requirements

Increasingly stringent data retention and protection regulations and complex compliance requirements also contribute to the data growth problem. These include the U.S. Health Insurance Portability and Accountability Act (HIPAA), Gramm-Leach-Bliley Act (GLBA), Sarbanes-Oxley (SOX), Canadas Management of Information Technology Security (MITS) directive, the EU Statutory Audit and Company Reporting Directive (EuroSox), and Japans Financial Instruments and Exchange Law (J-SOX), among others. According to the Storage Networking Industry Association (SNIA), 80 percent of organizations participating in a recent survey responded that they are required to retain data for more than 50 years, and 68 percent of companies require a 100-year archive! These include governments, digital libraries, research organizations, and industries that need to keep track of data on population-wide drug interactions or individual aircraft for 10, 20, or 50 years, for example. How long does your organizations archive horizon need to be? Challenge your retention requirements to ensure that they do not expose your organization to excess costs and liability, but still meet your business needs. Not only do organizations today have longer data retention requirements, but they also have to have

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

8
their archives readily available and easily accessible not just locked away in a cave somewhere. It is absolutely critical that your organization have the right combination of archive hardware and software to ensure your data can be archived efficiently, securely, and reliably. You must be able to accurately catalog and index the contents of your archives and quickly restore to online storage or other media when needed. In the event of litigation, this capability will help your organization reduce the scope of legal discovery and quickly comply with a subpoena while controlling legal costs.

Logical and Physical Data Migration

Long-term retention of digital information also creates unique technical issues for organizations. These include the logical and physical migration of archive data. Data must be updated, typically every three to five years, to newer formats that are supported by current and future applications. This cycle is known as logical data migration. Although most common applications today provide some level of backward compatibility for data created and saved with older versions of that application, there are limits to that compatibility particularly for proprietary applications and unstructured data.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

9
For example, many popular word processing, CAD (computer-aided design), and graphics file formats that were in popular use just 10 or 15 years ago are now obsolete and unreadable. One solution to this problem is to convert data to a common plain-text format, such as ASCII (American Standard Code for Information Interchange) or Unicode. However, these formats do not maintain the original data structure or metadata, and cannot support rich-text features and graphic images. Physical data migration refers to the need to copy archived data to newer storage media in order to preserve its integrity over time which, like logical data migration, typically tends to happen every three to five years, depending on the media type. Physical data migration is also necessary to ensure that current media formats are used, and that current backup and archiving software can read, write, and catalog the data properly. Both logical and physical data migrations require extensive time and resources. As the volume of organizational data continues to grow, so too do the resources required to migrate that data.

Rising Costs

Although the cost per gigabyte of storage has steadily decreased over time, energy and storage management costs are increasing. Storage consumes almost half of all data center power today, and it is growing at a rapid rate. Within ten

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

10
years, the total power consumed by storage will easily represent the majority of the energy consumed in the data center. The cost of managing this data is exploding as well. The increasing number of data sources, data formats, government and industry regulations, and the businesscritical nature of data is driving up management costs year over year even faster than energy utilization rates. Data and storage management will soon become the number one cost within many data centers. Considering that 80 percent of all data older than 90 days is never looked at again, you need a better way to deal with massive amounts of data storage. It is more important than ever to align the value of data with the capabilities and cost of the media on which it is stored. This can best be achieved with storage and archive solutions that: Drive the cost of storage used for data that is almost never accessed again to virtually zero Assure access to valuable content that needs to be accessed over the long-term Increase the amount of data that storage and database administrators can manage

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Chapter 2

Archive 101
In This Chapter
Defining archives versus backups Using disk-based and tape-based archives Choosing the best archive media

his chapter explains exactly what an archive is and helps you to differentiate archives from backups. You also find out about the different archive types and why tape is the best media for long-term archive data.

How Does an Archive Differ from a Backup?

An archive is data storage that is used for long-term retention of permanent records and information. Archives consist of data that is no longer modified or regularly accessed but is still important and has value to the organization. Archive data is retained for a period of time, as defined by organizational policy (or indefinitely) for future reference, and for legal or regulatory compliance. Archives must be cataloged, fully indexed, and searchable, so that data can be easily located and retrieved when needed.
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

12
Data archives are sometimes confused with data backups. Although both archives and backups may employ similar hardware and software technologies, they are distinctly different in several ways. Data archives are used for long-term retention of permanent records. In contrast, data backup is a copy of data that is still in production and is regularly accessed or modified. Archive data is analogous to finished product, whereas production data (and its associated backups) is analogous to work-in-progress (WIP). A backup is a copy of data. An archive is the data. Because production data is regularly accessed and modified, it is susceptible to corruption or destruction. In such an event, the backup copy is used to restore the original data. The purpose of an archive is long-term retention of permanent records. The purpose of a backup is to create a short-term copy of production data in case the original data is corrupted or destroyed. Organizations typically employ a combination of different backup routines to maintain an accurate copy of production data. These include Full backups: All of the data is copied. Incremental backups: Only data that has changed since the last backup is copied. Differential backups: Only data that has changed since the last full backup is copied.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

13
By comparison, archiving simply moves data to a separate repository, based on a pre-defined policy such as the last time a file was accessed or modified. Both archives and backups must be cataloged so that data can be located when needed. However, archives also require robust indexing and searching capabilities. A typical archive request may be: I need to locate all files that contain the phrase clinical drug trials created between 1995 and 2002. A similar request for a file restore from backup should result in the requester being banned from ever using a computer again: I just accidentally deleted a file that contained the word oops created between 1995 and 2002, but I have no idea what the complete name of the file was, what directory it was in, or what server it was on. Can you drop everything and restore it for me?! Speed is important to both archives and backups, but for different reasons. The ability to quickly index files and perform accurate full-text searches of extremely large (several terabytes or more) data repositories is critical for locating archive data. Archive data is, by definition, data that is not regularly accessed or modified, so it can be migrated to an archive from the production environment at pretty much any time. Backups today are increasingly being performed on production data in near real-time as backup systems and software become more robust and sophisticated. But regardless of the backup systems and software, backups can still limit access to certain files while running,

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

14
and can adversely affect system and network performance. For these reasons, most backups typically still occur in a backup window during nonproduction hours. Speed is critical to ensure that all production data can be backed up during the allotted backup window. Speed is also critical when restoring backups. In the event of a disaster, quickly and correctly restoring systems and data can be a daunting task that is of utmost importance to the continuity of business operations. On a much smaller scale, individual disasters happen almost every day, requiring a fast recovery capability: Eke, I just deleted my presentation for our sales meeting and its only an hour away! Finally, archives and backups often use similar types of storage media. However, archives and backups each have unique characteristics that should more clearly dictate the storage media that is most appropriate for each use. Archive data is written to media only once, but may be accessed many times over a period of many, many years. Over time, the amount of archive data within an organization typically grows exponentially (refer to Chapter 1). For these reasons, your primary factors for selecting archive media (in order) should be Reliability Cost Speed Backup tapes are constantly handled and rotated through a backup cycle that performs numerous high-speed reads and writes of data. This significantly shortens the life of a backup tape. Although you may be replacing backup
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

15
tapes on a regular basis, archive tapes are not normally subjected to that same level of wear and tear. Archive tapes typically have a 30-year life though you may perform data migrations more frequently (refer to Chapter 1). Backup data is written to the same media many times over a relatively short period, defined by your organizations backup cycle. For example, your organization may have a six-week backup cycle that enables it to recover data or system configurations up to six weeks old. Typically, a backup must be completed during a limited backup window to minimize its impact during production hours. For these reasons, your primary factors for selecting backup media (in order) should be Speed Reliability Cost Do not confuse a backup cycle with a recovery point objective (RPO). A backup cycle defines the oldest version of data that can be recovered. An RPO, used in disaster recovery and business continuity plans, defines the most current version of data that can be recovered.

What Are the Different Types of Archives?

Archives can be either disk-based or tape-based. A disk-based archive usually consists of large disk subsystems or storage arrays and is typically implemented with a tiered storage system.
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

16
A tiered storage system maintains an organizations production data on its highest performance drives such as serial-attached SCSI (SAS) or solid-state drives (SSD) and automatically moves archive data to slower drives such as serial ATA (SATA). Disk-based archive data must also be maintained at an off-site location for disaster recovery purposes. This requires a similar disk configuration at a secondary data center with sufficient network bandwidth for copying and replication between the two sites. Diskbased archives can be very costly to acquire, operate, and maintain. A tape-based archive is usually implemented with a tape library. Tape-based archive data can be quickly accessed and restored when needed. Todays tape technology reduces the latency to access data to very acceptable times for most organizations. Todays enterprise-class tape libraries have advanced capabilities that include automatic compression, WORM (Write-once, Read-many) technology, and encryption. These tape libraries can be automatically managed to augment expensive disk storage capacity with less costly tape-based storage. Finally, tapes containing archive data can be easily and securely copied locally, and then transported to an offsite location for disaster recovery purposes. Alternatively, an additional copy of the archive data can be created at a remote location.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

17

Why Is Tape the Best Archive Media?

Disk space is an important part of any enterprise data storage strategy, but it is simply not practical or even desirable to use disk exclusively for all of your enterprise storage needs. Enterprise storage is not an either/or proposition. Flash storage, disk, and tape all have their place in an enterprise tiered-storage strategy, and you have to use the right tool for the job. Flash storage is ideal for tasks with intensive I/O requirements where speed is the most critical factor. Disk works best for primary storage and as a staging area for backups. And tape is ideal for backups and archives. Tape and disk storage systems can and should coexist in a tiered-storage strategy. Many storage vendors paint tape storage as an inferior solution to disk, a last-generation technology on the verge of extinction a dinosaur, if you will. But the reality is that tape storage is not a dinosaur. Tape storage continues to be a key component in the enterprise data center, and most of the worlds information is actually stored on tape! This has been true for many years, and will be well into the future. Tape also has better error correction rates and longer refresh cycles than disk (see Table 2-1).

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

18
Table 2-1
Characteristic Max shelf life (bit rot) Best practices for data migration Uncorrected bit error rate Power and cooling

Tape versus Disk Performance


Disk 10 yrs 4-5 yrs 10
-14

Tape 30 yrs 10-15 yrs 10-19 1x

290x

Finally, tape has a significantly lower total cost of ownership (TCO) compared to disk. The cost and performance advantages of tape include Acquisition costs. The Clipper Group (www. clipper.com) estimates that the cost to implement a disk-based archive is 15 times more than the cost of a tape-based archive. Energy savings. An enterprise-class tape library uses much less energy (290 times less according to the Clipper Group!) than disk because it doesnt spin 24/7 like disk. In a 2010 study, the Clipper Group concluded that the cost of energy alone for the average disk-based solution exceeds the entire TCO for the average tape-based solution. Management savings. Tape has a higher ratio of petabytes managed per storage administrator than disk. This translates to lower overall labor costs. Longevity. No matter how you store your data, eventually it has to be moved either due to obsolescence or deterioration of the storage media. It is not uncommon for archive data to remain on
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

19
tape for up to a decade (though the tape itself can last up to 30 years) disk archives typically need to be replaced every three to five years. Scalability. Tape storage systems are highly scalable simply add more tapes for additional capacity and more drives for performance. The amount of tape and capacity that can be stored in a tape library dwarfs the capacity of comparable disk storage systems. Thus, you get more petabytes of storage per square foot in the datacenter with tape than with disk. Data integrity and auditing. Assuming the data is good when you archive it and the storage media is properly maintained, with tape, WYSIWYG (what you see is what you get) becomes what you store is what you get. But disk is constantly subject to corruption due to bad sectors, disk failure, malware, or accidental overwrites.

Debunking five myths about tape storage


Myth #1: Tape is more expensive than disk. Tape costs less per terabyte, consumes less energy, and is less expensive to operate than disk. Myth #2: Tape is cheaper to buy, but more expensive to operate. The Data Mobility Group reports the TCO for a Serial ATA-based disk storage system is 11 times higher than an LTO-based tape configuration over a seven year period.
(continued)

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

20
(continued)

Myth #3: Tape has gone away; no enterprise data center uses it. Most enterprise organizations use a tiered storage strategy with tape as the foundation layer, and nearly half of the worlds data is stored on magnetic tape. Myth #4: Tape is unreliable. The bit error rate (BER) for Oracles enterprise tape products is more than 4 million times better than enterprise disk. Myth #5: Tape is a greater security risk than disk. Tape is designed to be portable and therefore has a higher potential for loss. As a result, tape encryption became a necessity long before other storage media encryption and is far more advanced. Tape encryption is built into the tape drive and runs without performance degradation.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Chapter 3

Archive Components
In This Chapter
Managing your archive data Managing your tape data Checking out tape libraries Comparing tape drives and media

n this chapter, you learn about the key components of an enterprise archive: archive software, tape libraries, and drives and media.

Archive Software

Archive software components consist of content and data management software.

Content management software

Oracles WebCenter Content unifies data into a single repository where organizations can track information uniformly using metadata and logging. This information can then be integrated with business processes and enterprise applications.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

22
WebCenter Content offers best-in-class capabilities for managing data logically throughout its lifecycle based on business needs. Questions such as when data will need to be accessed, when it needs to be stored, and when it needs to be deleted apply to every instance of data a company manages or generates. WebCenter Content automatically manages those lifecycle decisions based on organizational policies to help organizations extract more value from the data. Archiving best practices are to always have at least two copies of your archive data on tape. With WebCenter Content, you can keep up to four copies on tape. WebCenter Content provides Intelligent content management Collaboration and re-use access to multiple applications Content management policies based on content Central search engine capabilities

Data management software

Working in conjunction with content management software applications, such as WebCenter Content (see the preceding section), Oracles Sun Storage Archive Manager (SAM) provides physical storage management and is used to optimize data placement across multiple tiers of storage, which can include tape and remote storage, as well as high-capacity disk storage.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

23
SAM presents the file system as if all data is located on primary disk. As data is accessed that is on archive devices only, SAM dynamically stages the data to the primary disk or directly to the application for immediate access. SAM works transparently in the background with tiered storage and makes archive copies based on policies that define file system characteristics. SAM can manage Thousands of SAN clients Hundreds of file systems Billions of files Petabytes of disk cache Exabytes of archive WebCenter Content manages what the data does and what it means; SAM dynamically manages exactly where the data resides in a hierarchy of storage mediums and protects the data with advanced features that include integrity checks, WORM, and encryption.

Tape Software
Tape analytics

Tape software components consist of tape analytics and the Linear Tape File System (LTFS). Oracles StorageTek Tape Analytics software simplifies tape storage management, taking a proactive approach to eliminate library, drive, and media errors through an intelligent monitoring software application exclusively available for Oracle StorageTek tape libraries.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

24
With StorageTek Tape Analytics software, you gain insight into detailed health information that helps you to make decisions about your tape environment prior to device failures (see Figure 3-1). Efficiently monitoring your storage environment is key to cost management. When archive applications encounter problems due to tape drive or media exchange errors, assets sit idle, administrators scramble, and data transfer to end-users is delayed. Any of these setbacks may have significant costs associated with them, leaving storage budgets depleted and users frustrated. With StorageTek Tape Analytics proactive approach to tape monitoring, errors are reduced, data flows freely, and the cost of managing an archive is ultimately reduced. StorageTek Tape Analytics is built to meet four key needs of all archive storage environments:

Figure 3-1: Oracle StorageTek Tape Analytics.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

25
Smart: Intelligent algorithms compute hardware health recommendations. Secure: Out-of-band tape monitoring adds zero risk for implementation. Simple: A tool that monitors tape so customers dont have to. Easy to deploy with a single IP connection to each library and a single pane-of-glass interface. Scalable: Supports multiple libraries and multiple sites, designed to meet the needs of a single library of users to the worlds largest archives.

Linear Tape File System (LTFS)

In order to present a complete file image to a user, two types of data need to be stored: the file metadata containing the file structure, file names, file format, and other data elements that are indexed to simplify access to the data on the tape; and the file data the raw file content that is stored on the tape. A tape that is LTFS-formatted is designed so that it may be split into two partitions. The smaller of the two partitions, at the beginning of the tape, holds all of the file metadata for all of the files on the tape. In the metadata partition, files are stored in a hierarchical directory structure. The rest of the tape, the second partition, is dedicated to data storage, as tape storage has done for decades. Because LTFS is an open format, anyone with a compatible tape drive and the drivers to operate it can read an LTFS tape without assistance from any other software. Oracles open source StorageTek Linear Tape File System (LTFS), Open Edition software enables customers to write files to tape in this self-describing format, much the same way files are written to disk and flash storage devices.
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

26
When a piece of tape media is loaded into a tape drive, the complete file folder image is displayed, with the file structure being pulled from the first partition and the raw file content being accessed from the second partition. StorageTek LTFS is extremely flexible, with support for all three major tape drive offerings: Oracles StorageTek T10000C tape drive, Oracles StorageTek LTO-5 tape drive from HP or IBM.

Tape Libraries

A tape library is a key infrastructure component in a tiered-storage strategy. Tiered storage aligns the value of your data assets with the most appropriate storage media in order to reduce cost and effectively manage data throughout its lifecycle (see Figure 3-2). Tape libraries provide comprehensive, highly scalable storage solutions for backup and archive applications in enterprise, midrange, distributed, and entry-level data center environments.

Figure 3-2: Tiered-storage is comprised of disk and tape.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

27
To learn more about tiered-storage, download your free copy of Storage Tiering For Dummies, Oracle Special Edition at www.oracle.com/ us/products/servers-storage/storage/ index.html.

Large-scale archive (more than 500 TB)


For enterprise data centers storing more than 500 TB, Oracle offers enterprise-class tape libraries: Oracles StorageTek SL3000 and StorageTek SL8500 modular library systems.

For large archives (greater than 5 PB), the Oracle StorageTek SL8500 modular library system the worlds first exabyte storage solution delivers significant value through heterogeneous data consolidation and multigeneration media support in an ultra-dense footprint. Both the StorageTek SL8500 and StorageTek SL3000 use a unique centerline architecture in which drives are kept at the center of the library, thereby alleviating robot contention. Robots travel one-third to one-half the distance required by other libraries, thereby improving cartridge-to-drive performance by up to 50 percent over other libraries.

StorageTek SL8500 and SL3000: At a glance


The StorageTek SL8500 and StorageTek SL3000 modular tape library systems (see the following table) are flexible, highly scalable storage solutions that feature Scalability and performance with capacity on demand so that you can install physical capacity in advance, then tap into it incrementally when you need it
(continued) These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

28
(continued)

RealTime Growth capability to non-disruptively add more cartridge slots, drives, and robotics Easy consolidation with Any Cartridge Any Slot technology for seamless mixed-media support that allows you to combine heterogeneous data sources and media types slot by slot for optimal consideration Industry-leading availability with redundant and hotswappable robotics and library control cards

SL8500
Cartridge Slots: Capacity (Compressed): Tape Drives: Native Throughput (TB/hr): Tape Drive Choices: Up to 100,000 Up to 1 exabyte Up to 640 Up to 552.9

SL3000
Up to 5925 Up to 60 petabytes Up to 56 Up to 48.4 T10000C/B/A T9840D/C LTO5/4/3 Eight Robotics, electronics, control path CAPS, fans, power

T10000C/B/A T9840D/C/B/A LTO 5/4/3/2 Number of Physical Eight Partitions: Redundant Robotics, elecComponents: tronics, control path CAPS, fans, power

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

29 StorageTek SL500: At a glance


The SL500 is a reliable, scalable, simple rack-mounted tape automation solution that features Enterprise-class reliability with superior robotics and easy serviceability Cartridge and drive expansion modules and capacity on demand for industry leading scalability Ideal for consolidation with up to eight native partitions and up to 575 LTO cartridges providing maximum native (uncompressed) capacity of over 860 TB (LTO-5) and more than 9 TB of native (uncompressed) throughput When integrated with a tiered-storage strategy that includes disk, Oracles Storage Archive Manager (SAM) and Oracles StorageTek T10000C tape media, the StorageTek SL8500 delivers a highly scalable enterprise-class archive system.

Mid-range archive (50 500 TB)

The Oracles StorageTek SL500 tape library is ideal for consolidation of distributed environments, which helps you save time, space, and energy by consolidating multiple libraries and applications into a central location. The StorageTek SL500 is also ideal for rack-based D2D2T (disk-to-disk-to-tape) solutions, when combined with SAM and Oracles Pillar Axiom 600. For organizations with mid-range (50-500 TB) data archiving needs, the StorageTek SL500 provides a flexible and scalable archive solution.
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

30
However, for customers in this segment who have high availability requirements or also need mainframe connectivity, the StorageTek SL3000 (discussed in the previous section) is the right archive solution.

Small-scale archive (less than 50 TB)

Oracles StorageTek SL24 and SL48 tape libraries are designed to meet the data storage demands including backup, archiving, and disaster recovery of fastgrowing businesses, workgroups, and remote offices.

StorageTek SL24 and SL48: At a glance


The SL48 tape library and SL24 tape autoloader (see the following table) provide reliability, simplicity, and value.

SL48
Up to 48 Up to 72 terabytes Tape Drives: Two fullheight or four half-height Native Throughput (TB/hr): 1.92 Tape Drive Choices: LTO 5/4/3 Number of Physical Four Partitions: Cartridge Slots: Capacity (Native):

SL24
Up to 24 Up to 36 terabytes One fullheight or two half-height 0.96 LTO5/4/3 Two

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

31
These small form-factor, rack-mounted libraries offer a number of interchangeable parts and are ideal for small and medium-sized businesses or remote office locations.

Drives and Media

The most common tape drive and media format in use today is LTO (Linear Tape-Open). The latest version is LTO-5 with a native capacity of 1.5 TB (uncompressed) and a maximum speed of 140 MB/s. LTO-5 tape supports dual partitioning and the Linear Tape File System (LTFS, discussed earlier in this chapter), which enables creation of tape-based file systems that are similar to disk-based file systems. For organizations that grow beyond the scalability of LTO, enterprise-class drives (such as Oracles StorageTek T10000C) and cartridges (such as Oracles StorageTek T10000 T2) provide higher capacity and throughput performance. Tape drive capacity and throughput are two key considerations when comparing the overall expense of different tape storage solutions. Other factors to consider when comparing tape drive technology include Acquisition cost. Its important to evaluate the combined cost of all drives, media, and library slots not just individual drive and media costs, as the drives have different capacity and performance points.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

32
Media re-use. Tape drives have different media re-use strategies. Some drives are able to write to previous generation media, while some tape drives allow you to reuse existing media at the full, higher capacity of future drive generations. Data integrity. StorageTek enterprise tape drives, like the StorageTek T10000C, have many features to improve reliability in archiving environments, including data integrity validation (DIV), which ensures that data is not corrupted while traveling along the data path, and StorageTek T10000 T2 media, which boasts 30+ years shelf life. Reliability. Drives are designed for different duty cycles and have different features to improve overall reliability. Table 3-1 summarizes the characteristics of StorageTek T10000C drives and StorageTek LTO-5 tape drives.

Table 3-1

Tape Drive Characteristics


T10000C Drive LTO-5 1.5 TB 140 MB/sec

Capacity (uncompressed) Throughput (uncompressed)

5 TB* 252 MB/ sec**

* 5.5 TB with StorageTek Maximum Capacity feature. ** Native sustained data rate. 240 MB/sec full file host data rate, includes wrap turnarounds.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

33 Oracles Optimized Solution for Lifecycle Content Management


For unstructured content management across any industry, Oracles Optimized Solution for Lifecycle Content Management (see the following figure) brings together many of the archive components (discussed in this chapter) into a ready-to-implement architecture that removes guesswork for customers. From content ingestion and creation to long-term retention, this architecture brings together best-of-breed components in a streamlined solution for the best TCO, speed to implementation, and reduced risk.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

34

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Chapter 4

Archive Use Cases


In This Chapter
Examining archive requirements for healthcare Looking out for media and entertainment needs Talking about telecommunications Crunching data for research and HPC applications

rchive systems are critical for managing data in virtually every business and organization, in every industry. Data archiving needs are driven by many factors, including customer needs (such as online access to digitized check images for bank customers), legal requirements (such as legal holds for pending litigation), and regulatory compliance (such as SarbanesOxley and the Health Insurance Portability and Accountability Act, or HIPAA). Not only does archive data need to be retained for long (or indefinite) periods, but it must be fully indexed, searchable, and easily retrievable. In this chapter, we explore several industry use cases for archives.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

36 University of Michigan Radiology Department praises StorageTek SL8500

Oracles StorageTek SL8500 Modular Library System is part of our tiered storage strategy, providing a very cost-effective, high-performance archive of our critical patient data. The library provides us with an extremely reliable, scalable, and cost-effective storage solution within a very constrained datacenter environment with limited power and cooling. Steve Ramsey, Director of Image Management and Computing Services, University of Michigan

Healthcare

Medical records and documents are typically managed through an enterprise content-management system that requires securely storing and accessing documents and records from multiple sources. Archived records are frequently compared with current results throughout the life of the patient requiring quick and reliable access to all patient data, regardless of where it may be stored. Retention requirements for medical data extend well beyond the life of the patient. SAM is a proven solution for PACS (Picture Archiving and Communication System) and has been for more than ten years. Refer to Chapter 3 for more about SAM.

Media and Entertainment

The media and entertainment industry requires streaming input and non-linear editing of very large files.
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

37
Having an archive server behind a Digital Asset Manager (DAM) solution gives this industry the ability to access more files using less costly storage. The ability to share files among editors improves time to revenue, as well as providing access to older data to generate new revenue.

Telecommunications

Various regulations, such as the European Unions Directive 2006/24/EC, require companies operating in the telecommunications industry to retain detailed communications data for various periods. Given the volume of communications traffic today whether telephony or Internet from both mobile and landline sources, the need for robust, highly scalable archive solutions to store the vast amounts of data being generated is of paramount importance in the telecommunications industry.

Thought Equity Motion gets disk performance on tape with SAM


Thought Equity Motion (www.thoughtequity.com) is a leading provider of cloud-based video management and licensing services for master-quality video. SAM has helped Thought Equity Motion treat multi-petabyte libraries as nearly equivalent to spinning disk without the cost, and with a much longer data management horizon. SAM streamlines the interaction between our applications and the data, eliminating the need for costly direct integrations with storage applications. Mark Lemmons, CTO, Thought Equity Motion
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

38

High-Performance Computing (HPC)

HPC environments, such as those found in many research settings and in the public sector, use multiple types of data. For example, streaming input from instrumentation that generates video and sound from sensors and other data collection devices and is then shared by multiple groups for immediate analysis, as well as being archived for easy access for future comparisons. Another example of data uses that are typical in HPC environments is data that is originally stored on a parallel file system. Parallel jobs are run to analyze the information. The raw data as well as the results of analysis are archived for future access and analysis.

Oracle StorageTek solutions help ECMWF forecast data archiving needs


The European Centre for Medium-Range Weather Forecasts (ECMWF) develops medium-range and seasonal forecasting using numerical methods and has produced operational medium-range weather forecasts since 1979. Challenges Ensure the safe and cost-effective storage of 23 TB of worldwide meteorological data, collected daily Enable fast access to 20 PB of archive data for 2,300 users and researchers across Europe and the world to improve the accuracy of forecasting and severe weather warnings

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

39
Ensure that the ECMWF can continue to collect, store, retrieve, and analyze weather forecasting data, even with a projected 60 percent annual data growth rate Solution Implemented three Oracle StorageTek SL8500 modular library systems to provide scalable storage Results Provided researchers with fast and easy access to a well-organized archive holding at least 50 years of data collected from weather stations around the globe Stored data in a safe and secure environment that minimized electricity usage and that has grown from 14 terabytes annually in 1995 to 23 terabytes daily in 2011 Enabled fast retrieval of data held on thousands of tapes, with the system typically handling 9,000 tape mounts per day and rising to 12,000 daily at peak times Accelerated access to 20 PB of archive data enabling researchers to improve accuracy of forecasts and severe weather warnings Oracles StorageTek SL8500 modular library system offers us an excellent balance in terms of cost, access, speed, and ease of use. The data stored in it is very close to being online, but without the prohibitive expense of spinning storage. Francis Dequenne, Principal Systems Analyst, Data Handling Systems, ECMWF

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

40

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

Chapter 5

Ten Key Factors to Consider in Implementing Your Archive


In This Chapter
Knowing whats important in an archive solution

nterprise archive requirements go beyond capacity considerations alone. Here are ten things you should consider for your archive.

Availability

An archive needs to provide high availability for your organizations archive data and fast, reliable access for your users, when they need it. An enterprise archive must have the following capabilities: Full-text indexing of all archive data Strong search engine with simple and advanced user-defined search variables Access to different audiences as different uses are determined for data Version control for multiple copies of data

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

42 Archiving Americas Archives


Reliable retrieval is key: An archive that only stores content is indistinguishable from a landfill. We need technology that reliably delivers all the content whenever requested and tells us proactively if there are issues affecting the retrieval of archived content. Scott Rife, U.S. Library of Congress

Integrity

An archive must preserve the integrity of the original data without data loss or corruption. Archive hardware and software should be fully integrated and work together to minimize silent data corruption (data that is altered without logging an error), sustain data through time, and guarantee data integrity.

Authenticity

Data that is stored in an archive must be saved in its original format, including all versions and associated metadata. Additionally, an enterprise archive solution should provide the option to preserve the data not only in its original format, but also in a transformed format that conforms to a universal archive standard.

Reusability and Collaboration

Archive data must be available for all users within the organization based on permissions managed with

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

43
role-based access. Archive data should be stored in open formats that facilitate reusability and collaboration between different user groups. Re-using data allows organizations to save money, for example, by not having to collect data again to run new tests. Companies shorten the product to market cycle and hit the mark with new products through analysis of collaborative data.

Security

An enterprise archive solution must protect data against corruption. Robust security features, including access control and encryption, protect the confidentiality, integrity, and availability of your archives.

Sustainability

An archive system must be sustainable through technology changes that occur over time. The 100-year archive is increasingly becoming a standard requirement in many enterprises. Your archive must be sustainable and able to migrate stored data through numerous inevitable technology changes over a period of many, many years. An open (rather than proprietary) format helps ensure sustainability and facilitates transforming your archives to new formats in the future.

Trustworthiness

Your archive vendor must be committed for the longterm. While there are never any guarantees that even the most reputable and financially stable companies will be around in 50 or 100 years, investing in bleedingedge archive technology from the latest overnight
These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

44
dot.com sensation is a recipe for disaster. Look for a vendor with an established reputation in the industry and a known commitment to your archive success.

Cost-effectiveness

When selecting your archive solution, you must consider the total cost of ownership. This includes not only your initial capital expenditures, but the ongoing operating expenses as well. Automated tape is the most cost-effective storage medium between 8 and 12 cents per gigabyte. The Clipper Group (www.clipper. com) estimates this to be up to 15:1 savings over disk.

Automated Data and Storage Management

Tiered storage enables you to take advantage of disk and tape in your archive. Your archive software must integrate with all of your storage hardware to manage your archive data efficiently and dynamically.

Infrastructure Analytics

Maintaining a healthy archive environment is critical. Analytic software proactively monitors the health of your archive devices.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

You might also like