You are on page 1of 179

Archiving and Content Indexing Training Guide

R00.2

ii Copyright 1999-2008 CommVault Systems, Inc. All rights reserved. CommVault, the CV logo, CommVault Systems, Solving Forward, SIM, Singular Information Management, Simpana, CommVault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, GridStor, Vault Tracker, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, and InnerVault are trademarks or registered trademarks of CommVault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.

CommVault Archiving and Content Indexing Course R00.2

iii

TABLE OF CONTENTS

Migration Archiving Module .....................................................................................................1 Overview .................................................................................................................................2 Why Migration Archiving? .....................................................................................................3 Moving your Data ................................................................................................................... 4 Supported Applications ...........................................................................................................5 Setting up for Archiving .........................................................................................................8 Windows/UNIX/Netware .......................................................................................................9 Network Storage ................................................................................................................... 11 SharePoint Archiver .............................................................................................................. 13 Domino E-mail...................................................................................................................... 16 Exchange ............................................................................................................................... 22 Publishing the Organizational Form ..................................................................................... 24 Outlook Add-In ..................................................................................................................... 26 Outlook Web Access (OWA) ............................................................................................... 28 Using Local (Offline) Archive .............................................................................................. 30 PST File Considerations .......................................................................................................33 Archiving PST Files .............................................................................................................. 34 Configuring Migration Archiving in the CommVault Software...........................................36 Agents ...................................................................................................................................37 Default Archive Set............................................................................................................... 39 Subclients .............................................................................................................................. 41 Defining Content ................................................................................................................... 43 Archive Rules........................................................................................................................ 44 Exchange Archive Rules .......................................................................................................45 SharePoint Archive Rules .....................................................................................................46 Other Subclient Options ........................................................................................................48 Data Classification ................................................................................................................ 50 Data Classification Overview ............................................................................................ 51 Data Classification Operation ............................................................................................ 53 Data Classification - Rules....................................................................................................55 Migration Archive Process ...................................................................................................58 Stubbing ................................................................................................................................ 59 On Demand Archive ............................................................................................................. 60 Recovering Archived Data ....................................................................................................62 Archiving Tools .................................................................................................................... 64 Best Practices ........................................................................................................................ 68 Summary ............................................................................................................................... 71

CommVault Archiving and Content Indexing Course R00.2

iv

Compliance Archiving Module ................................................................................................75 Why Compliance Archiving? ............................................................................................... 76 Setting up Exchange for Archiving....................................................................................... 77 Configuring Compliance Archiving ..................................................................................... 80 Compliance Archive Process ................................................................................................ 83 Retrieving Archived Messages ............................................................................................. 85 Summary ............................................................................................................................... 87 Content Indexing Module.........................................................................................................89 Why Content Indexing? ........................................................................................................91 Functional Architecture & Terminology .............................................................................. 92 Content Indexing Roles.........................................................................................................94 Content Indexing Data flow ..................................................................................................95 Planning Offline Content Indexing ....................................................................................... 96 Minimum System Requirements........................................................................................... 97 Install Options ....................................................................................................................... 99 Single Node vs. Multi Node................................................................................................ 100 Folder Locations ................................................................................................................. 102 Sizing Considerations .........................................................................................................103 Performance Considerations ............................................................................................... 105 Location Considerations .....................................................................................................107 Online Content Indexing .....................................................................................................109 Minimum System Requirements......................................................................................... 110 Installation & Configuration ............................................................................................... 111 Maintaining Content Index Engine ..................................................................................... 113 Protecting the Content Index Engine .................................................................................. 114 Recovering a Content Indexing Node ................................................................................. 115 Content Indexing Operations .............................................................................................. 116 Configure Storage Policy for Content Indexing .................................................................117 Selecting Jobs for Content Indexing ................................................................................... 119 Running Content Indexing Jobs .......................................................................................... 120 Monitoring Content Indexing ............................................................................................. 121 Content Director.................................................................................................................. 122 Best Practices ...................................................................................................................... 124 Summary ............................................................................................................................. 125 Search & Discovery Module ..................................................................................................127 Overview ............................................................................................................................. 128 Preliminaries ....................................................................................................................... 129 Functional Architecture & Terminology ............................................................................ 130 Types of Search................................................................................................................... 132 Supported Applications and Data Types............................................................................. 134 Search Consoles .................................................................................................................. 135 CommCell Console Search .................................................................................................136 Search Options .................................................................................................................... 137

CommVault Archiving and Content Indexing Course R00.2

v Restore Options ................................................................................................................... 138 Web Search Console ........................................................................................................... 139 Installing Web Search Server.............................................................................................. 140 End User Search Console....................................................................................................141 Using the Web Search Console........................................................................................... 142 E-Mail Filters ...................................................................................................................... 144 File Filters ........................................................................................................................... 146 Advanced Filters ................................................................................................................. 147 Delegate Search .................................................................................................................. 148 Compliance Search Console ............................................................................................... 149 Accessibility........................................................................................................................ 150 Discovery Options .............................................................................................................. 151 Job Options ......................................................................................................................... 153 Tag ......................................................................................................................................154 Legal Hold .......................................................................................................................... 156 Legal Hold Overview .......................................................................................................... 157 Legal Hold Set .................................................................................................................... 158 Legal Hold Process ............................................................................................................. 160 Legal Hold Recovery .......................................................................................................... 161 Exchange Message Search ..................................................................................................163 Search Administration ........................................................................................................165 Managing Search Permissions ............................................................................................ 166 User Resource Administration ............................................................................................ 167 User Constraints .................................................................................................................. 168 Protecting the Web Search Server ...................................................................................... 170 Best Practices ...................................................................................................................... 171 Summary ............................................................................................................................. 172

CommVault Archiving and Content Indexing Course R00.2

vi

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 1

Migration Archiving

www.commvault.com/training

Migration Archiving Module

CommVault Archiving and Content Indexing Course R00.2

2 - Migration Archiving Module

Overview
Why Migration Archiving? Setting up for Archiving Configuring Migration Archiving in the CommVault software Migration Archive Process Recovering Archived Data Best Practices

Overview

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 3

Why Migration Archiving?


Moving your Data Supported Applications

Why Migration Archiving?


The primary purpose of a migration archiving operation is to move data from primary storage to secondary storage, in order to free up primary storage space and to reduce the amount of data that is backed up. Significant cost savings can be realized by not storing infrequently used data on primary storage, and by minimizing the time it takes to perform backups. Also, archived data can be easily recovered when it is needed, from either the CommCell Console or through the use of a third-party application.

CommVault Archiving and Content Indexing Course R00.2

4 - Migration Archiving Module

Moving your Data


Move from expensive primary storage to lower cost secondary storage Support for different data types

Email

Archive

Tape

Delete File & Print Document Management Tape

Moving your Data


File Archiver is a hierarchical data management solution that allows an organization to reclaim critical primary storage space by moving and automatically recalling data from secondary storage for active use. By applying policies to select, move and retain fixed content, organizations gain significant economic and productivity benefits. It allows companies to address aggressive data growth by redistributing older data across tieredstorage architecture. File Archiver transparent in-place recall mechanism maximizes data accessibility for users, without imposing the need for extensive training, physical staging or catalog maintenance over the datas lifecycle. With this, administrators maintain more efficient primary storage architectures. As part of the CommVault platform, File Archiver fully complements data protection strategies for various environments. The impact from archiving data is immediate: the substantial reduction in the production environments physical storage space directly translates into reclaimed disk capacity and improvements in the speed and size of backup and recovery operations.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 5

Supported Applications
Windows, Unix, Netware, File System NetApp, BlueArc, EMC Celerra File System Exchange Mailbox and Public Folders SharePoint Documents Domino Mailbox

Supported Applications
Microsoft Windows File Archiver for Windows is a policy based migration of files from Volumes, Directories or Shares to secondary storage locations. These policies contain rules definable on thresholds, age, size, filters, and or folders. This process is transparent to the end-user and integrated through the file system. Migration also retains the ACL security of the files that are moved. A pointer or Stub replaces the actual file. To the end-user this stub looks like the original file and gives the user or application, the ability to recall the original file for use. All content is containerized and optionally compressed & encrypted. UNIX File Archiver for UNIX supports the ability to migrate and recover data from Network File System (NFS) mount points, through the use of CXFS (which is a special file system type for Data Archiver on UNIX) and the "cxfstab" file. Once you have configured a subclient and run a migration, the directories and mount points specified in the subclient content become CXFS partitions or mount points. Like any other locally mounted file system, CXFS must be explicitly exported in order to become available to remote NFS clients. Netware File Archiver for Netware provides support for native NSS volumes by removing data from production storage and moving it to secondary storage.

CommVault Archiving and Content Indexing Course R00.2

6 - Migration Archiving Module Supported Protocols include NCP with Novell client 4.91 SP2 as well as data recalls by MAC clients, which are supported using the AFP protocol. (Netware 6.5 with Support Pack 6 required). The option for a CIFS share or Netware Client is supported. Network Appliance Data files stored on a NAS device in Windows/CIFS shares are eligible for Archive. The agent must be installed on a host with an existing supported Media Agent component. Note that the DM-NS Agent and Client cannot be located on the same host. The agent enables definition of migration rules, content, and storage of the managed data. The FPolicy Subclient feature for File Share Archiver takes advantage of built-in monitoring services available in ONTAP 7.0 platforms called "FPolicy Screens" to allow archived data residing on CIFS shares to be recalled without the need for the File Share Archiver Client component to be installed on end-user workstations. It does this by eliminating the need for the GXHSM Recall Service to be running on end-user workstations connected to those shares in order to perform stub recalls. The Proxy Stub Subclient feature allows the File Archiver for Windows Agent to archive and recover data residing on an EMC Celerra File Server (running DART OS versions 5.5 and later). Microsoft Exchange The Mailbox Archiver for Exchange Agent will automatically migrate messages satisfying certain criteria, and replace them with stubs containing information for recovery. Users can double-click the message stub in Outlook or Outlook Web Access (OWA) to recover the original message. Alternatively, users can also ask the administrator to browse the CommCell Console to recover the message. The latter method is especially useful if the stub is deleted. Exchange Public Folders The Exchange Public Folder Archiver Agent will automatically archive Public Folder items satisfying certain criteria, and replace them with stubs containing information for recovery. Web Proxy Agent for Exchange If you plan to install the agent in an off-host proxy configuration, or in a 32-bit on 64-bit configuration, and you would like to provide functionality support for the Outlook Add-In, Archived Mail Browser and/or OWA (if applicable), then you must install the OWA Proxy Enabler on the Exchange Server. SharePoint Documents SharePoint Archiver provides data protection support for documents from both versioned and non-versioned Document or Picture Libraries. Exceptions which are automatically filtered out of archive operations by the system, and cannot be archived can be found in Books Online. Domino Mailbox The Domino Mailbox Archiver Agent is a software module that periodically moves unused or infrequently used Lotus Notes Mailbox messages on a host computer to secondary storage.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 7

CommVault Archiving and Content Indexing Course R00.2

8 - Migration Archiving Module

Setting up for Archiving


Windows/UNIX/Netware Network Storage SharePoint Domino E-mail Exchange

Setting up for Archiving


Each data type has different requirements for configuration and Archive. In the following sections we will explain each agent and the nuances of each.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 9

Windows/UNIX/Netware
Supported Data Types Supported File Systems Supported Protocols Whats not recommended to archive

Windows/UNIX/Netware
Supported Data Types The File Archiver Agent provides data protection support for all data types, except for those listed below. This agent does not require the operating system File System agent to be installed. Not Recommended for Archiving The following file types/objects are not recommended for archiving. Although it is possible to archive them, doing so could cause applications or the operating system to not function properly. Files with extensions *.dll, *.bat, *.exe, *.cur, *.ico, *.nlm, *.a, *.ksh, *.csh, *.sh, *.lib, *.lnk, and *.so are automatically filtered out of subclient content by default in the Subclient Properties (Filters) tab, therefore the system will not archive them. Application files/folders including software installation, database and mount path folders on which backup data resides, must be manually filtered out of subclient content.

CommVault Archiving and Content Indexing Course R00.2

10 - Migration Archiving Module

Not Supported for Archiving The following file types/objects are automatically filtered out of archive operations by the system, and cannot be archived. CommVault software specific: Install folder Job Results folder Index cache folder Log files folder Windows specific: Windows system32 folder Mount points to volumes Unix data residing on a Windows NFS share Files with attributes of encrypted or sparse (e.g., DataArchiver stub files) Files with attributes of hidden and system (Note that these file types can only be archived through the use of the GXHSMIFINDDISABLEHIDDEN registry key.) UNIX specific: /usr system directory /kernel system directory /etc system directory /tmp directory Netware specific: NetWare system folder Mount points to volumes Files with attributes of hidden, system, read-only or REMOTE_DATA_INHIBIT Files smaller than 4 KB

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 11

Network Storage
Secondary Storage

Data Archiver
Media Agent

Archive to disk

Archive To Tape

CIFS Data View

Primary Storage

Data Archiver for Network Storage Clients

Network Storage
The File Share Archiver Agent is a software module residing on NetApp, EMC or Blue Arc NAS devices that is responsible for periodically moving unused or infrequently used data on a host computer to secondary storage, thereby reducing the size of data on the primary storage. Migration Archiver Agents reduce the duration of backup windows by reducing the amount of data to be backed up by an iDataAgent There are several components in addition to the CommServe database, Media Agent and File System iDataAgent, components to be installed for File Share Archiver: File Share Archiver Agent File Share Archiver Client (optional) These components can be installed locally, and the Client component can also be remote installed. The remote install feature provides the administrator an avenue to install the File Share Archiver Client to one or multiple network enterprise computers without physically going to each computer. To use Fpolicy, the FPolicy Subclient feature for File Share Archiver takes advantage of built-in monitoring services available in ONTAP 7.0 platforms called "FPolicy Screens" to allow archived data residing on CIFS shares to be recalled. To start the screening type the following at a command prompt on the filer: Fpolicy create CVNSDM screen

CommVault Archiving and Content Indexing Course R00.2

12 - Migration Archiving Module To use the Proxy Stub Subclient feature for EMC Celerra, the EMC Celerra File Server and IIS must be configured to communicate with the proxy computer hosting the File Archiver for Windows Agent. After the initial setup, the Proxy Stub Subclient is created by the user on the File Archiver for Windows Agent from the CommCell Console, and then authentication credentials must be provided in the Agent Properties in order to monitor shares for transaction requests. Migration Archiving After configuration is completed, the recovery administrator can schedule migration archiving jobs. File Share Archiver will archive the files meeting the pre-set archiving criteria and, if applicable, put them into a list for the stubbing phase and prune expired stubs. Recovery File Share Archiver provides two ways to recover an archived file: from the CommCell Console (stand-alone application) or from Windows Explorer. 1. From the CommCell Console, the browse of File Share Archiver data is forced to be a non-image browse to view all the archived files in this archiving cycle (i.e., since last creation of new index). The recovery administrator can browse and find the file to recover. To find files archived earlier than the last creation of new index, the administrator can provide a point-in-time in the Browse Options dialog box. 2. The second method of recovery is recovering migrated file(s) from stub(s) in Windows Explorer by double-clicking the stub(s).

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 13

SharePoint
Supported Data Types Whats not supported Configuration considerations

SharePoint Archiver
During archive operations, the SharePoint Archiver Agent will archive versioned and nonversioned documents that meet the preset criteria specified in the Archiving Rules tab of the Subclient Properties dialog box. Information about each archived object is placed into a stub (when enabled) that can then be used as a link to recover the data. This agent does not require the Windows File System agent to be installed. Supported Data Types SharePoint Archiver provides data protection support for documents from both versioned and non-versioned Document or Picture Libraries, except for those listed in the section below. Not Supported for Archiving The following file types/objects are automatically filtered out of archive operations by the system, and cannot be archived: List items .aspx files Other Library items (e.g., Form Library items) Configuration Archiving Rules The configuration of the archiving rules is central to customizing migration archiving operations to meet the needs of your organization. The Archiving Rules are discussed in detail in Migration Archiving - SharePoint Archiver in Books Online documentation.

CommVault Archiving and Content Indexing Course R00.2

14 - Migration Archiving Module

Archiving Rules are initially disabled by default. After creating a subclient, you must clear the Disable All Rules option and configure archiving rules before Archive Operations are possible. Otherwise, the system will ignore all archiving rules (except for the Do Not Create Stub rule) for the subclient Migration Archiving After configuration is completed, migration archiving jobs can be scheduled. To ensure that only successfully archived files will be changed into stubs, the migration archiving operation is divided into two phases: archiving and stubbing. In the archiving phase, SharePoint Archiver will archive the files meeting the pre-set archiving criteria and, if applicable, put them into a list for the stubbing phase and prune expired stubs. Refer to Migration Archiving for more information on archiving. Recovery SharePoint Archiver provides two ways to recover an archived file: from the SharePoint Server User Interface or the CommCell Console (stand-alone application). 1. The first method of recovery is recovering archived file(s) from stub(s) in SharePoint Server. In the SharePoint Server User Interface, browse and select the item to restore, expand the Edit menu, and click Recall Archived Data. o In the SharePoint Server User Interface, you can recover items only if Yes appears under the Archive column next to the item. If the Archive column is not visible after archiving, you may need to manually refresh the Browser. 2. The second method is from the CommCell Console. The browse of SharePoint Archiver data is forced to be a non-image browse to view all the archived files in this archive cycle (i.e., since last creation of new index). The administrator can browse and find the file to recover. Migration Archiving Considerations Before performing any archive procedure, review the following information: When the SharePoint Archiver agent archives data, it creates a database file to track the stubs that will appear in the SharePoint user interface. Stubs contain information about the archived data for recovery purposes. This file, SPDAStubDatabase.db, resides in the JobResults folder and is not backed up with the SharePoint Database iDataAgent. Therefore, you should back up this file with the Windows File System iDataAgent to ensure that stubs can be recovered. For successful archiving and recovery, the user performing the operation must have write permissions to the SPDAStubDatabase.db file that resides in the JobResults CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 15 folder. Otherwise, you may receive the error Failed to update database with new version number. At the completion of an archive operation, the SharePoint Archiver checks for any new Virtual Servers. An alert can be configured to restart Internet Information Server (IIS) Services when new Virtual Servers are detected. If all items in a Library with a Major and Minor Version setting are getting archived, the latest version will remain available and a stub will not be created. In a SharePoint environment with more than 500 site collections, it is recommended to increase the size and number of log files with the <logfile>_MaxLogFileSize and <logfile>_MaxLogFileBackups registry keys. Replace <logfile> with any desired name (e.g., SPDBBackup, SPDocBackup, SPDocArchive, etc.).

CommVault Archiving and Content Indexing Course R00.2

16 - Migration Archiving Module

Domino E-mail
Supported Data Types Whats not supported Configuration considerations

Domino E-mail
The Domino Mailbox Archiver Agent is a software module that periodically moves unused or infrequently used Lotus Notes Mailbox messages on a host computer to secondary storage. This agent does not require the operating system File System agent to be installed. The Domino Mailbox Archiver Agent provides support for two distinct archiving scenarios: Migration Archiving, which is useful if you want to reduce the number of messages routinely backed up to primary storage by the Lotus Notes/Domino Server iDataAgent. Compliance Archiving, which is useful if you want to protect messages copied to the journaling mailbox to meet security and compliance standards. These archiving scenarios are further contrasted below: Migration Archiving (Domino Mailbox Archiver Agent with Migration-only configurations set) Message content is moved from the source mailbox to the designated secondary storage location, leaving only the message header and a corresponding stub icon. Users can later recover the original message contents by double-clicking the stub icon.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 17 Primary storage resources and data protection operation time used by the Lotus Notes/Domino Server iDataAgent are reduced. Messages may be recovered in-place (i.e., to the source mailbox) or out-of-place (i.e., to another mailbox). Compliance Archiving (Domino Mailbox Archiver Agent with Migration and Compliance configurations set) Messages copied to the Domino Server journaling mailbox are moved to the designated secondary storage location. Only authorized users may retrieve messages archived from the journaling mailbox. Client mailbox copies of archived messages remain intact within their respective client mailboxes. Messages are archived securely with no impact on the source mailboxes or databases. Messages are retrieved out-of-place (i.e., to a secure mailbox created exclusively for compliance purposes). Supported Data Types The Domino Mailbox Archiver Agent provides data protection support for all .nsf mail databases visible to the Domino Server. Data Protection Operations for all other data types not mentioned above are not supported by the Domino Mailbox Archiver Agent. Stub Icons After installing the Domino Mailbox Archiver Agent, the Lotus Notes Mailbox master template should be configured to display a stub icon for each message archived. When this configuration is performed, the Lotus Notes client mailbox will automatically display archived messages with a stub icon. Users can then recall the archived message by opening the message in the Lotus Notes client mailbox. For more information on configuring stubs refer to Books Online Documentation. Migration Archiving Once the Domino Mailbox Archiver Agent is installed and the Domino Server has been configured to display stub icons, the administrator can schedule migration archiving jobs. When mailbox messages are archived in a migration archiving scenario, the content of the messages, including any attachments, are automatically removed from the source mailbox as they are copied to secondary storage. However, the source mailbox will retain the original message header, as well as a corresponding stub icon, both which can be used to recover the message at a later time.

CommVault Archiving and Content Indexing Course R00.2

18 - Migration Archiving Module Recovery The migration archive process allows two methods for recovering an archived message: using the Browse functionality of the CommCell Console. double-clicking the message's stub icon in the source Lotus Notes client mailbox. double-clicking the message itself in the source Lotus Notes client mailbox. Compliance Archiving and Retrieve Once the Domino Mailbox Archiver Agent is installed and the Domino Server has been configured to display stub icons, the administrator can schedule compliance archiving jobs. The Domino Mailbox Archiver Agent works in conjunction with the message journaling feature of the Domino Server software to archive all incoming and outgoing messages and attachments. As incoming or outgoing messages pass through the Domino Server, they are instantly captured and recorded in the Domino Server Journal Mailbox. The Domino Mailbox Archiver Agent can then archive the journal mailbox to secondary storage, where the messages will remain readily available for access and retrieval by authorized users. Once archived, all messages are protected from any editing or alteration. Retrieve To retrieve archived messages, you can simply search or scan the secondary storage to which the journaling mailbox messages were archived using certain criteria, such as the contents of the subject, the original sender (from) and the original intended recipient (to). Additionally, if Content Indexing is enabled you can also search for messages by their content and attachments. Archiving Considerations To allow stub icons to be placed with archived messages in the Lotus Notes client mailboxes, each mailbox must be configured to display stub icons prior to performing any archive operation. Messages that have been migrated do not automatically appear as stubs in the Lotus Notes client mailbox when the archive operation has been completed. To display the stubs created during the archive operation, users must refresh the client mailbox manually by either: using the F9 function on the keyboard. restarting the Lotus Notes client mailbox. Filters can be used in conjunction with the "Items That Failed" list on the data protection Job History Report to eliminate backup or archive failures by excluding items which consistently fail that are not integral to the operation of the system or applications. Some items fail because they are locked by the operating system or application and cannot be opened at the time of the data protection operation. This often occurs with certain system-related files and database application files.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 19

Lotus Notes Client Add-In


Browse and Recover Messages Archive and Recall Messages Accessible from the Actions Menu option

Overview The Lotus Notes Client Add-In provides Lotus Notes users with a convenient way to browse and recover mail messages that were archived and removed from their Lotus Notes mailboxes. Once the e-mail messages are located, they are restored back to the All Documents view by default. In addition, a time range for the browse operation can be specified, and messages meeting the browse time range criteria can then be selected and recovered. The Lotus Notes Client Add-In also provides the facility to archive and recover messages while offline. This capability is useful if users need to access archived mail messages but are not able to connect to the Domino Server. Browsing Messages Once installed on the Lotus Notes Client computer, the Lotus Notes Client Add-In is accessible from the Lotus Notes client application's Actions menu. Users simply need to click the Find and Recover menu option to launch the Add-In's Browse Option dialog box. From here, users can choose two distinct options to find their messages: Browse the Latest Data, which will include all messages archived during the last archive operation. Specify Browse Time, which will include only messages created during a period of time specified by the user. Users can also choose to exclude messages that were created prior to a specific point-in-time.

CommVault Archiving and Content Indexing Course R00.2

20 - Migration Archiving Module Offline Archive and Recovery The Lotus Notes Client Add-In's Offline Archive feature allows you to retain a full copy of archived messages on the Lotus Notes client computer for recall access where a network connection is not available. These copies will reside in the original mailbox until an offline archive operation has been performed. At that point, these copies will be moved to a local cache, leaving a stub in the original mailbox. Configuring Offline Archiving Options Optionally, Lotus Notes Client users can configure the amount of time desired to retain messages that were archived offline. This capability helps reduce the amount of space consumed in the local computer's cache. Once these options are configured, users are automatically given an opportunity to perform a new archive operation with the new settings.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 21

Exchange
General Information The Organizational Form Outlook Add-in Outlook Web-Access PST File considerations

CommVault Archiving and Content Indexing Course R00.2

22 - Migration Archiving Module

Exchange General Information


Supported Data Types Whats not supported Configuration consideration

Exchange
The size of Exchange Server Information Store will grow over time. To keep its size under control, administrators and users have to either delete or archive unwanted messages periodically to release mailbox disk space. The Exchange Mailbox Archiver Agent is designed for this purpose. It will automatically archive messages satisfying certain criteria, and replace them with stubs containing information for recovery. A reduction in the mailbox size shown in the Exchange System Manager will be seen immediately after a migration archiving operation. However, in order for the database size to show the space savings, an offline defragmentation will need to be run. Supported Data Types Protection Operations for the following data types are supported by the Exchange Mailbox Archiver and the Public folders Agents: Mailboxes and public folders Folders within a mailbox or public folders Messages within a folder Attachments PST files

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 23 Exchange Mailbox Archiver Agent For Exchange Server 2003/2007, this component can either be installed on an Exchange Server or on an off-host proxy computer, and is similar to the Exchange Mailbox iDataAgent installation. For Exchange 2000, this component is installed on the Exchange Server. Archive and Recovery After configuration is completed, the administrator can schedule migration archiving jobs. DataArchiver will archive the messages/items meeting the pre-set archiving criteria and, if applicable, put them into a list for the stubbing phase and prune expired stubs. During this phase of the migration archiving operation for the Exchange Mailbox Archiver Agent, messages and folders that were added to the Archive List in Outlook will also be included in the migration archiving operation. Migration archiving of data in the Notes folder is not supported by the Exchange Mailbox Archiver Agent. Recovery There are four ways to recover an archived message: from the CommCell Console (standalone application), Archived Mail Browser, Outlook, or Outlook Web Access (OWA). From the CommCell Console, the browse of Exchange Migration Archiver data is forced to be a non-image browse to view all the archived messages/items in this migration archiving cycle (i.e., since last creation of new index). The administrator can browse and find the message/item to recover. Recovering All Protected Mail The Exchange Mailbox Archiver Agent provides an option to recover all archived e-mail messages for specified mailboxes or folders on active indexes (i.e., data that has not been aged). This feature is useful for recovering all protected data from deleted mailboxes or folders in a single recovery operation, without having to perform a separate recovery operation for each index cycle. Data can be recovered from the latest index, or a point-in-time, all the way back to the least recent index that has not been aged.

CommVault Archiving and Content Indexing Course R00.2

24 - Migration Archiving Module

Publishing the Organizational Form


Setup prior to installing the Outlook add-in Provides special icons in Outlook

Publishing the Organizational Form


OFL Configuration Before installing the DataArchiver Outlook Add-In you must publish a form in the Exchange Server Organizational Forms Library (OFL) from a computer that has Outlook installed. This task only needs to be performed once per Exchange Organization. Initially, the Exchange administrator needs to create the Organizational Form Library if is does not already exist. After the OFL is available, the administrator can log on as the owner of the OFL from a computer with Outlook installed and run the PublishForm tool (located on the Resource Pack) to publish the forms. If there are multiple Exchange Servers, the OFL should be replicated to make the forms available to other servers. In addition, you need to ensure that only one OFL per language exists in the Exchange Organization, taking replication into account. For details about how to create and manage the OFL, please refer to Exchange Server documentation and related Microsoft Knowledge Base articles. Steps 1 through 5 are for creating the OFL, and Steps 6 through 8 are for publishing the forms. If you have already created an OFL, then you can use the existing one and dont need to create a separate OFL. In that case, just follow Steps 6 through 8.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 25 Creating the OFL for Exchange 2000/2003: 1. Open the Exchange System Manager and expand the folders node. 2. Right-click the Public Folders tree and select View System Folders from the context menu. 3. Right-click the EFORMS REGISTRY system folder and select New -> Organizational Form from the context menu. 4. Name the new Organization Form "DataArchiver Organizational Forms" and make sure that the Language field is set to English (USA), select Apply then click OK. 5. Exit the Exchange System Manager. Creating the OFL for Exchange 2007: Refer to the CreateOFL instructions (located in the Intel32\Windows\CreateOFL folder on the Resource Pack) for information on creating a custom Public Folder using the Exchange Management Shell. Afterwards, run the CreateOFL tool to create the OFL. Once the OFL has been created, continue on to the procedure for Publishing the Forms below. Publishing the Forms: 6. Next you will need to publish the forms using the latest version of the PublishForm tool and OFT forms (located in the windows\DMEPublishForm folder on the Resource Pack or in the tools\Exchange DataArchiver folder on the installation DVDROM, whichever is most current). After identifying the location of the latest version of the PublishForm tool and forms, double-click the publishform.exe file from the appropriate folder listed above, then click Publish. NOTE: In the event that this step fails, it may be due to Outlook and Exchange Server being installed on the same computer, which is not recommended by Microsoft. In such a situation, we recommend that you perform this task from a computer that has Outlook but not Exchange Server installed. 7. The publish form program will prompt for a profile to use in order to publish the forms to the DataArchiver Organizational Forms (or previously existing OFL) folder. The profile used to publish the forms must have owner privileges to that folder. 8. Close the publish form program.

NOTE: An Organizational Forms Library is a special type of public folder that is listed only with system folders. You can have only one Organizational Forms Library for each language per organization.

CommVault Archiving and Content Indexing Course R00.2

26 - Migration Archiving Module

Outlook Add-In

Adds items to Outlook Tools Menu


Add/remove items for Migration Search and recall migrated messages

Replaces migrated messages with stubs Offline Archive for local migration

Outlook Add-In
The DataArchiver Outlook Add-In software needs to be installed on every Outlook client and registered. This enables Outlook users to identify the messages that have been archived by changing the icon of the message to indicate that it has become a stub. The Outlook Add-In also enables Outlook users to add data objects such as mail messages or folders into a Archive List for later archiving, remove data objects from the Archive List, as well as recover archived data from stubs. Additionally, mail messages and folders archived with the Exchange Mailbox Archiver Agent can be browsed, searched and recovered or erased through Outlook for clients that have the DataArchiver Outlook Add-In installed. Similar browse, search and recovery or erase data capabilities are also available through the CommCell Console (as a Java applet), which does not require Outlook to be present on the computer. Using the Archive List The DataArchiver Outlook Add-In will add a series of menu items under the Outlook Tools menu allowing the end-user to make additions to the Archive List, and remove entries from the Archive List. When selected messages or folders are added to the Archive List, then the next migration archiving operation on the Exchange Server will include them for archiving.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 27 Recovering a Mail Message In Outlook, the stubs will appear as messages/items with the icon. The stubs include information on sender, recipient, subject and body text if Archive message attachments only option is enabled. To find a stub, users can look for the icon in the folder view. Once a stub is located, users can open the stub by clicking on it in Outlook. The Outlook Add-In will intercept the Outlook users open item event and then ask for the users confirmation. Once the user decides to recover the original message or item, the Outlook Add-In will get the recovery information from the stub and submit the recovery request to the system. A recovery job will be started and the message/item will be recovered back to Exchange Server. After the recovery is completed, Outlook users will see the recovered message/item. During recovery, the Outlook Add-In allows the user to cancel the recovery job. At the end of recovery, the Outlook Add-In will display the result and related information to the user. Because a recovered message/item will have the most recent modification time, it will not be archived again until it meets the minimal message age or is added to the Archive List by the Outlook user (applicable for Exchange Mailbox Archiver), unless the user decides to delete the stub or the recovered message/item after use.

For data archived after installing version 7.0 Service Pack 3, Exchange 2003/2007 Mailbox Archiver allows Subject line text to be customized using the StubIdentifier registry key, otherwise no stub identification text will display in the Subject line. Location - Windows HKEY_LOCAL_MACHINE\SOFTWARE\CommVault Systems\Galaxy\Instance<xxx>\MSExchangeDMAgent, New String Value "StubIdentifier" (any valid string works). When this key is not present, the default behavior is to NOT display any string at the beginning of the Subject line of archived messages in Outlook or OWA. User-created in the computer in which the agent is installed. This key allows you to enter a customized text string that will be displayed at the beginning of the Subject line of archived messages in Outlook and OWA. Archived Mail Browser The Archived Mail Browser is a type of search tool for end-users that provides browse, search and recovery capabilities for archived e-mail messages residing within their mailboxes. It offers an alternative interface to the DataArchiver Outlook Add-In for performing these tasks, and does not require the Outlook application to be present on the computer.

CommVault Archiving and Content Indexing Course R00.2

28 - Migration Archiving Module

Outlook Web Access

Outlook Web Access (OWA)


When an Exchange Migration Archiver Agent is installed, you have the option to enable support for recovering archived message/item stubs from Outlook Web Access. If this option is enabled, the stubs will appear as messages in OWA, which can double-clicked for recovery. Since no stub icons are available for OWA 2003, the following enhancements have been made to allow you to easily distinguish them from normal messages in that version of OWA and any supported version of Outlook: For Public Folder Archiver and 7.0 pre-Service Pack 3 data archived by Exchange 2003/2007 Mailbox Archiver, message stubs are distinguished in Outlook and OWA 2003/2007 by displaying the word "ARCHIVED:" in the Subject line. For data archived after installing 7.0 Service Pack 3, Exchange 2003/2007 Mailbox Archiver allows Subject line text to be customized using the StubIdentifier registry key, otherwise no stub identification text will display in the Subject line. OWA Proxy Enabler If you plan to install the agent in an off-host proxy configuration, or in a 32-bit on 64-bit configuration, and you would like to provide functionality support for the Outlook Add-In, Archived Mail Browser and/or OWA (if applicable), then you must install the OWA Proxy Enabler on the Exchange Server. If the DataArchiver WebProxy Agent for Exchange is installed on an IIS server in your network, during the install of the DataArchiver Outlook Add-in, you can select the option to CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 29 route stub recovery through the IIS server over HTTPS. Normally, two direct channels of communication are opened from the Outlook Add-in client, one to the Exchange Server and one to the CommServe, over ports 8401 and 8402. When using the WebProxy Agent, a single channel of communication is opened through the IIS server. Stub recovery is performed through a single, encrypted and authenticated SSL (Secure Sockets Layer) port. This solution takes advantage of the RPC over HTTP facilities of Windows Server 2003; therefore, this solution only applies to Exchange 2003 accounts on an Exchange 2003 Server, running on Windows Server 2003. . To change the method of communication for stub recovery (HTTPS, or Direct) after installation, uninstall and then reinstall the Outlook add-in, selecting the desired method of communication. To avoid possible intermittent message recall failures when using the Outlook Add-In in RPC over HTTP mode, worker process recycling and Web Garden options under IIS 6.0 should be disabled for the DMProxy Service. In order to support stub recalls from OWA for Exchange 2003 agents in an off-host proxy configuration, run the RegisterWSSForm tool located on the Resource Pack, after installing the OWA Proxy Enabler. For Exchange 2007 on a cluster virtual server, in either an off-host proxy configuration or a 32-bit on 64-bit configuration, because of a Microsoft issue with Exchange 2007 Server, components must be deployed in the following manner: The cluster virtual server must have the Exchange Mailbox role only, not the client access role. The Exchange Client Access role must be on another, non-clustered server. For OWA recalls, OWA Proxy Enabler must be installed on both servers. For functionality support for the Outlook Add-In and Archived Mail Browser, OWA Proxy Enabler must be installed on the cluster virtual server which has the Exchange Mailbox role. For Exchange 2007 in a non-clustered environment, when deployed with the Exchange Mailbox role on one server and the Exchange Client Access role on another server, components must be deployed in the following manner: For OWA recalls, OWA Proxy Enabler must be installed on both servers. For functionality support for the Outlook Add-In and Archived Mail Browser, OWA Proxy Enabler must be installed on the server which has the Exchange Mailbox role.

CommVault Archiving and Content Indexing Course R00.2

30 - Migration Archiving Module

Using Local (Offline) Archive


Da ta

Data

Archived Data

Data is archived to secondary storage and locally.

Using Local (Offline) Archive


Offline Archiving is the process where full copies of archived messages or items are maintained in local cache on the Outlook client to provide end-users with the ability to recover archived data even when not connected to the Exchange Server. This is especially useful for remote users who are working offline yet need the capability to recover mail messages or other items in their inbox that have been archived. When Offline Archiving has been enabled on the subclient and configured for use on the Outlook client, any items that are archived off the Exchange Server to secondary storage are also archived to a file that resides in local cache on the Outlook client for easy offline recovery. Keep in mind that Outlook Add-In users must have local administrative level privileges to configure their clients for offline archiving. Configuring the Client for Offline Archiving Before you can start using this feature there is some initial setup required to configure an Outlook client for offline archiving. The sequence of steps to set up offline archiving is provided below, for more detailed information see documentation. 1. Enable the Outlook Add-In client to display the "Change Offline Archiving Options" option in the Mailbox Archiver Tools menu by changing the value of the UIOptions registry key to the appropriate number. See UIOptions in the Registry Keys appendix for more information. CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 31 2. Enable the "Offline Archiving" option in the Subclient Properties (General) tab. 3. Optionally, you can configure the amount of time you want to retain files created by the offline archive process by specifying the local file Pruning options. 4. Optionally, you can customize the location on the Outlook client where you would like the files containing full copies of the archived items to reside by creating the LM_PST_Location registry key, otherwise the default location of <software_install_path>/AddInLM will be used. See LM_PST_Location in the Registry Keys appendix for more information. 5. Optionally, you can customize the settings for the frequency and time at which Offline Archive jobs will automatically execute by changing the values for the AutoLMJobFreq and AutoLMJobRunTime registry keys. Note that the Outlook session must be active in order for the system to automatically execute Offline Archive jobs. 6. Optionally, you can configure the EnableLMToolBar registry key to display a toolbar button in Outlook for triggering an on-demand Offline Archive operation to synchronize with the Exchange Server. 7. Optionally, you can create the nOfflineArchivingTimeout registry key to configure a timeout value to complete the stubbing operation for messages marked as offline archiving candidates, to avoid situations where messages are held indefinitely in the offline archiving candidate state because no manual or automated offline archiving operation is being performed on the Outlook client. Once the above setup tasks have been completed, any migration archiving operations performed for the subclient with the Offline Archiving option enabled, will have the added benefit of performing offline archiving on Outlook-Add In clients configured for offline archiving. Keep in mind that offline archiving will only be performed for users whose mailboxes are included in the content of the subclient configured for offline archiving when they are using the Outlook Add-In. Also, local users logged into Outlook Add-In must have administrative level privileges to configure their clients for offline archiving. Best Practices You must first perform a migration archiving operation from the CommCell Console before you can perform an Offline Archive operation from Outlook. It is recommended that Offline Archiving operations be performed on a regular basis for performance reasons. Frequent Offline Archiving can also prevent potential file corruption in cases where ANSI-based files created by Offline Archiving for Outlook 2000 exceed the 2GB limit. Keep in mind that Unicode-based files created by Offline Archiving for Outlook 2003/2007 have a 20GB recommended limit. To support Offline Archive functionality, it is recommended that the Outlook Add-In not be used on clients that have multiple instances, and is intended for use on enduser desktop workstations. If you would like to append the Subject line of messages archived offline with a customized text string, this can be accomplished using the LMArchivePrefix registry key.

CommVault Archiving and Content Indexing Course R00.2

32 - Migration Archiving Module Files created by the Offline Archive process for Outlook 2003 and higher will be in Unicode format. The Offline Archive process allows users to perform other tasks while the Offline Archive progress dialog displays. Recalled messages are opened automatically after recovery, including messages that were archived offline.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 33

PST File Considerations

Archiving Requirements for PST Files


PST

Exchange

PST

PST File Considerations


In a typical organization, end-users' personal folders (*.PST files) reside on their local hard drives, network shares or a combination of both. Since PST files can only be archived from either a local drive of the Exchange Server or a network share connected to the Exchange Server, the PST files need to be collected from various places and moved to the appropriate centralized location for archiving. After the PST files have been moved to a centralized location, they can then be archived to secondary storage by the Exchange Mailbox Archiver Agent.

CommVault Archiving and Content Indexing Course R00.2

34 - Migration Archiving Module

Archiving Requirements for PST Files


Cannot be password protected PST Folder structure must follow MAPI directory name structure PSTs can be migrated with File Archiver Recall into a pst file or Recovered Items folder of the mailbox

Archiving PST Files


There are two different methods for moving PST files to a centralized location accessible by the Exchange Server, which are described below: The preferred method is the PstDiscovery tool, available in your Resource Pack, which automates moving PST files to a centralized location for archiving. The destination share is specified as a parameter which is passed to the tool on the command line. It is recommended that the centralized location to which the PST files will be moved is a share that is not on the Exchange Server. If locating PST files is a problem, another utility is available called PST Hound. This utility can locate PST files throughout the enterprise. It is a Visual Basic script that can be manually run against servers and workstations. This can be done as part of a logon routine to discover PSTs on end user workstations. The PST Hound also provides information about who is accessing or has access to the PST file as well as the location of the file. The PST Hound is available with Professional Services only. For step-by-step instructions on using PSTDiscovery tool, see Books Online documentation. Ensure the following before archiving PST files: The PST files to be archived must reside on either a local drive of the Exchange Server or a network share connected to the Exchange Server. There must be no passwords for the PST files. CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 35

MAPI32.dll Requirement for the Exchange 2003/2007 Mailbox Archiver Agent For PST support in ANSI format, you do not need to copy the MAPI32.dll, because the agent supports PST files in ANSI format by default. For PST support in UNICODE format, you must manually copy Outlook's MAPI32.dll into the <Software Installation Path>\Base installation folder prior to performing a PST archiving operation or PST recovery with this agent. Follow the instructions below to locate the correct MAPI32.dll for your version of Exchange:
o

For Exchange Server 2007, ensure that Outlook 2007 is installed on the client, then copy the MAPI32.dll file from c:\windows\syswow64 into the <Software Installation Path>\Base installation folder. For Exchange Server 2003, ensure that Outlook 2003 is installed on the client, then from Registry Editor, locate the path listed for the MSMAPI32.dll under the key: HKEY_LOCAL_MACHINE\SOFTWARE\Clients\Mail\Microsoft Outlook\DLLPathEx, which is typically C:\Program Files\Common Files\System\MSMAPI\1033. From this registry key, note the path where MSMAPI32.dll resides because this is the same folder in which the correct UNICODE MAPI32.dll resides. Do not copy MSMAPI32.dll, only copy MAPI32.dll from that folder into the <Software Installation Path>\Base installation folder, then PST archiving operations and recoveries using the Exchange 2003 Mailbox Archiver Agent will be correctly set up to run.

If the location for the PST files to be archived is a network share connected to the Exchange Server, then the MailboxAdmin user account associated with the agent (that was set up prior to agent installation) must have the necessary permissions to access those files on the share. Regardless of whether the location for the PST files to be archived is a local drive on the Exchange Server or a network share connected to the Exchange Server, the folders where the PST files reside must follow the same hierarchical structure as Exchange MAPI directory names.

CommVault Archiving and Content Indexing Course R00.2

36 - Migration Archiving Module

Configuring Migration Archiving


Archive Agents Default Archive Set Subclients

Configuring Migration Archiving in the CommVault Software

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 37

Archive Agents

File System

Stub Recovery - Limits amount of recalls

Exchange
Profile and Mailbox Name Auto Discovery AD Server List Customized Messages

Agents
Configuring and Using the Agent From the General tab of the Properties page of the Data Archiver iDataAgent you can configure the following options. Setting the Stub Recovery Parameters Data Archiver for File System Agents, can enable recall throttling to limit the maximum number of stubs that can be recovered by an agent within a specified timeframe. **For the File Share Archiver for Network Storage this feature can be enabled through the following registry entries on the computer that has the NAS client component installed GXHSM_limit GXHSM_interval GXHSM_cooldown Filer tab (Network Appliance) Used to enter authentication information for the domain where the network storage file server resides. This user must have read/write permissions on the CIFS share that contains the files. Modifying Profile and/or Mailbox name (Exchange) You can modify the name of the profile that is associated with the appropriate administrator mailbox for the following agents: Exchange Mailbox Archiver and Exchange Public Folder Archiver.

CommVault Archiving and Content Indexing Course R00.2

38 - Migration Archiving Module Recall items to Recovered Items Folder for Outlook/Outlook Web Access (Exchange) Specifies whether messages or items recalled from Outlook should be placed in the Recovered Items folder. When selected, data will be recalled to the Recovered Items folder. When cleared, data will be recalled to the original folder from which the data was archived. CSVDE Filtering for Discovery Operations A CSVDE filtering option is provided for administrators familiar with LDAP queries to increase performance for manual and auto-discovery operations in cases where the discovery process takes a long time to complete. Filtering objects in this way can reduce the number of objects returned from the query thereby speeding up the process of discovering new mailboxes for data protection. This feature is supported by Exchange Compliance Archiver, Exchange Mailbox Archiver and the Exchange Mailbox iDataAgent. A CSVDE filtering example is provided below, which will filter out system mailboxes residing on the SERVERNAME server from discovery operations for these agents: "(&(msExchHomeServerName=*/cn=SERVERNAME)(!(CN=SystemMailbox{*)))" CAUTION: Use of this option requires expert-level knowledge of CSVDE, which is a Microsoft tool used for extracting and filtering information from Active Directory. Incorrect use of the CSVDE filtering option can result in failed discovery and data protection operations for these agents. For more information on CSVDE, refer to documentation from Microsoft Corporation. Configuring the Active Directory Server List (Exchange) For Exchange Mailbox Archiver you can specify or remove the domain name of one or more Exchange Servers containing mailboxes and accounts that reside in a non-default domain or in multiple domains. To specify or remove the identified domain name open the properties page of the client computer and select the AD Server tab. You can add the domain name of the Exchange server(s) here. Customized Messages (Exchange) The Exchange Mailbox Archiver Agent and Exchange Public Folder Archiver Agent allow you to customize recovery messages that are displayed in Outlook during stub recovery operations. Messages that can be customized include the "Recovery in progress" message for stub recoveries from magnetic and tape libraries, as well as an error message for stub recoveries attempted for media outside the designated library. SharePoint Server Name (SharePoint) Displays the name of the SharePoint Server that is installed on the client computer. Use this space to modify this server name if the name displayed is incorrect (e.g., not the same as the Client name or Host name). SharePoint Administrator Account (SharePoint) Displays the SharePoint Administrator Account that has required rights to create and modify SharePoint databases. This account will be used for data protection and recovery operations.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 39

Default Archive set

File System Archiver

Create Subclients Manually

Exchange Mailbox Archiver

The purpose of Auto Discover

Default Archive Set


Data Archiver for File Systems After installation of the Data Archiver Agents for File Systems, the system automatically creates a default Archive Set. For the Data Archiver Agent for Windows File System, a DataClassSet is also created if Data Classification is installed. For these agents, you must create a user-defined subclient in order to perform migrations on the client. To perform migrations using the agent with Data Classification, the user-defined subclient must be created from the DataClassSet backup set. Since the subclients contain the data to be migrated, users must take extreme care not to delete any subclients that are created. In doing so, all data contained in that subclient will not be available to browse unless a point in time browse is performed using a date before the deletion of the subclient. Default Archive Set in Exchange Mailbox Archiver The Exchange Mailbox Archiver Agent also has a default Archive Set. This type of Archive Set is a complete set of subclients that contain all the Exchange Mailboxes on the client computer.

CommVault Archiving and Content Indexing Course R00.2

40 - Migration Archiving Module Archive Using the Data Classification Enabler The Exchange Mailbox Archiver Agent can use the Data Classification Enabler to facilitate selecting data for archive operations. Discovering and Assigning New Mailboxes (Auto Discover) Over time, new mailboxes are created to accommodate new Exchange accounts. For the Data Archiver for Exchange Mailbox Agent, the system performs an auto-discovery process at the beginning of each migration operation. The auto-discovery process detects any new mailboxes and automatically assigns them to the default subclient (unless otherwise configured for the Exchange Mailbox Archiver Agent). While this feature ensures that all the new mailboxes will be automatically migrated, you may want to manually discover and assign new mailboxes to specific subclients yourself. Exchange Mailbox Archiver Agents allow you to specify the assignment method that will be used to migrate newly discovered mailboxes for all subclients of a particular migration set. Auto-discovered mailboxes can be assigned based on storage group affinity, Active Directory user group affinity, or wildcard pattern (for user-defined subclients only). This feature ensures that all the new mailboxes will be automatically scanned for migration. For theses agents, you must first enable auto-discovery at the Default Archive Set level and specify the assignment method, in order to auto-discover and assign new mailboxes.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 41

Subclients
Defining Content Archive Rules

Exchange specific

Archiving PST files

SharePoint specific

File Share Archive Options Data Classification

Subclients
All Migration Archiver Agents When creating and configuring a subclient you will need to establish the archiving rules for the subclient, in addition to specifying the subclient name, storage policy association and content. If you enable the Do Not Create Stub or Archive Files Only, Do Not Create a Stub archiving rule option, then a non-browse recovery operation is not possible and archived data must be recovered via the browse/recovery operation from the CommCell Console. Exchange Mailbox Archiver Subclient content is defined at the mailbox level. At the beginning of each archive operation, the auto-discovery process detects any new or unassigned mailboxes and automatically assigns them to the default subclient (unless otherwise configured) for archiving. When assigning mailboxes for auto-discovery using regular expressions, keep in mind that the option to Match Mailboxes by Regular Expressions Specified Below is not applicable for default subclients. You must clear the Disable All Rules option and configure archiving rules before rulesbased Archive Operations are possible. Otherwise, the system will ignore all archiving rules (except for the Stub Rules and Select Items with attachment(s) only for items added to the Archive List in Outlook) for the subclient. When archiving rules are disabled for Exchange CommVault Archiving and Content Indexing Course R00.2

42 - Migration Archiving Module Mailbox Archiver, the only way to perform migration archiving operations for the subclient is by manually adding messages to the Archive List through Outlook. Exchange Public Folder Archiver Subclient content is defined at the folder level. You must clear the Disable All Rules option and configure archiving rules for Exchange Public Folder Archiver, otherwise no data will be archived for the subclient. File Archiver and File Share Archiver Subclient content is defined at the folder/directory/volume level. Since default subclients are not supported, you will have to create a user-defined subclient after installing the agent in order to perform migration archiving operations. During subclient creation, you will specify which portion of the client's folders/directories/volumes will be scanned for archiving by the subclient. To support Data Classification, the associated subclient must be created from the DataClassSet Backup Set After creating a subclient, you must clear the Disable All Rules option and configure archiving rules before Archive Operations are possible. Otherwise, the system will ignore all archiving rules (except for the Do Not Create Stub rule) for the subclient. Thus, the only way to perform migration archiving operations for the subclient is by using the On Demand File List feature (not applicable for File Archiver for Unix). For the File Share Archiver Agent, filtering is only supported for wildcard patterns and files on the Subclient Properties (Filters) tab. Paths cannot be filtered out. For the File Share Archiver Agent, if your subclient content includes files from a Unix share (NFS share), and if you perform a (migration) archiving operation on these files, the file names for these files will become corrupted if the file names either include any characters not supported in Windows (including *, ?, , <,>, or |) or contain more than the maximum number of characters allowed by Windows (255 characters). Therefore, after a recover operation, please carefully examine files if any of the files have been added from a Unix share (NFS share). SharePoint Archiver Subclient content is defined up to the library folder level. After creating a subclient, you must clear the Disable All Rules option and configure archiving rules before Archive Operations are possible. Otherwise, the system will ignore all archiving rules (except for the Do Not Create Stub rule) for the subclient.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 43

Defining Content
Agent Type of Data Default Subclient Other Types of subclients supported by the Agent Do Not Archive User Defined Domino Mailbox Archiver Lotus Notes client mailbox messages/Domino Server journaling mailbox messages Exchange mailboxes/messages No

Exchange Mailbox Archiver Exchange Public Folder Archiver File Archiver for NetWare File Share Archiver SharePoint Archiver File Archiver for Unix File Archiver for Windows

Yes

Do Not Archive User Defined User Defined

Exchange public folder messages/items NetWare files Filer data SharePoint Documents Unix files Windows Files

Yes

No No No No No

User Defined Fpolicy Subclient User Defined User Defined User Defined Proxy Stub Subclient User Defined

Defining Content
You can define the content of the subclient. Most agents include a configure button that displays a dialog where you can add or modify the data included as subclient content. Auto-discovery of Mailboxes (Exchange) The Exchange Mailbox Archiver Agent allows you to specify the assignment method that will be used to archive newly discovered mailboxes for all subclients of a particular archive set. Auto-discovered mailboxes can be assigned based on storage group affinity, Active Directory user group affinity, or wildcard pattern (for user-defined subclients only). This feature ensures that all the new mailboxes will be automatically scanned for archiving. For theses agents, you must first enable auto-discovery at the archive set level and specify the assignment method, in order to auto-discover and assign new mailboxes.

CommVault Archiving and Content Indexing Course R00.2

44 - Migration Archiving Module

Archive Rules

Migration Hierarchy
Archiving disabled by default Volume high/low watermarks File size, modification, or access dates Attachments (Exchange)

Selection criteria for files

Rules for Data Classification (names, folders, types)

Archive Rules
When creating and configuring a subclient you will need to establish the archiving rules for the subclient, in addition to specifying the subclient name, storage policy association and content. If you enable the Do Not Create Stub or Archive Files Only, Do Not Create a Stub archiving rule option, then a non-browse recovery operation is not possible and archived data must be recovered via the browse/recovery operation from the CommCell Console. The migration operation is based on subclients, so the following configuration is per subclient basis. The migration options are located in the Subclient Properties (Rules) dialog box.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 45

Exchange Archive Rules


Individual Mailbox folders Disk space watermark Mailbox Quotas Migrations from Outlook Attachments Message rules Archiving PST rules

Exchange Archive Rules


Archiving Specific Root-Level Mailbox Folders The Exchange Mailbox Archiver Agent offers you the option to archive specific root-level mailbox folders (and associated subfolders) instead of the entire mailbox. Since subclient content for this agent is defined at the mailbox level, all mailbox folders are scanned by default and data contained within them are archived according to the archiving rules. This feature offers a solution for further refining the mailbox content that you want archived without the need to apply exclusion filters.

CommVault Archiving and Content Indexing Course R00.2

46 - Migration Archiving Module

SharePoint Archive Rules


Document Rules User Rules Special Considerations

SharePoint Archive Rules


Document Rules are applied to allow the archiving of documents based on a document's age, size, version, type, SharePoint user name, SharePoint site group name, etc. The document rules are split between three tabs - General, User, and Type. An "AND" condition is applied to the rules in each tab. The criteria in all three tabs must be satisfied collectively before documents are archived. General Rules For the SharePoint Archiver, rules relating to age, size, or version of the document can be defined from the Document Rule (General) subtab. The criteria configured below is used with the User and Type criteria collectively must be satisfied collectively before documents are archived. Archive versions older than the last specified versions This option defines the version and any earlier versions to be archived. The number specified here is how many versions of the document that are counted back from the current version. Configuring version criteria is not supported when recovering items from non versionenabled libraries.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 47

User Rules For the SharePoint Archiver, rules relating to including or excluding documents "created by" specific SharePoint individuals or users belonging to a particular SharePoint site group are defined from the Document Rule (User) subtab. The criteria configured below is used with the General and Type criteria collectively must be satisfied collectively before documents are archived. Archive documents whose owner is listed below This option displays the list of SharePoint users whose documents will be archived for this subclient. These users appear under the "Created By" column in the SharePoint UI. You can manually add a user or browse from a list of SharePoint users. When adding, be sure to include the domain name following this format (w2kex\spuser1). Archive documents whose owner belongs to one of the site groups listed below. This option displays the list of SharePoint site groups whose documents will be archived for this subclient. You can manually add a site group or browse from a list of SharePoint site groups. Type Rules For the SharePoint Archiver, you can specify the type of document (e.g., .doc, .pdf, .gif) to be archived. This is defined from the Document Rule (Type) subtab. The criteria configured below is used with the General and User criteria collectively must be satisfied collectively before documents are archived. Archive documents whose type is listed below This option displays the list of document types that will be included in the archive operations for this subclient. When adding a type, be sure to include the dot before the file extension (e.g., .doc, .pdf, .gif). Special Considerations Before performing any archive procedure, review the following information: At the completion of an archive operation, the SharePoint Archiver checks for any new Virtual Servers. An alert can be configured to restart Internet Information Server (IIS) Services when new Virtual Servers are detected. If all items in a Library with a Major and Minor Version setting are getting archived, the latest version will remain available and a stub will not be created.

CommVault Archiving and Content Indexing Course R00.2

48 - Migration Archiving Module

File Share Archive Options


FPolicy Subclient Proxy Stub Subclient

Other Subclient Options


FPolicy Subclient The FPolicy Subclient feature for File Share Archiver takes advantage of built-in monitoring services available in ONTAP 7.0 platforms called "FPolicy Screens" to allow archived data residing on a share to be recalled without the need for the File Share Archiver Client component to be installed on end-user workstations. It does this by eliminating the need for the GXHSM Recall Service to be running on end-user workstations connected to those shares in order to perform stub recalls. To utilize this feature, you must first create an "FPolicy screen" service (using the name: CVNSDM) from the ONTAP Console in order for ONTAP to monitor shares for transaction requests; you must also configure the subclient from the CommCell Console to enable the FPolicy Subclient option and provide the Filer Name and a domain administrator level User Name, in order to recall the data contained within subclients that have FPolicy enabled. NOTES Data is always archived on a CIFS share. For recalls from CIFS clients, the volume or qtree does not need to be a mixed mode type on the filer. For recalls from NFS clients, the volume or qtree must be a mixed mode type on the filer. Refer to vendor documentation for instructions on configuring mixed mode. We recommend using the File System iDataAgent that is installed on the same client as the File Share Archiver Agent to back up the stub files residing on CIFS shares. This can be done by specifying the UNC path for the CIFS share, where the stub files reside, as subclient

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 49 content for that File System iDataAgent. Using any other File System iDataAgent to back up the stubs will execute unintended recalls from the stubs instead of backing them up. The NAS NDMP iDataAgent can be used to back up the stubs without executing unintended recalls. Share names that are exported from NetApp filers should not contain the '$' symbol. Proxy Stub Subclient The Proxy Stub Subclient feature allows the File Archiver for Windows Agent to archive and recover data residing on an EMC Celerra File Server running Dart 5.5, without the need to install the File Share Archiver Agent or Client component. A specially configured subclient of the File Archiver for Windows Agent, referred to as a Proxy Stub Subclient, serves as a proxy to conduct recalls of archived data from the EMC Celerra File Server. To utilize this feature, you must first configure the EMC Celerra File Server and IIS to communicate with the proxy computer hosting the File Archiver for Windows Agent. After the initial setup, you then create the Proxy Stub Subclient on the File Archiver for Windows Agent from the CommCell Console and provide authentication credentials in the Agent Properties in order to monitor shares for transaction requests. This feature requires the following combination of components: A File Archiver for Windows Agent with a Proxy Stub Subclient configured A license for the Proxy Stub Subclient feature. This must be a 32-bit server. NOTES The file system name and the CIFS share name must be the same for successful archiving operations with the Proxy Stub Subclient. For example, if the file system name is vtlib01, then the CIFS share name must also be vtlib01. When mapping the CIFS share from a Windows computer, using this example, it would appear as: \\<celerra_computer_name>\vtlib01

CommVault Archiving and Content Indexing Course R00.2

50 - Migration Archiving Module

Data Classification
Overview Operation Rules

Mikes Excel Spreadsheets

Financial Documents

Classified data

Data Classification
Data Classification is an independent enabler that can be used in conjunction with a several agents including File Archiver for Windows. The Data Classification Enabler helps maximize the scan speed, resulting in faster archive operations.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 51

Data Classification - Overview


Define data by attributes Highly fault tolerant File selection not restricted to location

Data Classification Overview


Data Classification greatly enhances a significant data management process in the industry: the "Information Life Cycle Management" of data, which is the process of determining the "treatment" of data over its life cycle. Traditionally, decisions regarding such treatment have been made based on the location of the data, typically folders/files. However, the folder/file structure suggests that the associated data belongs to one class or category and is therefore not very conducive to categorizing data for classification purposes. By eschewing traditional folder/file boundaries and by searching for common attributes, the Data Classification Enabler provides a much more powerful treatment of data. The Data Classification Enabler is a feature whose purpose is to enhance or "enable" agent capabilities beyond their standard scope. The enhancements that the enabler brings to the agent include: Enhanced and improved scan capabilities and speeds Ability to extend the rules for which data archiving takes place beyond traditional files and folder paths. Supported for the File Archiver for Windows Agent (Local File System instance) and File Archiver for Unix Agent Data Classification is designed to be robust and fault-tolerant. Some of the ways the enabler accomplishes this include the following: It automatically recreates its metadata if the database is deleted or compromised It automatically re-scans if the initial scan does not complete It automatically resynchronizes with the system data if the services are interrupted It automatically detects and scans new volumes as they come online. CommVault Archiving and Content Indexing Course R00.2

52 - Migration Archiving Module Data Classification on Windows can attempt to scan all the affected volumes even if the Data Classification scan fails on one volume. Therefore, the fallback scan methods (including Classic File Scan and Change Journal) if available will be used only on those volumes where Data Classification is not accessible. The following components can be used with the Data Classification Enabler on Unix: File Archiver for Unix Agent Unix File System iDataAgents The following components can be used with the Data Classification Enabler on Windows: Exchange Mailbox Archiver File Archiver for Windows Agent (Local File System instance) Microsoft Exchange Server 2003/2007 Mailbox iDataAgents Online Content Indexing for Exchange Online Content Indexing for Windows File System iDataAgent Windows File System iDataAgents SRM Windows File System Agent Supported Data Types For Windows, NTFS volumes on local disks are supported. For Exchange, all mailbox contents except journal mailbox contents are supported. The Data Classification Enabler on UNIX supports the following file system types: Supported File System(s) Enhanced Journal File System (JFS2) Extent 2 File System (ext2) Extent 3 File System (ext3) Unix File System (UFS) VERITAS File System (VxFS) 'X' File System (XFS) Zettabyte File System (ZFS) Platform Associated with the Enabler AIX Linux Linux Solaris Solaris Linux Solaris

For UNIX, you can add or delete the file system types to be monitored. To use a Data Classification Enabler scan, you must ensure that the Data Classification Enabler software is installed on the client.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 53

Data Classification - Operation


Monitors Change Journal Creates database for each volume Support for NTFS volumes only Administration Utility

Data Classification Operation


Once the Data Classification Enabler is installed and enabled, it performs an initial data collection of all the data and then creates SQL-like (meta) databases. Once this initialization is completed, the supported components can use Data Classification. Database Considerations Exchange To administer Exchange data, the Data Classification Enabler uses two processes: enumeration and sink. The enabler uses enumeration to log in to the Exchange Server, parse each Exchange mailbox on the server, and create a map of the data in the Data Classification database. The enabler uses sink to hook to the Exchange server in order to capture the state changes (events) of the mailbox contents and to record this information in the Data Classification transaction logs. Once these logs reach the specified maximum size, they are consumed in the Data Classification database. As such, the Data Classification database includes a record of the data change events and a corresponding time stamp for each event. UNIX Each meta database contains information about the files in the associated volume. Thereafter, the Data Classification service constantly monitors all the files on these volumes, and it detects new volumes at a prescribed time interval. The service updates the databases and it keeps tracks of the updates (e.g. file additions, content update to files, etc.) made to each database; in effect, this provides almost a real-time view of the data in the system.

CommVault Archiving and Content Indexing Course R00.2

54 - Migration Archiving Module By default, the meta database is located at the root of each mount point, and it is named .db.cv (e.g., for /home, it would be /home/.DATACLASS_1/.db.cv). Journals from the FSF driver is used to keep track of the updates to each meta database. Windows Each meta database contains information about the files in the associated volume. Thereafter, the Data Classification service constantly monitors all the files on these volumes, and it detects new volumes at a prescribed time interval. The service updates the databases and it keeps tracks of the updates (e.g. file additions, content update to files, etc.) made to each database; in effect, this provides almost a real-time view of the data in the system. By default, the meta database is located at the root of each NTFS volume or mount point, and it is named [volume]_db.db (e.g., for C:\, it would be c_db.db). For mount points, the database name is [mountpoint]_db.db (e.g., for a mount point C:\mountpoint, the file is mountpoint_db.db, and it resides in the C:\mountpoint directory). Change Journal is used to keep track of the updates to each meta database. Data Classification works with NTFS volumes but not with FAT volumes. New volumes that are added to your system are automatically recognized. Space and Performance Considerations The meta databases created by Data Classification usually consume about 5% of the total space on the hard disk. Depending on the type of data and folder layout, the metafiles may consume additional space. For Data Classification on Unix, each Data Classification update record consumes about 256 bytes (this assumes an average short name length of 16 bytes and an average full path length of 256 bytes). For Data Classification on Windows, you can administer the size of the Data Classification databases by using the DC_CREATE_INDEX registry key. This key also allows you to administer other items associated with Data Classification, such as the time required for database initialization as well as backup and archiving speed for some agents.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 55

Data Classification - Rules


Folders/Files Owned By File Paths SQL Query Strings

Data Classification - Rules


Rules and Queries for the Data Classification Enabler Data Classification uses the traditional rules established for the File Archiver for Windows Agent. In addition, it provides several unique rules. These rules are available only when Data Classification is enabled for the supported agent. All the rules for Data Classification are configurable from the DataClassSet subclient properties Rules tab of the File Archiver for Windows Agent. Whenever a DataClassSet subclient is created, a default set of rules is established. These rules are reflected in the various rules tabs. Also, the SQL query string, which has its own tab, is automatically disabled; however, the default set of rules is formulated into the SQL query string. At this point, you can start changing rules either from the various rules tabs or within the SQL query string. If you change rules from the various rules tabs, these rules will be formulated into the SQL query string (which is still disabled). However, if you choose to enable and edit the SQL query string, the rules reflected in the string will be enforced, and the rules previously established by the various rules tabs will be disabled. Subsequently, if you disable the rules in the SQL query string, the rules that were previously in effect via the various rules tabs will be enforced. Each DataClassSet subclient has its own rules, and these rules can be shared by other DataClassSet subclients; however, changing the rules in one DataClassSet subclient will not affect the rules in another DataClassSet subclient. CommVault Archiving and Content Indexing Course R00.2

56 - Migration Archiving Module File Archiver for UNIX Agent This agent can use the Data Classification Enabler to define archiving rules based on file attributes and not just on volumes and basic attributes, such as size and modified times. For example, you can use Data Classification to define the agent's subclient content to contain all files starting with 'A', all files modified after a specific date, etc. You can make the associated queries for these and more complex definitions by issuing SQL database-like commands from the CommCell Console against the metadata databases. File Archiver for Windows Agent Local File System Instance This agent can use the Data Classification Enabler to define archiving rules based on file attributes and not just on volumes and basic attributes, such as size and modified times. You can make the associated queries for these and more complex definitions by issuing SQL database-like commands from the CommCell Console against the metadata databases. This agent can use the Data Classification Enabler to support domain users and user groups. You can authenticate against the Active Directory domain the users whose files you want to archive. For Using Data Classification for this purpose is especially useful when you are archiving data for user groups across multiple volumes. Data Classification can archive data for users in these groups using rules that you define without the need for your specifying the exact paths to find this data. Online Content Indexing Online Content Indexing can content-index various data that are scanned or selected by the Data Classification Enabler. UNIX File System iDataAgents These agents can use the Data Classification Enabler to improve the scan speed of file system data before data management operations. If the enabler is not available, Classic File Scan is used to scan the data. Scans using Data Classification for these agents must be enabled from the CommCell Console. Windows File System iDataAgents These agents can use the Data Classification Enabler to improve the scan speed of file system data before data management operations. If the enabler is not available, Change Journal or Classic File Scan is used to scan the data. Scans using Data Classification for these agents must be enabled from the CommCell Console.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 57

Migration Archive Process


Migration Phase Stubbing Phase The On Demand Archive

CommVault Archiving and Content Indexing Course R00.2

58 - Migration Archiving Module

Migration Phase

Archive Job Phases


Moving the data Stubbing Removing old stubs

Migration Archive Process


After configuration is completed, the recovery administrator can schedule migration archiving jobs, similar to scheduling backup jobs for iDataAgents. To ensure that only successfully archived messages will be changed into stubs, the migration archiving operation is divided into two phases: archiving and stubbing. The stubbing phase starts after the archiving phase succeeds. However, stubs will only be created if the subclient properties were configured to create them during migration archiving operations. In the archiving phase, DataArchiver will archive the messages/items meeting the pre-set archiving criteria and, if applicable, put them into a list for the stubbing phase and prune expired stubs. During this phase of the migration archiving operation for the Exchange Mailbox Archiver Agent, messages and folders that were added to the Archive List in Outlook will also be included in the migration archiving operation. Information about each archived item is placed into a stub that can then be used as a link to recover the item.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 59

Stubbing Phase

Replaces the actual files Stub contains the location of the file Has special attributes Usually has a unique icon
Report.doc Financial.xls Summary.doc Financial.xls Summary.doc Report.doc

Stubbing
When files are archived the system removes the data and in place of the file the system leaves a stub that contains information about the location of the actual file which includes the identity of the archive file, and the offset location in the chunk. Basically the "stub" is a Windows file with special attributes. If a recall of a file is initiated through Windows Explorer the index cache is not necessary to locate the file because the location information is contained within the stub. The index cache is only used if the recall is initiated through the CommCell Console. Stub Retention By default, stub retention length is decided by the retention of the archived messages/items associated with the stub. After the messages/items on the media (all the copies) have expired, the stub will be pruned in next migration archiving operation. Users can also specify a value in days for stub retention time, from the subclient properties Archiving Rules tab. Keep in mind that the stub could be pruned before this value, if the storage policy retention time is smaller than this value. Stub retention does not apply when using the Erase Archived Data feature, since it causes the permanent erasure of stubs. When migrated data is erased it cannot be recovered, regardless of the stub retention rules.

CommVault Archiving and Content Indexing Course R00.2

60 - Migration Archiving Module

On Demand Archive
Define data from an external list Archive data regardless of rules defined Provides maximum flexibility

On Demand Archive
On Demand Data Protection Operations allow content to be specified as an external input at the time of initiating the archive operation. Whereas traditional archive operations are performed on subclients, which have fixed content configured prior to performing the operation, On Demand Archive Operations allow you the flexibility of specifying content each time you perform the archive operation. Although the concept is the same, the implementation differs somewhat based on the type of agent, as described below. Windows/Unix/Macintosh File System iDataAgents Content for on demand backups is specified through the use of a Directive File and one or more Content Files, which define data for a subclient of an On Demand Backup Set. NAS NDMP iDataAgents Content for on demand backups is specified directly through the use of a Content File (no Directive File is required), which defines data for a subclient of an On Demand Backup Set. File Archiver for Windows Content for on demand data protection operations can be defined through either of the following methods: o On Demand File List, which can work along with existing subclients and rule processing to archive data that you specify. o On Demand Archive Set, which uses a specialized subclient dedicated to this purpose and does not perform rule processing.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 61 Exchange Mailbox Archiver Content for on demand archive operations is specified through the use of a Migration List via Outlook. You are allowed to use multiple Content Files for an On Demand operation, provided that they are listed in the Directive File. For NAS NDMP iDataAgents and the File Archiver for Windows Agent, which do not use a Directive File, only one Content File is used. Content File Location For NAS NDMP iDataAgents, the Content File must be placed on the CommServe; in all other cases, the Content File must be placed on the client.

CommVault Archiving and Content Indexing Course R00.2

62 - Migration Archiving Module

Recovering Archived Data


Recover data through the CommCell Console Recovering data from Stubs Persistent pipe vs. a recovery operation

Recovering Archived Data


File Archiver Agents The File Archiver Agents support the following types of recoveries: Recovery of archived files per file paths provided using the CommCell Console. Browse and Recovery of archived files using the CommCell Console. Non-Browse Recovery of archived files from stubs, using third-party applications such as Windows Explorer, Unix Terminal or Console Window, and NetWare System Console. Recoveries can be performed in-place or out-of-place, and in certain scenarios crossapplication recoveries and recoveries to a network drive or NFS-mounted file systems are also supported. Recoveries using file paths for the File Archiver Agents can be performed from the archive set level in the CommCell Browser. Depending on the agent, browse and recovery operations for these agents can be performed from the client, agent, and subclient levels in the CommCell Browser. Exchange Mailbox Archiver Agent The Exchange Mailbox Archiver Agent supports the following types of recoveries: Browse and Recovery of archived mail messages, including messages within archived PST files, using the CommCell Console and Outlook Add-In. Non-Browse Recovery of archived mail messages from stubs, using Outlook and Outlook Web Access (OWA). Outlook also supports the recovery of messages within archived PST files. CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 63 Exchange Public Folder Archiver Agent The Exchange Public Folder Archiver Agent supports the following types of recoveries: Browse and Recovery of archived public folder items using the CommCell Console. Non-Browse Recovery of archived public folder items from stubs, using Outlook and Outlook Web Access (OWA). Recoveries can be performed in-place or out-of-place, and in certain scenarios crossapplication and cross-platform recoveries are also supported Browse and recovery operations for Exchange Migration Archiver Agents can be performed from the client, agent, and archive set levels in the CommCell Browser. Persistent Recovery Multiple stub recoveries from magnetic media or tape are submitted to the Job Controller as one job. For such stub recoveries, only one job (i.e., called a Persistent Recovery job) will display in the Job Controller. However, the Event Viewer and Job History log will treat the jobs as separate jobs (using the same Job ID associated with the common open pipeline). Also, the job will wait for approximately 5 seconds in order to allow other stub recovery requests being submitted on the same client to be batched into the same job. It is worth noting that stub recoveries from magnetic media are faster than tape, because the pipeline remains open for up to 20 minutes of idle time, allowing quicker recovery and avoiding the time needed to find and load tapes. The stub recall history is viewable at the client and agent levels in the Commcell Console and is associated with the first instance, which was created in the File Archiver agent. Stub recall history can be turned on by creating nDMRSendFileStatus key and setting the value to 1.

CommVault Archiving and Content Indexing Course R00.2

64 - Migration Archiving Module

Archiving Tools

File System Tools


MS Exchange Tools

FileSystemDMPredictor FSDMRestubbing StubCopy SPDMPredictor LNDMPredictor

SharePoint Tools

ExMBDMPredictor DMEPublishForm OutlookAddinADMTemplate PSTDiscovery PST Hound

Lotus Notes Tools

Archiving Tools
There are several tools that are offered to help with archiving data. Some tools will help the administrator predict the amount of files that qualify for a set of rules. Others are offered to manage stubs and still other tools are available to help with archiving PST files. The following rules, with the exception of PST Hound, are all located on the resource pack. File System Tools FileSystemDMPredictor Although this tool is listed in the resource pack as FileSystemDMPredictor, it is actually run as GXHSMPredictor.exe and it allows you to enter archive parameters for a specified volume or folder to predict how many files would be archived, and their combined total size. The output log file defaults to Comma Separated Values (*.csv) format. It also creates a list of files that would qualify for archive. This resembles the collect file for our archive process that lists all the files. Check the Windows Temp Directory after the tool has been run for a file called collect.tot. If the predictor tool is run again using the same path and file name for the output log file, the previous log file will be over-written. GXHSMPredictor can be run even if you don't have CommVault software installed, and it is release independent.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 65

FSDMRestubbing This tool is listed in the resource pack as FSDMRestubbing but the executable is actually named GXHSMRestub.exe. This tool allows a user to run restubbing without running a normal file system archive job. If the restubbing feature in File System Archiver is enabled, when a job runs and finds a file that has previously been stubbed but not modified since it was last stubbed, it will replace the file with a stub, and not archive it again. Some users may not want to run an archive job at all, and just have files that have been recovered but have not been modified to be replaced with a stub. When this tool is executed it will find and replace qualifying files with stubs. StubCopy The StubCopy.exe utility allows you to copy or move a stub without forcing a recall for the Archiver Agent for File System. If the location you copied or moved the stubs to is not currently defined as a File Archiver subclient, you can manually add a registry key to monitor the volume to which the stub was moved. The location and value of the "Drives" registry key is detailed in the Registry Keys appendix. SharePoint Tools SPDMPredictor This tool is used to predict the number (and cumulative size) of documents which can be archived by the CommVault SharePoint Data Archiver Product. This tool provides the user with a forms screen where they can enter the rules (filters) that are used as criteria for choosing a document or document version for Data Archival. These rules are exactly same as those available from the Data Archiver agent. The user is presented with a browse screen to choose the root node at which the documents are being recursively scanned and checked for Data Archival filter criteria success. This node can be one of the following: WebServer Level (All virtual Servers hosted by the webserver are scanned) Virtual Server (All TopLevel Site collections hosted by that Virtual Server are scanned) Top Level Site collection Level (All Subsites hosted by that Top Level Site collection are scanned) SubSite Level (All Lists hosted by that SubSite are scanned) List Level (All Documents of that list are scanned) After scanning through the defined node and its children, the tool generates an output (both as a pop-up window and as an XML file) which specifies the details of the space saving that can be accomplished by Data Archiver.

CommVault Archiving and Content Indexing Course R00.2

66 - Migration Archiving Module

Lotus Notes Tools LNDMPredictor This utility is used to predict the amount of local Domino server email data that will be archived based on specified rules. All results and any potential error messages will be output to a text file that is entered in the "Output (results) file:" edit box. MS Exchange Tools ExMBDMPredictor Users can use this utility program to predict the amount of local Exchange Server data that will be archived based on specified rules. This program logs error and information messages into <TEMP>\ExMBDMPredictor.log. DMEPublishForm This utility is used to publish Outlook forms to the Exchange server for Exchange Mailbox Archiver. This utility needs to be run on a machine where Outlook 2000 (and above) of the desired language is installed. The Organizational Form Library (OFL) with the same language must exist on the Exchange server (the OFL must be created if it does not exist), and the logon user must have owner permission to the OFL. OutlookAddin-ADMTemplate The CVOutlookAddin.adm custom administrative template file can be used to push registry key settings out to client computers to configure the Outlook Add-In installation, using the Active Directory Group Policy. This administrative template file should be deployed in conjunction with the Outlook Add-In installation package. PSTDiscovery PSTDiscovery.exe Allows you to scan a local hard drive for PST files, and then copies them to a network share that is provided. PSTDiscoveryGui.exe Runs on the Exchange Server where the PST files were copied, and allows the user to then manage the PST files for archiving into the local Exchange server. PSTHound The PST Hound utility is designed to find and report on PST files throughout the enterprise. The Hound is a Visual Basic script that can be executed manually against servers and workstations but is best used as part of a logon routine for the discovery of PST files on end CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 67 user workstations. Deploying the Hound as a logon routine provides administrators with information around who is accessing or has access to the PST file rather than just the name and location of the file. All of these tools with the exception of PSTHound are available on the resource pack. Along with the tools , is an HTML document detailing the configuration and usage of each tool.

CommVault Archiving and Content Indexing Course R00.2

68 - Migration Archiving Module

Best Practices
Use Filters extensively to avoid archiving essential data Configure Anti-Virus to allow proper recall service Agent Specific considerations

Best Practices
Application files/folders, magnetic mount path folders, databases and mount path folders should not be archived by the File Archiver Agents and must be manually filtered out. To avoid stub recall failures for File Archiver for Windows, ensure that your anti-virus software is configured to allow the GXHSM Recaller service and associated driver to function properly without being blocked. For more information, refer to the following KB articles on the Maintenance Advantage website: 10945 Using McAfee Anti-virus 8.0i with Windows File System DataMigrator Agent 10948 Using Symantec AntiVirus 10.0 with Windows File System DataMigrator Agent 10941 Using Symantec AntiVirus 10.1 with Windows File System DataMigrator Agent 10946 Using Symantec AntiVirus 9 with Windows File System DataMigrator Agent 10944 Using eTrust Anti-virus with Windows File System DataMigrator Agent 10947 Using Trend Micro ServerProtect with Windows File System DataMigrator Agent Before you can run a stub recovery operation of encrypted data (i.e., using pass-phrase security) from Windows Explorer, Unix Terminal Window, or NetWare System Console you must export the pass-phrase to the local computer.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 69 Agent Specific For File Archiver for Windows: If you want to enable the scanning and archiving of hidden and system files you can set the value of the GXHSMIFINDDISABLEHIDDEN registry key to N for this purpose. Although scanning and archiving of read-only files is supported without the use of a registry key, if you want to enable stubbing and recall capabilities for read-only files, then the value of the GXHSMSTUBPERSERVEREADONLY registry key must be set to Y. Stub files are created as sparse files by default so that the file system can utilize space more efficiently. However, if you would like to configure the system to create stub files that are not sparse files, then you can create the GXHSMSTUBCREATESPARSE registry key and set the value to N. See GXHSMSTUBCREATESPARSE in Registry Keys for more information. You can determine how to handle file migration archiving with regard to the sparse attribute. You can choose either to ignore the sparse attribute when archiving files or to archive only files without the sparse attribute set. See GXHSMIGNORESPARSE in Registry Keys for more information. However, if you have drives that are being monitored by VERITAS Storage Exec, ensure that GXHSMIGNORESPARSE is enabled to ensure that subsequent migration archiving jobs complete successfully and not just the initial migration archiving job. You can determine how to handle ACLs on a stub with regard to migration archiving. You can choose either to replace the original ACLs on the stub after migration archiving or to preserve the original ACLs on the stub. See GXHSMSERVICERESTOREACLS in Registry Keys for more information. If you archive a job using a non-DataClassSet subclient, the Data Classification install folder (if present) will not be excluded from the job unless you either have installed the Data Classification Enabler in the default location (under Program Files) or filter out the folder from the archive job. Therefore, the easiest way to prevent this problem is to install the Data Classification in the default location. If the stub cache is enabled, and if a stub file is recalled and the ACLs on the file are then modified, the changes to the ACLs are not picked up during the next migration archiving operation unless the file itself is modified. This is true because when a stub file is recalled under this scenario, a copy is put in the stub cache, and the file is copied back from the stub cache during the next migration archiving operation. The agent looks at the modified file time to determine if the file data has changed, and ACLs are not tracked by the modified file time when the stub cache is being used. As a workaround, modify the data in the file itself to pick up the changes made to its ACLs.

CommVault Archiving and Content Indexing Course R00.2

70 - Migration Archiving Module For File Archiver for UNIX/Windows Agents: in order to speed up migration archiving operations, the software will or can attempt to restub files under the following conditions: The file must have been previously archived and then recalled by a non-browse recovery (i.e., stub recovery). The stub for this file must exist in the "stub cache directory". The Archive index must still exist in the system (i.e., was not removed by a data aging operation). The file must not have been modified after it had been recalled. In this case, during migration archiving, the system will simply swap the file for the stub in the stub cache, thereby speeding up the operation by not transferring or storing it again. For File Archiver for Windows, this feature is enabled by default; however, it can be disabled by setting the GXHSMMAINTAINSTUBCACHE registry key value to N. For File Archiver for Unix, this feature is disabled by default; however, it can be enabled by setting the nUSE_STUB_CACHE registry key to Y. For the File Share Archiver Agent: if you perform a (migration) archiving operation on files added from a Unix share (NFS share), the file names for these files will become corrupted if the file names either include any characters not supported in Windows (including *, ?, , <,>, or |) or contain more than the maximum number of characters allowed by Windows (255 characters). Therefore, after a recover operation, please carefully examine files if any of the files have been added from a Unix share (NFS share). The client hosting the File Share Archiver Agent must either be in the same domain in which the filer resides or in a different domain from the filer provided that there is a trust relationship between the filer domain and the client domain. Otherwise, migration archiving operations will fail. For File Archiver for NetWare: Ensure that the migration archiving attribute is set on the NSS volume in order for files to be archived. The software will report the same value for the size of the file(s) archived and the size of the disk. Thus, to determine the actual amount of space freed by the archive operation, you must compare the amount of free space available on the appropriate volume before and after the archive operation.

CommVault Archiving and Content Indexing Course R00.2

Migration Archiving Module - 71

Module Summary

Key points to remember

Summary
Migration Archiving your data can have many benefits including freeing up production storage and more efficient backups. CommVault Archiving solution supports a wide variety of operating systems and applications including email. Data is moved off production storage while still being accessible to users and applications. Data Classification is available to access data in a more efficient and flexible manner such as file type and ownership. As with any CommVault product, configuration and usability is very easy to understand.

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 73

CommVault Archiving and Content Indexing Course R00.2

74 - Compliance Archiving Module

Compliance Archiving

www.commvault.com/training

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 75

Overview
Why Compliance Archiving? Setting up Exchange for Archiving Configuring Compliance Archiving Compliance Archive Process Retrieving Archived Messages

Compliance Archiving Module

CommVault Archiving and Content Indexing Course R00.2

76 - Compliance Archiving Module

Why Compliance Archiving?


Preserve data outside of the operational environment.

Store large amounts of data and review at a later time.

Why Compliance Archiving?


The Exchange Compliance Archiver agent is designed for long term storage and indexing of data and to meet security and compliance standards. The primary function of Compliance Archiver is to preserve data outside of the operational environment. It removes the data from the source client once it has been archived and/or indexed. In this way, large amounts of data can be stored, and reviewed at a later time. This works with the Microsoft Exchange Journaling feature. What Exactly Is Journaling? Journaling is the ability to record all communications in an organization. E-mail communications are one of many different forms of communication that you may be required to journal. Therefore, journaling in Exchange has been developed to enable the email administrator to feed messaging data into a larger journaling solution, while using minimum overhead. It is important to understand the difference between journaling and archiving. Journaling is the ability to record all communications; alternatively, archiving refers to reducing the strain of storing data by backing it up, removing it from its native environment, and storing it elsewhere. Exchange journaling can be used as a tool in your e-mail retention or archival strategy.

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 77

Setting up Exchange for Archiving


Considerations for Exchange Environment Types of Journaling Enabling Journaling

Setting up Exchange for Archiving


Considerations for Exchange Environment Compliance Archiver for Exchange works in conjunction with the message journaling feature of Microsoft Exchange Server software to archive all incoming and outgoing messages and attachments. This is done at the Exchange Message Store level. All incoming messages and outgoing messages are captured in the Exchange Journal Mailbox, which is then archived with the Exchange Compliance Archiver agent according to the schedules you set. Incoming messages are written to the journal as they come into the Exchange Server. Therefore, all messages are recorded, regardless of whether the message recipient deletes the message in his or her mailbox. When journaling is enabled in a mailbox database and a user sends a message, the server generates two messages: one for the recipients and one for the journal recipient. When a message is submitted to a journalized mailbox database, the mailbox database processes the message as it typically would to deliver it, but it also creates a message for the journaling recipient. When a journalized mailbox database receives a message, most of the time, the message has been journalized already. In the receive case, extra processing (beyond reading the journaling property) is required only when the receiving server is the expansion server for the distribution list or when the distribution list is hidden or query-based.

CommVault Archiving and Content Indexing Course R00.2

78 - Compliance Archiving Module Therefore, you can estimate the effect of journaling on a mailbox database by assuming that the enabled mailbox database can process approximately half of the messages being sent, as long as all other conditions, such as CPU power, bandwidth, storage space, and disk speed, remain constant. Note: This approximation is just a starting figure for planning purposes. Only complete testing in a lab environment that closely resembles your production environment can approximate a more accurate evaluation. Types of Journaling There are three different types of journaling that you can enable in Exchange Server 2003. However we will only discuss two of them because the third option (Bcc journaling) captures less data than envelope journaling. Message-only journaling creates a copy of all messages and the corresponding P2 message header data to and from users on a mailbox database and sends the message copy to a specified mailbox. The P2 message header contains only the message recipient data that the sender declared to the recipients. If an external message is received from the Internet, Exchange journals the P1 message headers. The P1 message header is the address information that is used by message transfer agents (MTAs) to route mail. By default, when message-only journaling is enabled, Exchange does not account for blind carbon copy (Bcc) recipients, recipients from transport forwarding rules, or recipients from distribution group expansions. Envelope journaling differs from message-only journaling and Bcc journaling because it permits you to archive transport envelope information (P1 message headers). This includes information about the recipients who actually received the message, including Bcc recipients and recipients from distribution groups. Envelope journaling delivers messages that are flagged to be archived by using an envelope message that contains a journal report together with the original message. The original message is delivered as an attachment. The body of the journal report contains the transport envelope data of the archived message. Although three different journaling methods exist, the majority of regulations that require journaling will likely require envelope journaling for compliance. Therefore, unless specifically noted, all discussions about journaling in this guide refer to envelope journaling in an Exchange Server 2003 environment (or Exchange 2000 SP3 with the envelope journaling software update). Where Journaling Does Not Work Exchange does not journal the following scenarios and data-types: Posts to public folders Journaling cannot be enabled on public folder stores. Mail sent to external contacts External contacts (users or distribution lists) cannot be journalized. Exchange journals a record of the sender's mail that lists the external contacts, but if a distribution list external to the sending Exchange organization is listed, the recipients on the external distribution list will not be recorded as recipients.

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 79 Enabling Journaling By default, envelope journaling is disabled. Enabling envelope journaling involves two steps: 1. Enable standard journaling in Exchange System Manager. 2. Enable envelope journaling. Note: If you are running Exchange 2000 and you want to enable envelope journaling, you must install Service Pack 3 and the Exchange 2000 envelope journaling software update. Enabling Standard Journaling Before you enable envelope journaling, you must enable standard journaling on each mailbox store in your organization for which you want envelope journaling enabled. It is recommended that you designate a dedicated Exchange server as the journaling server. Additionally, if you use a dedicated journaling server, you do not have to enable standard journaling on the server. Enable journaling only on those servers with mailbox stores for which you want to journal. Enabling Envelope Journaling You can enable envelope journaling by using the exejcfg.exe tool or by manually setting the last bit on the Exchange organization heuristic objects. You can run the tool from any server with access to Active Directory, but it is recommended that you run the tool from a domain controller. The exejcfg.exe tool is available in the Exchange Server 2003 SP1 download in the i386\RTW directory and can be used in Exchange 2000 or Exchange 2003 environments.

CommVault Archiving and Content Indexing Course R00.2

80 - Compliance Archiving Module

Configuring Compliance Archiving

Compliance Agent

Target Mailboxes

SubClient
Journal Mailbox(es) Deleting Archived messages

Configuring Compliance Archiving


Configure and using the Agent After installation of the Exchange Compliance Archiver Agent a user defined sub-client must be created before any archive operations are preformed. There is no default sub-client created with this Agent. General Tab Modifying the Profile Name You can modify the name of the profile that is associated with the appropriate administrator mailbox. Modifying the Mailbox Name You can modify the name of the mailbox that is associated with the appropriate administrator profile. Use CSVDE for Discovery When checked it will instruct the system to use CSVDE filtering for discovery operations on the selected agent. A CSVDE filter must be entered into the corresponding entry space. Modifying the Exchange Server Name For all Exchange-based agents, you can modify the name of the Exchange server that is installed on the client computer without having to manually edit the registry. This feature is

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 81 useful in situations where the Exchange Server Name has changed since the iDataAgent installation or where multiple NIC cards are being used on the same client. Threshold for number of messages in Journal Mailbox Specifies the number of messages in the Journal Mailbox at which a warning must be generated. If the number of messages in the journal mailbox exceeds the specified number, the system generates an event message and generates the alert, if configured. This field requires that the Enable Threshold option be selected first. Target Mailboxes Tab This journal mailbox is the target for Compliance Archiver to monitor and conduct archiving. This mailbox is designated in the contents of the subclient A discovery filter safe-guard is used in the Compliance Archiver Subclient content discovery process to filter accounts not assigned to an archiving target (this can be over-ridden with a registry key to discover all mailboxes) When customers wish to employ selective filtering they can use Exchange rules to redirect select messages to a second collection folder which provides the ability to use a different subclients storage policy For data retrieval, a list of Target retrieval mailboxes is presented for selection. Normal discovery is used to select mailboxes from Exchange server with the Compliance Archiver iDataAgent. Retrieval will create a new folder in the selected mailbox and copy all selected messages to that folder. This folder is at the same level as the Inbox annotated with Retrieval job information. AD (Active Directory) Server Tab For Compliance Archiver for Exchange you can specify or remove the domain name of one or more Exchange Servers containing mailboxes and accounts that reside in a non-default domain or in multiple domains. Configuring and using Subclients Since Compliance Archiver for Exchange Agents do not support default subclients, you will have to create a user-defined subclient after installing the agent in order to perform archive operations. During subclient creation, you will specify which Exchange message journal mailboxes will be archived by the subclient. Compliance Archiver subclients are similar to Data Archiver subclients, but they do not apply rules or filters because they will consume and if configured to, delete all objects found in the journal mailbox.

CommVault Archiving and Content Indexing Course R00.2

82 - Compliance Archiving Module General Tab Create a New Index for Archive Operations You can specify the interval for index creation for archive operations on this subclient. Note that this setting can be overridden by selecting the Create New Index option on the Advanced Backup/Archive Options (Data) tab. If overridden, the new index interval counter will start again at zero. Content Tab Contents of Subclient To select the mailboxes you wish to collect data from you must select the mailboxes by using the Configure button on the Contents tab. Normal discovery is used to select the journal mailboxe(s) from Exchange server. Delete archived messages after successful archive operations This option specifies whether to delete the contents of the mailbox after the data has been successfully archived. Archive All Mailboxes The Exchange Compliance Archiver discovers and archives Journaling mailboxes defined through the Exchange Server. (The journal mailbox captures new incoming and outgoing messages.) If you want to archive messages that exist outside of the journal mailbox, i.e., messages that were sent and/or delivered before archives of the journal began, you must archive the non-journaling mailboxes on the server. To do this, create the FindAllMailboxes registry key in the computer on which the Exchange Compliance Archiver is installed. Once this key has been created, you will be able to discover all mailboxes and add them as subclient content to be archived. Be sure that the Delete Archived messages after successful archive operations option is NOT selected when configuring subclient content for this operation. Otherwise, all of the messages that are archived will be deleted from the mailboxes. This option should only be selected when archiving journaling mailboxes. Archives of the existing messages and mailboxes (those created prior to implementing the message journal mailbox) need only be done once, after which time new messages will be captured by and archived through the message journal mailbox.

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 83

Compliance Archive Process


Phases Scheduled or On Demand Retention considerations

Compliance Archive Process


Once you have discovered the Exchange Journal mailboxes and configured a subclient, you can perform immediate and scheduled archive jobs, similar to performing and scheduling backup jobs. Archive operations move the contents of the subclient to the archive media. The purpose of an archive is long term retention of data, and to comply with regulations. Phases The Archive Job runs in several phases. First is the Create Index phase. This phase checks to see if the index is available for the job to run. Some jobs will actually create a new index if the advanced option Create new index is selected or when it is time to run a new index according to the Create new index every n Archive Operations setting on the subclient properties. The next phase is the Archive phase where the agent actually collects the data and copies it to the archival media. After the archive phase is the Archive Index phase that writes the index to the cache. And, finally, a Cleanup phase to delete the messages from the journal mailbox if configured to do so. Running Archive jobs After configuring the data to Archive, you must run an archive job. This can be done on demand (immediately) or scheduled. Archive jobs have many of the same options as regular backup jobs. If an administrator is familiar with the backup process, the Archive process is easy to understand.

CommVault Archiving and Content Indexing Course R00.2

84 - Compliance Archiving Module Advanced Archive Options You can choose to apply Advanced Archive options to your operation. The advanced archive options provide media management tools at the operation level, as well as tools to optimize your archive operations for specific circumstances. Retention considerations Storage policy configuration is very important when using the Exchange Compliance Archiver Agent. The retention period for the storage policy associated with the Exchange Compliance Archiver subclient will determine how long archived data is retained. The Exchange Compliance Archiver Agent uses Storage Policies, which by default have infinite retention periods so that the data will never be pruned. Take care when adjusting the retention period of these storage policies.

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 85

Retrieving Archived
Messages
Retrieve data Performing Compliance Searches from Outlook Add-In Retrieve Destinations

Retrieving Archived Messages


Retrieving Messages The Exchange Compliance Archiver Agent supports the following types of retrieve operations: Browse and Retrieve of archived mail messages to a Target mailbox. The Exchange Compliance Archiver Agent can retrieve archived mail messages to a PST file. All retrieves can be performed in-place, out-of-place or cross-application. For the Exchange Compliance Archiver Agent, retrieve operations can be performed from the client and agent levels in the CommCell Browser. Retrieving Messages to a Target Mailbox Retrieve operations are similar to out of place restores of other iDataAgents. Archived messages are retrieved into target mailboxes. You must set up at least one target mailbox before retrieving archived data. Archived data resides on the storage resources associated with the storage policy for the subclient containing the messages to be retrieved. When retrieving data, the selected archived messages are copied from the archive media into the specified target mailbox. Exchange Compliance Archiver will create a new folder in the target mailbox, annotate and time-stamp

CommVault Archiving and Content Indexing Course R00.2

86 - Compliance Archiving Module it with the retrieval job reference and proceed to deposit all the contents of the retrieval within that folder. The system provides a powerful set of retrieve options that enable you to specify and recover only the messages you need. In addition to searching for a particular message, the Exchange Compliance Archiver Agent also allows you search by contents if the archive operations were performed when the data has been content indexed. Furthermore, you can retrieve a sample of the archive data by specifying a sample number when searching archived files.

CommVault Archiving and Content Indexing Course R00.2

Compliance Archiving Module - 87

Module Summary

Key points to remember

Summary
Compliance Archiver for Exchange allows the administrator to monitor all incoming and outgoing email by taking advantage of the native Exchange feature of journaling messages. Provides Email Discovery and Compliance. Archives data from Journal Mailboxes. Supports Multiple Subclients. Multiple Journal mailboxes can belong to a single subclient if desired. Default discovery will find only Journal mailboxes. Using the FindAllMailboxes registry hook can be used to discover any/all mailboxes.

CommVault Archiving and Content Indexing Course R00.2

88 Content Indexing Module

Content Indexing

www.commvault.com/training

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 89

Overview
Preliminaries Planning Content Indexing Engines Maintaining a Content Index Engine Content Indexing Operations Best Practices

Content Indexing Module

CommVault Archiving and Content Indexing Course R00

90 Content Indexing Module

Preliminaries
Why Content Indexing? Functional Architecture & Terminology Content Indexing Roles Content Indexing Data Flow Planning Content Indexing

Offline Online

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 91

Why Content Indexing?

Data Discovery
Compliance Search Lost Data Search

Data Mining
Search current and previous versions of a file Search across various data types

Why Content Indexing?


Data Discovery Electronic data discovery is quickly becoming mainstream in civil discovery. Recent surveys confirm that more than 90 percent of all documents produced since 1999 were created in digital form. Aggressive law firms are now seeking computer-generated evidence, especially in cases related to defamation, trade secret and intellectual property theft, sexual harassment in the workplace, fraud, breach of contract, divorce proceedings and spoliation of evidence. Even in small personal injury auto cases, defense attorneys are going after e-mail and other electronic evidence related to wage and injury claims. Data Mining Data mining is the principle of sorting through large amounts of data and picking out relevant information. Trends, patterns, and statistical analysis can be used to project the future and facilitate business decisions. Access to data residing in backup or archive storage is essential for this type of data.

CommVault Archiving and Content Indexing Course R00

92 Content Indexing Module

Functional Architecture & Terminology CommServe

FIXML Search & Index Node

Media Agent
Admin Node
Storage Policy Copy Disk Tape Data Data

Client

OCI Agent

DCE

OFFLINE

Content Indexing Engine

ONLINE

Functional Architecture & Terminology


Content Indexing Node Single host performing any of the Content Indexing functions. Content Indexing Engine/Cluster/Cloud Collection of CI Nodes functioning as a single index Admin Node Host functioning as data source status/distribution/query results server for CI Engine. Only one Admin Node can exist within a Content Indexing Engine. FIXML (Fast Index XML) Document database upon which the index is based. The FIXML database can be backed up/restored/moved and the index re-generated. Each Search/Index Node in a Content Indexing Engine will have a FIXML database. Search/Index Node Host with ability to create/query an index from a FIXML database. If multiple Search/Index Nodes are used, each Node can index/search only its allocated portion of the data. Offline - Storage Policy based source of data. Data resides within a library and is accessed via a Media Agent for both indexing and viewing. Online Client based source of data. Data resides on the client and is accessed via an Agent for indexing and share path for viewing. OCI Agent Online Content Indexing Agent. Software installed on a Client host to collect data for indexing. CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 93

DCE Data Classification Enabler. Required component for Online Content Indexing. Pre-collects meta data for content indexing. Notice there is no direct CommServe management control of the Content Indexing Engine. All control is exercised internally or via web interface on the Admin Node. Each Engine and associated Nodes appear in the CommCell Console under Storage Resources. The Content Indexing nodes are Clients of the CommCell Browser through the File System iDataAgent.

CommVault Archiving and Content Indexing Course R00

94 Content Indexing Module

Content Indexing Roles

The administrative node performs the following functions: Content Distributor role Query Results Server role Status Server role The Search/Index nodes perform the following functions: Search Processing Index Processing

Content Indexing Roles


The administrative node performs the following functions: Content Distributor role Dispatches incoming data to be Content Indexed to the appropriate document processing pipeline(s). Query Results Server role Performs query and results processing and presents the results to the search interface. Status Server role Maintains a database of the Content Indices in the cloud. In a multi-node cloud, files are distributed for indexing across the index nodes in batches. The batches are sent to the index nodes in a serial fashion, creating a round-robin load-balanced environment. The status server maintains information about the state and location of documents in the index. The Search / Index nodes perform the following functions: Search Processing Queries processed by the Query Results Server are passed to the Search server to be executed against the indices Index Processing After a file is processed by the Content Distributor it is sent to an index engine for processing. The index engine processes the document through the necessary pipelines resulting in a searchable index. CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 95

Content Indexing Data Flow

Content Indexing Data flow


Submission Content is submitted to the Content Indexing Admin Node via a Media Agent (offline data) or Content Indexing Agent (online data). Normalization Index-able Data is normalized into an internal XML format for processing. Processing Metadata such as language, ownership, etc. is determined and attached to the document. Processed document is stored in the FIXML data store. Indexing Indices are created for the various fields in the extended normalized document. Organizing For faster query responses, documents are organized into logical collections based on source, data type, and owner/access.

CommVault Archiving and Content Indexing Course R00

96 Content Indexing Module

Planning Offline Content Indexing


Minimum System Requirements Installation Options Sizing Considerations Performance Considerations Location Considerations

Planning Offline Content Indexing

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 97

Minimum System Requirements


Windows 2003 Server Dual/Quad-core Intel 5100 processor (32 Bit) 4GB RAM (8GB recommended) High Speed SCSI Drives 1GB Free Disk Space

Minimum System Requirements


The Content Indexing Engine is a resource intensive application and should be installed on a dedicated server. Hence, we recommend a dedicated server for the Content Indexing Engine rather than a Virtual Machine, such as VMware, Microsoft Virtual Machine, etc., as it may reduce system performance below acceptable levels. For the same reason, it is also recommended that the Content Indexing Engine not be installed on a computer running other applications, such as Microsoft Exchange Server, an Oracle database, etc. A Windows 2003 Enterprise Server is specified in the requirements documentation because of the expansion capability. A Windows 2003 Standard Server can be used with the understanding of its limitations in RAM and CPU count. Simpana suite 8.0 currently only supports Windows 2003 as the host operating system. Other operating systems may be added in the future. The Content Indexing Engine requires that the clock is kept in sync and not abruptly corrected forwards or backwards. Avoid manual clock adjustments and consider using professional software for keeping clocks adjusted and in sync. Set the server Time Zone to either GMT or UTC and always uncheck the option Automatically adjust clock for daylight saving changes on the Time Zone settings. It is recommended that anti-virus software is not run on the servers running the Content indexing Engine. If anti-virus software must be installed as a business requirement within your environment, you must configure exclude rules within the anti-virus software to avoid scanning the Content Indexing data directories. CommVault Archiving and Content Indexing Course R00

98 Content Indexing Module Avoid having the Windows Indexing Service running on the Content Indexing data directories. The windows Indexing Service will index all the content on the server which can lock-up files in data search and cause serious problems. If possible, install The Content Indexing Engine on a separate physical disk than the one Windows is running from. e.g., C:\. Do not assign the paging file or system directories on this disk. The optimal is to install The Content Indexing Engine on a striped disk array (RAID 0). Up to 4000 sequential ports can be used to support concurrent document indexing between nodes in the Content Indexing Engine.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 99

Installation Options

Single Node vs. Multi Node

Plan for growth Consistent locations Performance considerations Capacity for growth

Folder Locations

Install Options

CommVault Archiving and Content Indexing Course R00

100 Content Indexing Module

Single Node vs. Multi Node

Single Node vs. Multi Node


Single Node installations place all services on a single server. Single node installations should be limited to small or test deployments where the total number of objects to be indexed will remain under 10,000,000 and the end user query load will remain minimal. It is important to account for the environments growth rate when determining object and load counts. Unless the pool of data to be indexed will remain static in terms of growth, a single node installation is NOT recommended for production use. A multi node installation is recommended for all production deployments. Multi node installations leverage a dedicated Administrative node and up to 8 Index / Search nodes. Admin Node The administrative node processes are resource intensive, requiring substantial processing power and memory, as dictated by the size and volume of objects to index. The administrative node performs the following functions: Distributes ingested data to the appropriate document processing pipelines. Processes queries and returns results to the search page. Maintains a database about the state and location of documents in the index. Hosts both the search and compliance search web pages.

Search / Index Nodes CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 101

The Search / Index nodes will process data received from the Content Distributor to create the searchable index. The index engine is not resource intensive, but does require SCSI disk to handle the I/O demand on search. The Search Engine can be set up to allow for load-balancing based on the following criteria: number of documents per search engine node input data rate network characteristics query rates that the system is set up to provide

CommVault Archiving and Content Indexing Course R00

102 Content Indexing Module

Folder Locations

Installation and Index Profile

Common directories used by all CI nodes Fixml and data directories, shared by all nodes

Data Path

Best Practices

Folder Locations
There are three folders maintained in an Indexing environment. All of which are specified during install. These folders must reside on high speed local disk and should not be placed on disks where the System and/or Page file partitions reside. It is advisable to place the installation and data directories on separate physical disks to provide optimal performance. 15k SCSI disks for index storage are recommended for all deployments. Index and Install Profile path This path contains the installation and index profile xml files. The install profile defines the architecture of the indexing cloud such as the number of index nodes and location of data and installation directories. This path should be consistent on all nodes for easy expansion and recover. Data path This path contains the fixml and data directories. The fixml directory contains an xml representation of all data that has been indexed. The data directory maintains the actual binary index. This path should have the fastest/largest volume/disks available and be consistent on all nodes. Install path This path contains the program files required to perform all functions in the indexing cloud. Best Practice Do not install the Content Indexing Data Path on System Drive Do not install the Content Indexing Data Path on same volume as pagefile.sys Use the same location for FIXML on all nodes CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 103

Sizing Considerations

Volume of data (# of objects)

Amount of Indexable text (Avg size per object)


Non Indexable format code or graphics

File Size

Query Rate Index Retention Rate of Growth

Indexable Text

Sizing Considerations
Volume of data (# of objects) Each Search & Index node can handle approximately 15 million objects. The actual number of objects is dependent on the index-able content of each object. Amount of Index-able text (Avg size per object) The average file size of the objects within the Content Index is a key factor in sizing a Content Index server cloud. When evaluating the typical file size in a given environment it is critical to know whether the body of the file contains a large amount of text content. Large files with minimal text content require less storage and processing power than files containing mostly textual content in the body, such as Microsoft Office documents. The index process will consume disk equal to two and one half times the size of the data ingested before the index is built. Once complete, the size of the index will be 35% of the ingested data size. If 500 GB of data will be ingested for index, 435 GB of disk space will be required for the index to build; the final product will consume only 175 GB. Optimal performance is achieved with SCSI attached disk on the search / index nodes, but SAN disk can also be utilized. Understanding your use case and requirements are vital in designing a system that will easily grow with your needs. The process of adding additional Search / Index nodes to a cloud will be much smoother if there is an existing Search / Index node to replicate as opposed to splitting the Search / Index node from the Admin node.

CommVault Archiving and Content Indexing Course R00

104 Content Indexing Module Query Rate The number of queries per second the system must support. Index Retention Indexes are usually retained for the life of the data. Hence, long term retention requires larger index capacity. Rate of Growth Indexes are usually incremental in nature capturing new or changed objects. If your data is constant, the index will be constant in size. Dynamic data sources with a significant number of new or changed objects will grow the index accordingly. Notes on Space Requirements Local disk requirements will vary based on the type of files being Content indexed. On average uncompressed file server - Office Documents - generally require 5 - 15% of the file size in Content Index space. Emails can require 50 - 100+% due to the fact that it is heavily text based, our storage of email in archive form is highly compressed, so the percentage is seen as higher, and there is a large amount of metadata which is additionally Content Indexed. The Content Index size is directly related to the word density in the files being Content Indexed. There are times where 2x+ the size of the index at that time will be required during the normal maintenance of the indexes. This allows for the Content Index to be searchable, while the maintenance processes (index consolidation/pruning) are performed. You can reduce disk space requirement by setting proper retention settings for the Content Indexes. For example, if vast majority of searches will be performed within 90 days of them entering the system, then you can set the retention settings such that the Content Indexes expire after 90 days. All data that was retained can still be re-Content Indexed if there was a special need to retain them beyond the retention date.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 105

Performance Considerations

Query Load Performance

Max # Queries/sec

Index / Search Load Performance Tunable Resources


RAM CPU Nodes Disk Speed Network

Performance Considerations
Query Load Performance Query performance is measured by the maximum number of queries per second (QPS) the system is able to process within acceptable response times. When the load is too high query response times will begin to increase. The Administrative server processes all Query and Results requests. Query and Results processing is CPU and system memory intensive. If the CPUs on the Administrative server are fully utilized the query and results performance will be severely degraded. If this condition occurs frequently, additional processing power and system memory must be added to the server. Query and results processing can also be memory intensive. If excessive paging occurs on the Administrative server then query and results processing may be severely degraded. If this condition occurs then additional memory must be added to the server. Index / Search Load Performance Index and Search processing is spread across the Index / Search nodes in the Content Indexing cloud. Search and Index processing is highly CPU and Disk IO intensive. In most deployments disk performance will be the performance bottleneck before CPU performance. Disk performance can be determined by measuring the number of disk IO operations per second. The maximum number of disk operations per second varies between disk systems but is usually between 200400 operations per second.

CommVault Archiving and Content Indexing Course R00

106 Content Indexing Module Since Index and Search processing is disk IO intensive system memory should be monitored to ensure excessive paging is not occurring. If excessive paging is found additional memory needs to be added to the system. Index and Search processing is multithreaded and performs best on systems with multiple processors. Index and Search nodes should also be monitored for high CPU utilization. CommVault recommends TWO 3GHz, or better, CPUs for each server in the Content Indexing cloud. Performance Bottleneck Solutions If systems are found to be paging excessively then more memory needs to be added to the system. If CPU utilization is found to be consistently high then additional and faster CPUs need to be added to the system. If disk performance is found to be a bottleneck consider moving the Index to a faster disk system. Adding additional Search / Index nodes to the cloud will increase Search and Index performance across all performance factors. To increase disk IO performance use disks with high spindle speeds or add more physical disks. Disk must be tied together to make a single logical unit using RAID 0 or similar. SATA drives are NOT recommended for v7 Content Indexing configurations. All nodes within the cloud need to be connected on high speed network. Gigabit networks should be used whenever possible. If network bottlenecks are present consider moving the index cloud to a dedicated high speed network.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 107

Location Considerations
Proximity to Data Source Proximity to Web Search Server Proximity to CommServe database

Location Considerations
Proximity to Data Source Offline data is fed to the Content Indexing Engines Admin node via the Media Agent with local read access to the specified storage policy copy library. By default, only data that has not been previously indexed is passed. Depending upon your data dynamics, the indexing performance will be impacted by the volume passed and data path capacity between the Media Agent and the CI Admin node. If Online data is being indexed, the data passes directly from the Client to the Content Indexing Engines Admin node. If multiple Clients are being indexed or if both Online and Offline data is being indexed the consideration of volume, frequency, data path capacity, latency (distance), and resource availability needs to be considered when locating the Content Indexing Engines Admin node. CommVaults recommendation is to put the Content Indexing Engines Admin node next to (largest capacity data path) the largest daily volume data source. Proximity to Web Search Server If a high volume of queries is anticipated, the location of the Web Search Server to the Content Indexing Engines Admin node should be evaluated for data path performance. Normally, this is not an issue but could, in some cases, produce a less than acceptable response to users querying the index.

CommVault Archiving and Content Indexing Course R00

108 Content Indexing Module Proximity to CommServe system The CommServe systems relationship to the Content Indexing Engines Administrative node is the query capability provided within the CommCell Console. Queries of the Content Indexing Engine are passed from the CommCell Console through the CommServe system to the Admin node. As with the Web Search Server, if a high volume of queries is anticipated, the location of the Web Search Server to the Content Indexing Engines Admin node should be evaluated for data path performance.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 109

Planning Online Content Indexing


Minimum System Requirements Installation & Configuration

Online Content Indexing

CommVault Archiving and Content Indexing Course R00

110 Content Indexing Module

Minimum System Requirements


Windows 2000/XP/2003/Vista (32 & 64 bit) .Net Framework 2.0+ Data Classification Enabler Windows File System iDataAgent

Minimum System Requirements

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 111

Installation & Configuration

Automatic install of required components Select Content Indexing Engine Define Subclient/Content Schedule Content Indexing Job Configure CIFS share path

Installation & Configuration


Automatic inclusion of required components The Content Indexing iDataAgent requires the presence of the File System iDataAgent and the Data Classification Enabler. The Data Classification Enabler requires the presence of .Net Framework 2.0 or higher. Installation of the Content Indexing iDataAgent will include all of these components as may be required on the Client Select Content Indexing Engine Similar to the selection of a Storage Policy for other iDataAgents, the installation of the Content Indexing iDataAgent asks for a Content Indexing Engine to be assigned. If a Content Indexing Engine is not installed, the selection can be deferred to the properties page of the subclient, but it must be selected before a Content Indexing job can be run. NOTE: You cannot share a content indexing engine between CommCell groups. Define Subclient/Content The default subclient has the same relationship to other subclients as the File System default subclient. User defined subclient content is mutually exclusive and the default subclient content includes all data not otherwise included in other subclient content. Best practice would be not to use/schedule the default subclient and instead define index content using User-defined subclients. Schedule Content Indexing Job CommVault Archiving and Content Indexing Course R00

112 Content Indexing Module Online Content Indexing Jobs must be scheduled. Job Types include Full and Incremental. Incremental includes all new and changed files defined within the subclient content. NOTE: You can use schedule polices for online content indexing jobs, but not offline jobs Configure share path Search Console access to the online file is enabled via a CIFS share path. The user must create and enable appropriate permissions on the share. Once the share has been created, you need to inform the Content Indexing iDataAgent the share name and path. The Share name tab is available on the Content Indexing Agents property page. This tab is used to configure directories and corresponding share names for the Online Content Indexing agent so that full copies of the original files returned as search results can be viewed from the Search Console. Configure Application Specific Settings If you wish to perform content indexing operations for the Domino Server's journaling mailbox, you must configure the following administrative settings for the Domino Server's journaling mailbox: The Method must be set to Send from Mail-In Database. The option to Encrypt Incoming Mail must be set to NO. For Online Content Indexing for Exchange agent, defining content to index is similar to defining the subclient content for an Exchange Mailbox iDataAgent. The Online Content Indexing for Exchange Agent allows you to content index the following data types; Mailboxes Folders within a mailbox Messages within a folder Attachments

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 113

Maintaining a Content Index Engine


Protecting the Content Index Engine Recovering a Content Index Node

Maintaining Content Index Engine

CommVault Archiving and Content Indexing Course R00

114 Content Indexing Module

Protecting the Content Index Engine

File System iDataAgent included in install


Use VSS for backup Backup contents of FIXML folder Backup contents of the Data folder

Protecting the Content Index Engine


Performing full system backups of all nodes in the Content Indexing Engine will guarantee the fastest possible route to recovery in the event of failure. It is recommended to use VSS when backing up the nodes in the Content Indexing Engine. If VSS is not used the necessary CI processes must be stopped to ensure all files are protected. Full System Backups The simplest and most effective means of protecting the Content Indexing Engine is to perform Full System Backups on all nodes. A typical schedule of Weekly FULL and Daily INCREMENTAL backups should be used to minimize the backup storage foot print and backup window. Partial System Backups The primary folders to protect are the data and fixml folders residing on the search index nodes. The data folder holds the actual binary index. The fixml folder contains an xml representation of the files that have been indexed.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 115

Recovering a Content Index Node

Search & Index Node


Rebuild lost Node Re-install Content Indexing software Restore FIXML Rebuild lost Node Restore or Re-install Content Indexing software

Admin Node

Recovering a Content Indexing Node


It is important to understand that if any one Index / Search node is lost and cannot be recovered from backup the entire Content Indexing Engine must be rebuilt and all indexing operations must be re-run. In the event the index is lost the Content Indexing Engine may be rebuilt as new with empty indexes and the indexing operations can be re-run to rebuild the index. If any index / search node is lost the index cannot be recovered unless it is has been protected by backup. At a minimum, the fixml folder for ALL index nodes must be available for restore. If the fixml databases are available, the index can be rebuilt as new with empty indexes and the indexing operations can be re-run to rebuild the index. However, this is a long and resource intensive process

CommVault Archiving and Content Indexing Course R00

116 Content Indexing Module

Content Indexing Operations


Configuring Storage Policy for Content Indexing Selecting Jobs for Content Indexing Online Content Indexing Running Content Indexing Jobs Monitoring Content Indexing

Content Indexing Operations

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 117

Configuring Your Storage Policy for Content Indexing


Select Content Indexing Engine Select Subclients Select Backup Rules and Filters Set Retention Select Source Copy

Configure Storage Policy for Content Indexing


Offline Content Indexing is enabled in the Storage Policy Properties dialog window. Select the Content Indexing Tab and check the Enable Content Indexing Option. Content Indexing Engine Select the appropriate Content Indexing Engine. This determination is primarily made by data path but can consider other criteria. Include Subclients Supported Subclients associated with the Storage Policy CAN be included in the Content Indexing operation ONLY if the Advanced option to Enable Content Indexing is checked in the Advanced tab dialog of the respective Client Properties window. A license will be consumed when you enable a Client for Offline Content Indexing Set Filters An Exclusion or Inclusion filter for file extension types can be set to manage what files are indexed. Set Retention When you prune data based on data retention rules, the corresponding content index for the data also gets pruned from the content indexing engine. However, if there are multiple copies of the data and you are pruning the data in only one of the copies, then the content index for that data does not get pruned automatically. Also, if for some reason, you need to prune the content index before pruning the data, you can do so using the retention rules available in the Storage Policy Properties (Content Indexing) tab. CommVault Archiving and Content Indexing Course R00

118 Content Indexing Module

Select Source Copy By default, the Primary copy is selected. However, you can select any copy within the storage policy to use as the source copy for content indexing. The primary criterion for making this selection would be: Content Data path Performance

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 119

Selecting Jobs for Content Indexing

All selection done from Storage Policys View-> Jobs list


Pick for Content Indexing Re-Pick for Content Indexing (partial/killed job) Prevent Content Index Delete Content Index

Selecting Jobs for Content Indexing


When you run the Content Indexing operation for the first time, by default the system content indexes data from the date on which the Content Indexing Engine was configured for the storage policy. If for some reason, you wish to content index old data (or recontent index data that was already content indexed) you must manually select the jobs that must be content indexed and then re-run the content indexing operation. Regardless of what Storage Policy Copy is being used as the source for Content Indexing, the selection/status of content indexed jobs is done at the Storage Policy level through the View->Jobs task list. Select the appropriated filters to view the job list. As appropriate, each job will have the following task options available in the right-click menu. Pick for Content Indexing Available if the job has never been content indexed. Select this option to content index the specific job. Re-Pick for Content Indexing Available if the job has been content indexed, or partially content indexed. Select this option to re-content index a job. Prevent Content Index - Select this option to prevent a job from being content indexed by content indexing operations . Delete Content Index Select this option to delete an existing content index for data associated with this job.

CommVault Archiving and Content Indexing Course R00

120 Content Indexing Module

Running Content Indexing Jobs


Full or Incremental for Online CI Offline CI job equivalent to a Data Verification job Offline and Online Content Indexing Jobs

non-preemptible restartable

Running Content Indexing Jobs


Full vs. Incremental When an incremental online content indexing operation is run, the Data Classification Enabler identifies the files that are new or have been changed and index those files only. By default the software uses Data Classification as the default method for scanning files/folders. If for some reason the data classification services are not available, the software automatically uses the Change Journal to scan the files/folders. A Full online content indexing job will re-index all the subclient content. The previous index is then deleted so only the latest version of each file is indexed and viewable. When an offline content indexing job is run the backup/archive job/objects that has been content indexed during the operation are marked as such. Previously indexed jobs/objects will then be skipped during subsequent content indexing operations. There is no Full offline content indexing job option. If you kill and restart an offline content indexing operation, the already content indexed jobs (with content index status success) will not get re-content indexed. Also, partially content indexed jobs (with content index status partial) will get re-content indexed from the last chunk file where the indexing operation was in progress earlier. However, if you pick the partial content indexed jobs for content indexing, then the job will get re-content indexed from the beginning.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 121

Monitoring Content Indexing

Job Summary Reports

Data Protection ->Online Content Indexing Administrative->Offline Content Indexing Online: View Content Indexing History(from client) Offline CI:Storage Policy->View Jobs

Job History

Monitoring Content Indexing


Reports Content Indexing Reports contain detailed content indexing information such as: The client source and destination. The duration of the job. The total objects successful, skipped, or failed to be restored. The data size and the CommCell user that initiated the job. The failure reason (if necessary), and associated events. The failed objects (if necessary), and the files that were successfully restored. The options that were selected for the job before the job was run. The status of the job, and the throughput unit. Offline Content Indexing Job History can be viewed from the Storage Policy by selecting the Job View task and setting the appropriate filter options. Online Content Indexing Job History can be viewed using Client Groups, Client, iDataAgent, ContentIndexset, and subclient level tasks of View-> Job History or View>Content Indexing History if available

CommVault Archiving and Content Indexing Course R00

122 Content Indexing Module

Content Director

Content Director
Visually, Simpana Content Director is the consolidation and presentation of key components used in support of records management. Tags, Legal Holds, and ERM (Enterprise Record Management) Connectors are record management tools that help classify, archive, preserve, and destroy records. Automated Content Classification Policies provides the mechanism that ties all of these components together along with Search (identification) and Review (auditing) into a consistent, schedulable workflow. As a solution, Content Director is a compilation of licensed features in the Simpana software suite that provides a full set of data management tools. Legal Hold Legal Hold provides the ability for a compliance user to segregate relevant information found during a data discovery and search operation and preserve them for long term retention for legal purposes. It uses a policy based approach to search relevant data and retain a subset of the data for a long retention period. Legal Holds can be created from the Search Console as well as the CommCell Console. Tagging When performing search on content indexed data, you can assign tags to the discovered items for easy identification/classification. These tagged items can then be searched based on the tags. There are pre-defined tags or system tags already available in the CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 123 CommServe. In addition, you can also create user-defined tags from the CommCell Console. Tagging is applicable only for Compliance users and administrators. ERM Connectors ERM (Enterprise Records Management) Connectors allows you to submit discovered documents and files to a record management system. Currently, the software supports submission of documents to Microsoft SharePoint Record Center. When you create an ERM Connector, you pre-define the mapping of documents to a specific ERM server in the record management site. ERM Connectors can be used only by Compliance users. You can create and use ERM Connectors from the Search Console as well as the CommCell Console. Automated Content Classification Policy The Automated Content Classification Policy is a component under Content Director node in the CommCell Console, that allows you to automate and schedule the data discovery and search operations, such as Legal Hold, Tagging, Restore to Review Set, and ERM Connector. You can also use the policy to restore the discovered items to a review set in the Web Search Server. When automating these operations, you can also specify the date from which the backup/archive data will be considered for the search. If a particular job is qualified to be processed by the Automated Content Classification Policy, it will be not be pruned even though eligible to be pruned, until acted upon by the policy.

CommVault Archiving and Content Indexing Course R00

124 Content Indexing Module

Best Practices

Do not install on System Drive or drive with Page File Dedicate Admin Node and scale Search/Index Nodes appropriately Use Disk based Storage Policy Copies Set Index Retention as needed

Best Practices
Plan the initial installation to handle anticipated growth. While the CI Engine may be expanded with additional nodes, it is easier to have the nodes already in place. Distribute the Search and Index roles separate from the Admin role for better performance. Resource all nodes and data paths with high performance components. Locate the CI Engines Admin node within a high capacity network reach of the majority of the source data. For the Offline source copy use a Magnetic Library-based storage policy copy. Evaluate availability requirements against on demand or schedule index and retention. Do not content index the same data online as available offline.

CommVault Archiving and Content Indexing Course R00

Content Indexing Module - 125

Module Summary

Key points to remember

Summary
A Content Indexing Engine is a singular administrated collection of one to nine Content Indexing Nodes. Only one Admin Node role is allowed in a Content Indexing Engine. All data in and out of the Engine is processed through the Admin Node. Volume capacity scaling is enabled through adding additional nodes and loadbalancing the index folders across the nodes. Content Indexing is resource intensive. Performance is affected by RAM, CPU, Disk I/O and Network Bandwidth. Offline Content Indexing is enabled at the Storage Policy Level and sourced by any copy within the storage policy. Supported data type Subclients associated with a Content Index enabled Storage Policy must have Client level Content Indexing enabled in order to be content indexed. Offline Content Indexing is schedulable and multi stream capable. Incremental data is content indexed. Previous data/jobs can be selected for indexing/reindexing. Offline Content Indexes are retained for the life of the data within the storage policy or for a set number of days. Online Content Indexing is installed as an Agent on client systems. Only Windows platforms are supported. Online Content Indexing is schedulable as Full or Incremental jobs. Only the latest version of the index for each file is retained for the life of the source file. CommVault Archiving and Content Indexing Course R00

126 Content Indexing Module The Content Indexing fixml database on each CI node must be backed up for recovery of a failed node. The index can also be backed up to save time in rebuilding the index after restore of the Content Indexing Service

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 127

Search & Discovery

www.commvault.com/training

Search & Discovery Module

CommVault Archiving and Content Indexing Course R00

128 Search & Discovery Module

Overview
Preliminaries Search Consoles Search Administration Best Practices

Overview

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 129

Preliminaries
Functional Architecture & Terminology Types of Searches Supported Application and Data Types

Preliminaries

CommVault Archiving and Content Indexing Course R00

130 Search & Discovery Module

Functional Architecture & Terminology ONLINE


Client CommServe
OCI Agent

Content Indexing Engine


Data Schedule & Save Search Content Indexing Nodes Search & Index Expansion Admin + Search & Index DCE Share Name

Search

CommCell Console Find Data

Search Console Restore Cache

Media Agent

FIXML Data & Index

Storage Policy Copy Disk Tape

Find

Search

Web Search Server


Queries & Results

OutLook Add-in
Index Cache

DM2

OFFLINE

Functional Architecture & Terminology


Content Indexing Node Single host performing any of the Content Indexing functions. Content Indexing Engine/Cluster/Cloud Collection of CI Nodes functioning as a single index. Admin Node Host functioning as data source integration/distribution/management server for CI Engine. Only one Admin Node can exist within a Content Indexing Engine. FIXML Document database upon which the index is based. The FIXML database can be backed up/restored/moved and the index re-generated. Each Search/Index Node in a Content Indexing Engine will have a FIXML database Search/Index Node Host with ability to create/query an index from a FIXML database. If multiple Search/Index Nodes are used, each Node can index/search only its allocation portion of the data. Search Use of the Content Indexing Engine Index to locate objects. Allows for key word/phrase and ownership searches. Find Use of the Indexes maintained by Media Agents to locate objects. Allows for object name, and metadata searches. Offline - Storage Policy based source of data. Data resides within a library and is accessed via a Media Agent for both indexing and viewing. CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 131

Online Client based source of data. Data resides on the client and is accessed via an Agent for indexing and share path for viewing. OCI Agent Online Content Indexing Agent. Software installed on a Client host to collect data for indexing. DCE Data Classification Enabler. Required component for Online Content Indexing. Pre-collects meta-data for content indexing. Web Search Server SQL2005-based/IIS-managed web site thats maintains queries and review sets CommCellConsole Find and Offline Content Index Search capable interface. Outlook Add-in Optional component that enabled integrated Find and Search via the Outlook interface. Search Console Web Search Server interface for Compliance and End User Search.

CommVault Archiving and Content Indexing Course R00

132 Search & Discovery Module

Types of Search

Compliance Search

All associated objects All associated objects with Read permission All associated content indexed protected storage data All associated content indexed production data All associated index-based agent data

End User Search

Offline Search

Online Search

Find

Types of Search
Compliance Search Compliance Search transcends Active Directory permission boundaries. It allows legal or compliance administrators to locate and view any indexed object within the scope of the associated data. User can be an internal CommCell User or an enabled external Active Directory user. User must be a member of a CommCell User Group with Compliance Search capability. The CommCell User Group must be associated with at least one Content Indexed enabled client. Search criteria is extensive and includes key word search within the content of the object. End User Search End User search is restricted by Active Directory permissions to only those objected for which the End User has ownership or read permission. User must be an enabled external Active Directory User. User must be a member of a CommCell User Group with End Search Capability. The CommCell User Group must be associated with at least one Content Indexed enabled client. Search criteria is extensive and includes key word search within the content of the object. Offline Search Offline Search covers all indexed data maintained within a Content Indexed enabled storage policy. Data from a single source copy is indexed. If that data also exists on other copies within the storage policy (via auxiliary copy) then the index remains/can retrieve data from any copy until all copies of the data are deleted or the index is deleted whichever is later. CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 133 Online Search Online Search covers all indexed data residing on Content Indexed enabled Clients in the CommCell browser. Indexes are retained for as long as the data exists on the client. Find Find is a meta-data based search capability inherent within the normal protected storage object index maintained by a Media Agent. Search criteria for file objects is limited to the filename and time range. Messages can also be searched on From, To, CC, and Subject line criteria.

CommVault Archiving and Content Indexing Course R00

134 Search & Discovery Module

Supported Applications and Data Types


Exchange Mailbox Lotus Notes Mail SharePoint Documents Windows File Systems

Supported Applications and Data Types


Exchange Mailbox Offline or Online. Exchange 2000/20003/2007 private and public folder mailboxes that have been protected (backed up), archived (migrated), or journaled (Compliance archive) Lotus Notes Mail Offline only. Lotus Notes Mail boxes and Journal mailbox that have been archived (migrated). SharePoint Documents Offline only. SharePoint 2003/2007 documents that have been protected (backed up) or archived (migrated). Windows File Systems Online or Offline. NDMP backups and NTFS formatted volume files only.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 135

Search Consoles
CommCell Console Search Web Search Console Outlook Add-in Search

Search Consoles

CommVault Archiving and Content Indexing Course R00

136 Search & Discovery Module

CommCell Console Search


Search Options Schedule Options Restore Options

CommCell Console Search


The CommCell console Search is available with the Console and its search interface offers a Search dialog similar to the Web-based Search Console. This Search Console is available from the CommCell group level of the CommCell Browsers navigational panel and from any Computer Client that has their Advanced Enable Content Indexing option checked. Only Compliance Search enabled Users can use the CommCell Console Search. Only Offline content can be searched using the CommCell Console Search.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 137

Search Options

Search possible at CommCell or Client levels Offline data search only Searches can be scheduled Search results can be restored, reported, and/or saved.

Search Options
The CommCell Search Console facilitates Compliance Search capability for authorized CommCell users. A description of various portions of the interface are given below: Search For - Use the entry space to enter (or select) the text string or wildcard pattern that you wish to search for. Files - Provides a group of fields that allow you to specify additional search criteria for files, to further refine the search. Mail - Provides a group of fields that allow you to specify additional search criteria for emails, to further refine the search. Advanced Options - Displays additional search criteria such as group or user ownership and Client computer or Client Computer groups that you can define to further refine your search. Schedule - Click to schedule the search.

CommVault Archiving and Content Indexing Course R00

138 Search & Discovery Module

Restore Options

To Mailbox

Different Path only Outlook must be installed

To PST File

Restore Options
E-mail messages can be restored on demand to the Original Client, or a different compatible Client. They can also be restored to a PST file. Scheduled restores can only restore data as files in native format or .msg (Exchange) or .xml (Domino Mail). Normal restore advanced options are available for both on demand and scheduled restores. Its important to note that you cannot restore search result messages using the same path option. However, you can restore to the originating or any alternate mailbox using the Recover to Different Path option. The reasoning behind this lies in the method with which the data is located on the storage media. In the case of a search-based restore, the location information is derived from the content index not the normal Media Agent index, and the originating Mailbox is not part of that index. This is called a Power Restore and the job will appear on the CommCell Consoles Job Controller window as such. Prior to restoring Exchange emails as a PST file or as a .msg file along with other data types from the search results in the CommCell Console, make sure that Outlook 2003 or above is installed on the destination client, or else the restore operation may fail.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 139

Web Search Console


Installing the Web Search Server End User Search Console Compliance Search Console Understanding Quick View

Web Search Console


The Web Search Console is provided via the Web Search Server. The Web Search Server can be installed on and co-exist with other CommCell components such as the CommServe database, Media Agents, or Content Indexing Admin node. However, considerations for performance based on demand and resource availability should be carefully considered before multi-tasking a host with these components.

CommVault Archiving and Content Indexing Course R00

140 Search & Discovery Module

Installing Web Search Server

Minimum Requirements

Windows 2003/SQL 2005 32bit host Microsoft IIS 6.0+ .Net Framework 2.0 Visual J# 2.0 File System iDataAgent Standalone (recommended) On Content Indexing Node On CommServe database

Locating the Web Search Server


Installing Web Search Server


The Search Console is a resource intensive application and should be installed on a dedicated server. Hence, we recommend that the Search Console not be installed on a computer running other applications, such as Microsoft Exchange Server, an Oracle database, etc. The Search Console is installed on a Windows Server with IIS server configured to provide a web-based interface for end-users and compliance users to search for data. The IIS Server can be configured to provide additional security beyond the security provided by the Web Search page.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 141

End User Search Console


Main Menu Options E-Mail Filters File Filters Advanced Filters

End User Search Console


The web-based Search Console is a search interface client that works in conjunction with a Web Search Server which allows searches to be performed remotely through a web browser. The Search Console is a convenient way to perform offline and online searches without needing access to a CommCell group. Because the Search Console allows you to search data from both online and offline content indexes, some of the data objects returned as search results may not need to be restored in order to view full copies of the data. Specifically, full copies of data objects returned as search results from online content indexes can be viewed by simply clicking on the link provided for the data object in the Search Results display pane of the Search Console. However, this capability requires that a directory on the Web Search Server be configured as a share. Data restored from Search Results is cached in the JobResults folder of the File System iDataAgent installed on the Web Search Server. Storing job results on a UNC path for data restored from Search Results is not supported.

CommVault Archiving and Content Indexing Course R00

142 Search & Discovery Module

Main Menu Options

Authorized AD users only can log on Queries can be crafted and saved Review Sets can be saved Owned/Accessible Objects can be accessed and saved

Using the Web Search Console


A description of various portions of the Web Search Console interface are given below: Use the entry space at the top of the screen to enter the text string or wildcard pattern that you wish to search for. Search All Data - Provides the options to search only protected/archived data or file server/desktop data or both. All Words - Provides the options to search all the words, any word, or phrase from the text specified in the entry space. Main Menu Options Home - Return to the Home view of the Search Console. Job Status - Displays the status of search restore operations. The Job Status window displays a maximum of 25 most recent search restore operation status, which are not older than 7 days. Query Builder - Provides an entry pane where you can construct your own SQL custom queries, and save them for later use. For more information on writing SQL queries refer to Microsoft documentation. Note that this tool can be used only by a compliance user. CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 143 My Queries - Displays a list of custom SQL queries that you have set up. Search Result - Displays the results of a search operation. Downloads - Displays files containing search results that were previously exported to a PST or CAB file format, which can then be downloaded.

CommVault Archiving and Content Indexing Course R00

144 Search & Discovery Module

E-Mail Filters

E-Mail Filters
This option group specifies search criteria for e-mail data types. Search in mail - Select this option to enable search criteria for this option group. Checkboxes are also provided within this option group that allow you to specify which message types you would like to search (Archived Mail, Protected Mail, Journaled Mail and/or Archived Mail ). Email address - Use this space to narrow the search to the specified e-mail addresses using an Alias Name, Display Name, or that are in Simple Mail Transfer Protocol (SMTP) format (for example: user1@company.com;user2@company.com). If more than one e-mail address is entered, use the semi-colon ';' to separate the additional entries. Subject - Use this space to narrow the search to e-mails with a subject line containing the specified text string or wildcard pattern. This field allows you to search partial words without the need for wildcard characters at the beginning and/or end of the search string. From - Use this space to narrow the search to e-mails that were sent from the specified user(s). If more than one user is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/Exact Phrase/None of them). To - Use this space to narrow the search to e-mails that were sent to the specified user(s). If more than one user is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/Exact Phrase/None of them). When CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 145 searching Public Folder data using this field, keep in mind that only e-mails posted to mail-enabled Public Folders will be searchable. If you wish to search posts made to a Public Folder, use the Subject or From fields instead. CC - Use this space to narrow the search to e-mails that were sent to the specified Carbon Copy (CC) recipients. If more than one user is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/Exact Phrase/None of them). BCC - Use this space to narrow the search to e-mails that were sent to the specified Blind Carbon Copy (BCC) recipients. If more than one user is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/Exact Phrase/None of them). Attachment Name - Use this space to narrow the search to e-mails containing the specified attachment name. If more than one attachment name is entered, use the semicolon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/Exact Phrase/None of them). Received - Select an entry from the drop-down list to narrow the search according to the specified received date criteria (Any, Today, Yesterday, This Week, This Month, This Year, Is, After, Before, Between); depending on your selection additional date range fields may appear below the Received field.

CommVault Archiving and Content Indexing Course R00

146 Search & Discovery Module

File Filters

File Filters
This option group specifies search criteria for file or document data types. Search in Files - Select this option to enable search criteria for this option group. Checkboxes are also provided within this option group that allow you to specify which file types you would like to search (Archived Files, Backup Files, Archived Documents and/or Backup Documents). Look in folder - Use this space to narrow the search to the specified folder or directory. Search by Modified Date - Select or specify a date range for narrowing file searches (Any, Current Week, Last Week, Current Month, Last Month, Current Year, Last Year, Specify Date Range). This field is only enabled when Modified Date/time or Created Date/time is specified in the Search by field. All or part of the File Name - Use this space to narrow the search to the specified file name or wildcard pattern (for example: *.doc, *.pdf, etc.). Size - Use this space to narrow the search by file size or size range.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 147

Advanced Filters

Advanced Filters
This option group allows you to search all versions or latest version of the data, and to limit the search to the specified client(s). Client Computers - Specifies the clients on which the search will be performed. List boxes are provided to choose the clients from a list of available clients. Select Version - Specifies whether to search all versions or only the latest version of the files to be searched. Options are provided to Select All Versions or Select only latest version. If you select Select only latest version, note that the total number of hit count displayed on top of the page may not match the actual number of search items listed. Select Query Language - Select the language in which the search will be performed.

CommVault Archiving and Content Indexing Course R00

148 Search & Discovery Module

Delegate Search

Delegate Search
The Delegate Search dialog window does not appear in the web-search until you check the Enable Exchange Delegate Search option in the CommCell Consoles Control Panels Browse/Search/Recovery control. The Exchange Clients option to Optimize Data for Search should also be checked in the Client Properties Advanced tab dialog window. Delegate Access: Someone granted permission to open another person's folders, create items, and respond to requests for that person. The person granting delegate permission determines the folders the delegate can access and the changes the delegate can make. By default, if you grant someone access to your folders, that delegate has access to the items in the folders, except items marked private. You must grant additional permissions to allow access to private items.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 149

Compliance Search Console


Accessibility Discovery Options Job Options Understanding Quick View Legal Hold

Compliance Search Console

CommVault Archiving and Content Indexing Course R00

150 Search & Discovery Module

Accessibility

Authorized AD users and CommCell Users can log on Queries can be crafted and saved Review Sets can be saved All Objects can be accessed and saved

Accessibility

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 151

Discovery Options

Discovery Options
This option group allows to you select additional compliance search criteria for Files and E-mails. You can narrow the search to files and/or messages owned by the specified user(s) and/or user group(s). A drop-down list is provided which allows you to specify whether to search only Files, Emails or Both. Users - Use this space to narrow the search by specifying one or more users who are owners of the data objects to be searched (for example: Domain\User). If more than one User is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/None of them). Search all users within user groups - Specifies whether to search data owned by all users within the User Group(s) specified below. User Group(s) - Use this space to select one or more User Groups in which to search (for example: Administrators). If more than one User Group is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/None of them).

CommVault Archiving and Content Indexing Course R00

152 Search & Discovery Module Files accessible by - Specifies whether to narrow the search to files accessible by the specified user(s). Users - Use this space to narrow the search by specifying one or more users who are owners of the files to be searched (for example: Domain\User). If more than one User is entered, use the semi-colon ';' to separate the additional entries. Optionally, you can further refine searches for this field by selecting a Condition from the corresponding list (All of them/Any of them/None of them).

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 153

Job Options

Job Options
This option group allows you to perform a compliance search by Job ID associated with Content Indexing jobs and/or Backup Jobs and search for files that failed to Content Index. Restricting or focusing the search by job allows you faster and more specific results. Search for files that failed to Content Index helps meet compliance rules that require proof that all objects have been search. These files can then be included in the results and manually searched if necessary. Backup or Content Indexing Job IDs - Use this space to enter one or more backup or content indexing Job IDs. If more than one Job ID is entered, use the semi-colon ';' to separate the additional entries. While searching for file server/desktop items using Job ID, the search will display results based on the following conditions: Search objects failed to index Select this option to perform a compliance search for data that failed to be content indexed. Failed from date - Click the calendar icon for this field to specify the starting date to search for data objects that failed to be content indexed. Failed till date - Click the calendar icon for this field to specify the ending date to search for data objects that failed to be content indexed. CommVault Archiving and Content Indexing Course R00

154 Search & Discovery Module

Tag

Tag
Tagging is the categorizing of data in a search review set. Tags associated with a result item can be used in later search criteria. Tags can be associated by Compliance users to search result items from the Search Console and via an Automatic Content Classification policy operation. While the Search Console allows you to tag the documents interactively, the Automatic Content Classification policy enables you to schedule the tagging operation. More than one tag can be associated with a data item. Apart from user-defined tags, there are also pre-defined system tags available. Tags can be created, enabled/disabled, and deleted. Disabled tags cannot be assigned to new items but are still available for search. If no item is associated with the disabled tag, the tag will be automatically deleted. Deleted tags are removed from existing items and are not available for search. Whenever a search item is tagged, the Content Indexing Engine updates the search item with the tag information. When you associate a tag to search items in a review set, you have the option to synchronize the tags with the Content Indexing Engine. Associated tags that are not synchronized with the Content Indexing Engine are called Transient Tags. Transient tags will be associated for the search item in the specific review set only and will not get reflected for the same search item in other review sets. When you synchronize the tag with the Content Indexing Engine, they become Persisted Tags and will be available for the search item in all the review sets. Also, you will be able to search for the specific search item based on the persisted tag from the Advanced Search window. In addition, you can also filter the search items in a review set based on the persisted tags. CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 155 The new tags associated with the search item will be stored in the web search server database. This information will be available in the web search server as long as the search items with the tags are present in the review set.

CommVault Archiving and Content Indexing Course R00

156 Search & Discovery Module

Legal Hold

Legal Hold Overview Legal Hold Set Legal Hold Process Legal Hold Recovery

Legal Hold

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 157

Legal Hold Overview

Identify and segregate relevant data

Send Result set to a Legal Hold Set Define unique retention criteria Available in the Web Search Console only

Preserve relevant information

Performed by a compliance user

Legal Hold Overview


Legal Hold is a status and action assigned by a user to data in a Compliance Search Consoles review set. You can only initiate a Legal Hold action from the review set page view of the Search Console. All application types supported by the Content Indexing platform are supported by Legal Hold. The Legal Hold feature is provided to enable Compliance administrators to capture and uniquely hold selected objects pertaining to a legal or corporate investigation. Objects found in a search may reside on different media with different retention criteria. Consistently preserving all relevant objects is best handled by collecting these objects for management in a single container.

CommVault Archiving and Content Indexing Course R00

158 Search & Discovery Module

Legal Hold Set

Legal Hold Set


A Legal Hold Set is a special type of On Demand Backup Set containing an unaltered copy of the original data. Once you perform a search operation and move the search items to a review set, you can select specific search items from the review set and move them to a new or an existing Legal Hold Set to retain them for a selected retention time. A Legal Hold Set is used/generated on the CommServes File System iDataAgent. Multiple Legal Hold Sets can be created. New or additional review set data can be added to a new or an existing Legal Hold Set The items in the Legal Hold Set are retained for a period either specified by the retention policy or by the extended retention time selected by the user while creating the new Legal Hold Set. All the files selected for Legal Hold are assigned to the Legal Hold Set and are archived as a Legal Hold archive operation. During this process, the Job Controller window in the CommCell Console displays a restore job followed by a backup job. When you browse a Legal Hold Set for data, you will notice that the Browse window does not display the actual path to the files. This is because in a Legal Hold situation the backup files are restored from the client to a web server and then archived as a Legal Hold operation in the CommServe system. As a result, all the Legal Hold operations will be associated only with the CommServe system and hence the actual path to the files is not available. CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 159 Specific notes on Legal Hold: Legal Hold data can be encrypted and stored on a single instanced storage for long term retention. To do this, once the Legal Hold Set is created, you need to enable single instancing and data encryption on the Legal Hold Set Subclient. Once the data is restored to the web server, Legal Hold archives the data as file system data.

CommVault Archiving and Content Indexing Course R00

160 Search & Discovery Module

Legal Hold Process

Legal Hold Process


On initiating a Legal Hold, the selected review set items are retrieved from their respective storage policy copies to the cache on the web server and archived together as a Legal Hold Set to a Legal Hold-enabled storage policy. One or more Legal Hold-enabled storage policies must be created prior to initiating a Legal Hold action. When creating a new storage policy, an option is displayed in the Storage Policy Creation Wizard to select whether this new storage policy can be used for Legal Hold purposes or not. Each Legal Hold action can use either a previously created Legal Hold Set and its associated storage policy or you can create a new Legal Hold Set with the same or different storage policy. Multiple Legal Hold actions to the same Legal Hold set will appear in the CommCell Console as separate jobs in the same Legal Hold set Specific notes on Legal Hold: You can create multiple Legal Hold storage policies. Once you enable Legal hold for a storage policy, it cannot be disabled later. Once you create a Legal Hold storage policy, it is available for selection from the Search Console when creating Legal Holds. When you define a storage policy for legal hold, if the storage policy has multiple copies with varying retention time, the highest retention time among the copies will be displayed as the default retention period for the Legal Hold. However, you can also extend the default retention period during Legal Hold creation. CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 161

Legal Hold Recovery

Legal Hold Recovery


The Web Search Console enables you to retrieve and view the entire Legal Hold set of data to a new review set. The review set actions pertaining to a normal review set is also applicable for the review set containing Legal Hold data. The CommCell Console provides you the facility to retrieve all the items or selected items from the Legal Hold to a desired location. You can retrieve the Legal Hold data from the Legal Hold set in any of the following methods: View the Legal Hold operations from the Legal Hold set and retrieve the selected Legal Hold data. Browse the Legal Hold set and retrieve selected Legal Hold data. When you browse for Legal Hold data, you will notice that the Browse window does not display the actual path to the files. This is because the backup files are restored from the client to a web server and then archived as a Legal Hold operation in the CommServe Host. As a result, all the Legal Hold operations will be associated only with the CommServe Host and hence the actual path to the files is not available. To resolve this issue, the Browse window displays a system generated path in the following order: o <Legal Hold Set name> o Legal Hold o <Legal Hold name> CommVault Archiving and Content Indexing Course R00

162 Search & Discovery Module o <CommCell ID Number> (This is CommCell ID of the CommCell from which the files were initially backed up to the media) o <Client name> (This reflects the name of the client when the files were backed up initially) o <iDataAgent name> (The iDataAgent for the file type during original backup) o <Files>

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 163

Outlook Add-in Search

Each Message and Attachment counts as an object Outlook must be installed on search host in order to view recalled message(s) (.msg format) Viewed/Recalled Message(s) can be forwarded through Exchange or saved as .pst file

Exchange Message Search


Prior to restoring Exchange emails from the search results in the Search Console, make sure that Outlook 2003 or above is installed on the web server, or else the restore operation will fail. Similarly, prior to restoring Exchange emails along with other data types from the search results in the CommCell Console, make sure that Outlook 2003 or above is installed on the destination client, or else the restore operation may fail. Perform the following security configuration tasks for the Outlook Add-In as appropriate for your implementation: Enable Single Sign On for each mailbox user that you would like to grant search and restore capabilities. In order to take advantage of basic Find and Search Console capabilities from Outlook Add-In, end-users and compliance users must be granted full permissions for the following registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\Outlook\Addins\Galaxy.Ex 2KMBDM.CVEAAddin; To set permissions for this key, from Registry Editor rightclick the registry key and select Permissions, click Add, type in the <User ID> then click OK. Select Allow Full Control, then click Apply.

CommVault Archiving and Content Indexing Course R00

164 Search & Discovery Module In order to take advantage of Search Console capabilities from the Outlook Add-In, perform the following configuration tasks: On the client where Outlook Add-In is installed, edit the UIOptions registry key to add 128 to the existing value. This will enable the Search Console toolbar button in Outlook with the default capability of performing End-User Searches. After editing the UIOptions registry key, if you want to change the default capability to be Compliance Searches instead of End-User Searches, then you will need to create the SearchPageURLOption registry key with a value of 1 on the Outlook Add-In client. Stop and re-start the Outlook session for each change to take effect.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 165

Search Administration
Managing Search Permission User Resource Administration User Constraints Protecting the Web Search Server

Search Administration
One of the key features of content search is the ability to filter the search by object ownership and/or read access. When allowing End User search capability, this is particularly important. Prevention of inadvertent exposure of information requires that objects have adequate user security in place.

CommVault Archiving and Content Indexing Course R00

166 Search & Discovery Module

Managing Search Permissions

Active Directory Integration

Required for End User Search Compliance Search End User Search Client Group, Client, iDataAgent, or BackupSet

CommCell User Group Capability


CommCell User Group Association

Managing Search Permissions


Active Directory Integration Only Active Directory (AD) enabled users can perform end-user searches. The AD user GUID is required in order to determine access privileges. The users AD group must be associated with a CommCell User Group that has End User Search capability. CommCell User Group Capability CommCell Users who are not AD enabled can only perform compliance searches. To do so they must be a member of a CommCell User Group that has Compliance Search Capability. Only members of the Master CommCell User Group have Compliance or End search capabilities by default CommCell User Group Association For User defined CommCell User Groups, Search capability is only extended to those Client Groups, Clients, iDataAgent, or Backupset which have been associated to the CommCell User Group. CommCell User groups only associated at the subclient level cannot perform searches.

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 167

User Resource Administration

Managed via SearchAdmin web page Must be a CommCell Administrator Limits set per user

Display in Search/Result set (20) Saved Query/Result Sets (20) Disk Quota (20000KB) Retention Days

User Resource Administration


The User Administration page for the Search Console provides options to manage the disk space utilization and search result display for each user. In order to access the User Administration page, you need to be a member of the Master CommCell User Group.

CommVault Archiving and Content Indexing Course R00

168 Search & Discovery Module

User Constraints

User Constraints
In addition to resource management options, the User Administration page displays the Last login time and Last logged in system details for each user. Note that, you can configure and view individual user details for only those users who have performed a search operation using the Search Console. NOTE: The default settings used for each new user can be changed by editing the appropriate field/row in the Web Search Servers DM2 databases DMSetting table. However, editing of the SQL Server database is not recommended or supported for end users. To optimize the disk space in the job result directory and customize the display of the search results, you need to configure the following options for each user in the User Administration page: Display in Search/Review set For faster display or scroll, the number of items displayed in a Search results window or in a saved Review set window can be configured. The default is 20 each. These settings can be changed for each individual user on the SearchAdmin web page. Saved Queries/Review sets Queries or Review sets created by the user in the Search Console can be saved, viewed, and re-used. The query syntax and review set listing are saved in the Web Search Server. Each user has a quota on the number of queries/review sets they can save. The default is CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 169 20 of each. These settings can be changed for each individual user on the SearchAdmin web page. Disk Quota Offline Review set items selected for quick view must be restored to a temporary cache in order to allow viewing. This temporary cache is located in the JobResults folder of the Web Search Server. Retention Days Queries are retained forever until manually deleted. Review sets are retained for a set number of days. The default is 90 days. This setting can be changed for each individual user on the SearchAdmin web page.

CommVault Archiving and Content Indexing Course R00

170 Search & Discovery Module

Protecting the Web Search Server


Use File System iDataAgent to back up installed software Install SQL iDataAgent to protect the DM2 database

Protecting the Web Search Server


A File System iDataAgent is required and automatically installed when installing the Web Search Server. The File System iDataAgent provides the means to restore/backup review sets from offline storage and should be used to protect the Web Search Server. If the sole function of the host system is the Web Search Server, then only a File System backup of the JobResults folder (review set/view cache) and a SQL iDataAgent backup of the DM2 database (review set information/saved queries) would be necessary. However, since IIS is involved and may be custom configured along with other system settings, a full system backup is recommended. Should the Web Search Server fail, the operating system can be re-installed following company procedures and The Web Search Server and File System iDataAgent software can be installed from the original Simpana software installation DVD. A restore of the JobResults folder and the SQL DM2 database should follow. If a SQL iDataAgent is not available to protect the DM2 database and SQL Server instance, you could, as an alternative do a File System VSS backup of the DM2 database files

CommVault Archiving and Content Indexing Course R00

Search & Discovery Module - 171

Best Practices

Verify user security to Console and limit End User Searches Restrict access to Compliance Search Console Monitor restore cache space for large number of users Implement Training for all users

Best Practices
Verify your user security implementation and limit the scope if enabling End User Search Severely limit who has Compliance Search capability Monitor restore cache space consumption with large number of users Train your Users! Remember Legal Hold is only available with license and via the Compliance Search web page.

CommVault Archiving and Content Indexing Course R00

172 Search & Discovery Module

Module Summary

Key points to remember

Summary
Search uses content indexes generated by offline and online data passed to the Content Indexing Engine. Find uses data protection indexes maintained by the CommServe system and Media Agents. Compliance Search and End User Search capabilities must be assigned to the CommCell User Group and associated down to the backupset level in order for a search the data to be performed. Compliance Search can be performed from the CommCell Console Search. End User cannot. Offline data can be searched using the CommCell Console Search feature from the CommServe system or enabled Client levels. Online data can only be searched at the job level Legal Hold is only available with license and the only via the Compliance Search web page. Both End User and Compliance Search windows can be enabled in Outlook via registry values. Constraints on user search resource consumption is managed via the Search Administration web page. Protect the JobResults folder and DM2 database on the Web Search Server Train your users!

CommVault Archiving and Content Indexing Course R00

CommVault Archiving and Content Indexing Course R00

You might also like