Professional Documents
Culture Documents
Abstract
This session will explore TSM server operations, daily maintenance, and best practices to optimize your TSM server. The speaker will also discuss the administrative and reporting capabilities for the server along with examples and rationale for managing and scheduling server maintenance tasks.
Agenda
Revisiting TSM practices Lifecycle Best Practices Workflow Scripts and Sequencing Schedules Operational Limits Monitoring References
TSM Server
TSM Server and Database Change based on: Growth Changes to H/W and vendors
Storage Hierarchy
Storage hierarchy changes based on: Capacity requirements Changes to H/W and vendors Performance and cost needs of an organization
Client Workloads change over time: More clients More data Different types of clients Network/infrastructure changes
Take a Look at Your TSM Environment: Are you using and exploiting the best functions
and features TSM offers? Has your environment changed such that TSM is not being used optimally?
Reporting and monitoring introduced in V6.1. Release to release improvements in: Install Configuration Deployment More Reports
An aggregated view of reporting and monitoring for the entire TSM environment
6
6
With TSM 6.2, the administration center can be used to orchestrate the push of updates to windows clients
7 7
Server Workload
Whether viewed as a sine wave cycle or the Wheel of Life, some view of the cyclic nature of TSM operations and the daily support for these operations is helpful
10 2011 IBM Corporation
Observing System Resource Relative to Workflow Cycle Will Help Provide Guidance For Changes
The peaks during workload are limited by total available resource on the machine (CPU, Memory, I/O throughput, etc) The client workloads are usually done using schedules Most often, the main data ingest is through a nightly backup window which may be one or more schedules initiating the backup of various groups of clients The server actions are the back-end maintenance actions necessary to protect the client data by performing backup storage pools position the data appropriately in the hierarchy based on policies, storage management, and the data flow through the server perform the other server operations to keep the database, storage hierarchy, and system healthy and ready for the next set of actions Client operations may happen (and often do) throughout the day For example, archive operation for DB logs can occur as needed as opposed to limited only to the nightly ingest window Resources such as mount points need to be considered for these always possible operations
11
12
Database
DB2
TSM Server
Storage Hierarchy
Disaster Recovery and Availability: Onsite recovery through DB restore or clustering (where available). Offsite recovery through DB restore + copy storage pools. Other offsite recovery techniques
Protect the server: Data movement activities (reclamation, migration) Expiration Identify processing for deduplication enabled environments. Protect the client data: Storage pool backup Copy active Database backup
13
Prepare and Execute for Disaster Recovery: DELETE VOLHIST MOVE DRMEDIA PREPARE
Identify
Table Reorganization
14
16
Server A
(Disaster Recovery)
Replication of server database (DB, log) using either: Device level replication with consistency groups. V6.2 server using database HADR.
DB DB
Replication of storage pool(s) using: Disk device level replication with consistency groups. VTL to VTL system replication.
17
18
PARALLEL
5 Commands run in parallel.
SERIAL
Re-converge to single when SERIAL keyword encountered.
19
Example Script
PARALLEL BACKUP STGPOOL X WAIT=YES BACKUP STGPOOL Y WAIT=YES BACKUP STGPOOL Z WAIT=YES SERIAL PARALLEL MIGRATE STGPOOL X HIGHMIG=nn LOWMIG=mm RECLAIM=NO WAIT=YES MIGRATE STGPOOL Y HIGHMIG=nn LOWMIG=mm RECLAIM=NO WAIT=YES MIGRATE STGPOOL Z HIGHMIG=nn LOWMIG=mm RECLAIM=NO WAIT=YES EXPIRE INVENTORY DURATION=qq RESOURCE=nn WAIT=YES SERIAL PARALLEL RECLAIM STGPOOL X THRESHOLD=nn DURATION=qq WAIT=YES RECLAIM STGPOOL Y THRESHOLD=nn DURATION=qq WAIT=YES RECLAIM STGPOOL Z THRESHOLD=nn DURATION=qq WAIT=YES SERIAL BACKUP DB TYPE=FULL WAIT=YES BACKUP VOLHIST FILENAMES=/path1/volhist,/path2/volhist,/path3/volhist BACKUP DEVCONFIG FILENAMES=/path1/dc,/path2/dc,/path3/dc
20
Script Illustrated
Parallel
Reclamation (x3)
Serial
21
22
Overlap 1
Overlap 2
An example of using a spreadsheet and schedule window information to visualize schedule sequencing and overlaps.
23 2011 IBM Corporation
Time of Day
16:00 14:00 12:00 10:00 8:00 6:00 4:00 2:00 0:00 0 0.5 1 1.5 2 2.5
Proposed adjustments: Eliminate most overlap Only remaining overlap is expiration and migration which generally contend for different resources
24 2011 IBM Corporation
25
26
Operational limit may be reached when: Server overruns/saturates available CPU on system at peak workload or less then peak workload Server overruns available RAM on system and drives high pagefile use I/O bandwidth is saturated: DB or active log performance degraded because I/O cant keep up Storage pool actions performance degraded because I/O cant keep up Saturation or overrun of CPU, RAM, I/O bandwidth achieved at or prior to achieving peak workload For example, in the lab weve demonstrated that more then 1500 concurrent client sessions to the SERVER pushed it to saturation with available memory and CPU such that performance degraded significantly
27
28
Taking steps to improve infrastructure may result in faster operations and may mitigate or remedy the operational limit
For example, if the operational limit is database backup: Using a faster device for the db backup may eliminate the limit Improving I/O subsystem and bandwidth for DB and logs may address the issue
In cases where it is not possible or practical to resolve via improved or changed infrastructure, this may represent a cap to the existing server and need to implement and balance workload to another TSM server
29
30
TSM Best Practice Monitoring May Involve More than Simply TSM
TSM is a large, multi-threaded software application. It exploits or has dependencies on: CPU the application and database perform many calculation/instruction intensive operations I/O to Disk: This relates to the database, active log, and archive log Bottlenecks such as not enough parallel I/O capability or insufficient bandwidth (small channels) can affect server performance, scalability, and throughput I/O to Storage Hierarchy: This can be disk (TSM device classes of type DISK or FILE) and sequential media (Real tape and VTLs) Often controllers or other virtualized appliances used for storage devices. (SVC, VTL, etc) Devices may be locally attached (SCSI) or fiber attached (SAN) Network: TSM is a client/server application with its client operations almost entirely network driven
31
32
LAN/WAN
SAN
33
34
LAN/WAN
SAN
35
Network teams/owner typically have monitoring tools in place to: Identify and alert to outages Identify and alert to degradation
From TSM perspective: Symptoms would be failed client operations due to communication issues. (socket error, send error, receive error) Not usually evaluated or investigated unless issues are occurring
36
LAN/WAN
SAN
37
Virtualization can hide/mask errors VTL, SAN Controller, etc are systems unto themselves running: Embedded host, OS, drivers, devices, etc. Evaluation of health may require vendor involvement as the relationship between logs, devices, and errors or symptoms may not be surfaced to end-user
38
39
40
41
Conclusion
Server Workflow Priorities: Protect Client Data Maintain the Server Protect the Server Priorities then provide sequencing of actions which can Be orchestrated via scheduled (type=admin) scripts Scripts structured using PARALLEL and SERIAL semantics to sequence actions and manage resources while satisfying the workflow priority actions Operational limits Have been defined Steps to identify and possible actions have been discussed Monitoring considerations have been discussed for: Server topology And the server itself
42
43