Professional Documents
Culture Documents
Eric Barrett Technical Global Advisor Bikash R. Choudhury Technical Global Advisor Bruce Clarke Consulting Systems Engineer Ed Hsu Systems Engineer Christopher Slater Database Consulting Systems Engineer Michael Tatum Database Consulting Systems Engineer
Network Appliance, a pioneer and industry leader in data storage technology, helps organizations understand and meet complex technical challenges with advanced storage solutions and global data management strategies.
Table of Contents
Introduction .........................................................................................................3 1. Network Appliance System Configuration....................................................4 1.1. Appliance Network Settings........................................................................4 1.1.1. EthernetGigabit Ethernet, Autonegotiation, and Full Duplex ............4 1.2. Volume Setup and Options.........................................................................5 1.2.1. Databases............................................................................................5 1.2.2. Volume Size.........................................................................................5 1.2.3. Oracle Optimal Flexible Architecture (OFA) on NetApp Storage .........6 1.2.4. Best Practices for Control and Log Files..............................................6 1.3. RAID Group Size........................................................................................8 1.4. Snapshot and SnapRestore .......................................................................9 1.5. Snap Reserve.............................................................................................9 1.6. System Options ........................................................................................10 1.6.1. The minra Option ...............................................................................10 1.6.2. File Access Time Update ...................................................................11 1.6.3. NFS Settings......................................................................................11 2. Operating Systems .......................................................................................11 2.1. Linux.........................................................................................................11 2.1.1. LinuxRecommended Versions .......................................................12 2.1.2. LinuxKernel Patches ......................................................................13 2.1.3. LinuxOS Settings ...........................................................................13 2.1.4. Linux NetworkingFull Duplex and Autonegotiation .........................14 2.1.5. Linux NetworkingGigabit Ethernet Network Adapters.....................14 2.1.6. Linux NetworkingJumbo Frames with GbE ....................................15 2.1.7. Linux NFS ProtocolMount Options .................................................16 2.1.8. iSCSI Initiators for Linux ....................................................................19 2.1.9. FC-AL Initiators for Linux ...................................................................20 2.2. Sun Solaris Operating Systems................................................................20 2.2.1. SolarisRecommended Versions .....................................................20 2.2.2. SolarisKernel Patches ....................................................................20 2.2.3. SolarisOS Settings .........................................................................21 2.2.4. Solaris NetworkingFull Duplex and Autonegotiation.......................22 2.2.5. Solaris NetworkingGigabit Ethernet Network Adapters ..................23 2.2.6. Solaris NetworkingJumbo Frames with GbE ..................................23 2.2.7. Solaris NetworkingImproving Network Performance ......................24 2.2.8. Solaris IP Multipathing (IPMP) ...........................................................26 2.2.9. Solaris NFS ProtocolMount Options...............................................27 2.2.10. iSCSI Initiators for Solaris ................................................................29 2.2.11. Fibre Channel SAN for Solaris.........................................................30 2.3. Microsoft Windows Operating Systems ....................................................30 2.3.1. Windows Operating SystemRecommended Versions ....................30 2.3.2. Windows Operating SystemService Packs ....................................30 Page 2
2.3.3. Windows Operating SystemRegistry Settings ................................30 2.3.4. Windows NetworkingAutonegotiation and Full Duplex ...................31 2.3.5. Windows NetworkingGigabit Ethernet Network Adapters...............31 2.3.6. Windows NetworkingJumbo Frames with GbE ..............................32 2.3.7. iSCSI Initiators for Windows ..............................................................32 2.3.8. FC-AL Initiators for Windows .............................................................33 3. Oracle Database Settings.............................................................................33 3.1. DISK_ASYNCH_IO ..................................................................................33 3.2. DB_FILE_MULTIBLOCK_READ_COUNT ...............................................34 3.3. DB_BLOCK_SIZE 3.4. DBWR_IO_SLAVES and DB_WRITER_PROCESSES ...........................34 3.5. DB_BLOCK_LRU_LATCHES...................................................................35 4. Backup, Restore, and Disaster Recovery ...................................................35 4.1. How to Back Up Data from a NetApp System ..........................................35 4.2. Creating Online Backups Using Snapshot Copies ...................................36 4.3. Recovering Individual Files from a Snapshot Copy ..................................37 4.4. Recovering Data Using SnapRestore.......................................................37 4.5. Consolidating Backups with SnapMirror...................................................38 4.6. Creating a Disaster Recovery Site with SnapMirror .................................38 4.7. Creating Nearline Backups with SnapVault ..............................................38 4.8. NDMP and Native Tape Backup and Recovery........................................39 4.9. Using Tape Devices with NetApp Systems ..............................................40 4.10. Supported Third-Party Backup Tools .....................................................40 4.11. Backup and Recovery Best Practices ....................................................40 4.11.1. SnapVault and Database Backups ..................................................41 References ........................................................................................................45 Revision History................................................................................................46
Introduction
Thousands of Network Appliance (NetApp) customers have successfully deployed Oracle Databases on NetApp filers for their mission- and businesscritical applications. NetApp and Oracle have worked over the past several years to validate Oracle products on NetApp filers and a range of server platforms. NetApp and Oracle support have established a joint escalations team that works hand in hand to resolve customer support issues in a timely manner. In the process, the team discovered that most escalations are due to failure to follow the best established practices when deploying Oracle Databases with NetApp filers. This document describes best practices for running Oracle Databases on NetApp filers with system platforms such as Solaris, HP/UX, AIX, Linux, and
Page 3
Windows. These practices were developed through the interaction of technical personnel from NetApp, Oracle, and joint customer sites. This guide assumes a basic understanding of the technology and operation of NetApp products and presents options and recommendations for planning, deployment, and operation of NetApp products to maximize their effective use.
When configuring or reconfiguring NICs or VIFs in a cluster, it is imperative to include the appropriate partner <interface> name or VIF name in the configuration of the cluster partners NIC or VIF to ensure fault tolerance in the event of cluster takeover. Please consult your NetApp support representative for assistance. A NIC or VIF being used by a database should not be reconfigured while the database is active. Doing so can result in a database crash.
Page 4
their default autonegotiation state, unless no link is established, performance is poor, or other conditions arise that might warrant further troubleshooting. Flow control should by default be set to full on the filer in its /etc/rc file, by including the following entry (assuming the Ethernet interface is e5):
ifconfig e5 flowcontrol full
If the output of the ifstat a command does not show full flow control, then the switch port will also have to be configured to support it. (The ifconfig command on the filer will always show the requested setting; ifstat shows what flow control was actually negotiated with the switch.)
NetApp recommends that no fewer than 10 data disks be configured in volumes that require high data throughput.
ORACLE_BASE ORACLE_HOME (home1) /DBS /LOG /Admin ORACLE_HOM E (home2) /DBS /LOG /Admin
OFA $ORACLE_HOME (Oracle libraries) $ORACLE_HOME/dbs (data files) $ORACLE_HOME/log (log files)
Page 6
2.
Put the first online redo log group on one volume and the next on another volume. Oracle writes each committed transaction to the first member of each online redo log group until it fills up, then goes to the next member of each online redo log group, and so on. When all members are full, it does a checkpoint to flush the members of the Oracle redo log groups to the archived log files. Redo Grp 1: $ORACLE_HOME/Redo_Grp1 (on filer volume /vol/oracle) Redo Grp 2: $ORACLE_HOME/Redo_Grp2 (on filer volume /vol/oralog)
Archived Log Files 1. Set your init parameter, ARCHIVE_LOG_DEST, to a directory in the log volume such as $ORACLE_HOME/log/ArchiveLog (on filer volume /vol/oralog). Control Files Multiplex your control files. To do that: 1. Set your init parameter, CONTROL_FILE_DEST, to point to destinations on at least two different filer volumes: Dest 1: $ORACLE_HOME/Control_File1 (on filer volume /vol/oracle) Dest 2: $ORACLE_HOME/log/Control_File2 (on filer volume /vol/oralog) Filer /vol/vol0 /vol/oracle
Mountpoint
Server (Filer root volume) /var/opt/oracle ($ORACLE_HOME) /Binaries /Redo_Grp1 /Redo Log Member1 /Redo Log Member2 /Redo Log Member 3 /Control_File1 /Control File 1
/var/opt/oracle/dbs ($ORACLE_HOME/dbs) /Data Files /var/opt/oracle/log ($ORACLE_HOME/log) /vol/oralog Mountpoint /Redo_Grp2 /Redo Log Member1 /Redo Log Member2 /Redo Log Member3 /Control_File2 /Control File 2 /vol/oradata
Mountpoint
Page 7
Database Client
/vol/vol0 / [Root partition on Oracle DB Server machine] /vol/oracle Mount Point /var/opt/oracle [Oracle Home (Binaries) file system] /Binaries /Redo_Grp1 /Redo Log Member1 /Redo Log Member2 /Redo Log Member 3 /Control_File1 /Control File 1 /var/opt/oracle/dbs [Data File file system] /Data Files /var/opt/oracle/log [Log File file system] /Redo_Grp2 /Redo Log Member1 /Redo Log Member2 /Redo Log Member3 /Control_File2 /Control File 2
GbE
GbE
Database Client
disk. Given this additional protection, the likelihood of data loss due to a double disk failure has been nearly eliminated, and therefore larger RAID group sizes can be supported. With Data ONTAP 6.5 or later, RAID group sizes up to 14 disks can be safely configured using RAID-DP. However we recommend the default RAID group size of 16 for RAID-DP.
If you want to make the .snapshot directory invisible to clients, issue the following command:
vol options <volname> nosnapdir on
With automatic Snapshot copies disabled, regular Snapshot copies are created as part of the Oracle backup process when the database is in a consistent state. For additional information on using Snapshot and SnapRestore to back up/restore an Oracle Database, see [5].
Page 9
To set the volume snap reserve size (the default is 20%), issue this command:
snap reserve <volume> <percentage>
Do not use a percent sign (%) when specifying the percentage. The snap reserve should be adjusted to reserve slightly more space than the Snapshot copies of a volume consume at their peak. The peak Snapshot copy size can be determined by monitoring a system over a period of a few days when activity is high. The snap reserve may be changed at any time. Dont raise the snap reserve to a level that exceeds free space on the volume; otherwise client machines may abruptly run out of storage space. NetApp recommends that you observe the amount of snap reserve being consumed by Snapshot copies frequently. Do not allow the amount of space consumed to exceed the snap reserve. If the snap reserve is exceeded, consider increasing the percentage of the snap reserve or delete Snapshot copies until the amount of space consumed is less than 100%. NetApp DataFabric Manager (DFM) can aid in this monitoring.
Page 10
Generally, the read ahead operation is beneficial to databases, and the minra option should be left alone. However, NetApp recommends experimenting with the minra option to observe the performance impact, since it is not always possible to determine how much of an applications activity is sequential versus random. This option is transparent to client access and can be changed at will without disrupting client I/O. Be sure to allow two to three minutes for the cache on the appliance to adjust to the new minra setting before looking for a change in performance.
This sets the NFS transfer size to the maximum. There is no penalty for setting this value to the maximum of 32,768. However, if xfersize is set to a small value and an I/O request exceeds that value, the I/O request is broken up into smaller chunks, resulting in degraded performance.
2. Operating Systems
2.1. Linux
For additional information about getting the most from Linux and NetApp technologies, see [6].
Page 11
The volumes used for storing Oracle Database files should still be mounted with the "noac" mount option for Oracle9i RAC databases. The uncached I/O patch has been developed by Red Hat and tested by Oracle, NetApp, and Red Hat.
This is especially useful for NFS over UDP and when using Gigabit Ethernet. Consider adding this to a system startup script that runs before the system Page 13
mounts NFS file systems. The recommended size (262,143 bytes) is the largest safe socket buffer size NetApp has tested. On clients with 16MB of memory or less, leave the default socket buffer size setting to conserve memory. Red Hat distributions after 7.2 contain a file called /etc/sysctl.conf where changes such as this can be added so they will be executed after every system reboot. Add these lines to the /etc/sysctl.conf file on these Red Hat systems:
net.core.rmem_max = 262143 net.core.wmem_max = 262143 net.core.rmem_default = 262143 net.core.wmem_default = 262143
2.1.3.2. Other TCP Enhancements The following settings can help reduce the amount of work clients and filers do when running NFS over TCP:
echo 0 > /proc/sys/net/ipv4/tcp_sack echo 0 > /proc/sys/net/ipv4/tcp_timestamps
These operations disable optional features of TCP to save a little processing time and network bandwidth. When building kernels, be sure that CONFIG_SYNCOOKIES is disabled. SYN cookies slow down TCP connections by adding extra processing on both ends of the socket. Some Linux distributors provide kernels with SYN cookies enabled. Linux 2.2 and 2.4 kernels support large TCP windows (RFC 1323) by default. No modification is required to enable large TCP windows.
Page 14
available to the application, so make sure resources are adequate. Most gigabit cards that support 64-bit PCI or better should provide good performance. Any database using NetApp storage should utilize Gigabit Ethernet on both the filer and database server to achieve optimal performance. NetApp has found that the following Gigabit Ethernet cards work well with Linux: SysKonnect. The SysKonnect SK-98XX series cards work very well with Linux and support single- and dual-fiber and copper interfaces for better performance and availability. A mature driver for this card exists in the 2.4 kernel source distribution. Broadcom. Many cards and switches use this chipset, including the ubiquitous 3Com solutions. This provides a high probability of compatibility between network switches and Linux clients. The driver software for this chipset appeared in the 2.4.19 Linux kernel and is included in Red Hat distributions with earlier 2.4 kernels. Be sure the chipset firmware is up to date. AceNIC Tigon II. Several cards, such as the NetGear GA620T, use this chipset, but none are still being manufactured. A mature and actively maintained driver for this chipset exists in the kernel source distribution. Intel EEPro/1000. This appears to be the fastest gigabit card available for systems based on Intel, but the card's driver software is included only in recent kernel source distributions (2.4.20 and later) and may be somewhat unstable. The card's driver software for earlier kernels can be found on the Intel Web site. There are reports that the jumbo frame MTU for Intel cards is only 8998 bytes, not the standard 9000 bytes.
Page 15
because it allows the Linux NFS client to work without adjustment in most environments. Usually, on clean high-performance networks or with NFS over TCP, overall NFS performance can be improved by explicitly increasing these values. With NFS over TCP, setting rsize and wsize to 32kB usually provides good performance by allowing a single RPC to transmit or receive a large amount of data. It is very important to note that the capabilities of the Linux NFS server are different from the capabilities of the Linux NFS client. As of the 2.4.19 kernel release, the Linux NFS server does not support NFS over TCP and does not support rsize and wsize larger than 8kB. The Linux NFS client, however, supports NFS over both UDP and TCP and rsize and wsize up to 32kB. Some online documentation is confusing when it refers to features that "Linux NFS" supports. Usually such documentation refers to the Linux NFS server, not the client. fg vs. bg. Consider using the "bg" option if a client system needs to be available even if it cannot mount some servers. This option causes mount requests to put themselves in the background automatically if a mount cannot complete immediately. When a client starts up and a server is not available, the client waits for the server to become available by default. The default behavior results in waiting for a very long time before giving up. The "fg" option is useful when mount requests must be serialized during system initialization. For example, a system must mount /usr before proceeding with multiuser boot. When /usr or other critical file systems are mounted from an NFS server, the fg option should be specified. Because the boot process can complete without all the file systems being available, care should be taken to ensure that required file systems are present before starting the Oracle Database processes. nosuid. The "nosuid" mount option can be used to improve security. This option causes the client to disable the special bits on files and directories. The Linux man page for the mount command recommends also disabling or removing the suidperl command when using this option. actimeo/nocto. Due to the requirements of the NFS protocol, clients must check back with the server at intervals to be sure cached attribute information is still valid. The attribute cache timeout interval can be lengthened with the "actimeo" mount option to reduce the rate at which the client tries to revalidate its attribute cache. With the 2.4.19 kernel release, the "nocto" mount option can also be used to reduce the revalidation rate even further, at the expense of cache coherency among multiple clients. Page 17
timeo. The units specified for the timeout option are in tenths of a second. This causes confusion for many users. This option controls RPC retransmission timeouts. By default, the client retransmits an unanswered UDP RPC request after 0.6 seconds (timeo=6). In general, it is not necessary to change the retransmission timeout, but in some cases, a shorter retransmission timeout for NFS over UDP may improve latencies due to packet losses. As of kernel 2.4.20, an estimation algorithm that adjusts the timeout for optimal performance governs the UDP retransmission timeout for some types of RPC requests. The TCP network protocol contains its own timeout and retransmission mechanism. The RPC client depends on this mechanism for recovering from the loss of RPC requests and thus uses a much longer timeout setting for NFS over TCP by default. Due to a bug in the mount command, the default retransmission timeout value on Linux for NFS over TCP is six seconds, unlike other NFS client implementations. To obtain standard behavior, you may wish to specify "timeo=600" explicitly when mounting via TCP. Using a short retransmission timeout with NFS over TCP does not have performance benefits and may introduce the risk of data corruption. sync. The Linux NFS client delays application writes to combine them into larger, more efficiently processed requests. The sync option guarantees that a client immediately pushes every write system call an application makes to servers. This is useful when an application must guarantee that data is safe on disk before it continues. Frequently such applications already use the O_SYNC open flag or invoke the flush system call when needed. Thus, the sync mount option is often not necessary. Oracle Database software specifies D_SYNC when it opens files, so the use of the sync option is not required in an Oracle environment. Noac. The "noac" mount option prevents an NFS client from caching file attributes. This means that every file operation on the client that requires file attribute information results in a GETATTR operation to retrieve a file's attribute information from the server. Note that noac also causes a client to process all writes to that file system synchronously, just as the sync mount option does. Disabling attribute caching is only one part of noac; it also guarantees that data modifications are visible on the server so that other clients using noac can detect them immediately. Thus noac is shorthand for "actimeo=0,sync." When the noac option is in effect, clients still cache file data as long as they detect that a file has not changed on the server. This allows a client to keep very close track of files on a server so it can discover changes made by other clients quickly. This option is normally not used, but it is important when an application that depends on single system behavior is deployed across several clients. Noac generates a very large number of GETATTR operations and sends write operations synchronously. Both of these add significant protocol overhead. The noac mount option trades off Page 18
single-client performance for client cache coherency. With uncached I/O the number of GETATTR calls is reduced during reads, and data is not cached in the NFS client cache on reads and writes. Uncached I/O is available with Red Hat Advanced Server 2.1, update 3, kernel 2.4.9-e35 and up. Uncached I/O works on file systems mounted with the noac mount option. Only applications that require tight cache coherency among multiple clients require that file systems be mounted with the noac mount option. nolock. For some servers or applications, it is necessary to prevent the Linux NFS client from sending network lock manager requests. Use the "nolock" mount option to prevent the Linux NFS client from notifying the server's lock manager when an application locks a file. Note, however, that the client still uses more restrictive write-back semantics when a file lock is in effect. The client always flushes all pending writes whenever an application locks or unlocks a file.
NetApp recommended mount options for an Oracle single-instance database on Linux: a) rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768 NetApp recommended mount options for Oracle9i RAC on Linux (without directI/O support, ex: RHEL 2.1, Update 3): a) Uncached I/O patch for RHEL 2.1 is release in Update 3 (e35) b) Add entry to /etc/modules.conf file: options nfs nfs_uncached_io=1 c) Use noac NFS client mount option d) Complete mount options: rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noac NetApp recommended mount options for Oracle9i RAC on Linux (with directI/O support, ex: RHEL 3.0): a) Apply direct I/O patch for RHEL 3.0 update 2. Patch obtained from Oracle metalink site, patch 2448994 b) Enable Oracle init.ora param: filesystemio_options=directio c) Use actimeo=0 NFS client mount option d) Complete mount options: rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,actimeo=0 NetApp recommended mount options for Oracle10g RAC on Linux (with directI/O support, ex: RHEL 3.0): a) Direct I/O support is built in to 10g RAC and RHEL 3.0 update 2 b) Enable Oracle init.ora param: filesystemio_options=directio c) Use actimeo=0 NFS client mount option d) Complete mount options: rw,bg,vers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,actimeo=0
insufficient to recommend any best practices at this time. This section will be revisited in the future for any recommendations or best practices for running Oracle Databases on Linux with iSCSI initiators.
NetApp recommends the use of Solaris 2.9 or Solaris 2.8 for optimal server performance.
These recommendations are in addition to, not a replacement for, the Solaris patch recommendations included in the Oracle installation or release notes. List of desired Solaris 8 patches as of January 21, 2004: Solaris 8 108813-16 SunOS 5.8: Sun Gigabit Ethernet 3.0 108806-17 SunOS 5.8: Sun Quad FastEthernet qfe driver 108528-27 SunOS 5.8: kernel update patch 108727-26 SunOS 5.8: /kernel/fs/nfs and /kernel/fs/sparcv9/nfs patch (108727-25 addresses Solaris NFS client caching [wcc] bug 4407669: VERY important performance patch) 111883-23 SunOS 5.8: Sun GigaSwift Ethernet 1.0 driver patch List of desired Solaris 9 patches as of January 21, 2004: Solaris 9 112817-16 SunOS 5.9: Sun GigaSwift Ethernet 1.0 driver patch 113318-10 SunOS 5.9: /kernel/fs/nfs and /kernel/fs/sparcv9/nfs patch (addresses Solaris NFS client caching [wcc[ bug 4407669: VERY important performance patch) 113459-02 SunOS 5.9: udp patch 112233-11 SunOS 5.9: kernel patch 112854-02 SunOS 5.9: icmp patch 112975-03 SunOS 5.9: patch /kernel/sys/kaio 112904-09 SunOS 5.9: kernel/drv/ip patch; obsoletes 112902-12 112764-06 SunOS 5.9: Sun Quad FastEthernet qfe driver Failure to install the patches listed above can result in database crashes and/or slow performance. They must be installed. Please note that the "Sun EAGAIN bug"SUN Alert 41862, referenced in patch 108727can result in Oracle Database crashes accompanied by this error message: SVR4 Error 11: Resource temporarily unavailable The patches listed here may have other dependencies that are not listed. Read all installation instructions for each patch to ensure that any dependent or related patches are also installed.
Page 21
Solaris file descriptors: rlim_fd_cur. "Soft" limit on the number of file descriptors (and sockets) that a single process can have open rlim_fd_max. "Hard" limit on the number of file descriptors (and sockets) that a single process can have open Setting these values to 1024 is STRONGLY recommended to avoid database crashes resulting from Solaris resource deprivation. Solaris kernel "maxusers" setting: The Solaris kernel parameter "maxusers" controls the allocation of several major kernel resources, such as the maximum size of the process table and the maximum number of processes per user.
Page 22
Note: The instance may be other than 0 if there is more than one Gigabit Ethernet interface on the system. Repeat for each instance that is connected to NetApp storage. For servers using /etc/system, add these lines:
set ge:ge_adv_pauseRX=1 set ge:ge_adv_pauseTX=1 set ge:ge_intr_mode=1 set ge_ge_put_cfg=0
Note that placing these settings in /etc/system changes every Gigabit interface on the Sun server. Switches and other attached devices should be configured accordingly.
SysKonnect provides SK-98xx cards that do support jumbo frames. To enable jumbo frames, execute the following steps: 1. Edit /kernel/drv/skge.conf and uncomment this line:
JumboFrames_Inst0=On;
3. Reboot. If using jumbo frames with a SysKonnect NIC, use a switch that supports jumbo frames and enable jumbo frame support on the NIC on the NetApp system.
packets or retransmits, because this setting forces the NIC card to perform flow control. If the NIC gets overwhelmed with data, it will signal the sender to pause. It may sometimes be beneficial to set this parameter to 0 to determine if the sender (the NetApp system) is overwhelming the client. Recommended settings were described in section 2.2.6 of this document. /dev/ge adv_pauseRX 1. Forces receive flow control for the Gigabit Ethernet adapter. Receive flow control provides a means for the receiver to govern the amount of data received. A setting of "1" is the default for Solaris. /dev/ge adv_1000fdx_cap 1. Forces full duplex for the Gigabit Ethernet adapter. Full duplex allows data to be transmitted and received simultaneously. This should be enabled on both the Solaris server and the NetApp system. A duplex mismatch can result in network errors and database failure. sq_max_size. Sets the maximum number of messages allowed for each IP queue (STREAMS synchronized queue). Increasing this value improves network performance. A safe value for this parameter is 25 for each 64MB of physical memory in a Solaris system up to a maximum value of 100. The parameter can be optimized by starting at 25 and incrementing by 10 until network performance reaches a peak. Nstrpush. Determines the maximum number of modules that can be pushed onto a stream and should be set to 9. Ncsize. Determines the size of the DNLC (directory name lookup cache). The DNLC stores lookup information for files in the NFS-mounted volume. A cache miss may require a disk I/O to read the directory when traversing the pathname components to get to a file. Cache hit rates can significantly affect NFS performance; getattr, setattr, and lookup usually represent greater than 50% of all NFS calls. If the requested information isn't in the cache, the request will generate a disk operation that results in a performance penalty as significant as that of a read or write request. The only limit to the size of the DNLC cache is available kernel memory. Each DNLC entry uses about 50 bytes of extra kernel memory. Network Appliance recommends that ncsize be set to 8000. nfs:nfs3_max_threads. The maximum number of threads that the NFS V3 client can use. The recommended value is 24. nfs:nfs3_nra. The read-ahead count for the NFS V3 client. The recommended value is 10.
Page 25
nfs:nfs_max_threads. The maximum number of threads that the NFS V2 client can use. The recommended value is 24. nfs:nfs_nra. The read-ahead count for the NFS V2 client. The recommended value is 10.
Page 26
Page 27
all circumstances, the NFS read/write size should be the same as or greater than the Oracle block size. For example, specifying a DB_FILE_MULTIBLOCK_READ_COUNT of 4 multiplied by a database block size of 8kB results in a read buffer size (rsize) of 32kB. NetApp recommends that DB_FILE_MULTIBLOCK_READ_COUNT should be set from 1 to 4 for an OLTP database and from 16 to 32 for DSS. Vers. Sets the NFS version to be used. Version 3 yields optimal database performance with Solaris. Proto. Tells Solaris to use either TCP or UDP for the connection. Previously UDP gave better performance but was restricted to very reliable connections. TCP has more overhead but handles errors and flow control better. If maximum performance is required and the network connection between the Sun and the NetApp system is short, reliable, and all one speed (no speed matching within the Ethernet switch), UDP can be used. In general, it is safer to use TCP. In recent versions of Solaris (2.8 and 2.9) the performance difference is negligible. Forcedirectio. A new option introduced with Solaris 8. It allows the application to bypass the Solaris kernel cache, which is optimal for Oracle. This option should only be used with volumes containing data files. It should never be used to mount volumes containing executables. Using it with a volume containing Oracle executables will prevent all executables stored on that volume from being started. If programs that normally run suddenly wont start and immediately core dump, check to see if they reside on a volume being mounted using forcedirectio. The introduction of forced direct I/O with Solaris 8 is a tremendous benefit. Direct I/O bypasses the Solaris file system cache. When a block of data is read from disk, it is read directly into the Oracle buffer cache and not into the file system cache. Without direct I/O, a block of data is read into the file system cache and then into the Oracle buffer cache, double-buffering the data, wasting memory space and CPU cycles. Oracle does not use the file system cache. Using system monitoring and memory statistics tools, NetApp has observed that without direct I/O enabled on NFS-mounted file systems, large numbers of file system pages are paged in. This adds system overhead in context switches, and system CPU utilization increases. With direct I/O enabled, file system page-ins and CPU utilization are reduced. Depending on the workload, a significant increase can be observed in overall system performance. In some cases the increase has been more than 20%. Direct I/O for NFS is new in Solaris 8, although it was introduced in UFS in Solaris 6. Direct I/O should only be used on mountpoints that house Oracle Page 28
Database files, not on nondatabase files or Oracle executables or when doing normal file I/O operations such as dd. Normal file I/O operations benefit from caching at the file system level. A single volume can be mounted more than once, so it is possible to have certain operations utilize the advantages of forcedirectio while others dont. However, this can create confusion, so care should be taken. NetApp recommends the use of forcedirectio on selected volumes where the I/O pattern associated with the files under that mountpoint do not lend themselves to NFS client caching. In general these will be data files with access patterns that are mostly random as well as any online redo log files and archive log files. The forcedirectio option should not be used for mountpoints that contain executable files such as the ORACLE_HOME directory. Using the forcedirectio option on mountpoints that contain executable files will prevent the programs from executing properly.
NetApp recommended mount options for Oracle single-instance database on Solaris: rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio NetApp recommended mount options for Oracle9i RAC on Solaris: rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,noac
Multiple Mountpoints To achieve the highest performance, transactional OLTP databases benefit from configuring multiple mountpoints on the database server and distributing the load across these mountpoints. The performance improvement is generally from 2% to 9%. This is a very simple change to make, so any improvement justifies the effort. To accomplish this, create another mountpoint to the same file system on the NetApp filer. Then either rename the data files in the database (using the ALTER DATABASE RENAME FILE command) or create symbolic links from the old mountpoint to the new mountpoint.
Page 29
2.3. Microsoft Windows Operating Systems 2.3.1. Windows Operating SystemRecommended Versions
Microsoft Windows NT 4.0, Windows 2000 Server and Advanced Server, Windows 2003 Server
Page 30
The following table explains some of these items and offers tuning suggestions: Item MaxMpxCt Description The maximum number of outstanding requests a Windows client can have against a NetApp system. This must match cifs.max_mpx. Look at the performance monitor redirector/current item. If it is constantly running at the current value of MaxMpxCt, then increase this value. The maximum transfer size for data across the network. This value should be set to 64,240 (0xFAF0).
TcpWindow
Page 31
NetApp has tested the Intel PRO/1000 F Server Adapter. The following settings can be tuned on this adapter. Each setting should be tested and optimized as necessary to achieve optimal performance. Item Coalesce buffers = 32 Flow control = receive pause frame Description The number of buffers available for transmit acceleration. The flow control method used. This should match the setting for the Gigabit Ethernet adapter on the NetApp system. This would allow larger Ethernet packets to be transmitted. NetApp filers will support this in Data ONTAP 6.1 and later releases. The number of receive buffers and descriptors that the driver allocates for receiving packets. The number of transmit buffers and descriptors that the driver allocates for sending packets.
Page 32
hardware). NetApp currently supports Microsoft initiator 1.02 and 1.03, available from www.microsoft.com.
3.1. DISK_ASYNCH_IO
Enables or disables Oracle asynchronous I/O. Asynchronous I/O allows processes to proceed with the next operation without having to wait for an issued write operation to complete, therefore improving system performance by minimizing idle time. This setting may improve performance depending on the database environment. If the DISK_ASYNCH_IO parameter is set to TRUE, then DB_WRITER_PROCESSES and DB_BLOCK_LRU_LATCHES (Oracle versions prior to 9i) or DBWR_IO_SLAVES must also be used, as described below. The calculation looks like this: DB_WRITER_PROCESSES = 2 * number of CPUs Recent performance findings on Solaris 8 patched to 108813-11 or later and Solaris 9 have shown that setting: DISK_ASYNCH_IO = TRUE DB_WRITER_PROCESSES = 1 can result in better performance as compared to when DISK_ASYNCH_IO was set to FALSE. NetApp recommends ASYNC_IO for Solaris 2.8 and above.
Page 33
3.2. DB_FILE_MULTIBLOCK_READ_COUNT
Determines the maximum number of database blocks read in one I/O operation during a full table scan. The number of database bytes read is calculated by multiplying DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT. The setting of this parameter can reduce the number of I/O calls required for a full table scan, thus improving performance. Increasing this value may improve performance for databases that perform many full table scans but degrade performance for OLTP databases where full table scans are seldom (if ever) performed. Setting this number to a multiple of the NFS READ/WRITE size specified in the mount will limit the amount of fragmentation that occurs in the I/O subsystem. Be aware that this parameter is specified in DB Blocks, and the NFS setting is in bytes, so adjust as required. As an example, specifying a DB_FILE_MULTIBLOCK_READ_COUNT of 4 multiplied by a DB_BLOCK_SIZE of 8kB will result in a read buffer size of 32kB. NetApp recommends that DB_FILE_MULTIBLOCK_READ_COUNT should be set from 1 to 4 for an OLTP database and from 16 to 32 for DSS.
3.3. DB_BLOCK_SIZE
For best database performance, DB_BLOCK_SIZE should be a multiple of the OS block size. For example, if the Solaris page size is 4096: DB_BLOCK_SIZE = 4096 * n The NFS rsize and wsize options specified when the file system is mounted should also be a multiple of this value. Under no circumstances should it be smaller. For example, if the Oracle DB_BLOCK_SIZE is set to 16kB, the NFS read and write size parameters (rsize and wsize) should be set to either 16kB or 32kB, never to 8kB or 4kB.
Page 34
The first rule of thumb is to always enable DISK_ASYNCH_IO if it is supported on that OS platform. Next, check to see if it is supported for NFS or only for block access (FC/iSCSI). If supported for NFS, then consider enabling async I/O at the Oracle level and at the OS level and measure the performance gain. If performance is acceptable, then use async I/O for NFS. If async I/O is not supported for NFS or if the performance is not acceptable, then consider enabling multiple DBWRs and DBWR IO slaves as described next. Multiple DBWRs and DBWR IO slaves cannot coexist. It is recommended that one or the other be used to compensate for the performance loss resulting from disabling DISK_ASYNCH_IO. Metalink note 97291.1 provides guidelines on usage. The recommendation is that DBWR_IO_SLAVES be used for single CPU systems and that DB_WRITER_PROCESSES be used with systems having multiple CPUs. NetApp recommends that DBWR_IO_SLAVES be used for single-CPU systems and that DB_WRITER_PROCESSES be used with systems having multiple CPUs.
3.5. DB_BLOCK_LRU_LATCHES
The number of DBWRs cannot exceed the value of the DB_BLOCK_LRU_LATCHES parameter: DB_BLOCK_LRU_LATCHES = DB_WRITER_PROCESSES Starting with Oracle9i, DB_BLOCK_LRU_LATCHES is obsolete and need not be set.
Page 35
copies of the file system. They must coordinate with the state of the Oracle Database to ensure database consistency. With Fibre Channel or iSCSI protocols, Snapshot copies and SnapMirror commands must always be coordinated with the server. The file system on the server must be blocked and all data flushed to the filer before invoking the Snapshot command. Data can be backed up within the same NetApp filer, to another NetApp filer, to a NearStore system, or to a tape storage device. Tape storage devices can be directly attached to an appliance, or they can be attached to an Ethernet or Fibre Channel network, and the appliance can be backed up over the network to the tape device. Possible methods for backing up data on NetApp systems include: Use automated Snapshot copies to create online backups Use scripts on the server that rsh to the NetApp system to invoke Snapshot copies to create online backups Use SnapMirror to mirror data to another filer or NearStore system Use SnapVault to vault data to another NetApp filer or NearStore system Use server operating systemlevel commands to copy data to create backups Use NDMP commands to back up data to a NetApp filer or NearStore system Use NDMP commands to back up data to a tape storage device Use third-party backup tools to back up the filer or NearStore system to tape or other storage devices
tablespaces into hot backup mode prior to creating a Snapshot copy. NetApp has several technical reports that contain details on backing up an Oracle Database. For additional information on determining data protection requirements, see [13]. NetApp recommends using Snapshot copies for performing cold or hot backup of Oracle Databases. No performance penalty is incurred for creating a Snapshot copy. It is recommended to turn off the automatic Snapshot scheduler and coordinate the Snapshot copies with the state of the Oracle Database. For more information on integrating Snapshot technology with Oracle Database backup, refer to [5] and [9].
Page 37
minutes, and this reduces downtime while performing Oracle Database recovery. If using SnapRestore on a volume level, it is recommended to store the Oracle log files, archive log files, and copies of control files on a separate volume from the main data file volume and use SnapRestore only on the volume containing the Oracle data files. For more information on using SnapRestore for Oracle Database restores, refer to [5] and [9].
backups online for faster restoration. SnapVault also gives users the power to choose which data gets backed up, the frequency of backup, and how long the backup copies are retained. SnapVault software builds on the asynchronous, block-level incremental transfer technology of SnapMirror with the addition of archival technology. This allows data to be backed up via Snapshot copies on a filer and transferred on a scheduled basis to a destination filer or NearStore appliance. These Snapshot copies can be retained on the destination system for many weeks or even months, allowing recovery operations to the original filer to occur nearly instantaneously. For additional references on data protection strategies using SnapVault, refer to [10], [11], and [13].
Furthermore, a separate host bus adapter must be used in the filer for tape backup. This adapter must be attached to a separate Fibre Channel switch that contains only filers, NearStore appliances, and certified tape libraries and tape drives. The backup server must either communicate with the tape library via NDMP or have library robotic control attached directly to the backup server.
nearline disk-based storage for backups of the active data set improves performance and lowers the cost of operations. Periodically moving data from primary to nearline storage increases free space and improves performance, while generating considerable cost savings. Note: If NetApp NearStore nearline storage is not part of your backup strategy, then refer to [5] for information on Oracle backup and recovery on filers based on Snapshot technology. The remainder of this section assumes both filers and NearStore systems are in use.
The example in this subsection assumes the primary filer for database storage is named descent and the NearStore appliance for database archival is named rook. The following steps occur on the primary filer, descent: 1. License SnapVault and enable it on the filer, descent: descent> license add ABCDEFG descent> options snapvault.enable on descent> options snapvault.access host=rook 2. License SnapVault and enable it on the NearStore appliance, rook: rook> license add ABCDEFG rook> options snapvault.enable on rook> options snapvault.access host=descent
3. Create a volume for use as a SnapVault destination on the NearStore
appliance, rook: rook> vol create vault r 10 10 rook> snap reserve vault 0
Step 2: Set up schedules (disable automatic Snapshot copies) on filer and NearStore system.
1. Disable the normal Snapshot schedule on the filer and the NearStore system,
which will be replaced by SnapVault Snapshot schedules: descent> snap sched oracle 0 0 0 rook> snap sched vault 0 0 0
2. Set up a SnapVault Snapshot schedule to be script driven on the filer,
descent, for the oracle volume. This command disables the schedule and also specifies how many of the named Snapshot copies to retain. descent> snapvault snap sched oracle sv_hourly 5@This schedule creates a Snapshot copy called sv_hourly and retains the most recent five copies, but does not specify when to create the Snapshot copies specified by a cron script, described later in this procedure. descent> snapvault snap sched oracle sv_daily 1@Page 42
Similarly, this schedule creates a Snapshot copy called sv_daily and retains only the most recent copy. It does not specify when to create the Snapshot copy. descent> snapvault snap sched oracle sv_weekly 1@This schedule creates a Snapshot copy called sv_weekly and retains only the most recent copy. It does not specify when to create the Snapshot copy.
3. Set up the SnapVault Snapshot schedule to be script driven on the NearStore
appliance, rook, for the SnapVault destination volume, vault. This schedule also specifies how many of the named Snapshot copies to retain. rook> snapvault snap sched vault sv_hourly 5@This schedule creates a Snapshot copy called sv_hourly and retains the most recent five copies, but does not specify when to create the Snapshot copies. That is done by a cron script, described later in this procedure. rook> snapvault snap sched vault sv_daily 1@Similarly, this schedule creates a Snapshot copy called sv_daily and retains only the most recent copy. It does not specify when to create the Snapshot copy. rook> snapvault snap sched vault sv_weekly 1@This schedule creates a Snapshot copy called sv_weekly and retains only the most recent copy. It does not specify when to create the Snapshot copy.
Step 3: Start the SnapVault process between filer and NearStore appliance.
At this point, the schedules have been configured on both the primary and secondary systems, and SnapVault is enabled and running. However, SnapVault does not know which volumes or qtrees to back up or where to store them on the secondary. Snapshot copies will be created on the primary, but no data will be transferred to the secondary. To provide SnapVault with this information, use the SnapVault start command on the secondary: Page 43
Step 5: Use cron script to drive Oracle hot backup script enabled by SnapVault from step 4.
A scheduling application such as cron on UNIX systems or the Windows task scheduler program on Windows systems is used to create an sv_hourly Snapshot copy each day at every hour except 11:00 p.m. and a single sv_daily Snapshot copy each day at 11:00 p.m. except on Saturday evenings, when an sv_weekly Snapshot copy is created instead. Sample cron script: # sample cron script with multiple entries for Oracle hot backup # using SnapVault, NetApp filer (descent), and NetApp NearStore (rook) # Hourly Snapshot copy/SnapVault at the top of each hour 0 * * * *: /home/oracle/snapvault/sv-dohot-hourly.sh # Daily Snapshot copy/SnapVault at 2:00 a.m. every day except on Saturdays Page 44
0 2 * * 0-5: /home/oracle/snapvault/sv-dohot-daily.sh # Weekly Snapshot copy/SnapVault at 2:00 a.m. every Saturday 0 2 * * 6: /home/oracle/snapvault/sv-dohot-weekly.sh; In step 4 above, there is a sample script for daily backups, sv-dohot-daily.sh. The hourly and weekly scripts are identical to the script used for daily backups, except the Snapshot copy name is different (sv_hourly and sv_weekly, respectively).
References
1. Power and System Requirements for Network Appliance Filers: http://now.netapp.com/NOW/knowledge/docs/hardware/hardware_index.shtml 2. DS14 Disk Shelf Installation Guide, page 37: http://now.netapp.com/NOW/knowledge/docs/hardware/filer/ds14hwg.pdf 3. Installation Tips Regarding Power Supplies and System Weight: http://now.netapp.com/NOW/knowledge/docs/hardware/filer/warn_fly.pdf 4. Definition of FCS and GA Terms: http://now.netapp.com/NOW/download/defs/ontap.shtml 5. Oracle9i for UNIX: Backup and Recovery Using a NetApp Filer: www.netapp.com/tech_library/3130.html 6. Using the Linux NFS Client with Network Appliance Filers: Getting the Best from Linux and Network Appliance Technologies: www.netapp.com/tech_library/3183.html 7. Installation and Setup Guide 1.0 for Fibre Channel Protocol on Linux: http://now.netapp.com/NOW/knowledge/docs/hba/fcp_linux/fcp_linux10/pdfs/inst all.pdf 8. Oracle9i for UNIX: Integrating with a NetApp Filer in a SAN Environment: www.netapp.com/tech_library/3207.html 9. Oracle9i for UNIX: Backup and Recovery Using a NetApp Filer in a SAN Environment: www.netapp.com/tech_library/3210.html
Page 45
10. Data Protection Strategies for Network Appliance Filers: www.netapp.com/tech_library/3066.html 11. Data Protection Solutions Overview: www.netapp.com/tech_library/3131.html 12. Simplify Application Availability and Disaster Recovery: www.netapp.com/partners/docs/oracleworld.pdf 13. SnapVault Deployment and Configuration: www.netapp.com/tech_library/3240.html 14. Oracle8i for UNIX: Providing Disaster Recovery with NetApp SnapMirror Technology: www.netapp.com/tech_library/3057.html
Revision History
Version 1.0 Date October 30, 2004 Comments Creation date
2005 Network Appliance, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the Network Appliance logo, DataFabric, NearStore, SnapManager, SnapMirror, SnapRestore, and SnapVault are registered trademarks and Network Appliance, Data ONTAP, RAID-DP, SnapDrive, and Snapshot are trademarks of Network Appliance, Inc. in the U.S. and other countries. Intel is a registered trademark of Intel Corporation. Solaris and Sun are trademarks of Sun Microsystems, Inc. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation. Oracle is a registered trademark and Oracle8i, Oracle9i, and Oracle10g are trademarks of Oracle Corporation. UNIX is a registered trademark of The Open Group. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.
Page 46
Page 47