You are on page 1of 34

SQL Server

SQL Server

Planning for and choosing storage wisely


David Chernicoff Greg A. Larsen Brian Moran Lavon Peters Mel Shum Alan Sugano

sponsored by

Contents
SQL Server Storage Options
Chapter 1: SQL Server on a SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SAN Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAN Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAN Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . When Using DAS Makes Sense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Im Ready for a SAN . Now What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step Up to a SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting a Storage Array for a SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SANs: Always Better than DAS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
1 2 2 4 4 5 5 7

Chapter 2: Solid State Storage for SQL Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . When To Use SSDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An Emerging Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other SQL Server Storage Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iSCSI SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
9 9 9 10 11 11 11 11 11 11

Chapter 3: SQL Server Storage Options: Sort through acronyms SATA, iSCSI, NAS, SANto choose the right storage for your applications . .
Hard Disk Drive Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Server Storage Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . It's Your Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13
13 14 17

Chapter 4: Avoiding the Red Zone: A Two-Step Process for Tracking Disk Usage . . . . . . . . . . . . . . . . . . . .
The Stored Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Permanent Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SQL Server Agent Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Growth-Rate Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19
20 29 30 31

Chapter 1:

SQL Server on a SAN


By Mel Shum
As a DBA, one of your many tasks is to manage your SQL Server databases ever-expanding storage requirements . How often do you find yourself adding more disk, trying to accurately size a database, or wishing you could more efficiently use your existing disk capacity? Storing database data on a SAN can make such tasks much easier and can also improve disk performance and availability and shorten backup and restore times . Start your search for a SAN here, as you learn the basics of SAN technology and the benefits of using a SAN to store SQL Server databases . And the sidebar Selecting a Storage Array for a SAN covers several features youll want to consider when selecting a storage array for your SAN .

SAN Fundamentals
A SAN is basically a network of switches that connect servers with storage arrays . SAN topology is similar to how Ethernet switches are interconnected, as Figure 1 shows . A SANs physical layer comprises a network of either Fibre Channel or Ethernet switches . Fibre Channel switches connect to host bus adapter (HBA) cards in the server and storage array . Ethernet switches connect to Ethernet NICs in the servers and storage array .

Figure 1: SAN topology

Brought to you by Dell and SQL Server Magazine

SQL Server Storage Options

A storage array is an external disk subsystem that provides external storage for one or more servers . Storage arrays are available in a range of prices and capabilities . On the low end, an array consists simply of a group of disks in an enclosure connected by either a physical SCSI cable or Fibre Channel Arbitrated Loop (FC-AL) . This type of plain-vanilla array is also commonly called Just a Bunch of Disks (JBOD) . In high-end arrays, storage vendors provide features such as improved availability and performance, data snapshots, data mirroring within the storage array and across storage arrays, and the ability to allocate storage to a server outside the physical disk boundaries that support the storage . Two types of SANs exist: Fibre Channel and iSCSI . Fibre Channel SANs require an HBA in the server to connect it to the Fibre Channel switch . The HBA is analogous to a SCSI adapter, which lets the server connect to a chain of disks externally and lets the server access those disks via the SCSI protocol . The HBA lets a server access a single SCSI chain of disks as well as any disk on any storage array connected to the SAN via SCSI . iSCSI SANs use Ethernet switches and adapters to communicate between servers and storage arrays via the iSCSI protocol on a TCP/IP network . Typically, youd use a Gigabit Ethernet switch and adapter, although 10Gb Ethernet switches and adapters are becoming more popular in Windows server environments . On a SAN, a server is a storage client to a storage array, aka the storage server . The server that acts as the primary consumer of disk space is called the initiator, and the storage server, which provides the disk space, is called the target . The disks that the storage arrays provide on the SAN are called LUNs and appear to a Windows server on the network as local hard drives . Storage-array vendors use a variety of methods to make multiple hard drives appear local to the storage array and to represent a LUN to a Windows server by using parts of multiple hard drives . Vendors also use different RAID schemes to improve performance and availability for data on the LUN . Whether the SAN uses Fibre Channel or Ethernet switches, ultimately what appears from the Windows server through the Microsoft Management Console (MMC) Disk Management snap-in are direct-attached disks, no different from those physically located within the server itself . In addition, most arrays have some type of RAID protection, so that the storage that represents a given LUN is distributed across multiple hard drives that are internal to the storage array .

SAN Security
SAN architecture provides two measures for securing access to LUNs on a SAN . The first is a switch-based security measure, called a zone . A zone, which is analogous to a Virtual LAN (VLAN), restricts access by granting only a limited number of ports on several hosts an access path to several, but not all, storage arrays on the SAN . The second security measure is storage-array-based; a storage array can use LUN masking to restrict access . Depending on the vendor, this security feature comes free of charge with the storage array or is priced separately as a licensed product . LUN masking can be configured either by the administrator or by the storage-array vendor for a fee . When masking is configured, the array grants only explicitly named ports of named hosts an access path to the specified LUNs . LUN masking functions similarly to ACLs on Common Internet File System (CIFS) shares in a Windows environment .

SAN Benefits
Now that you have a grasp of what a SAN is, youre probably wondering how a SAN could benefit your SQL Server environment . To address this question, well first examine problems inherent in local DAS, then explore how using a SAN avoids these problems . Performance and availability . As part of the typical process of designing a database that will reside on a local disk, or DAS, youd determine how the disks on which the database will be stored are attached (i .e ., which disks
Brought to you by Dell and SQL Server Magazine

Chapter 1 SQL Server on a SAN 3

are attached to which SCSI adapter) .You want to carefully organize the database files to minimize contention for disk accessfor example, between a table and indexes on the table, two tables that are frequently joined together, or data and log files . To minimize contention (i .e ., disk I/O operations), youd try to ensure that the two contending objects are separated not only on different disks but also across SCSI adapters . Another disk-related issue that you must consider in designing a database is availability . You need to use some type of disk redundancy to guard against disk failures . Typically, youd use either RAID 1 (mirroring) or RAID 5 to provide redundancy and thus, improved availability . After you create the RAID devices by using Windows Disk Management, you might lay out the database across these multiple RAID storage structures . When allocating such structures, you have to decide how to size them . Determining the amount of storage each server needs is like estimating your taxes: If you overestimate or underestimate taxes or storage needs, youll be penalized either way . If you overestimate your storage and buy too much, youll have overspent on storage . If you underestimate your storage needs, youll soon be scrambling to find ways to alleviate your shortages . A SAN addresses the issues of contention, availability, and capacity . On a SAN, the storage array typically pools together multiple disks and creates LUNs that reside across all disks in the pool . Different disks in the pool can come from different adapters on the storage array, so that traffic to and from the pool is automatically distributed . Because the storage array spreads the LUNs across multiple disks and adapters, the Windows server thats attached to the SAN sees only a single disk in Disk Management . You can use just that one disk and not have to worry about performance and availability related to the disk, assuming that your storage or network administrator has properly configured the SAN . How complex or simple a storage array is to configure depends on the vendors implementation . I recommend that you meet with the IT person responsible for configuring your storage and ask him or her to explain your storage arrays structure . Also, determine your storage requirements ahead of time and give them to this person . In addition to storage size, note your requirements for performance (e .g ., peak throughput40Mbps); availability (e .g ., 99 .999 percent availability); backup and recovery (e .g ., hourly snapshot backups take 1 minute; restores take 10 minutes); and disaster recovery, based on metrics for recovery time objective (RTO)the time it takes to restore your database to an operational state after a disaster has occurredand recovery point objective (RPO)how recent the data is thats used for a restore . Using these metrics to define your requirements will help your storage administrator better understand your database-storage needs . Some vendors storage arrays let you dynamically expand a LUN that you created within the disk pool without incurring any downtime to the SQL Server database whose files reside on that LUN . This feature lets DBAs estimate their disk-space requirements more conservatively and add storage capacity without downtime . Backup control . As a database grows, so does the amount of time needed to perform database backups . In turn, a longer backup requires a longer backup window . Partial backupssuch as database-log-backups take less time but require more time to restore . Increasingly, upper management is mandating smaller backup windows and shorter restore times for essential applications, many of which access SQL Server databases . SANs can help decrease backup windows and restore times . Some storage arrays can continuously capture database snapshots (i .e ., point-in-time copies of data), which are faster to back up and restore than traditional database-backup methods . The snapshot doesnt contain any actual copied data; instead, it contains duplicate pointers to the original data as it existed at the moment the snapshot was created . To back up SQL Server database data by using snapshots, youd typically want to put your database in a ready state, more commonly called a hot-backup state, for a few moments to perform the snapshot . If you didnt put your database in a hot-backup state, the snapshot could take a point-in-time copy of your database before SQL Server has finished making a consistent database write . Storage-array vendors often use Microsofts SQL Server
Brought to you by Dell and SQL Server Magazine

SQL Server Storage Options

Virtual Backup Device Interface (VDI) API to enable their software to put the database in a hot-backup state . This lets the system copy the point-in-time snapshot image to separate backup media without causing a database outage . Snapshots are minimally intrusive, so you can use them frequently without affecting database performance . Restoring data from a snapshot takes only a few seconds . By using a SAN-connected storage array along with a snapshot capability, DBAs can minimize backup windows and restore times, in part because snapshot images are maintained on distributed disks in the array, instead of on one local disk . Reduced risks for database updates . Changes to a database, such as SQL Server or application upgrades or patches, can be risky, especially if the changes might cause database outages or worse, database corruption . To test changes without putting the production database at risk, youd need to set aside an amount of storage equivalent to the size of the production database . On this free storage, youd restore the last recent backup of that database (typically 1 week old) .Youd spend a few hours (maybe even days) restoring the database from tape to disk, applying the changes, then testing to see whether the changes were successfully applied and whether they adversely affected the database . After you verified that the changes were successfully implemented, youd apply them to the production database . Some vendors SAN storage arrays let you quickly clone your database data for testing purposes . Cloning the data takes only a few seconds versus hours to restore it from tape . The added benefit of cloning is reduced disk utilization . Some cloning technology lets you take a read-only database snapshot and turn it into a writeable clone . For testing purposes, the clone consumes far less disk storage than a full backup of a database because only modified blocks of data are copied to the clone database .

When Using DAS Makes Sense


Storing database data in a SAN gives you features not available with DAS, such as local and remote mirroring, data cloning, the ability to share data across multiple hosts, and the ability to capture data snapshots . However, if you dont need these features, storing your SQL Server databases on DAS might make more sense . A SAN environment consists of multiple SAN clients with multiple HBAs on SAN switches connected to storage arrays . If the SAN wasnt properly designed and configured (i .e ., to provide redundancy), the storage array or a component on the SAN could fail, so that servers on the SAN couldnt access data on the storage array . To enable you to troubleshoot storage problems, youll need to make sure that SQL Server binaries and message-log files stay on the local disk . Storing the message log and binaries on a disk other than the local disk puts the database in a Catch-22 situation, in which a database-access failure caused by a storage-connection failure cant be logged because logging occurs only for the device on which the logs and binaries are stored .

Im Ready for a SAN. Now What?


If your organization doesnt already have a Fibre Channel SAN switching network in place, iSCSI will most likely give you a greater ROI and minimize your equipment investment . For a Fibre Channel SAN, you need to buy a storage array, Fibre Channel SAN switches, and HBAs . For an iSCSI SAN, you need to buy a storage array, but you can use your existing Ethernet switches and Gigabit Ethernet adapters . To include your Windows servers in the iSCSI SAN, you need only download and install an iSCSI driver for your particular OS . (You can download the latest version of Microsofts iSCSI driver, Microsoft iSCSI Software Initiator, at http://www .microsoft .com/downloads/details .aspx?familyid=12cb3c1a-15d6-4585-b385-befd1319f825&displaylang=en .) Carving up the storage array and presenting it to your Windows server could get complicated, depending on the storage vendor . As I mentioned earlier, you should discuss your storage requirements with your storage administrator . Most modern storage arrays let you access LUNs on the same storage array via either Fibre Channel or iSCSI .
Brought to you by Dell and SQL Server Magazine

Chapter 1 SQL Server on a SAN 5

Ive found that many IT environments dont take full advantage of their SANs features . If your organization already uses a Fibre Channel SAN switching network, you can try out some storage-array features such as cloning and snapshots in a development or test environment . If your organization doesnt have a SAN yet, you can still try some of these features relatively inexpensively by setting up an iSCSI SAN .

Step Up to a SAN
As you can see, housing databases on a SAN can benefit DBAs in various ways . SANs can reduce the pain of sizing storage requirements for databases, enhance overall storage throughput, simplify storage performance tuning, and improve availability . Using a SAN can also decrease backup and restore windows and enables quicker and easier testing cycles and reduced overhead in test storage . The availability of iSCSI removes the cost barriers that have until now inhibited some users from investigating SANs . Nows the time to check out SAN technology and see whether it can improve your database-storage environment .

Selecting a Storage Array for a SAN


Storage arrays are available in a wide spectrum of capacities and capabilities, and sorting through the options can be confusing . These guidelines can help you narrow down the type of storage array you need to house your SQL Server databases . Snapshot methodologies . Snapshots work about the same in all storage arrays . The idea is to freeze all the blocks of data in a database and the structure of the data being captured at a point in time . Vendors use one of two basic methodologies for handling snapshots after data has been modified . The first methodology, which Figure A shows, is to leave the snapshot block alone and use a free block to write the modified block information . Of the two approaches, this is the more efficient because it requires only one block I/O operation to write the new block and one update to a pointer .

Figure A: First snapshot-updating methodology The second methodology is to copy the snapshot block to a free block, then overwrite the block that was just copied . This approach, which Figure B shows, is often called copy-on-write . Copy-onwrite requires more data movement and overhead on the storage arrays part than the first approach . In Figure B, block D is moved from the current block to a new block so that the new contents of D can be
continued on next page

Brought to you by Dell and SQL Server Magazine

SQL Server Storage Options

continued from previous page

written to Ds old location . Doing so requires three block I/Os and an update to a link, whereas the first approach requires only one block I/O . This difference becomes significant for disk performance as large numbers of blocks are updated .

Figure B: Second snapshot-updating methodology Support for Fibre Channel and iSCSI on the same array . Consider buying a storage array that supports both Fibre Channel and iSCSI, so that you have the flexibility to switch from one to the other or implement both . (For example, you might want to use an iSCSI SAN for testing and development and use a Fibre Channel SAN for production .) Ability to create, grow, and delete LUNs dynamically . Being able to create, grow, and delete LUNs without bringing a database down is a major benefit of putting the database on a SAN . If you need this capability, consider storage arrays that provide it . Integration of snapshot backups with SQL Server . The process of taking a snapshot copy of your SQL Server database needs to be coordinated with your database and NTFS . Storage-array vendors can use Microsofts SQL Server Virtual Backup Device Interface (VDI) API to accomplish this coordination . If the snapshot process isnt synchronized with NTFS and the database, the created snapshot might not be in a consistent state because either NTFS or the database might not have completely flushed pending writes from memory to the LUN . A uniform storage OS as you scale up . Youd most likely want to start with a small storage array to test and validate the SANs benefits before deploying it enterprise-wide . Look for a storage array that lets you grow without having to do a forklift upgrade or having to learn a new storage OS . Maintaining a consistent OS lets you upgrade your storage array as your needs grow, with a minimum of database downtime . A transport mechanism to mirror data over the WAN to a recovery site . The storage array should provide a uniform transport method for sending mirrored data across the WAN to another storage array for disaster recovery purposes . Ability to instantaneously create a writeable copy of your database . Look for storage arrays that let you instantaneously-create a writeable copy (i .e ., clone) of your database for testing upgrades and large data loads without affecting the production-database . This feature could reduce outages and corruption of the production database, giving DBAs a tool to test major changes without endangering data .

Brought to you by Dell and SQL Server Magazine

Chapter 1 SQL Server on a SAN 7

SANs: Always Better than DAS? By Brian Moran


Hardware-level I/O isnt one of my areas of expertise, so Ive shied away from discussing certain SANspecific topics in the past . But recent experiences with SQL Server customers have led me to make the following observation: Contrary to popular belief, SANs arent necessarily the best solution for a SQL Server system that must support high-end I/O on a budget . Its true that SANs offer advanced features for I/O and storage management, and they can be configured to offer extremely high levels of availability and performance . But heres a dirty little secret: Its generally harder to configure a SAN correctly than it is to use DAS . And usually, the cost of a directattached I/O solution is much less than the cost of buying a SAN that has the same number of disks that offer comparable cache and performance levels . This means that you can often end up with more spindles for the same price by using DAS rather than a SAN . Customers often assume that buying a SAN is a best practice . But many people dont need SANs high-end data management and virtualization features . In most cases, you can get better performance by deploying more spindles through DAS . Dont get me wrongIm not saying that you shouldnt use a SAN . Im simply pointing out that you dont have to use a SAN to build solutions that support very large I/O environments for online transaction processing (OLTP) and data warehousing .

Brought to you by Dell and SQL Server Magazine

SQL Server Storage Options

Brought to you by Dell and SQL Server Magazine

Chapter 2:

Solid State Storage for SQL Server


By Lavon Peters
Historically, the most widely used types of storage have been DAS, NAS, SANs, andmore recentlyiSCSI SANs . Each type has its niche, including associated advantages and disadvantages . But a new trend in the storage market threatens to blow them all out of the water: solid state disk (SSD) . For a summary of other SQL Server storage options, see the sidebar Other SQL Server Storage Options . SSDs are the latest advancement in storage technology . The aerospace and military industries have used SSD technology since the mid-1990s, when M-Systems introduced flash-based SSDs . But recent price drops have brought SSDs into the enterprise IT market . SSDs are also gaining popularity for use in ultraportable notebook computers . SSDs have standard Serial ATA (SATA) connections and use solid-state memory to store data . In addition to flash memory, SSDs can use static RAM (SRAM) or DRAM . Because SSDs mimic hard drives, they can easily replace them .

Advantages
SSDs provide numerous advantages over previous storage options . Because no rotation is involved (i .e ., no moving parts), data access is faster than with DAS, NAS, or SANsSSD access times range from 10 to 15 microseconds, which is 250 times faster than hard disk drives . A lack of moving parts also means increased reliability . In addition, SSDs use less power than other storage options . BiTMICRO Networks Product Officer Ces Martorillas notes that from a cost-benefit standpoint, SSDs can replace approximately 200 HDDs . . . not to mention the savings on power and cooling energy required per device . SanDisk, a leading SSD storage vendor, asserts that SSD is rugged, fast, and power efficient . Its just what you need . . . to drive your business more successfully . One of the most crucial SQL Server database performance factors is I/O; SSDs faster data access times give them an important advantage over other storage mechanisms .

Disadvantages
Traditionally, SSDs have had a fairly high cost per I/O operationalthough recent advances in flash technology are lowering those costs . EMC spokesperson Colin Boroski notes that over the next few years we expect flash drive prices will decline at a faster rate than traditional Fibre Channel drives due to the rapid advances being made in semiconductor manufacturing technologies and the natural effects of increased volumes in the market . In the past, SSD storage capacity was limited . One workaround for this problem is to store only your most frequently accessed data (e .g ., tables, database components) on SSD . However, BiTMICRO Networks, who already offers 256GB SSDs, has announced plans to release SSDs with up to 832GB capacity in third quarter 2008 . And the company claims to have packed a whopping 1 .6TB into their E-Disk Altima E3S320 SSD .

When To Use SSDs


According to Texas Memory Systems, the SQL Server environments that can benefit the most from SSDs are those in which the servers have long I/O wait times . In the white paper Faster SQL Server Access with Solid State
Brought to you by Dell and SQL Server Magazine

10

SQL Server Storage Options

Disks (www .texmemsys .com/files/f000174 .pdf), the company recommends that you investigate the following SQL Server database components to determine the cause of increased I/O: The entire databaseLook for excessively large databases, concurrent access by many users, or users frequently accessing all the tables in the database . Transaction logsLook for a high number of entries, which occur during write transactions and increase a databases I/O time . The temporary databaseThis database stores temporary data during many types of operations; complex operations can complete more quickly if the temporary database is on SSD . IndexesSQL Server updates table indexes every time it adds or modifies a record, and it accesses these indexes during each read transaction, which results in frequent, small, and random transactionsand thus increased I/O time . Frequently accessed tablesAs with indexes, when users access tables frequently, random data requests occur . Random requests translate directly into higher I/O wait times . Moving to SSD can alleviate all these causes of I/O delay and improve SQL Server performance .

Other Considerations
Several additional factors can come into play in the decision to move from traditional storage to SSD in a SQL Server environment . The size of an organization and the extent of its SQL Server use are important, if only indirectly . For example, small companies might not have the budget to implement SSDstraditionally, SSD prices have been five times that of standard SATA drives . In addition, Mark Hayashida, CTO of Solid Data Systems, notes that typically, larger companies tend to utilize a greater number of enterprise-class applications that benefit greatly by deploying SSDs within their storage pool . The most important thing to consider is the particular SQL Server applications that your organization employs . Eric Schott, the senior director of product management for Dell EqualLogic Storage, says the decision to switch to SSD is a function of the performance requirements of the SQL application (database), and the importance of the application to the business . The type of database (i .e ., OLAP or OLTP) isnt a factorwhether SSD will benefit a SQL Server organization depends on the types of transactions within that database . Virtualization, as you might expect, is a factor youll want to consider before implementing SSDs in your SQL Server environment . The performance gains achieved by implementing SSDs are magnified in a virtualized SQL Server environment . In The High Performance SAN Alliance: SAN, SSD, and Virtualization, Texas Memory Systems notes that a superior storage virtualization solution simplifies storage provisioning and reduces administrative overhead . It also enables and simplifies the targeted provisioning of resources, so that the fastest storage (e .g ., SSD) can be provisioned to those applications that need it, when they need it, for maximum performance . Eric Schott, of Dell, echoes, Storage virtualization helps make it easier for the administrator to place appropriate data on the faster storage devicesstorage virtualization lends to easier management . And as BiTMICRO Networks Martorillas says, A superior storage virtualization solution simplifies the complexity in handling and managing storage needs for every application . With virtualized storage, applications requiring high levels of performance can be provisioned with the fastest storage, like SSD, in their storage pool . Virtualized SSDs, like any other virtualized storage, can be allocated and de-allocated depending on customer needs . Applications may only require a high level of performance for a certain period of time, so SSDs allocated in these applications may be re-assigned to other applications when needed . Such flexibility is useful in a SQL Server environment in which applications periodically generate high volumes of transactions .

Brought to you by Dell and SQL Server Magazine

Chapter 2 Solid State Stprage for SQL Server 11

An Emerging Alternative
SSDs are no longer limited to government or niche markets . Theyre widely available from several vendors for use in enterprise applications . SQL Server environments can especially benefit from SSDs, because SQL Server is such an I/O-intensive application . As the price of these devices continues to drop, and their storage capacity continues to increase, they present an affordable, high-performance alternative to traditional storage options .

Other SQL Server Storage Options


DAS DAS is the traditional or legacy type of storage . This type of storage is non-networkedthat is, its connected directly to a server . The main advantages of DAS are that its easy to install and manage, and its inexpensive . However, because DAS isnt shared between computers, its use is limited to individual machines . Therefore, when you need to increase storage capacity, you often end up with multiple (albeit inexpensive) hard drives . With DAS, you also run the risk of running out of storage capacity . DAS is also difficult and time consuming to upgrade or expand . NAS and SANs address some of these limitations . NAS NAS is shared storage connected to the network . NAS appliances are simple to install . Initially, NAS is more expensive than DAS (but less expensive than a SAN) . However, centralized storage makes NAS cheaper and easier to administer overall than DAS . NAS provides high storage capacity, easy data sharing, resource consolidation, and quick file access for multiple clients . However, NAS is less efficient than a SAN for moving large chunks of data, and it doesnt provide as many configuration options . In addition, NAS isnt entirely suitable for SQL Server . Although you can use NAS for SQL Server database storage, few organizations do so because NAS often doesnt provide the same level of performance as DAS . NAS is a shared storage solution, which means that I/O is limited by the network connection . (For more information about NAS, see Chapter 4, Network Attached Storage . SAN A SAN is separate network specifically for attaching storage devices . SANs are very reliable, scalable, and fault tolerant . In addition, SANs provide better availability than NAS because all the storage devices are available on all the servers . End users benefit from the optimized network capacity and maximum utilization of server power . SANs are extremely efficient at moving large chunks of data . Theyre most applicable to large databases or to bandwidth-intensive or mission-critical applications . However, SANs are complex to manage and are very expensive . Early implementations of SANs required Fibre Channel connections, which added to the complexity and cost . iSCSI SANs With the emergence of iSCSI SAN technology, the cost of implementing a SAN greatly decreased . iSCSI SANs use a standard Ethernet infrastructure to transmit data between devices . Although iSCSI SANs offer greater flexibility in remote storage than traditional Fibre Channel SANs, theyre slower . In addition, iSCSI SANs arent any easier to manage than Fibre Channel SANs are .

Brought to you by Dell and SQL Server Magazine

12

SQL Server Storage Options

Brought to you by Dell and SQL Server Magazine

13

Chapter 3:

SQL Server Storage Options


Sort through acronymsSATA, iSCSI, NAS, SANto choose the right storage for your applications
By Alan Sugano
People often wonder about server storage options . What are the differences between different storage types? What solution is best for a particular situation? You have to sort through a lot of acronyms with server storage: U320 SCSI, SATA, SAS, NAS, iSCSI, SAN . . . the list goes on and on . In this article, Ill demystify server storage options and help you determine which solution is best for different situations .

Hard Disk DriveTechnology


There have been quite a few changes in hard disk drive technology in recent years, focused primarily on speeding up the slowest part of any server: drive access . Heres a summary of the different drive technologies and where they might be used . Parallel ATA drives . PATA drives, also known as ATA, ATAPI, or Integrated Drive Electronics (IDE), are the type of drives that were generally used in workstations until recently . PATA drives typically have a maximum transfer rate of 150MBps . Most workstations still include an IDE controller on the motherboard thats used to interface with PATA CD-ROM and DVD drives . You can typically have two devices per controller channel, although some IDE RAID controllers let you connect more drives to a controller channel . If your workstation is more than three years old, it probably contains a PATA drive . Serial ATA drives . SATA drives have become the new standard for workstations and low-end servers . The biggest improvement over PATA drives is the transfer rate . Older SATA drives have a maximum transfer rate of 1 .5GBps, but newer SATA drives have a maximum transfer rate of 3GBps . Instead of a shared bus, which PATA drives use, each SATA drive has a dedicated channel to the controller . This design improves performance because drives dont compete on the same communication channel . If you plan to use SATA drives in a server, you should verify the duty cycle that was used to calculate the mean time between failures (MTBF) . The MTBF for SATA drives might be rated at a duty cycle more appropriate to workstationsmaybe around 20 to 30 percentthan true server-class drives . MTBF for server drives are often calculated at a duty cycle around 80 to 90 percent . If you use a workstation-rated SATA drive in a server, youll probably experience a high degree of drive failure . However, SATA drive density is pretty good; 500GB drives are readily available . SATA drives are a good fit for nearline or archive applications where large amounts of data must be readily accessible, but highest performance isnt necessary . Ultra320 SCSI drives . Ultra320 SCSI drives (or, sometimes, U320 SCSI) were the standard for servers and other high-end storage until a few years ago . As the name implies, Ultra320 SCSI has a maximum transfer rate of 320MBps . You can typically connect 14 devices to each SCSI bus . Ultra320 SCSI uses a shared bus, so the chance

Brought to you by Dell and SQL Server Magazine

14

SQL Server Storage Options

of SCSI bus contention increases with each additional drive you add to the SCSI channel . The largest readily available Ultra320 SCSI drives are 300GB . Serial Attached SCSI . SAS drives are replacing Ultra320 SCSI in the server-class storage market . SAS drives have a transfer rate of 3GBps, although most drive manufacturers have plans to release 6GBps SAS drives in the future . SAS drives are designed to go into heavily used servers so their MTBF is calculated with a high duty rating . Just like SATA, theres a dedicated communication channel for each drive, eliminating any shared-bus contention . Although SAS drive performance is significantly better than Ultra320 SCSI, the drive density isnt very good . The largest drive you can get in the 2 .5 form factor is 146GB; the 3 .5 form factor can get you up to 300GB . This limitation usually isnt an issue, though, if you need to build a high-performance disk array because using more drives will improve the performance of the disk array . But if you have a lot of data to store, 146GB drives might not be adequate .

Server Storage Designs


After youve chosen the appropriate drive type, you still have to decide where to install them: locally attached storage, disk subsystem, Network Attached Storage (NAS), or Storage Area Network (SAN)? Your application requirements should help you determine the storage option thats the best fit for your company . Locally attached storage . Locally attached storage is installed directly in the server or is connected to an external storage device with a SCSI cable . A common configuration for a server is to use a RAID 1 array for the OS partition and a RAID 5 array for data . The best performing and most fault tolerant disk array is RAID 1+0 (or RAID 10), which combines data striping and mirroring . For the best performance, use a hardware RAID controller that has a hardware cache . The controller should have a battery backup to protect any data left in the hardware cache in the event of a server crash . Microsoft SQL Server log files perform best on a RAID 1 array, but the data portion of the database (or any randomly accessed file) will perform better on a RAID 5 or RAID 10 array . Locally attached storage is a good solution for servers that dont have high availability requirements . You can set up locally attached storage for as little as $1,000 . NAS . NAS devices are appliances that are capable of holding multiple hard disk drives (usually eight or more) . They have one or more built-in Ethernet network cards . NAS devices serve files but dont have any other server capabilities, such as email, database, DNS or DHCP . Although they can be placed on a dedicated network, NAS devices are usually placed on the public Ethernet network so workstations and servers can access the NAS device . A drawback of NAS devices is their tendency to become obsolete . For example, early NAS devices typically didnt support Active Directory (AD), and they didnt have an upgrade path . If you had a change in your environment that required your NAS device to support AD, you had to replace the entire unit with a new NAS device with AD support . NAS devices are a good fit for applications where the data must be online but isnt accessed frequently . For example, a NAS device filled with SATA drives might be an appropriate choice for an email archive . Prices for NAS solutions typically start at around $3,000 . Just a Bunch of Disks . JBOD, just like it sounds, is a disk subsystem that holds many hard disk drives . The drives are often configured in a RAID 0 array with multiple drives striped together to create one large logical disk partition . JBODs typically can have SCSI, SATA, SAS, or Fibre Channel interfaces . JBODs are commonly used to backup data stored on a SAN . Data is copied from the SAN to the JBOD, then the data is copied from the JBOD to an offline media such as tape . By copying the data to the JBOD, the backup is performed faster and you dont have to worry as much about data contention resulting from open files on the SAN; open files often cant be reliably backed up . JBODs are also used to consolidate data from multiple sources before its backed up to tape . In

Brought to you by Dell and SQL Server Magazine

Chapter 3 SQL Server Storage Options 15

enterprises with a lot of data to back upmore than a few terabytesand a small backup window, JBODs are usually part of the backup strategy . The cost of implementing a JBOD solutions starts at around $5,000 . SAN . SANs are at the high end of server storage options . They come in two types, iSCSI and Fibre Channel, which Ill discuss in more detail below . A SANs main advantage is shared storage: Unlike with locally attached storage, more than one server can access data on a SAN . On lower end SAN configurations, you have a single point of failure in the SAN chassis . You can configure the SAN to eliminate this single point of failure; however, the SAN price goes up significantly as a result . SANs are typically used for high-availability solutions, such as Microsoft Cluster Server or VMwares ESX Server with VMware High Availability . Because a SAN allows for shared storage between two or more server nodes, a passive node can take the place of an active node in the event of a hardware failure in the active node . You would typically start considering a SAN if you have more than 400 users; if the cost of downtime is extremely high, you should consider a SAN at lower user numbers . For example, I worked with an organization that estimated its downtime costs at $20,000 per minute; even though they had only 20 users in the office, they opted to use a SAN . The applications you run in conjunction with the SAN significantly impact how your LUNs should be created . LUNs are how each server node views the logical disk partitions . Each LUN is typically made up of a RAID array, commonly RAID 1, RAID 5, or RAID 10 . For optimum performance, sequentially written data such as log files should be placed on LUNs made up of RAID 1 arrays and randomly accessed data such as database files should be stored on LUNs made up of either RAID 5 or RAID 10 arrays . iSCSI SANs . iSCSI SAN is the less expensive SAN solution; they typically start at around $15,000 . iSCSI SANs use Gigabit Ethernet to transfer the data between the server nodes and the SAN, which means the server nodes dont have to be in the same physical location; iSCSI is therefore a little more flexible to set up than Fibre Channel SANs . If you choose this solution, I strongly suggest using a TCP Offload Engine (TOE) card to process the iSCSI requests because these requests can place a significant load on the servers processor . For the best performance, run the iSCSI SAN on a dedicated network thats separate from your LAN traffic . An iSCSI SAN is a good solution when you need high availability but dont have extremely high disk throughput requirements . The amount of money you save by using an iSCSI SAN can be used to purchase higher end server nodes, which might give you the best performance per dollar . An ideal application for an iSCSI SAN is a SQL Server database that has high-availability requirements but has relatively light database transactions, has to run a large number of stored procedures, and has powerful server nodes connected to the SAN . Because the database transaction load is light, you probably dont need a really fast disk subsystem, but the large number of stored procedures places a significant load on the processors . If you have enough memory installed on each server node, a lot of data can be cached to further reduce disk I/O, especially if your servers run on the x64 platform . Figure 1 shows a typical iSCSI SAN configuration . Fibre Channel SANs . Fibre Channel is the higher end SAN Solution . Typical solutions start at around $25,000 . Early Fibre Channel SANs ran at 2GBps, but the newer solutions run at 4GBps and 8GBps . A 4GBps Fibre Channel SAN gives you the best disk performance available today . Instead of using a Gigabit Ethernet switch like iSCSI SANs, they use a Fibre Channel switch to connect the nodes and the SAN . Some vendors charge for each connection on the Fibre Channel switch, so you might have to pay a connection fee to add additional nodes . In a typical configuration, each server node has redundant connections to the SAN . Figure 2 shows a typical Fibre Channel SAN configuration .

Brought to you by Dell and SQL Server Magazine

16

SQL Server Storage Options

Figure 1: A typical iSCSI SAN configuration

Figure 2: A typical Fibre Channel SAN configuration


Brought to you by Dell and SQL Server Magazine

Chapter 3 SQL Server Storage Options 17

When you purchase a Fibre Channel SAN, a dedicated engineer typically comes out to assist with the implementation . These specialist engineers verify that everything is properly installed and configured . If the onsite installation isnt included in the cost of the SAN, I suggest purchasing this service, especially if this is your first SAN installation . Fibre Channel SANs are good solutions for Microsoft Exchange Server 2003 installations with large databases (e .g ., more than 500GB) . All other things being equal, the speed on the disk subsystem on an Exchange 2003 server determines the ultimate performance of the mail system . Note that the disk I/O requirements are significantly less on Exchange 2007 compared to Exchange 2003 because Exchange 2007 takes advantage of 64-bit processors and can cache a significant amount of data in memory .

Its Your Choice


Your requirements for applications, disk performance, fault tolerance, and high availability should help you narrow down your storage choices very quickly . For instance, if you need high availability on your server, youll probably need to use a SAN . If you dont have strict high-availability requirements, you can probably get by with locally attached storage . You still have many different storage options to consider for your servers, but you should no longer be afraid of that morass of acronyms . Use the information presented here to match a solution with your needs .

Brought to you by Dell and SQL Server Magazine

18

SQL Server Storage Options

Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 19

Chapter 4:

Avoiding the Red Zone: A Two-Step Process for Tracking Disk Usage
By Greg A . Larsen
Have your customers or managers ever asked you how much their databases grew during the past year? Have you needed to plan how much disk capacity youll need for the next year based on your databases average growth rate during the past 12 months? How long will your existing unallocated disk space last based on your current growth rate? To answer these kinds of database-growth questions or similar disk-space questions, you need some historical space-usage information about your databases . Ive developed a process that you can use to automatically collect space-usage statistics for each of your databases . You can then use the collected space information to perform a simple growth-rate calculation . Several months ago, I decided to build a process to capture space-usage information for each database on a system so that I could track disk-space consumption over time . I wanted to find the amount of space allocated and used for both the data and the log files . I was looking for the same information that you see in Enterprise Managers Database Details pane when youre viewing Space Allocated information, but I needed the information to be available to T-SQL code . Using SQL Server Profiler, I discovered that Enterprise Manager obtains space-allocated information by using two DBCC statements . One of the statements, SQLPERF, is documented; the other DBCC statement, SHOWFILESTATS, isnt . By manually running DBCC SHOWFILESTATS on each database and comparing the output with what Enterprise Manager displayed, I determined that this command would provide me used disk space information by database . Both SQL Server 2000 and SQL Server 7 .0 use these DBCC statements to populate Enterprise Managers Space Allocated display . The DBCC SQLPERF(LOGSPACE) statement returns transaction log space informationthe allocated log size for each database in megabytes and the percentage of log space used for each databasefor all databases . With some simple math, you can easily convert the percentage of log space used into megabytes . This DBCC statement helped me obtain the log file space information I wanted to track . I used the undocumented DBCC SHOWFILESTATS statement, which returns space-usage information for one databases data, to obtain the rest of the disk-space statistics I wanted . This statement returns one record per physical data file . Each statistics record returned appears in six columns: Fileid, FileGroup, TotalExtents, UsedExtents, Name, and FileName . You can use the TotalExtents column to determine the total space allocated to data and the UsedExtents column to determine the total space used for data . By summing the TotalExtents and UsedExtents values of all files within a database, then converting the number of extents into megabytes, I calculated the total space allocated and total space used for data . These calculations gave me the data space usage information I wanted to track over time . Figure 1 shows sample output of the DBCC SHOWFILESTATS command after you run it against the master database .

Brought to you by Dell and SQL Server Magazine

20

SQL Server Storage Options

Fileid 1

FileGroup 1

TotalExtents 171

UsedExtents 168

Name master

FileName g:\mssql7\data\master.mdf

DBCC execution completed. If DBCC printed error messages, contact your system administrator. Figure 1: Sample Output of DBCC SHOWFILESTATS Ive built these two DBCC statements into a process that automatically collects space information by database . This process runs periodically and saves space-usage statistics in a database table . The process consists of a SQL Server Agent job that contains two steps . The first step executes a stored procedure named usp_get_dbstats, which generates a T-SQL script . The resulting script consists of a DBCC SQLPERF(LOGSPACE) statement to gather the log information for all databases, a DBCC SHOWFILESTATS statement for each database, and some code to manipulate the DBCC data into the right format for saving the space-usage information in . The second step executes the T-SQL script that the first step generates . After extracting the space-usage information from SQL Server and formatting the data, this script populates a permanent database table with the current data and log space-usage information . You can then use this permanent table to answer a wealth of disk space allocation questions . This process of gathering space-usage statistics is an example of using T-SQL code to generate T-SQL code . I used this two-step process to minimize the complexity of writing a stored procedure that would need to issue a USE statement to let me run the DBCC SHOWFILESTATS command against each database . Now, lets look at my homegrown disk-space collection process in a little more detail .

The Stored Procedure


The usp_get_dbstats stored procedure, which Listing 1 shows, is the main body of the space-usage statisticsgathering process . The stored procedure queries the system tables and programmatically generates and executes PRINT statements to produce a T-SQL script that, when executed, uses two DBCC statements to extract current space-usage information . Lets walk through this stored procedure one section at a time . LISTING 1: The usp_get_dbstats Stored Procedure
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N[dbo].[usp_get_dbstats]) AND OBJECTPROPERTY(id, NIsProcedure) = 1) DROP PROCEDURE [dbo].[usp_get_dbstats] GO SET QUOTED_IDENTIFIER GO OFF SET ANSI_NULLS ON

CREATE PROCEDURE usp_get_dbstats AS DECLARE @DBSTATS_DB char(3) SET @DBSTATS_DB = DBA


Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 21

-- Begin callout A PRINT DECLARE @cmd nvarchar(1024) PRINT IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE id = object_id(N + char(39) + [tempdb]..[#tmplg] + char(39) + )) PRINT DROP TABLE #tmplg PRINT PRINT PRINT PRINT PRINT PRINT PRINT CREATE TABLE #tmplg ( DBName varchar(32), LogSize real, LogSpaceUsed real, Status int )

PRINT SELECT @cmd = + char(39) + dbcc sqlperf (logspace) + char(39) PRINT INSERT INTO #tmplg EXECUTE (@cmd) -- End callout A -- Begin callout B PRINT IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE id = object_id(N + char (39) + [tempdb]..[#tmp_stats] + char(39 ) + )) PRINT DROP TABLE #tmp_stats PRINT CREATE TABLE #tmp_stats ( PRINT totalextents int, PRINT usedextents int, PRINT dbname varchar(40), PRINT logsize real, PRINT logspaceused real PRINT ) PRINT go--End callout B --Begin callout C DECLARE AllDatabases CURSOR FOR SELECT name FROM master..sysdatabases OPEN AllDatabases DECLARE @DB nvarchar(128) FETCH NEXT FROM AllDatabases INTO @DB WHILE (@@FETCH_STATUS = 0)

Brought to you by Dell and SQL Server Magazine

22

SQL Server Storage Options

BEGIN PRINT USE [ + @DB + ] PRINT GO PRINT IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE id = object_id(N + char(39) + [tempdb]..[#tmp_sfs] + char(39) + )) PRINT DROP TABLE #tmp_sfs PRINT CREATE TABLE #tmp_sfs ( PRINT fileid int, PRINT filegroup int, PRINT totalextents int, PRINT usedextents int, PRINT name varchar(1024), PRINT filename varchar(1024) PRINT ) PRINT go PRINT DECLARE @cmd nvarchar(1024) PRINT SET @cmd= + char(39) + DBCC SHOWFILESTATS + char(39) PRINT INSERT INTO #tmp_sfs EXECUTE(@cmd) PRINT DECLARE @logsize real PRINT DECLARE @logspaceused real PRINT SELECT @logsize= logsize FROM #tmplg WHERE dbname = + char(39) + @DB + char(39) PRINT SELECT @logspaceused = (logsize*logspaceused)/100.0 PRINT FROM #tmplg WHERE dbname = + char(39) + @DB + char(39) PRINT SET @cmd = + char(39) + INSERT INTO #tmp_stats + char(39) + + PRINT + char(39) + (totalextents,usedextents, dbname,logsize,logspaceused) + char(39) + + PRINT + char(39) + SELECT SUM(totalextents), SUM (usedextents), + char(39) + + char(39) + + char(39) + @DB + char(39) + + char(39) + + char(39) + , + char(39) + + PRINT CAST(@logsize AS varchar) + + char(39) + , + char(39) + + CAST (@logspaceused AS varchar) + PRINT + char(39) + FROM #tmp_sfs + char(39) PRINT EXEC sp_executesql @cmd FETCH NEXT FROM AllDatabases INTO @DB END --(@@FETCH_STATUS = 0)

Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 23

--End callout C --Begin callout D PRINT INSERT INTO + @DBSTATS_DB + .dbo.DBSTATS PRINT (RECORD_TYPE, DBNAME, DATA_SIZE, DATA_USED, LOG_SIZE, LOG_USED) PRINT SELECT 1,dbname,totalextents*64/1024 , usedextents*64/1024 , PRINT logsize ,logspaceused FROM #tmp_stats --End callout D CLOSE AllDatabases DEALLOCATE AllDatabases GO SET QUOTED_IDENTIFIER OFF GO

SET ANSI_NULLS

ON

The code at callout A in Listing 1 gathers the log-space usage information . This block of code, like the others in Listing 1, dynamically generates and executes a series of PRINT statements that become the T-SQL script that gathers the space-usage statistics . The code at callout A produces a set of T-SQL statements that create a temporary table called #tmplg, then populates the table with the output from DBCC SQLPERF(LOGSPACE) . The INSERT INTO statement that has the EXECUTE option puts the DBCC statements output into the #tmplg table, which will eventually contain one record for each database on the server . Each record will contain information that goes into columns labeled DBName, LogSize, LogSpaceUsed, and Status . You can find the definitions of each of these columns in SQL Server Books Online (BOL) under the heading DBCC SQLPERF . Callout B shows the code that creates the #tmp_stats temporary table . Each record in this table will hold both the data and log space-usage information for a database . Later code blocks will populate and use this temporary table . This section of the code executes a series of PRINT statements to append to the T-SQL script that the code at callout A started . The code at callout C generates the DBCC SHOWFILESTATS statement for each database . This chunk of code also generates the T-SQL statements that merge the DBCC SHOWFILESTATS information with records in the #tmplg table for each database to produce one record per database containing data and log disk-space statistics . Again, the code uses PRINT statements to generate T-SQL code that will gather DBCC SHOWFILESTATS information . Remember that the DBCC SQLPERF statement generates log-size information for all databases, whereas the DBCC SHOWFILESTATS statement gathers the data sizes for only the current database . This scope limitation of the DBCC SHOWFILESTATS command requires that the stored procedure generate code that will execute the DBCC SHOWFILESTATS statement against each database on the SQL Server box . The code at callout C uses a cursor, AllDatabases, to hold a list of the databases on the system . This cursor lets the stored procedure iterate through the list of databases inside a WHILE loop to generate a DBCC SHOWFILESTATS statement for each database . Inside the WHILE loop, the stored procedure generates code to create a temporary table, #tmp_sfs, to hold the output of the DBCC SHOWFILESTATS statement thats followed by an INSERT INTO statement . Again, I used the INSERT INTO statement with the EXECUTE option to insert the DBCC SHOWFILESTATS statement information into the temporary table . The last several lines of code in this section generate the code that will convert the LogSpaceUsed column in table #tmplg from a percentage of used space into a megabyte value . Then, the code populates the #tmp_stats table with current data and log space-usage statistics for the current database .

Brought to you by Dell and SQL Server Magazine

24

SQL Server Storage Options

The code at callout D generates the T-SQL statements to put the data and log space-usage statistics into a permanent table . This section uses a simple INSERT INTO statement to populate a permanent table, DBSTATS, with the current calculated database space-usage statistics that the temporary table #tmp_stats holds . Listing 2 shows a sample of what the T-SQL script would look like if you executed usp_get_dbstats on a server that had only a few databases . This output was produced on a system that had only the standard SQL Server installed databases (master, model, msdb, Northwind, Pubs, and tempdb), plus one user-defined database (DBA) . Note that in this listing, one chunk of code collects transaction log space information by using the DBCC SQLPERF(LOGSIZE) command . Seven sections of similar code, one for each database, use the DBCC SHOWFILESTATS statement to gather data space-usage statistics . LISTING 2: Example of Using the usp_get_dbstats Stored Procedure
declare @cmd nvarchar(1024) if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmplg])) drop table #tmplg CREATE TABLE #tmplg ( DBName varchar(32), LogSize real, LogSpaceUsed real, Status int ) SELECT @cmd = dbcc sqlperf (logspace) INSERT INTO #Tmplg EXECUTE (@cmd) if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_stats])) drop table #tmp_stats create table #tmp_stats ( totalextents int, usedextents int, dbname varchar(40), logsize real, logspaceused real ) go use [dba] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int, totalextents int, usedextents int,

Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 25

name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real declare @logspaceused real select @logsize= logsize from #tmplg where dbname = dba select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = dba set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + dba+ char(39) + , + cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs exec sp_executesql @cmd use [master] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int, totalextents int, usedextents int, name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real declare @logspaceused real select @logsize= logsize from #tmplg where dbname = master select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = master set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + master+ char(39) + , + cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs

Brought to you by Dell and SQL Server Magazine

26

SQL Server Storage Options

exec sp_executesql @cmd use [model] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int, totalextents int, usedextents int, name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real declare @logspaceused real select @logsize= logsize from #tmplg where dbname = model select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = model set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + model+ char(39) + , + cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs exec sp_executesql @cmd use [msdb] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int, totalextents int, usedextents int, name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real
Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 27

declare @logspaceused real select @logsize= logsize from #tmplg where dbname = msdb select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = msdb set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + msdb+ char(39) + , + cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs exec sp_executesql @cmd use [Northwind] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int, totalextents int, usedextents int, name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real declare @logspaceused real select @logsize= logsize from #tmplg where dbname = Northwind select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = Northwind set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + Northwind+ char(39) + , + cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs exec sp_executesql @cmd use [pubs] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int,
Brought to you by Dell and SQL Server Magazine

28

SQL Server Storage Options

totalextents int, usedextents int, name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real declare @logspaceused real select @logsize= logsize from #tmplg where dbname = pubs select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = pubs set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + pubs+ char(39) + , + cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs exec sp_executesql @cmd use [tempdb] go if exists (select * from tempdb..sysobjects where id = object_id(N[tempdb]..[#tmp_sfs])) drop table #tmp_sfs create table #tmp_sfs ( fileid int, filegroup int, totalextents int, usedextents int, name varchar(1024), filename varchar(1024) ) go declare @cmd nvarchar(1024) set @cmd=DBCC SHOWFILESTATS insert into #tmp_sfs execute(@cmd) declare @logsize real declare @logspaceused real select @logsize= logsize from #tmplg where dbname = tempdb select @logspaceused = (logsize*logspaceused)/100.0 from #tmplg where dbname = tempdb set @cmd = insert into #tmp_stats + (totalextents,usedextents,dbname,logsize,logspaceused) + select sum(totalextents), sum(usedextents), + char(39) + tempdb+ char(39) + , +

Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 29

cast(@logsize as varchar) + , + cast(@logspaceused as varchar) + from #tmp_sfs exec sp_executesql @cmd INSERT INTO dba.dbo.DBSTATS (RECORD_TYPE, DBNAME, DATA_SIZE, DATA_USED, LOG_SIZE, LOG_USED) SELECT 1,dbname,totalextents*64/1024 , usedextents*64/1024 , logsize ,logspaceused from #tmp_stats

The Permanent Table


The usp_get_dbstats stored procedure assumes that the space-usage statistics it gathers will be stored in a permanent table called DBSTATS . So before executing the T-SQL script that usp_get_dbstats generates, youll need to create the DBSTATS table . Running the code that Listing 3 shows creates the DBSTATS permanent table that will hold all the historic database space-usage information . Records are appended to the DBSTATS table each time you execute the commands that usp_get_dbstats generates . LISTING 3: Code That Creates the Permanent Table DBSTATS
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N[dbo].[DBSTATS]) AND OBJECTPROPERTY(id, NIsUserTable) = 1) DROP TABLE [dbo].[DBSTATS] GO CREATE TABLE [dbo].[DBSTATS] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [RECORD_TYPE] [int] NOT NULL , [DBNAME] [char] (50) NOT NULL , [DATA_SIZE] [decimal](9, 2) NULL , [DATA_USED] [decimal](9, 2) NULL , [LOG_SIZE] [decimal](9, 2) NULL , [LOG_USED] [decimal](9, 2) NULL , [STAT_DATE] [datetime] NULL ) ON [PRIMARY] GO ALTER TABLE [dbo].[DBSTATS] WITH NOCHECK ADD CONSTRAINT [DF_DBSTATS_STAT_DATE] DEFAULT (getdate()) FOR [STAT_DATE] GO

In my shop, we have a DBA database that contains the DBSTATS table and the usp_get_dbstats stored procedure . If your shop has a database that your DBAs use to hold stored procedures and tables such as DBSTATS, you can change the default database at the beginning of usp_get_dbstats (variable @DBSTATS_DB) to a database appropriate to your site . Note that if you do change the default database name, you need to change the @ DBSTATS_DB declaration to match the size of your database name .

Brought to you by Dell and SQL Server Magazine

30

SQL Server Storage Options

The SQL Server Agent Job


You could manually execute the usp_get_dbstats stored procedure to generate the T-SQL script to gather the space-usage statistics, then copy the generated T-SQL script into Query Analyzer to get the current statistics into the DBSTATS table . However, this manual approach would quickly become boring and waste your valuable time . Instead of generating your statistics manually, I recommend that you build a SQL Server Agent job like the Get DBSTATS job that Figure 2 shows .

Figure 2: Sample GetDBSTATS job The Get DBSTATS SQL Server Agent job has two steps . The first step, which Figure 3 shows, uses the osql command to execute the usp_get_dbstats stored procedure . Using osql lets the second step place the output from usp_get_dbstats into a file for execution . The -o option tells the usp_get_dbstats stored procedure to write output to a file called c:\temp\get_dbstats .sql . This file is the T-SQL script that the second step of the SQL Server Agent job will execute .

Figure 3: First step in the GetDBSTATS job

Brought to you by Dell and SQL Server Magazine

Chapter 4 Avoiding the Red Zone 31

The second step of Get DBSTATS, which Figure 4 shows, executes the statements that usp_get_dbstats generated, extracting and saving disk-space usage information . Figure 4 shows the osql command that executes the script that the first step produced . The input (-i) parameter feeds into the osql process the T-SQL script that the first step built .

Figure 4: Second step in the GetDBSTATS job In my shop, Ive scheduled the SQL Server Agent job to run once a week so that I can capture the database space-usage statistics and monitor the growth of our databases week by week . You need to determine how frequently you should gather space-usage statistics for your environment . Capturing disk-space usage lets me perform several kinds of disk-space usage analysis . I can track monthly and yearly disk usage, both by individual databases and overall, and how much additional disk space was used when we migrated data related to a particular project .

Growth-Rate Calculation
If you dont have any disk-space usage information, predicting an average database growth rate is extremely difficult . After youve implemented a disk-usage collection method such as the one Ive outlined, you have statistics available to help you calculate a databases average growth rate . I produce a simple Microsoft Excel chart monthly that tracks our disk-space usage over time . Figure 5 shows the monthly disk-space usage for one of our production servers, SQLPROD1 . This graph represents the amount of disk space that all our production databases on SQLPROD1 were using on different dates over a period of 7 months . Note that I recorded several spikes in the graph . Over time, I can associate the peaks and valleys with specific events that cause unusual growth in our database, so I can better predict growth rates for upcoming database work . In Figure 5, you can see when we added DB_TEST: The used space on server SQLPROD1 grew almost 3GB .

Brought to you by Dell and SQL Server Magazine

32

SQL Server Storage Options

Figure 5: Monthly disk-space usage for the SQLPROD1 server Although this graph represents disk-space usage statistics starting only in July 2001, getting a picture of the average disk-space growth rate for a more recent or longer period on this server is easy . I can determine the monthly growth rate by using the following simple formula: MONTHLY_GROWTH_RATE = (SPACE_USED_END - SPACE_USED_BEGIN) / NUMBER_OF_MONTHS The amount of disk space occupied on July 1, 2001 (SPACE_USED_BEGIN), was 6 .5GB . By February 4, 2002 (SPACE_USED_END), the used disk space had grown to 7 .66GB . The number of months between the July and February data points is a little more than 7 . According to this formula, the monthly growth rate for our SQLPROD1 box is a little more than 0 .16GB per month . Now that I can calculate the monthly growth rate, based on statistics, I can predict the number of months before our database growth consumes our available free disk space and Ill have time to acquire more disk space in advance . Calculating a monthly growth rate for our SQLPROD1 server would be impossible without collecting statistics over time . This homegrown solution, using documented and undocumented DBCC statements, meets my organizations needs . Other organizations might find they need to collect more historical space-usage information, such as space usage by tables within a database . Whether you acquire canned software to track space usage or choose a homegrown solution, gathering database-growth information over time can give you valuable insight into the growth patterns of your databases . Without historical growth-rate information, you have no way to adequately understand a databases disk usage . Knowing the current growth rate of each database will help you more accurately plan for future disk acquisitions .

Brought to you by Dell and SQL Server Magazine

You might also like