20464C ENU TrainerHandbook PDF

MCT USE ONLY.
STUDENT USE PROHIBITED

O F F I C I A L M I C R O S O F T L E A R N I N G P R O D U C T
20464C
Developing Microsoft® SQL Server®
Databases
MCT USE ONLY. STUDENT USE PROHIBITED
ii Developing Microsoft® SQL Server® Databases
Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
The names of manufacturers, products, or URLs are provided for informational purposes only and
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not
responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained
therein.
© 2014 Microsoft Corporation. All rights reserved.
Microsoft and the trademarks listed at

http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks
of the Microsoft group of companies. All other trademarks are property of their respective owners
Product Number: 20464C
Part Number (if applicable):
Released: 08/2014
MICROSOFT LICENSE TERMS
MICROSOFT INSTRUCTOR-LED COURSEWARE
These license terms are an agreement between Microsoft Corporation (or based on where you live, one of its
affiliates) and you. Please read them. They apply to your use of the content accompanying this agreement which
includes the media on which you received it, if any. These license terms also apply to Trainer Content and any
updates and supplements for the Licensed Content unless other terms accompany those items. If so, those terms
apply.
BY ACCESSING, DOWNLOADING OR USING THE LICENSED CONTENT, YOU ACCEPT THESE TERMS.
IF YOU DO NOT ACCEPT THEM, DO NOT ACCESS, DOWNLOAD OR USE THE LICENSED CONTENT.
If you comply with these license terms, you have the rights below for each license you acquire.
1. DEFINITIONS.
a. “Authorized Learning Center” means a Microsoft IT Academy Program Member, Microsoft Learning
Competency Member, or such other entity as Microsoft may designate from time to time.
b. “Authorized Training Session” means the instructor-led training class using Microsoft Instructor-Led
Courseware conducted by a Trainer at or through an Authorized Learning Center.
c. “Classroom Device” means one (1) dedicated, secure computer that an Authorized Learning Center owns
or controls that is located at an Authorized Learning Center’s training facilities that meets or exceeds the
hardware level specified for the particular Microsoft Instructor-Led Courseware.
d. “End User” means an individual who is (i) duly enrolled in and attending an Authorized Training Session
or Private Training Session, (ii) an employee of a MPN Member, or (iii) a Microsoft full-time employee.
e. “Licensed Content” means the content accompanying this agreement which may include the Microsoft
Instructor-Led Courseware or Trainer Content.
f. “Microsoft Certified Trainer” or “MCT” means an individual who is (i) engaged to teach a training session
to End Users on behalf of an Authorized Learning Center or MPN Member, and (ii) currently certified as a
Microsoft Certified Trainer under the Microsoft Certification Program.
g. “Microsoft Instructor-Led Courseware” means the Microsoft-branded instructor-led training course that
educates IT professionals and developers on Microsoft technologies. A Microsoft Instructor-Led
Courseware title may be branded as MOC, Microsoft Dynamics or Microsoft Business Group courseware.
h. “Microsoft IT Academy Program Member” means an active member of the Microsoft IT Academy
Program.
i. “Microsoft Learning Competency Member” means an active member of the Microsoft Partner Network
program in good standing that currently holds the Learning Competency status.
j. “MOC” means the “Official Microsoft Learning Product” instructor-led courseware known as Microsoft
Official Course that educates IT professionals and developers on Microsoft technologies.
k. “MPN Member” means an active Microsoft Partner Network program member in good standing.
l. “Personal Device” means one (1) personal computer, device, workstation or other digital electronic device
that you personally own or control that meets or exceeds the hardware level specified for the particular
Microsoft Instructor-Led Courseware.
m. “Private Training Session” means the instructor-led training classes provided by MPN Members for
corporate customers to teach a predefined learning objective using Microsoft Instructor-Led Courseware.
These classes are not advertised or promoted to the general public and class attendance is restricted to
individuals employed by or contracted by the corporate customer.
n. “Trainer” means (i) an academically accredited educator engaged by a Microsoft IT Academy Program
Member to teach an Authorized Training Session, and/or (ii) a MCT.
o. “Trainer Content” means the trainer version of the Microsoft Instructor-Led Courseware and additional
supplemental content designated solely for Trainers’ use to teach a training session using the Microsoft
Instructor-Led Courseware. Trainer Content may include Microsoft PowerPoint presentations, trainer
preparation guide, train the trainer materials, Microsoft One Note packs, classroom setup guide and Pre-
release course feedback form. To clarify, Trainer Content does not include any software, virtual hard
disks or virtual machines.
2. USE RIGHTS. The Licensed Content is licensed not sold. The Licensed Content is licensed on a one copy
per user basis, such that you must acquire a license for each individual that accesses or uses the Licensed
Content.
2.1 Below are five separate sets of use rights. Only one set of rights apply to you.
a. If you are a Microsoft IT Academy Program Member:

i. Each license acquired on behalf of yourself may only be used to review one (1) copy of the Microsoft
Instructor-Led Courseware in the form provided to you. If the Microsoft Instructor-Led Courseware is
in digital format, you may install one (1) copy on up to three (3) Personal Devices. You may not
install the Microsoft Instructor-Led Courseware on a device you do not own or control.
ii. For each license you acquire on behalf of an End User or Trainer, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one (1) End
User who is enrolled in the Authorized Training Session, and only immediately prior to the
commencement of the Authorized Training Session that is the subject matter of the Microsoft
Instructor-Led Courseware being provided, or
2. provide one (1) End User with the unique redemption code and instructions on how they can
access one (1) digital version of the Microsoft Instructor-Led Courseware, or
3. provide one (1) Trainer with the unique redemption code and instructions on how they can
access one (1) Trainer Content,
provided you comply with the following:
iii. you will only provide access to the Licensed Content to those individuals who have acquired a valid
license to the Licensed Content,
iv. you will ensure each End User attending an Authorized Training Session has their own valid licensed
copy of the Microsoft Instructor-Led Courseware that is the subject of the Authorized Training
Session,
v. you will ensure that each End User provided with the hard-copy version of the Microsoft Instructor-
Led Courseware will be presented with a copy of this agreement and each End User will agree that
their use of the Microsoft Instructor-Led Courseware will be subject to the terms in this agreement
prior to providing them with the Microsoft Instructor-Led Courseware. Each individual will be required
to denote their acceptance of this agreement in a manner that is enforceable under local law prior to
their accessing the Microsoft Instructor-Led Courseware,
vi. you will ensure that each Trainer teaching an Authorized Training Session has their own valid
licensed copy of the Trainer Content that is the subject of the Authorized Training Session,
vii. you will only use qualified Trainers who have in-depth knowledge of and experience with the
Microsoft technology that is the subject of the Microsoft Instructor-Led Courseware being taught for
all your Authorized Training Sessions,
viii. you will only deliver a maximum of 15 hours of training per week for each Authorized Training
Session that uses a MOC title, and
ix. you acknowledge that Trainers that are not MCTs will not have access to all of the trainer resources
for the Microsoft Instructor-Led Courseware.
b. If you are a Microsoft Learning Competency Member:

User attending the Authorized Training Session and only immediately prior to the
commencement of the Authorized Training Session that is the subject matter of the Microsoft
Instructor-Led Courseware provided, or
2. provide one (1) End User attending the Authorized Training Session with the unique redemption
code and instructions on how they can access one (1) digital version of the Microsoft Instructor-
Led Courseware, or
3. you will provide one (1) Trainer with the unique redemption code and instructions on how they
can access one (1) Trainer Content,
iv. you will ensure that each End User attending an Authorized Training Session has their own valid
licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the Authorized
Training Session,
v. you will ensure that each End User provided with a hard-copy version of the Microsoft Instructor-Led
Courseware will be presented with a copy of this agreement and each End User will agree that their
use of the Microsoft Instructor-Led Courseware will be subject to the terms in this agreement prior to
providing them with the Microsoft Instructor-Led Courseware. Each individual will be required to
denote their acceptance of this agreement in a manner that is enforceable under local law prior to
vi. you will ensure that each Trainer teaching an Authorized Training Session has their own valid
licensed copy of the Trainer Content that is the subject of the Authorized Training Session,
vii. you will only use qualified Trainers who hold the applicable Microsoft Certification credential that is
the subject of the Microsoft Instructor-Led Courseware being taught for your Authorized Training
Sessions,
viii. you will only use qualified MCTs who also hold the applicable Microsoft Certification credential that is
the subject of the MOC title being taught for all your Authorized Training Sessions using MOC,
ix. you will only provide access to the Microsoft Instructor-Led Courseware to End Users, and
x. you will only provide access to the Trainer Content to Trainers.
c. If you are a MPN Member:
User attending the Private Training Session, and only immediately prior to the commencement
of the Private Training Session that is the subject matter of the Microsoft Instructor-Led
Courseware being provided, or
2. provide one (1) End User who is attending the Private Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) Trainer who is teaching the Private Training Session with the unique
redemption code and instructions on how they can access one (1) Trainer Content,
iv. you will ensure that each End User attending an Private Training Session has their own valid licensed
copy of the Microsoft Instructor-Led Courseware that is the subject of the Private Training Session,
v. you will ensure that each End User provided with a hard copy version of the Microsoft Instructor-Led
Courseware will be presented with a copy of this agreement and each End User will agree that their
use of the Microsoft Instructor-Led Courseware will be subject to the terms in this agreement prior to
providing them with the Microsoft Instructor-Led Courseware. Each individual will be required to
denote their acceptance of this agreement in a manner that is enforceable under local law prior to
vi. you will ensure that each Trainer teaching an Private Training Session has their own valid licensed
copy of the Trainer Content that is the subject of the Private Training Session,
vii. you will only use qualified Trainers who hold the applicable Microsoft Certification credential that is
the subject of the Microsoft Instructor-Led Courseware being taught for all your Private Training
Sessions,
viii. you will only use qualified MCTs who hold the applicable Microsoft Certification credential that is the
subject of the MOC title being taught for all your Private Training Sessions using MOC,
ix. you will only provide access to the Microsoft Instructor-Led Courseware to End Users, and
x. you will only provide access to the Trainer Content to Trainers.
d. If you are an End User:

For each license you acquire, you may use the Microsoft Instructor-Led Courseware solely for your
personal training use. If the Microsoft Instructor-Led Courseware is in digital format, you may access the
Microsoft Instructor-Led Courseware online using the unique redemption code provided to you by the
training provider and install and use one (1) copy of the Microsoft Instructor-Led Courseware on up to
three (3) Personal Devices. You may also print one (1) copy of the Microsoft Instructor-Led Courseware.
You may not install the Microsoft Instructor-Led Courseware on a device you do not own or control.
e. If you are a Trainer.

i. For each license you acquire, you may install and use one (1) copy of the Trainer Content in the
form provided to you on one (1) Personal Device solely to prepare and deliver an Authorized
Training Session or Private Training Session, and install one (1) additional copy on another Personal
Device as a backup copy, which may be used only to reinstall the Trainer Content. You may not
install or use a copy of the Trainer Content on a device you do not own or control. You may also
print one (1) copy of the Trainer Content solely to prepare for and deliver an Authorized Training
Session or Private Training Session.
ii. You may customize the written portions of the Trainer Content that are logically associated with
instruction of a training session in accordance with the most recent version of the MCT agreement.
If you elect to exercise the foregoing rights, you agree to comply with the following: (i)
customizations may only be used for teaching Authorized Training Sessions and Private Training
Sessions, and (ii) all customizations will comply with this agreement. For clarity, any use of
“customize” refers only to changing the order of slides and content, and/or not using all the slides or
content, it does not mean changing or modifying any slide or content.
2.2 Separation of Components. The Licensed Content is licensed as a single unit and you may not
separate their components and install them on different devices.
2.3 Redistribution of Licensed Content. Except as expressly provided in the use rights above, you may
not distribute any Licensed Content or any portion thereof (including any permitted modifications) to any
third parties without the express written permission of Microsoft.
2.4 Third Party Notices. The Licensed Content may include third party code tent that Microsoft, not the
third party, licenses to you under this agreement. Notices, if any, for the third party code ntent are included
for your information only.
2.5 Additional Terms. Some Licensed Content may contain components with additional terms,
conditions, and licenses regarding its use. Any non-conflicting terms in those conditions and licenses also
apply to your use of that respective component and supplements the terms described in this agreement.
3. LICENSED CONTENT BASED ON PRE-RELEASE TECHNOLOGY. If the Licensed Content’s subject

matter is based on a pre-release version of Microsoft technology (“Pre-release”), then in addition to the
other provisions in this agreement, these terms also apply:
a. Pre-Release Licensed Content. This Licensed Content subject matter is on the Pre-release version of
the Microsoft technology. The technology may not work the way a final version of the technology will
and we may change the technology for the final version. We also may not release a final version.
Licensed Content based on the final version of the technology may not contain the same information as
the Licensed Content based on the Pre-release version. Microsoft is under no obligation to provide you
with any further content, including any Licensed Content based on the final version of the technology.
b. Feedback. If you agree to give feedback about the Licensed Content to Microsoft, either directly or
through its third party designee, you give to Microsoft without charge, the right to use, share and
commercialize your feedback in any way and for any purpose. You also give to third parties, without
charge, any patent rights needed for their products, technologies and services to use or interface with
any specific parts of a Microsoft technology, Microsoft product, or service that includes the feedback.
You will not give feedback that is subject to a license that requires Microsoft to license its technology,
technologies, or products to third parties because we include your feedback in them. These rights
survive this agreement.
c. Pre-release Term. If you are an Microsoft IT Academy Program Member, Microsoft Learning
Competency Member, MPN Member or Trainer, you will cease using all copies of the Licensed Content on
the Pre-release technology upon (i) the date which Microsoft informs you is the end date for using the
Licensed Content on the Pre-release technology, or (ii) sixty (60) days after the commercial release of the
technology that is the subject of the Licensed Content, whichever is earliest (“Pre-release term”).
Upon expiration or termination of the Pre-release term, you will irretrievably delete and destroy all copies
of the Licensed Content in your possession or under your control.
4. SCOPE OF LICENSE. The Licensed Content is licensed, not sold. This agreement only gives you some
rights to use the Licensed Content. Microsoft reserves all other rights. Unless applicable law gives you more
rights despite this limitation, you may use the Licensed Content only as expressly permitted in this
agreement. In doing so, you must comply with any technical limitations in the Licensed Content that only
allows you to use it in certain ways. Except as expressly permitted in this agreement, you may not:
• access or allow any individual to access the Licensed Content if they have not acquired a valid license
for the Licensed Content,
• alter, remove or obscure any copyright or other protective notices (including watermarks), branding
or identifications contained in the Licensed Content,
• modify or create a derivative work of any Licensed Content,
• publicly display, or make the Licensed Content available for others to access or use,
• copy, print, install, sell, publish, transmit, lend, adapt, reuse, link to or post, make available or
distribute the Licensed Content to any third party,
• work around any technical limitations in the Licensed Content, or
• reverse engineer, decompile, remove or otherwise thwart any protections or disassemble the
Licensed Content except and only to the extent that applicable law expressly permits, despite this
limitation.
5. RESERVATION OF RIGHTS AND OWNERSHIP. Microsoft reserves all rights not expressly granted to
you in this agreement. The Licensed Content is protected by copyright and other intellectual property laws
and treaties. Microsoft or its suppliers own the title, copyright, and other intellectual property rights in the
Licensed Content.
6. EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regulations.
You must comply with all domestic and international export laws and regulations that apply to the Licensed
Content. These laws include restrictions on destinations, end users and end use. For additional information,
see www.microsoft.com/exporting.
7. SUPPORT SERVICES. Because the Licensed Content is “as is”, we may not provide support services for it.
8. TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you fail
to comply with the terms and conditions of this agreement. Upon termination of this agreement for any
reason, you will immediately stop all use of and delete and destroy all copies of the Licensed Content in
your possession or under your control.
9. LINKS TO THIRD PARTY SITES. You may link to third party sites through the use of the Licensed
Content. The third party sites are not under the control of Microsoft, and Microsoft is not responsible for
the contents of any third party sites, any links contained in third party sites, or any changes or updates to
third party sites. Microsoft is not responsible for webcasting or any other form of transmission received
from any third party sites. Microsoft is providing these links to third party sites to you only as a
convenience, and the inclusion of any link does not imply an endorsement by Microsoft of the third party
site.
10. ENTIRE AGREEMENT. This agreement, and any additional terms for the Trainer Content, updates and
supplements are the entire agreement for the Licensed Content, updates and supplements.
11. APPLICABLE LAW.

a. United States. If you acquired the Licensed Content in the United States, Washington state law governs
the interpretation of this agreement and applies to claims for breach of it, regardless of conflict of laws
principles. The laws of the state where you live govern all other claims, including claims under state
consumer protection laws, unfair competition laws, and in tort.
b. Outside the United States. If you acquired the Licensed Content in any other country, the laws of that
country apply.
12. LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the laws
of your country. You may also have rights with respect to the party from whom you acquired the Licensed
Content. This agreement does not change your rights under the laws of your country if the laws of your
country do not permit it to do so.
13. DISCLAIMER OF WARRANTY. THE LICENSED CONTENT IS LICENSED "AS-IS" AND "AS
AVAILABLE." YOU BEAR THE RISK OF USING IT. MICROSOFT AND ITS RESPECTIVE
AFFILIATES GIVES NO EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. YOU MAY
HAVE ADDITIONAL CONSUMER RIGHTS UNDER YOUR LOCAL LAWS WHICH THIS AGREEMENT
CANNOT CHANGE. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAWS, MICROSOFT AND
ITS RESPECTIVE AFFILIATES EXCLUDES ANY IMPLIED WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
14. LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. YOU CAN RECOVER FROM
MICROSOFT, ITS RESPECTIVE AFFILIATES AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP
TO US$5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL,
LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
This limitation applies to

o anything related to the Licensed Content, services, content (including code) on third party Internet
sites or third-party programs; and
o claims for breach of contract, breach of warranty, guarantee or condition, strict liability, negligence,
or other tort to the extent permitted by applicable law.
It also applies even if Microsoft knew or should have known about the possibility of the damages. The
above limitation or exclusion may not apply to you because your country may not allow the exclusion or
limitation of incidental, consequential or other damages.
Please note: As this Licensed Content is distributed in Quebec, Canada, some of the clauses in this
agreement are provided below in French.
Remarque : Ce le contenu sous licence étant distribué au Québec, Canada, certaines des clauses
dans ce contrat sont fournies ci-dessous en français.
EXONÉRATION DE GARANTIE. Le contenu sous licence visé par une licence est offert « tel quel ». Toute
utilisation de ce contenu sous licence est à votre seule risque et péril. Microsoft n’accorde aucune autre garantie
expresse. Vous pouvez bénéficier de droits additionnels en vertu du droit local sur la protection dues
consommateurs, que ce contrat ne peut modifier. La ou elles sont permises par le droit locale, les garanties
implicites de qualité marchande, d’adéquation à un usage particulier et d’absence de contrefaçon sont exclues.
LIMITATION DES DOMMAGES-INTÉRÊTS ET EXCLUSION DE RESPONSABILITÉ POUR LES

DOMMAGES. Vous pouvez obtenir de Microsoft et de ses fournisseurs une indemnisation en cas de dommages
directs uniquement à hauteur de 5,00 $ US. Vous ne pouvez prétendre à aucune indemnisation pour les autres
dommages, y compris les dommages spéciaux, indirects ou accessoires et pertes de bénéfices.
Cette limitation concerne:
• tout ce qui est relié au le contenu sous licence, aux services ou au contenu (y compris le code)
figurant sur des sites Internet tiers ou dans des programmes tiers; et.
• les réclamations au titre de violation de contrat ou de garantie, ou au titre de responsabilité
stricte, de négligence ou d’une autre faute dans la limite autorisée par la loi en vigueur.
Elle s’applique également, même si Microsoft connaissait ou devrait connaître l’éventualité d’un tel dommage. Si
votre pays n’autorise pas l’exclusion ou la limitation de responsabilité pour les dommages indirects, accessoires
ou de quelque nature que ce soit, il se peut que la limitation ou l’exclusion ci-dessus ne s’appliquera pas à votre
égard.
EFFET JURIDIQUE. Le présent contrat décrit certains droits juridiques. Vous pourriez avoir d’autres droits
prévus par les lois de votre pays. Le présent contrat ne modifie pas les droits que vous confèrent les lois de votre
pays si celles-ci ne le permettent pas.
Revised July 2013

Implementing a Data Warehouse with Microsoft SQL Server 2014 xi
xii Developing Microsoft® SQL Server® Databases
Acknowledgments
Microsoft Learning would like to acknowledge and thank the following for their contribution towards
developing this title. Their effort at various stages in the development has ensured that you have a good
classroom experience.
Geoff Allix – Lead Content Developer

Geoff Allix is a Microsoft SQL Server subject matter expert and professional content developer at Content
Master—a division of CM Group Ltd. As a Microsoft Certified Trainer, Geoff has delivered training courses
on SQL Server since version 6.5. Geoff is a Microsoft Certified IT Professional for SQL Server and has
extensive experience in designing and implementing database and BI solutions on SQL Server
technologies, and has provided consultancy services to organizations seeking to implement and optimize
database solutions.
Implementing a Data Warehouse with Microsoft SQL Server 2014 xiii
Contents
Module 1: An Introduction to Database Development
Module Overview 1-1
Lesson 1: Introduction to the SQL Server Platform 1-2
Lesson 2: Working with SQL Server Tools 1-9
Lesson 3: Configuring SQL Server Services 1-14
Lab: Introduction to Database Development 1-19
Module Review and Takeaways 1-21
Module 2: Designing and Implementing Tables

Module Overview 2-1
Lesson 1: Using Data Types 2-2
Lesson 2: Working with Character Data 2-10
Lesson 3: Designing Tables 2-15

Lesson 4: Working with Schemas 2-21
Lesson 5: Creating and Altering Tables 2-24
Lab: Designing and Implementing Tables 2-29
Module 3: Ensuring Data Integrity through Constraints

Module Overview 3-1
Lesson 1: Enforcing Data Integrity 3-2

Lesson 2: Implementing Domain Integrity 3-5
Lesson 3: Implementing Entity and Referential Integrity 3-8
Lab: Ensuring Data Integrity Through Constraints 3-15

Module 4: Introduction to Indexes

Module Overview 4-1
Lesson 1: Core Indexing Concepts 4-2

Lesson 2: Single-Column and Composite Indexes 4-7
Lesson 3: Table Structures in SQL Server 4-9
Lesson 4: Working with Clustered Indexes 4-14
Lesson 5: Working with Nonclustered Indexes 4-21
Lab: Creating Indexes 4-26

xiv Developing Microsoft® SQL Server® Databases
Module 5: Advanced Indexing

Module Overview 5-1
Lesson 1: Core Concepts of Execution Plans 5-2
Lesson 2: Common Execution Plan Elements 5-9
Lesson 3: Working with Execution Plans 5-14
Lesson 4: Designing Effective Nonclustered Indexes 5-17
Lesson 5: Performance Monitoring 5-21
Lab: Advanced Indexing 5-27
Module 6: In-Memory Database Capabilities

Module Overview 6-1
Lesson 1: The Buffer Pool Extension 6-2
Lesson 2: Columnstore Indexes 6-5

Lab: Using In-Memory Database Capabilities 6-11
Module 7: Designing and Implementing Views

Module Overview 7-1
Lesson 1: Introduction to Views 7-2
Lesson 2: Creating and Managing Views 7-6
Lesson 3: Performance Considerations for Views 7-11

Lab: Designing and Implementing Views 7-15
Module 8: Designing and Implementing Stored Procedures

Module Overview 8-1
Lesson 1: Introduction to Stored Procedures 8-2
Lesson 2: Working with Stored Procedures 8-6
Lesson 3: Implementing Parameterized Stored Procedures 8-12
Lesson 4: Controlling Execution Context 8-17
Lab: Designing and Implementing Stored Procedures 8-20

Implementing a Data Warehouse with Microsoft SQL Server 2014 xv
Module 9: Designing and Implementing User-Defined Functions

Module Overview 9-1
Lesson 1: Overview of Functions 9-2
Lesson 2: Designing and Implementing Scalar Functions 9-4
Lesson 3: Designing and Implementing Table-Valued Functions 9-8
Lesson 4: Considerations for Implementing Functions 9-11
Lesson 5: Alternatives to Functions 9-15
Lab: Designing and Implementing User-Defined Functions 9-17
Module 10: Responding to Data Manipulation via Triggers

Module Overview 10-1
Lesson 1: Designing DML Triggers 10-2
Lesson 2: Implementing DML Triggers 10-7

Lesson 3: Advanced Trigger Concepts 10-11
Lab: Responding to Data Manipulation by Using Triggers 10-17
Module 11: Using In-Memory Tables

Lesson 1: Memory-Optimized Tables 11-2
Lesson 2: Natively Compiled Stored Procedures 11-9
Module 12: Implementing Managed Code in SQL Server

Lesson 1: Introduction to CLR Integration in SQL Server 12-2
Lesson 2: Importing and Cataloging Assemblies 12-9
Lesson 3: Implementing CLR Integration in SQL Server 12-13
Lab: Implementing Managed Code in SQL Server 12-23

xvi Developing Microsoft® SQL Server® Databases
Module 13: Storing and Querying XML Data in SQL Server

Lesson 1: Introduction to XML and XML Schemas 13-2
Lesson 2: Storing XML Data and XML Schemas in SQL Server 13-9
Lesson 3: Implementing XML Indexes 13-15
Lesson 4: Using the Transact-SQL FOR XML Statement 13-18
Lesson 5: Getting Started with XQuery 13-27
Lesson 6: Shredding XML 13-33
Lab: Storing and Querying XML Data in SQL Server 13-38

Module 14: Working with Spatial Data in SQL Server

Lesson 1: Introduction to Spatial Data 14-2

Lesson 2: Working with Spatial Data Types in SQL Server 14-7
Lesson 3: Using Spatial Data in Applications 14-15
Lab: Working with Spatial Data in SQL Server 14-20

Module 15: Incorporating Data Files into Databases

Lesson 1: Considerations for Working with Data Files in SQL Server 2014 15-2
Lesson 2: Implementing FILESTREAM and FileTables 15-9
Lesson 3: Searching Data Files 15-16
Lab: Implementing a Solution for Storing Data Files 15-23

Lab Answer Keys

Module 1 Lab: Introduction to Database Development L01-1
Module 2 Lab: Designing and Implementing Tables L02-1

Module 3 Lab: Ensuring Data Integrity Through Constraints L03-1
Module 4 Lab: Creating Indexes L04-1
Module 5 Lab: Advanced Indexing L05-1
Module 6 Lab: Using In-Memory Database Capabilities L06-1
Module 7 Lab: Designing and Implementing Views L07-1
Module 8 Lab: Designing and Implementing Stored Procedures L08-1
Module 9 Lab: Designing and Implementing User-Defined Functions L09-1

Implementing a Data Warehouse with Microsoft SQL Server 2014 xvii
Module 10 Lab: Responding to Data Manipulation by Using Triggers L10-1
Module 11 Lab: Using In-Memory Database Capabilities L11-1
Module 12 Lab: Implementing Managed Code in SQL Server L12-1
Module 13 Lab: Storing and Querying XML Data in SQL Server L13-1
Module 14 Lab: Working with Spatial Data in SQL Server L14-1
Module 15 Lab: Implementing a Solution for Storing Data Files L15-1

About This Course i
About This Course

This section provides you with a brief description of the course, audience, suggested prerequisites, and
course objectives.
Course Description
This 5-day instructor-led course introduces SQL Server 2014 and describes logical table design, indexing
and query plans. It also focusses on the creation of database objects including views, stored procedures,
along with parameters, and functions. Other common aspects of procedure coding, such as transactions,
error handling, triggers, and SQL CLR are also covered in this course.. This course helps people prepare for
exam 70-461: Writing Queries Using Microsoft® SQL Server® 2014 Transact-SQL.
Audience
The primary audience for this course is IT Professionals who want to become skilled on SQL Server 2012
product features and technologies for implementing a database.
Student Prerequisites
This course requires that you meet the following prerequisites:
In addition to their professional experience, students who attend this training should already have the
following technical knowledge:
 Knowledge of writing T-SQL queries.
 Knowledge of basic relational database concepts.
Course Objectives
After completing this course, students will be able to:
 Describe the concepts of database development.
 Design and implement tables.
 Use constraints to ensure data integrity.

 Describe indexes.
 Implement advanced indexes.
 Deploy in-memory database capabilities.
 Design and implement views.
 Design and implement stored procedures.
 Design and implement user-defined functions.

 Implement triggers to respond to data manipulation.
 Deploy in-memory tables.
 Implement managed code in SQL Server.

 Store and query XML data in SQL Server.
 Work with spatial data in SQL Server.
 Incorporate data files into databases.

ii About This Course
Course Outline
This section provides an outline of the course:
Module 1, “An Introduction to Database Development”
Module 2, “Designing and Implementing Tables”

Module 3, “Ensuring Data Integrity through Constraints”
Module 4, “Introduction to Indexes”
Module 5, “Advanced Indexing”

Module 6, “In-Memory Database Capabilities”
Module 7, “Designing and Implementing Views”
Module 8, “Designing and Implementing Stored Procedures”
Module 9, “Designing and Implementing User-Defined Functions”
Module 10, “Responding to Data Manipulation via Triggers”
Module 11, “Using In-Memory Tables”

Module 12, “Implementing Managed Code in SQL Server”
Module 13, “Storing and Querying XML Data in SQL Server”
Module 14, “Working with Spatial Data in SQL Server”

Module 15, “Incorporating Data Files into Databases”
Course Materials
The following materials are included with your kit:
 Course Handbook A succinct classroom learning guide that provides all the critical technical
information in a crisp, tightly-focused format, which is just right for an effective in-class learning
experience.
 Lessons: Guide you through the learning objectives and provide the key points that are critical to
the success of the in-class learning experience.
 Labs: Provide a real-world, hands-on platform for you to apply the knowledge and skills learned
in the module.
 Module Reviews and Takeaways: Provide improved on-the-job reference material to boost
knowledge and skills retention.
 Lab Answer Keys: Provide step-by-step lab solution guidance at your fingertips when it’s
needed.
Course Companion Content on the http://www.microsoft.com/learning/companionmoc/ Site:

Searchable, easy-to-navigate digital content with integrated premium on-line resources designed to
supplement the Course Handbook.
 Modules: Include companion content, such as questions and answers, detailed demo steps and
additional reading links, for each lesson. Additionally, they include Lab Review questions and answers
and Module Reviews and Takeaways sections, which contain the review questions and answers, best
practices, common issues and troubleshooting tips with answers, and real-world issues and scenarios
with answers.
About This Course iii
 Resources: Include well-categorized additional resources that give you immediate access to the most
up-to-date premium content on TechNet, MSDN®, Microsoft Press®.
Student Course files on the http://www.microsoft.com/learning/companionmoc/ Site: Includes the

Allfiles.exe, a self-extracting executable file that contains all the files required for the labs and
demonstrations.
 Course evaluation At the end of the course, you will have the opportunity to complete an online
evaluation to provide feedback on the course, training facility, and instructor.
 To provide additional comments or feedback on the course, send e-mail to

support@mscourseware.com. To inquire about the Microsoft Certification Program, send e-mail
to mcphelp@microsoft.com.
Virtual Machine Environment

This section provides the information for setting up the classroom environment to support the business
scenario of the course.
Virtual Machine Configuration

In this course, you will use Microsoft Hyper-V to perform the labs.
The following table shows the role of each virtual machine used in this course:
Virtual machine Role

20464C-MIA-SQL Database Server
20464C -MIA-DC Domain Controller
Software Configuration
The following software is installed on each VM:
 Windows Server® 2012

 Microsoft SQL Server 2014
 Microsoft SharePoint Server 2013
 Microsoft Office 2013

 Microsoft Visual Studio 2012
Course Files
There are files associated with the labs in this course. The lab files are located in the folder
D:\Labfiles\LabXX on the 20464C-MIA-SQL virtual machine.
Classroom Setup
Each classroom computer will have the same virtual machine configured in the same way.
Course Hardware Level 6+

To ensure a satisfactory student experience, Microsoft Learning requires a minimum equipment
configuration for trainer and student computers in all Microsoft Certified Partner for Learning Solutions
(CPLS) classrooms in which Official Microsoft Learning Product courseware are taught.
1-1
Module 1
An Introduction to Database Development
Contents:
Module Overview 1-1
Lesson 1: Introduction to the SQL Server Platform 1-2
Lesson 2: Working with SQL Server Tools 1-9
Lesson 3: Configuring SQL Server Services 1-14
Lab: Introduction to Database Development 1-19
Module Overview
Before beginning to work with SQL Server in either a development or an administration role, it is
important to understand the overall SQL Server platform. In particular, it is useful to understand that SQL
Server is not just a database engine but it is a complete platform for managing enterprise data.
Along with a strong platform, SQL Server provides a series of tools that make the product easy to manage
and a good target for the application development.
Individual components of SQL Server can operate within separate security contexts. Correctly configuring
SQL Server services is important where enterprises are operating with a policy of least possible
permissions.
Objectives
After completing this lesson, you will be able to:
 Describe the SQL Server Platform
 Work with SQL Server Tools
 Configure SQL Server Services

1-2 An Introduction to Database Development
Lesson 1
Introduction to the SQL Server Platform
Microsoft® SQL Server® data management software is a platform for developing business applications
that are data focused. Rather than being a single, monolithic application, SQL Server is structured as a
series of components. It is important to understand the use of each component.
You can install more than one copy of SQL Server on a server. Each copy is called an instance and you can
separately configure and manage each one.
There are various editions of SQL Server, and each edition has a different set of capabilities. It is important
to understand the target business cases for each SQL Server edition and how SQL Server has evolved
through a series of improving versions over many years. It is a stable and robust platform.
Lesson Objectives
 Describe the overall SQL Server platform.
 Explain the role of each of the components that make up the SQL Server platform.
 Describe the functionality that SQL Server instances provide.

 Explain the available SQL Server editions.
 Explain how SQL Server has evolved through a series of versions.
SQL Server Architecture

SQL Server is an integrated and enterprise-ready
platform for data management that offers a low
total cost of ownership.
Enterprise Ready
SQL Server provides a very secure, robust, and

stable relational database management system,
although it offers much more than this. You can use
SQL Server to manage organizational data and
provide analysis of, and insights into, that data.
Its database engine is one of the highest

performing database engines available and
regularly features in the top tier of industry performance benchmarks. You can review industry
benchmarks and scores on the Transaction Processing Performance Council (TPC) website.
Transaction Processing Performance Council

http://go.microsoft.com/fwlink/?LinkID=394849&clcid=0x409
Developing Microsoft® SQL Server® Databases 1-3
High Availability
Impressive performance is necessary, but not at the cost of availability. Organizations need constant
access to their data. Many enterprises are now finding it necessary to provide access to their data 24 hours
a day, seven days a week. The SQL Server platform was designed with the highest levels of availability in
mind. As each version of the product has been released, more capabilities have been added to minimize
any potential downtime.
Security
Uppermost in the minds of enterprise managers is the need to secure organizational data. It is not
possible to retrofit security after an application or product has been created. From the very beginning,
SQL Server has been built with the highest levels of security as a goal.
Scalability
Organizations need data management capabilities for systems of all sizes. SQL Server scales from the
smallest needs to the largest via a series of editions that have increasing capabilities.
Cost of Ownership
Many competing database management systems are expensive both to purchase and to maintain. SQL
Server offers very low total cost of ownership. SQL Server tooling (both management and development)
builds on existing Windows® knowledge. Most users tend to become familiar with the tools quite quickly.
The productivity that users achieve when they use the tools is enhanced by the high degree of integration
between the tools. For example, many of the SQL Server tools have links to launch and preconfigure other
SQL Server tools.
SQL Server Components

SQL Server is an excellent relational database
engine, but as a data platform, it offers much more
than this. SQL Server consists of many components.
Component Purpose
Database Engine Is a relational database engine based on

Structured Query Language (SQL)
Analysis Services Is an online analytical processing (OLAP)

engine that works with analytic cubes
Integration Services Is a tool used to orchestrate the

movement of data between SQL Server
components and external systems (in
both directions)
Component Purpose
Reporting Services Offers a reporting engine based on web

services and provides a web portal and
end-user reporting tools
Master Data Services Provides tooling and a hub for managing

master or reference data
Microsoft StreamInsight™ Is a platform for building applications to

process high-speed events
Data Mining Provides tooling and an inference engine

for deriving knowledge and insights from
existing OLAP data or relational data
Full-Text Search Enables users to build sophisticated

search options into applications. SQL
Server 2014 includes sophisticated
semantic search alongside full-text search
PowerPivot Enables end users, power users, and

business analysts to quickly analyze large
volumes of data from different locations
Replication Makes it possible to move data between

servers to suit data distribution needs
Data Quality Services Enables building or connecting to a

knowledge base for data cleansing
Power View Enables rapid visualization of data by end

users
SQL Server Instances

It is sometimes useful to install more than one copy
of a SQL Server component on a single server. You
can install many SQL Server components more than
once as separate instances.
Multiple Instances
The ability to install multiple instances of SQL Server

components on a single server is useful in several
situations:
 There may be a need to have different

administrators or security environments for sets
of databases. Each instance of SQL Server is
separately manageable and securable.
 Applications that need an organization to support them may require server configurations that are
inconsistent or incompatible with the server requirements of other applications. Each instance of SQL
Server is separately configurable.
 Application databases might need to be supported with different levels of service, particularly in
relation to availability. You can use SQL Server instances to separate workloads with differing service
level agreements (SLAs) that need to be met.
 Different versions or editions of SQL Server might need to be supported.
 Applications might require different server-level collations. Although each database can have
different collations, an application might be dependent on the collation of the tempdb database
when the application is using temporary objects.
You can often install different versions of SQL Server side by side by using multiple instances. This can
assist when testing upgrade scenarios or performing upgrades.
Default and Named Instances
Prior to SQL Server 2000, it was only possible to install a single copy of SQL Server on a server system. SQL
Server was addressed by the name of the server. To maintain backward compatibility, this mode of
connection is still supported and is known as a ‘‘default’’ instance.
Additional instances of SQL Server require an instance name in addition to the server name and are
known as ‘‘named’’ instances. You do not need to install a default instance before installing named
instances. It is not possible to install all components of SQL Server in more than one instance. A
substantial change in SQL Server 2012 enables multiple instance support for SQL Server Integration
Services.
There is no need to install SQL Server tools more than once. A single installation of the tools can manage
and configure all instances.
SQL Server Editions

SQL Server is available in a wide variety of editions.
These have different price points and different
levels of capability.
Targets for SQL Server Editions
Each SQL Server edition is targeted to a specific

business use case as shown in the following table:
Edition Business use case
Enterprise Provides the highest levels of reliability for

demanding workloads
Parallel Data Warehouse Uses massively parallel processing (MPP) to

execute queries quickly against vast
amounts of data. Parallel Data Warehouse
systems are sold as a complete ‘‘appliance’’
rather than via standard software licenses
Standard Delivers a reliable, complete data

Edition Business use case

management platform
Business Intelligence Adds Business Intelligence to the offerings

from the Standard edition
Microsoft Azure™ Enables users to build and extend SQL

Server applications to a cloud-based
platform
Developer Enables users to build, test, and

demonstrate all SQL Server functionality
Express Is a free edition for lightweight web and

small server-based applications
Compact Is a free edition for stand-alone and

occasionally connected mobile applications,
optimized for a very small memory footprint
Web Provides a secure, cost-effective, and

scalable platform for public websites and
applications
SQL Server Versions

SQL Server has a rich history of innovation that has
been achieved while maintaining strong levels of
stability. SQL Server has been available for many
years, yet it is rapidly evolving new capabilities and
features.
Early Versions
The earliest versions of SQL Server (1.0 and 1.1)

were based on the OS/2 operating system.
Versions 4.2 and later moved to the Windows

operating system, initially on the Windows NT
operating system.
Later Versions
Version 7.0 saw a significant rewrite of the product. Substantial advances were made in reducing the
administration workload for the product. OLAP Services (which later became Analysis Services) was
introduced.
SQL Server 2000 featured support for multiple instances and collations. It also introduced support for data
mining. SQL Server Reporting Services was introduced after the product release as an add-on
enhancement to the product, along with support for 64-bit processors.
SQL Server 2005 provided another significant rewrite of many aspects of the product:
 It introduced support for nonrelational data that was stored and queried as XML.
 SQL Server Management Studio was released to replace several previous administrative tools.
 SQL Server Integration Services replaced a tool formerly known as Data Transformation Services (DTS).
 Another key addition to the product was the introduction of support for objects that had been
created by using the common language runtime (CLR).
 The Transact-SQL language was substantially enhanced, including structured exception handling.
 Dynamic Management Views and Functions were introduced to enable detailed health monitoring,
performance tuning, and troubleshooting.
 Substantial high-availability improvements were included in the product. Database mirroring was
introduced.
 Support for column encryption was introduced.
SQL Server 2008 also provided many enhancements:
 The SQL Server “AlwaysOn” technologies were introduced to reduce potential downtime.
 FILESTREAM support improved the handling of structured and semi-structured data.
 Spatial data types were introduced.
 Database compression and encryption technologies were added.

 Specialized date-related and time-related data types were introduced, including support for time
zones within date/time data.
 Full-text indexing was integrated directly within the database engine. (Previously, full-text indexing
was based on interfaces to services at the operating system level.)
 A policy-based management framework was introduced to assist with a move to more declarative-
based management practices, rather than reactive practices.
 A Windows PowerShell® provider for SQL Server was introduced.
The enhancements and additions to the product in SQL Server 2008 R2 included:
 Substantial enhancements to SQL Server Reporting Services.
 The introduction of advanced analytic capabilities with PowerPivot.
 Improved multi-server management capabilities.
 Support for managing reference data with the introduction of Master Data Services.
 The introduction of StreamInsight, which enabled users to query data that was arriving at high speed,
before storing the data in a database.
SQL Server 2012
The enhancements and additions to the product in SQL Server 2012 included:
 Further substantial enhancements to SQL Server Reporting Services.
 Substantial enhancements to SQL Server Integration Services.

 The introduction of tabular data models into SQL Server Analysis Services.
 The migration of Business Intelligence projects into Microsoft Visual Studio® 2010.
 Data-tier applications, which assisted with packaging database applications as part of application
development projects.
 The introduction of the AlwaysOn enhancements to SQL Server High Availability.
 The introduction of Data Quality Services.
 Strong enhancements to the Transact-SQL language, such as the addition of sequences, new error-
handling capabilities, and new window functions.
 The introduction of the FileTable feature.
 The introduction of statistical semantic search.
 Many general tooling improvements.
SQL Server 2014
The enhancements and additions to the product in SQL Server 2014 include:
 Substantial performance gains from the introduction of in-memory tables and native stored
procedures.
 Enhanced security.
 Improved scalability.
 Enhanced AlwaysOn high availability.
 Increased integration with Microsoft Azure.

Lesson 2
Working with SQL Server Tools
Working effectively with SQL Server requires familiarity with the tools that are used in conjunction with it.
Before any tool can connect to SQL Server, it needs to make a network connection to the server. In this
lesson, you will see how these connections are made, and then look at the tools that are most commonly
used when you are working with SQL Server.
Lesson Objectives
 Connect from clients and applications.
 Describe the roles of software layers for connections.
 Use SQL Server Management Studio.
 Use SQL Server Data Tools.
Connecting from Clients and Applications

Client applications connect to endpoints. Various
communication protocols are available for making
connections. In addition, users need to be identified
before they are permitted to use the server.
Connectivity
The protocol that client applications use when they

connect to the SQL Server relational database
engine is known as Tabular Data Stream (TDS). It
defines how requests are issued and how results are
returned.
Other components of SQL Server use alternate
protocols. For example, clients to SQL Server Analysis Services communicate via the XML for Analysis
(XML/A) protocol. However, in this course, you are primarily concerned with the relational database
engine.
TDS is a high-level protocol that is transported by lower-level protocols. It is most commonly transported
by the TCP/IP protocol or the Named Pipes protocol, or implemented over a shared memory connection.
Authentication
For most applications and organizations, data must be held securely and access to the data is based on
the identity of the user who is attempting to access the data. The process of verifying the identity of a
user (or more formally, of any principal) is known as authentication. SQL Server supports two forms of
authentication:
1. It can store the login details for users directly within its own system databases. These logins are
known as SQL Server logins.
2. It can be configured to trust a Windows authenticator (such as Active Directory®). In that case, a
Windows user can be granted access to the server, either directly or via his or her Windows group
memberships.
When a connection is made, the user is connected to a specific database, which is known as his or her
“default” database.
Software Layers for Connections

Connections to SQL Server are made through a
series of software layers. It is important to
understand how each of these layers interacts. This
knowledge will assist you when you need to
perform configuration or troubleshooting.
Client Libraries
Client applications use programming libraries to

simplify their access to databases such as SQL
Server.
Open Database Connectivity (ODBC) is a commonly

used library. It operates as a translation layer that
shields the application from some details of the underlying database engine. By changing the ODBC
configuration, an application could be altered to work with a different database engine, without the need
for application changes. Java Database Connectivity (JDBC) is the Java-based equivalent library to ODBC.
OLEDB is a library that does not translate commands. OLEDB originally stood for Object Linking and
Embedding for Databases, but that meaning is no longer very relevant. When an application sends an SQL
command, OLEDB passes it to the database server without modification.
The SQL Server Native Access Component (SNAC) is a software layer that encapsulates commands that
libraries such as OLEDB, ODBC, and JDBC have issued into commands that SQL Server can understand. It
then encapsulates results that SQL Server returns ready for consumption by these libraries. This primarily
involves wrapping the commands and results in the TDS protocol.
Network Libraries
SQL Server exposes endpoints that client applications can connect to. The endpoint is used to pass
commands and data to and from the database engine.
SNAC connects to these endpoints via network libraries such as TCP/IP, or Named Pipes. For client
applications that are executing on the same computer as the SQL Server service, a special “shared
memory” network connection is also available.
SQL Server Software Layers
SQL Server receives commands via endpoints and sends results to clients via endpoints. Clients interact
with the Relational engine, which in turn utilizes the Storage engine to manage the storage of databases.
The SQL Server Operating System (SQLOS) is a software layer that provides a layer of abstraction between
the Relational engine and the available server resources.
SQL Server Management Studio

SQL Server Management Studio is the primary tool
that Microsoft supplies for interacting with SQL
Server services.
It is an integrated environment that has been

created within the Visual Studio platform shell. SQL
Server Management Studio shares many common
features with Visual Studio.
SQL Server Management Studio is used to execute

queries and return results, but it can also help users
to analyze queries. It offers rich editors for a variety
of document types (.sql files, .xml files, and so on).
When users are working with .sql files, SQL Server
Management Studio provides IntelliSense® to assist with writing queries.
All SQL Server relational database management tasks can be performed by using the Transact-SQL
language, but many users prefer graphical administration tools because they are typically easier to use
than the Transact-SQL commands. SQL Server Management Studio provides graphical interfaces for
configuring databases and servers.
SQL Server Management Studio can connect to a variety of SQL Server services including the Database
Engine, Analysis Services, Integration Services, Reporting Services, and SQL Server Compact edition.
SQL Server Data Tools

SQL Server Management Studio is created by using
the Visual Studio environment and will be familiar
to Visual Studio developers. SQL Server Data Tools
brings SQL Server functionality into Visual Studio
itself.
SQL Server Data Tools enables Visual Studio

developers to develop both on-premises and cloud-
based applications by using SQL Server
components. This enables the developers to
develop .NET code and database-specific code,
such as Transact-SQL, in one environment. If they
need to change the database design, there is no
need to leave Visual Studio and open SQL Server Management Studio; it can all be achieved from a single
tool.
Demonstration: Using SQL Server Management Studio

In this demonstration, you will see how to:
 Use SSMS to connect to an on-premises instance of SQL Server 2014.
 Run a T-SQL script.

 Open a SQL Server Management Studio project.
 Connect to servers and databases.
 Register servers.
Demonstration Steps
Use SSMS to connect to an on-premises instance of SQL Server 2014
1. Ensure that the 20464C-MIA-DC and 20464C-MIA-SQL virtual machines are running and then log on
to 20464C-MIA-SQL as AdventureWorks\Student with the password Pa$$w0rd.
2. Run D:\Demofiles\Mod01\Setup.cmd as an administrator to revert any changes.

3. In the virtual machine, on the taskbar, click SQL Server 2014 Management Studio.
4. In the Connect to Server window, ensure that Server type is set to Database Engine.
5. In the Server name text box, type (local).
6. In the Authentication drop-down list, select Windows Authentication, and then click Connect.
Run a T-SQL script
1. If required, on the View menu, click Object Explorer.
2. In Object Explorer, expand Databases, expand AdventureWorks, and then expand Tables. Review
the database objects.
3. Right-click the AdventureWorks database, and then click New Query.
4. Type the query shown in the snippet below.
SELECT * FROM Production.Product ORDER BY ProductID;
5. Note the use of IntelliSense while you are typing this query, and then on the toolbar, click Execute.
Note how the results can be returned.
6. On the File menu, click Save SQLQuery1.sql. Note that this saves the query to a file. In the Save File
As window, click Cancel.
7. On the Results tab, right-click the cell for ProductID 1 (first row and first cell), and then click Save
Results As. In the FileName text box, type Demonstration2AResults and then click Save. Note that
this saves the query results to a file.
8. On the Query menu, click Display Estimated Execution Plan. Note that SQL Server Management
Studio can do more than simply execute queries.
9. On the Tools menu, click Options.
10. In the Options pane, expand Query Results, expand SQL Server, and then click General. Review the
available configuration options and then click Cancel.
11. On the File menu, click Close. In the Microsoft SQL Server Management Studio window, click No.
Open a SQL Server Management Studio project

1. On the File menu, click Open, and then click Project/Solution.
2. In the Open Project window, open the D:\Demofiles\Mod01\Demo01.ssmssln project.
3. On the View menu, click Solution Explorer. Note the contents of Solution Explorer.
4. In Solution Explorer, click Close.

Connect to servers and databases
1. In Object Explorer, from the Connect toolbar icon, note the other SQL Server components to which
connections can be made.
2. On the File menu, click New, and then click Database Engine Query to open a new connection.
3. In the Connect to Database Engine window, in the Server name box, type (local).
5. In the Available Databases drop-down list on the toolbar, click tempdb. Note that this will change
the database against which the query is executed.
6. Right-click in the query window, click Connection, and then click Change Connection. This will
reconnect the query to another instance of SQL Server.
7. In the Connect to Database Engine window, click Cancel.

Register servers
1. On the View menu, click Registered Servers.
2. In the Registered Servers window, expand Database Engine, right-click Local Server Groups, and
then click New Server Group.
3. In the New Server Group Properties window, in the Group name box, type Dev Servers and then
click OK.
4. Right-click Dev Servers, and then click New Server Registration.
5. In the New Server Registration window, click the Server name drop-down list, select (local) and
then click Save.
6. Right-click Dev Servers, and then click New Server Registration.

7. In the New Server Registration window, in the Server name text box, type MIA-SQL\SQL2, and
then click Save.
8. In the Registered Servers window, right-click the Dev Servers group, and then click New Query.
9. Type the query as shown in the snippet below, and then click Execute.
SELECT @@version;
10. Close SQL Server Management Studio.
11. In the Microsoft SQL Server Management Studio window, click No.
Lesson 3
Configuring SQL Server Services
Users can configure each SQL Server service individually. The ability to provide individual configuration for
services assists organizations that aim to minimize the permissions assigned to service accounts as part of
a policy of least-privilege execution. SQL Server Configuration Manager is used to configure services,
including the accounts under which the services operate, and the network libraries that the SQL Server
services use.
SQL Server also ships with various tools. It is important to know what each of these tools is used for.
Lesson Objectives
 Use SQL Server Configuration Manager.
 Use SQL Server services.

 Use network ports and listeners.
 Create server aliases.
 Use other SQL Server tools.
SQL Server Configuration Manager

SQL Server Configuration Manager is used to
configure SQL Server services, to configure the
network libraries that SQL Server services expose,
and to configure how client connections are made
to SQL Server:
 Managing services. Users can control (start

and stop) each service and configure it.
 Managing server protocols. It is possible to
configure the endpoints that the SQL Server
services expose. This includes the protocols and
ports that are used.
 Managing client protocols. When client applications (such as SQL Server Management Studio) are
installed on a server, it is necessary to configure how connections from those tools are made to SQL
Server. Users can use SQL Server Configuration Manager to configure the protocols required and to
create aliases for the servers to simplify connectivity.
SQL Server Services

Users can use SQL Server Configuration Manager to
configure the individual services that SQL Server
provides. Many components that SQL Server
provides are implemented as operating system
services. The components of SQL Server that you
choose during installation determine which of the
SQL Server services are installed.
Changing the Identity of a SQL Server Service
SQL Server services operate within a specific

Windows identity. If users need to alter the assigned
identity for a service, they should use SQL Server
Configuration Manager to make this change. A
common error is to use the Services applet in the server's administrative tools to change the service
identity. Although this applet will change the identity for the service, it will not update the other
permissions and access control lists that are required for the service to operate correctly. When service
identities are modified from within SQL Server Configuration Manager, the required permissions and
access control lists are also modified.
Each service has a start mode. This mode can be set to Automatic, Manual, or Disabled. Services that are
set to the Automatic start mode are automatically started when the operating system starts. Services that
are set to the Manual start mode can be manually started. Services that are set to the Disabled start mode
cannot be started.
Instances
Many SQL Server components are instance-aware and can be installed more than once on a single server.
When SQL Server Configuration Manager lists each service, it shows the associated instance of SQL Server
in parentheses after the name of the service.
Network Ports and Listeners

Users can use SQL Server Configuration Manager to
configure both server and client protocols and
ports.
Network and Protocol Configurations
SQL Server Configuration Manager provides two

sets of network configurations. Each network
endpoint that an instance of SQL Server exposes
can be configured. This includes the determination
of which network libraries are enabled and, for each
library, the configuration of the network library.
Typically, this will involve settings such as protocol
port numbers. You should discuss the required
network protocol configuration of SQL Server with your network administrator.
Many protocols provide multiple levels of configuration. For example, the configuration for the TCP/IP
protocol makes it possible to have different settings on each configured IP address if required, or a
general set of configurations that is applied to all IP addresses.
Client Configurations
Every computer that has SNAC installed needs to be able to configure how that library will access SQL
Server services.
SNAC is installed on the server in addition to being installed on client systems. When SQL Server
Management Studio is installed on the server, it uses the SNAC library to make connections to the SQL
Server services that are on the same system. Users can use the client configuration nodes within SQL
Server Configuration Manager to configure how those connections are made. Note that two sets of client
configurations are provided and that they only apply to the computer where they are configured. One set
is used for 32-bit applications; the other set is used for 64-bit applications. SQL Server Management
Studio is a 32-bit application, even when SQL Server is installed as a 64-bit application.
Creating Server Aliases

Connecting to a SQL Server service can involve
multiple settings such as server address, protocol,
and port. To make this easier for client applications
and to provide a level of available redirection, it is
possible to create aliases for servers.
Aliases
Hard-coding connection details for a specific server,
protocol, and port within an application is not
desirable because these might need to change over
time.
It is possible to create a server alias and associate it

with a server, protocol, and port (if required). Client applications can then connect to the alias without
being concerned about how those connections are made.
Each client system that utilizes SNAC (including the server itself) can have one or more aliases configured.
Aliases for 32-bit applications are configured independently of the aliases for 64-bit applications.
Other SQL Server Tools

SQL Server provides a rich set of tools to make
working with the product easier.
The most commonly used tools are listed in the

following table.
Tool Purpose
SQL Server Profiler Trace activity from client applications to

SQL Server. Supports both the Database
Engine and Analysis Services
Database Engine Tuning Advisor Design indexes and statistics to improve

database performance, based on analysis
of trace workloads
Master Data Services Configuration Configure and manage SQL Server Master
Manager Data Services
Reporting Services Configuration Manager Configure and manage SQL Server

Reporting Services
Data Quality Services Client Configure and manage Data Quality

Services knowledge bases and projects
SQL Server Error and Usage Reporting Configure the level of automated reporting
back to the SQL Server product team
about errors that occur and on usage of
different aspects of the product
PowerShell Provider Enable configuring and querying SQL

Server by using Windows PowerShell
SQL Server Management Objects (SMO) Provide a detailed .NET-based library for
working with management aspects of SQL
Server directly from application code
Demonstration: Using SQL Server Profiler

 Start a SQL Server Profiler trace.
 View a SQL Server Profiler trace.
Demonstration Steps
Start a SQL Server Profiler trace
to 20464C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. If you have not completed the previous demonstration, run D:\Demofiles\Mod01\Setup.cmd as an

administrator to revert any changes.
3. On the taskbar, click SQL Server 2014 Management Studio.
7. On the Tools menu, click SQL Server Profiler.
11. In the Trace Properties window, in Trace name, type Demonstration.
12. Click Run. Note that this will start a new trace with the default options.
View a SQL Server Profiler trace
1. Switch to SQL Server Management Studio, and then click New Query.
2. In the query window, type the query as shown below, and then click Execute.
USE AdventureWorks;
GO
SELECT * FROM Person.Person
ORDER BY FirstName;
GO
3. Switch to SQL Server Profiler. Note the statement trace occurring in SQL Server Profiler.
4. On the File menu, click Stop Trace.

5. In the Results grid, click individual statements to see the detail shown in the lower pane.
6. Close SQL Server Management Studio and SQL Server Profiler without saving any changes.
Lab: Introduction to Database Development

Scenario
You have just moved to the database development team and need to investigate the tools that are
available to help you perform your role.
Objectives
After completing this lab, you will have:
 Used SQL Server Management Studio to connect to SQL Server.
 Used Visual Studio to connect to SQL Server.
Estimated Time: 30 minutes
Virtual machine: 20464C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa$$w0rd
Exercise 1: Start SQL Server Management Studio

Scenario
Most of your coworkers use SQL Server Management Studio for database development, so you have
decided to use SQL Server Management Studio to connect to a database.
The main tasks for this exercise are as follows:
1. Prepare the Lab Environment

2. Open SQL Server Management Studio
 Task 1: Prepare the Lab Environment

1. Ensure that the 20464C-MIA-DC and 20464C-MIA-SQL virtual machines are both running, and then
log on to 20464C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. On the taskbar, click File Explorer.
3. In File Explorer, navigate to the D:\Labfiles\Lab01\Starter folder, right-click the Setup.cmd file, and
then click Run as administrator.
4. In the User Account Control dialog box, click Yes, and then wait for the script to finish.
 Task 2: Open SQL Server Management Studio

1. Start SQL Server Management Studio.
2. Connect to the MIA-SQL server.

3. Connect to the AdventureWorks database.
4. View the tables in the AdventureWorks database.
Results: After completing this exercise, you will have:
Prepared the lab environment.
Connected to a database by using SQL Server Management Studio.

Exercise 2: Configure SQL Server

Scenario
You need to ensure that SQL Server is configured correctly before any database development work can
start.
1. Check That the Database Engine and Reporting Services Have Been Installed
2. Ensure That All Required Services Including SQL Server Agent Are Started and Set To Autostart for Both
Instances
3. Configure the TCP Port for the SQL3 Database Engine Instance to 51550
 Task 1: Check That the Database Engine and Reporting Services Have Been Installed
1. Open SQL Server Configuration Manager.
2. Check the installed list of services for the MSSQLSERVER instance and ensure that the database
engine and Reporting Services have been installed for the default instance.
 Task 2: Ensure That All Required Services Including SQL Server Agent Are Started and
Set To Autostart for Both Instances
1. Ensure that all of the services for the default instance are set to autostart. (Ignore the SQL Full-text
Filter Daemon Launcher service at this time.)
 Task 3: Configure the TCP Port for the SQL3 Database Engine Instance to 51550
1. Using the property page for the TCP/IP server protocol, configure the use of the fixed port 51550.
(Make sure that you clear the dynamic port.)
2. Restart the SQL3 database engine instance.
3. Ensure that the SQL3 database engine instance has been restarted successfully.
Results: After completing this lab, you will have:
Checked that the necessary database services have been installed.

Check that the necessary services are set to auto-start.
Configured TCP port for the database engine.
Question: How can you configure SQL Server to use a different IP port?
Module Review and Takeaways

Review Question(s)
Question: Why is it necessary to back up the Reporting Services encryption key?
Question: What is the difference between a version of SQL Server and an edition of SQL
Server?
Question: What is the purpose of SQL Server Data Tools?

2-1
Module 2
Designing and Implementing Tables
Contents:
Module Overview 2-1
Lesson 1: Using Data Types 2-2
Lesson 2: Working with Character Data 2-10
Lesson 3: Designing Tables 2-15
Lesson 4: Working with Schemas 2-21
Lesson 5: Creating and Altering Tables 2-24
Lab: Designing and Implementing Tables 2-29

Module Overview
In relational database management systems (RDBMSs), user and system data is stored in tables. Each table
consists of a set of rows that describe entities and a set of columns that hold the attributes of an entity.
For example, a Customer table would have columns such as CustomerName and CreditLimit and a row
for each customer. In Microsoft® SQL Server® data management software, tables are contained within
schemas that are very similar in concept to folders that contain files in the operating system. Designing
tables is often one of the most important roles that a database developer undertakes because incorrect
table design leads to the inability to query the data efficiently. After an appropriate design has been
created, it is then important to know how to correctly implement the design.
Objectives
After completing this module, you will be able to:
 Use data types.
 Use character data.
 Design tables.
 Work with schemas.
 Create and alter tables.

2-2 Designing and Implementing Tables
Lesson 1
Using Data Types
The most basic types of data that get stored in database systems are numbers, dates, and strings. There is
a range of data types that can be used for each of these. In this lesson, you will see the Microsoft-supplied
data types that you can use for numeric and date-related data. You will also see what NULL means and
how to work with it. In the next lesson, you will see how to work with string data types.
Lesson Objectives
 Understand the role of data types.
 Use exact numeric data types.
 Use approximate numeric data types.
 Use date and time data types.

 Work with unique identifiers.
 Decide on the appropriate nullability of data.
Introducing Data Types

Data types determine what can be stored in
locations within SQL Server, such as columns,
variables, and parameters. For example, a tinyint
column can only store whole numbers from 0 to
255. Data types also determine the types of values
that can be returned from expressions.
Constraining Values
Data types are a form of constraint that is placed on
the values that can be stored in a location. For
example, if you choose a numeric data type, you
will not be able to store text in the location.
In addition to constraining the types of values that can be stored, data types also constrain the range of
values that can be stored. For example, if you choose a smallint data type, you can only store values
between –32,768 and +32,767.
Query Optimization
When SQL Server identifies that the value in a column is an integer, it may be able to generate an entirely
different and more efficient query plan to one where it identifies that the location is holding text values.
The data type also determines which sorts of operations are permitted on that data and how those
operations work.
Self-Documenting Nature
Choosing an appropriate data type provides a level of self-documentation. If all values were stored in a
string value (which could potentially represent any type of value) or XML data types, you would probably
need to store documentation about what sort of values can be stored in the string locations.
Data Types
There are three basic sets of data types:
System data types. SQL Server provides a large number of built-in (or intrinsic) data types. Examples of
these include integer, varchar, and date.
Alias data types. Users can also define data types that provide alternate names for the system data types
and potentially further constrain them. These are known as alias data types. For example, you could use
an alias data type to define the name PhoneNumber as being equivalent to nvarchar(16). Alias data
types can help to provide consistency of data type usage across applications and databases.
User-defined data types. By using managed code via SQL Server integration with the common language
runtime (CLR), you can create entirely new data types. There are two categories of these CLR types. One
category is system CLR data types, such as the geometry and geography spatial data types. The other is
user-defined CLR data types, which enable users to create their own data types.
Question: Why would it be faster to compare two integer variables that are holding the
values 3,240 and 19,704 than two varchar(10) variables that are holding the values "3240"
and "19704"?
Exact Numeric Data Types

Numeric data types can be exact or approximate.
Exact data types are the most common data type
that is used in business applications.
Integer Data Types

SQL Server offers a choice of integer data types that
are used for storing whole numbers, based upon
the size of the storage location for each:
 tinyint is stored in a single byte (that is, 8 bits)

and can be used to store the values 0 to 255.
Note that, unlike the other integer data types,
tinyint cannot store any negative values.
 smallint is stored in 2 bytes (that is, 16 bits) and stores values from –32,768 to 32,767.
 int is stored in 4 bytes (that is, 32 bits) and stores values from –2,147,483,648 to 2,147,483,647. It is a
very commonly used data type. SQL Server uses the full word “integer” as a synonym for “int.”
 bigint is stored in 8 bytes (that is, 64 bits) and stores very large integer values. Although it is easy to
refer to a 64-bit value, it is hard to comprehend how large these values are. If you placed a value of
zero in a 64-bit integer location and executed a loop to simply add one to the value, on most
common servers currently available, you would not reach the maximum value for many months.
Exact Fractional Data Types

SQL Server provides a range of data types for storing exact numeric values that include decimal places:
 decimal is an ANSI-compatible data type that enables you to specify the number of digits of
precision and the number of decimal places (referred to as the scale). A decimal(12,5) location can
store up to 12 digits with up to five digits after the decimal point. decimal is the data type that you
should use for monetary or currency values in most systems and any exact fractional values such as
sales quantities (where part quantities can be sold) or weights.
 numeric is a data type that is functionally equivalent to decimal.

 money and smallmoney are data types that are specific to SQL Server and have been present since
the early days of the platform. They were used to store currency values with a fixed precision of four
decimal places.
Note: Four is often the wrong number of decimal places for many monetary applications,
and the money and smallmoney data types are not standard data types. In general, use decimal
for monetary values.
bit Data Type

bit is a data type that is stored in a single bit. The storage of the bit data type is optimized. If there are
eight or fewer bit columns in a table, they are stored in a single byte. bit values are commonly used to
store the equivalent of Boolean values in higher-level languages.
Note that there is no literal string format for bit values in SQL Server. The string values TRUE and FALSE
can be converted to bit values, as can the integer values 1 and 0. TRUE is converted to 1 and FALSE is
converted to 0.
Higher-level programming languages differ about how they store true values in Boolean columns. Some
languages store true values as 1; others store true values as –1. In two's complement notation (which is
the encoding used to store smallint, int, and bigint), a 1-bit value would range from –1 to 0.
To avoid any chance of mismatch, in general, when working with bits in applications, test for false values
by using the following code.
IF (@InputValue = 0)
Test for positive values by using the following code.
IF (@InputValue <> 0)
This is preferable to testing for a value being equal to 1 because it will provide more reliable code.
bit, along with other data types, is also nullable, which can be a surprise to new users. That means that a
bit location can be in three states: NULL, 0, or 1. (Nullability is discussed in more detail later in this
module.)
Question: What would be a suitable data type for storing the value of a check box that can
be 0 for cleared, 1 for selected, or –1 for disabled?
Approximate Numeric Data Types

SQL Server provides two approximate numeric data
types. They are used more commonly in scientific
applications than in business applications. A very
common design error that new developers make is
to use the float or real data types for storing
business values such as monetary values.
Approximate Numeric Values

The real data type is a 4-byte (that is, 32-bit)
numeric value that is encoded by using ISO
standard floating-point encoding.
The float data type is a data type that is specific to
SQL Server and occupies either 4 or 8 bytes, enabling the storage of approximate values with a defined
scale. The scale values permitted are from 1 to 53 and the default scale is 53. Even though a range of
values is provided for in the syntax, the current SQL Server implementation of the float data type is that if
the scale value is from 1 to 24, the scale is implemented as 24. For any larger value, a scale of 53 is used.
Common Errors
A very common error for new developers is to use approximate numeric data types to store values that
need to be stored exactly. This causes rounding and processing errors. A “code smell” for identifying
programs that new developers have written is a column of numbers that do not exactly add up to the
displayed totals. It is common for small rounding errors to creep into calculations, for example, a total that
is incorrect by 1 cent in dollar-based or euro-based currencies.
The inappropriate use of numeric data types can cause processing errors. Look at the following code and
decide how many times the PRINT statement would be executed.
DECLARE @Counter float;

SET @Counter = 0;
WHILE (@Counter <> 1.0) BEGIN
SET @Counter += 0.1;
PRINT @Counter;
END;
It might surprise you to learn that this query would never stop running and would need to be cancelled.
After cancelling the query, if you looked at the output, you would see the following code.
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
…
What has happened? The problem is that the value 0.1 cannot be stored exactly in a float or real data
type, so the termination value of the loop is never hit exactly. If a decimal value had been used instead,
the loop would have executed as expected.
Consider how you would write the answer to 1÷3 in decimal form. The answer isn't 0.3, it is 0.3333333
recurring. There is no way in decimal form to write 1÷3 as an exact decimal fraction. You have to
eventually settle for an approximate value.
The same problem occurs in binary fractions; it just occurs at different values. 0.1 ends up being stored as
the equivalent of 0.099999 recurring. 0.1 in decimal form is a nonterminating fraction in binary. Therefore,
when you put the system in a loop adding 0.1 each time, the value never exactly equals 1.0, which can be
stored precisely.
Date and Time Data Types

SQL Server supports a rich set of data types for
working with values that are related to dates and
times. It is important to be very careful when
working with string literal representations of these
values and their precision (or accuracy). SQL Server
also provides a large number of functions for
working with dates and times.
date and time Data Types

The date data type complies with the ANSI
Structured Query Language (SQL) standard
definition for the Gregorian calendar. The default
string format is YYYY-MM-DD. This format is the
same as the ISO 8601 definition for DATE. date has a range of values from 0001-01-01 to 9999-12-31
with an accuracy of one day.
The time data type is aligned to the SQL standard form of hh:mm:ss with optional decimal places up to
hh:mm:ss.nnnnnnn. Note that when you are defining the data type, you need to specify the number of
decimal places, such as time(4), if you do not want to use the default value of seven decimal places, or if
you want to save some storage space. The format that SQL Server uses is similar to the ISO 8601 definition
for TIME.
The ISO 8601 standard makes it possible to use 24:00:00 to represent midnight and to have a leap second
over 59. These are not supported in the SQL Server implementation.
The datetime2 data type is a combination of a date data type and a time data type.
datetime Data Type

The datetime data type is an older data type that has a smaller range of allowed dates and a lower
precision or accuracy. It is a very commonly used data type, particularly in older Transact-SQL code. A
common error is not allowing for the 3 milliseconds accuracy of the data type. For example, using the
datetime data type, executing the following code would actually cause the value '20110101 00:00:00.000'
to be stored.
DECLARE @When datetime;

SET @When = '20101231 23:59:59.999';
Another problem with the datetime data type is that the way it converts strings to dates is based on
language format settings. A value in the form “YYYYMMDD” will always be converted to the correct date,
but a value in the form “YYYY-MM-DD” might end up being interpreted as “YYYY-DD-MM,” depending
on the settings for the session.
It is important to understand that this behavior does not happen with the new date data type, so a string
that was in the form “YYYY-MM-DD” could be interpreted as two different dates by the date (and
datetime2) data type and the datetime data type. You should specifically check any of the formats that
you intend to use, or always use formats that cannot be misinterpreted. Another option that was
introduced in SQL Server 2012 can help. A series of functions that enable date and time values to be
created from component parts was introduced. For example, there is now a DATEFROMPARTS function
that enables you to create a date value from a year, a month, and a day.
Time Zones
The datetimeoffset data type is a combination of a datetime2 data type and a time zone offset. Note
that the data type is not aware of the time zone; it can simply store and retrieve time zone values.
Note that the time zone offset values extend for more than a full day (a range of –14:00 to +14:00). A
range of system functions has been provided for working with time zone values, and for all of the data
types related to dates and times.
Question: Why is the specification of a date range from the year 0000 to the year 9999
based on the Gregorian calendar not entirely meaningful?
Unique Identifiers
Globally unique identifiers (GUIDs) have become
common in application development. They are used
to provide a mechanism where any process can
generate a number and know that it will not clash
with a number that any other process has
generated.
GUIDs
Numbering systems have traditionally depended on
a central source for the next value in a sequence to
make sure that no two processes use the same
value. GUIDs were introduced to avoid the need for
anyone to function as the “number allocator.” Any
process (on any system) can generate a value and know that it will not clash with a value generated by
any process across time and space and on any system to an extremely high degree of probability.
This is achieved by using extremely large values. When discussing the bigint data type earlier, you learned
that the 64-bit bigint values were really large. GUIDs are 128-bit values. The magnitude of a 128-bit value
is well beyond our capabilities of comprehension.
uniqueidentifier Data Type

The uniqueidentifier data type in SQL Server is typically used to store globally unique identifiers.
Standard arithmetic operators such as =, <> (or !=), <, >, <=, and >= are supported along with NULL and
NOT NULL checks.
The IDENTITY property is used to automatically assign values to columns. (IDENTITY is discussed in
Module 3.) The IDENTITY property is not used with uniqueidentifier columns. New values are not
calculated by code in your process. They are calculated by calling system functions that generate a value
for you. In SQL Server, this function is the NEWID() function.
The random nature of GUIDs has also caused significant problems in current storage subsystems. SQL
Server 2005 introduced the NEWSEQUENTIALID() function to try to circumvent the randomness of the
values that the NEWID() function generated. However, the function does so at the expense of some
guarantee of uniqueness.
The usefulness of the NEWSEQUENTIALID() function is also quite limited because the main reason for
using GUIDs is to enable other layers of code to generate the values and know that they can just insert
them into a database without clashes. If you need to request a value from the database via
NEWSEQUENTIALID(), it usually would have been better to use an IDENTITY column instead.
A very common development error is to store GUIDs in string values rather than in uniqueidentifier
columns.
Note: Replication systems also commonly use uniqueidentifier columns. Replication is an

advanced topic that is beyond the scope of this course.
Question: The slide mentions that a common error is to store GUIDs as strings. What would
be wrong with this?
NULL and NOT NULL Columns

Nullability determines whether a value must be
present. Assigning inappropriate nullability of
columns is another very common design error.
NULL
NULL is a state of a column in a particular row,
rather than a type of value that is stored in a
column. You do not say that a value equals NULL;
you say that a value is NULL. This is why, in
Transact-SQL, you do not check whether a value is
NULL with the equality operator. For example, you
would not write the following code.
WHERE Color = NULL;
Instead, you would write the following code.
WHERE Color IS NULL;
Common Errors
New developers often confuse NULL values with zero, blank (or space), zero-length strings, and so on. The
misunderstanding is exacerbated by other database engines that treat NULL and zero-length strings or
zeroes as identical. NULL indicates the absence of a value.
Careful consideration must be given to the nullability of a column. In addition to specifying a data type
for a column, you specify whether a value needs to be present. (Often, this is referred to as whether a
column is mandatory.)
Look at the NULL and NOT NULL declarations on the slide and decide why each decision might have been
made.
Question: When should a value be nullable?
Demonstration: Working with Numeric Data Types

 Work with NULL and insert GUIDs into a table
Demonstration Steps
Work with NULL and insert GUIDs into a table
4. In the Connect to Server window, in Server name, type MIA-SQL and then click Connect.
5. On the File menu, click Open, click Project/Solution, navigate to

D:\Demofiles\Mod02\Demo02.ssmssln, and then click Open.
6. If Solution Explorer is not visible, click the View menu and click Solution Explorer.
7. Expand the Queries folder and then double-click 11 - Demonstration 1A.sql.
8. Follow the instructions contained within the comments of the script file.
Lesson 2
Working with Character Data
In the last lesson, you saw that the most basic types of data that get stored in database systems today are
numbers, dates, and strings. There are a choice of data types that can be used for each of these. You also
looked at the available range of data types that can be used for numeric and date-related data. In this
lesson, you will now look at the other very common category of data: the string-related data types.
Another common class of design and implementation errors relates to collations. Collations define how
string data is sorted. In this lesson, you will also see how collations are defined and used.
Lesson Objectives
 Explain the role of Unicode encoding
 Use character data types

 Work with collations
 Implement UTF-16 SC collations
Unicode
Traditionally, most computer systems stored one
character per byte. This only allowed for 256
different character values, which is not enough to
store characters from many languages.
Multi-byte Character Issues

Consider Asian languages such as Chinese or
Japanese that need to store thousands of
characters. You may not have ever considered it but
how would you type these characters on a
keyboard? There are two basic ways that this is
accomplished. One option is to have an English-like
version of the language that can be used for entry.
Japanese does in fact have a language form called Romaji that uses English-like characters for
representing words. Chinese has a form called pinyin that is also somewhat English-like.
They can then enter the number beside the character to select the intended word. It might not seem
important to an English-speaking person but given that the first option means “horse”, the second option
is like a question mark, and the third option means “mother”, there is definitely a need to select the
correct option!
Character Groups
An alternate way to enter the characters is via radical groupings. Please note the third character in the
screenshot above. The left-hand part of that character, 女, means “woman”. Rather than entering English-
like characters (that could be quite unfamiliar to the writers), select a group of characters based on what is
known as a radical.
Please note that the character representing “mother” is the first character on the second line. For this sort
of keyboard entry to work, the characters must be in appropriate groups, not just stored as one large sea
of characters. An additional complexity is that the radicals themselves are also in groups. You can see in
the screenshot that the woman radical was part of the third group of radicals.
Unicode
In the 1980s, work was done by a variety of researchers, to determine how many bytes are required to be
able to hold all characters from all languages but also store them in their correct groupings. The answer
from all researchers was three bytes. You can imagine that three was not an ideal number for computing
and at the time users were mostly working with 2 byte (that is, 16 bit) computer systems.
Unicode introduced a two-byte character set that attempts to fit the values from the three bytes into two
bytes. Inevitably then, trade-offs had to occur.
Unicode allows any combination of characters that are drawn from any combination of languages to exist
in a single document. There are multiple encodings for Unicode with UTF-7, UTF-8, UTF-16, and UTF-32.
(UTF is universal text format). SQL Server currently implements double-byte UTF-16 characters for its
Unicode implementation.
For string literal values, an N prefix on a string allows the entry of double-byte characters into the string
rather than just single-byte characters. (N stands for “National” in “National Character Set”).
When working with character strings, the LEN function returns the number of characters (Unicode or not)
whereas DATALENGTH returns the number of bytes.
Character Data Types

SQL Server provides a range of string data types for
storing characters. They differ by length and by
character encoding.
char and nchar Data Types

The char and nchar data types are data types that
allow you to specify the number of characters that
will be stored. It is important to realize that if you
specify char(50) then 50 characters will be stored
and retrieved. char is for single-byte character sets
and nchar is designed for double-byte Unicode
characters. When retrieving values from char and
nchar data, it is common to need to trim the
trailing space characters.
Look at the following code:
Trailing Spaces
DECLARE @String1 char(10);
DECLARE @String2 char(10);
SET @String1 = 'Hello';
SET @String2 = 'There';
SELECT @String1 + @String2;
When executed, it returns:
"Hello There "

Note the trailing spaces. The char and nchar data types are not very useful for data that varies in length
but are ideal for short strings that are always the same length, for example, state codes in the U.S.A.
varchar and nvarchar Data Types

The varchar and nvarchar data types are the “varying” equivalents of the char and nchar data types.
They are used for strings where a maximum length is specified but where the length varies. Rather than
allocating a location of a fixed size and allocating the whole location regardless of the length of the string,
these data types incur the overhead of storing the length of the string separately to the string itself. This is
of great benefit when the length of the strings being stored varies and it also avoids the need to trim the
right-hand-side of the string in most applications.
The varchar and nvarchar data types are limited to 8000 and 4000 characters, respectively. This is
roughly what fits in a data page in a SQL Server database.
char is restricted to a particular code page, so it is likely that applications will not be able to store input
values that do not fit in that code page. This could be as simple as an accent in the user's name. These
problems also occur when exporting data. For example, you might send data to a vendor to produce a
report, a code page mismatch occurs, and the output appears as square boxes or question marks. Nchar
and nvarchar support the main Unicode character pane and avoid the problems with encoding
conversions. This is particularly important for web apps, which may have browsers set to any number of
code pages.
varchar(max) and nvarchar(max) Data Types

It has become common to store even longer string values. The varchar(max) and nvarchar(max) data
types are used for this. They each allow up to around 2GB of data to be stored.
text and ntext Data Types

The text and ntext data types are older data types that are now deprecated and should not be used for
new work. The varchar(max) and nvarchar(max) data types should be used instead.
sysname Data Type

You will often see object names in SQL Server referred to as being of sysname data type. sysname is an
alias data type that is currently mapped to nvarchar(128).
Question: Why would you use the sysname data type rather than the nvarchar(128) data
type?
Understanding Collations
Collations in SQL Server are used to control the
code page that is used to store non-Unicode data
and the rules that govern how SQL Server sorts and
compares character values.
Code Pages
It was mentioned earlier that computer systems
traditionally stored one byte per character. This
allowed for 256 possible values, with a range from 0
to 255. The values from 0 to 31 were reserved for
“control characters” such as backspace (character 8)
and tab (character 9). Character 32 was allocated
for a space and so on, up to the Delete character
which was assigned the value 127.
For values above 127 though, standards were initially not very clear. It was common to store characters
such as line drawing characters or European characters with accents or umlauts in these codes.
In fact, a number of computer systems only used 7 bits to store characters instead of 8 bits. (As an
example, the DEC10 system from Digital Equipment Corporation stored 5 characters of 7 bits each per 36-
bit computer “word”. It used the final bit as a parity check bit).
Problems did arise when different vendors used the upper characters for different purposes. In the 1970s,
it was not uncommon to type a character on your screen and see a different character when that
document was printed, as the screen and the printer were using different characters in the values above
127.
A number of standard character sets that described what should be in the upper code values did appear.
The MS-DOS operating system categorized these as “code pages”. What a code page really defines is
which characters are used for the values from 128 to 255.
Both the operating systems and SQL Server support a range of code pages. A default code page is chosen
while installing SQL Server.
Sorting and Comparing

Another issue that arises with character sets deals with how string values are sorted or compared. For
example, is the value “mcdonald” equal to the value “McDonald”? Does the letter “á” (that is, with an
accent) equal the letter “a” (without an accent)? If they are not equal, which is greater or less than the
other when you sort them?
SQL Server Collations

SQL Server provides a concept of “collations” for dealing with these issues. There are two types of
collations: SQL Server collations and Windows collations. SQL Server collations are retained for backward
compatibility but you are encouraged to make use of Windows collations instead.
SQL Server collations have names that are in the form:
SQL Server Collations

SQL_SortRules[_Pref]_CPCodePage_ComparisonStyle
The elements of this are:
SQL The actual string “SQL”
SortRules A string identifying the alphabet or language that are applied when dictionary
sorting is specified.
Pref An optional string that indicates an uppercase preference.
CodePage One to four digits that define the code page used by the collation. For curious
historic reasons, CP1 specifies code page 1252 but for all others the number
indicates the code page, for example, CP850 specifies code page 850.
ComparisonStyle Either BIN for binary or a combination of case and accent sensitivity. CI is case-
insensitive, CS is case-sensitive. AI is accent-insensitive, AS is accent-sensitive.
As an example, the collation SQL_Latin1_General_Pref_CP850_CI_AS indicates that it is a SQL collation,

Latin1_General is the alphabet being used, there is a preference for upper-case, the code page is 850, and
sorting is performed case-insensitive and accent-sensitive. Windows collations have similar naming but
with less fields. For example, Windows collation Latin1_General_CI_AS refers to Latin1_General as the
alphabet being used, case-insensitive and accent-sensitive.
Collation Issues
The main issues with collations occur when you try to compare values that are stored with different
collations. It is possible to set default collations for servers, databases, and even columns.
When comparing values from different collations, you need to then specify which collation (which could
be yet another collation) will be used for the comparison.
Another use of this is as shown in the example in the slide. In this case, you are forcing the query to
perform a case-sensitive comparison between the string '%ball%' and the value in the column. If the
column contained 'Ball', it would not then match.
Question: What are the code page and sensitivity values for the collation
SQL_Scandinavian_Cp850_CI_AS?
Implementing UTF-16 SC Collations

The Unicode consortium defines an extended set of
characters that are supported as part of their UTF-
16 standard. These additional characters (known as
supplementary characters) have codepoint values
larger than those supported by the two byte
character range of 0x0000 to 0xFFFF.
SC Collations
SQL Server Denali introduced support for collations
with supplementary characters. Current Microsoft
Windows® operating systems already support
these SC collations. The supplementary characters
are stored in four bytes per character. The two
consecutive 16 bit words that are used to store these characters are known as surrogate pairs.
Unicode UTF-16 characters are defined in 16 planes. Planes are ranges of allowed values. The planes of
particular interest are denoted in the standard as follows:
 0x0000 to 0xFFFF is the main multilingual plane
 0x10000 to 0x1FFFF is the supplementary multilingual plane
 0x20000 to 0x2FFFF is the supplementary ideographic plane
 0xE0000 to 0xEFFFF is the supplementary special purpose plane
 0xF0000 to 0x10FFFF is the private use plane
The supplementary multilingual plane mostly includes further Asian language elements and the other
planes include less common (but still useful) characters such as musical notes. SQL Server collations that
have an SC suffix (such as Japanese_Bushu_Kakusu_100_CI_AS_SC) permit the use of supplementary
characters.
Lesson 3
Designing Tables
The most important aspect of designing tables involves determining what data each column will hold. All
organizational data is held within database tables, so it is critical to store the data with an appropriate
structure.
The best practices for table and column design are often represented by a set of rules that are known as
“normalization” rules. In this lesson, you will learn the most important aspects of normalized table design
along with the appropriate use of primary and foreign keys. In addition, you will learn to work with the
system tables that are supplied when SQL Server is installed.
Lesson Objectives
 Describe what a table is.

 Normalize data.
 Describe common normal forms.
 Explain the role of primary keys.

 Explain the role of foreign keys.
 Work with system tables.
What Is a Table?
Relational databases store data about entities in
tables that are defined by columns and rows. Rows
represent entities and columns define the attributes
of the entities. The rows of a table have no
predefined order and can be used as a security
boundary.
Tables
Relational database management systems are not

the only type of database system available, but they
are the most commonly deployed type of database
management system at present. In the terminology
of formal relational database management systems,
tables are referred to as “relations.”
Tables store data about entities such as customers, suppliers, orders, products, and sales. Each row of a
table represents the details of a single entity, such as a single customer, supplier, order, product, or sale.
Columns define the information that is being held about each entity. For example, a Product table might
have columns such as ProductID, Size, Name, and UnitWeight. Each of these columns is defined by
using a specific data type. For example, the UnitWeight column of a product might be allocated a
decimal(18,3) data type.
Naming Conventions
Strong disagreement exists in the industry over naming conventions for tables. The use of prefixes (such
as tblCustomer or tblProduct) is widely discouraged. Prefixes were widely used in higher-level
programming languages before the advent of strong typing (that is, the use of strict data types rather
than generic data types), but are now rare. The main reason for this is that names should represent the
entities, not how they are stored. For example, during a maintenance operation, it might become
necessary to replace a table with a view or vice versa. This could lead to views named tblProduct or
tblCustomer when trying to avoid breaking existing code.
Another area of strong disagreement relates to whether table names should be singular or plural. For
example, should a table that holds the details of a customer be called Customer or Customers?
Proponents of plural naming argue that the table holds the details of many customers, whereas
proponents of singular naming argue that it is common to expose these tables via object models in
higher-level languages and that the use of plural names complicates this process. Although we might
have a Customers table, in a high-level language, we are likely to have a Customer object. SQL Server
system tables (and views) have plural names.
The argument is not likely to be resolved either way and is not a problem that is specific to the SQL
language. For example, an array of customers in a higher-level language could sensibly be called
“Customers,” yet referring to a single customer via “Customers[49]” seems awkward. The most important
aspect of naming conventions is that you should adopt a naming convention that you can work with and
apply it consistently.
Security
It is possible to use tables as security boundaries because users can be assigned permissions at the table
level. However, note that SQL Server supports the assignment of permissions at the column level in
addition to at the table level. Row-level security is not available for tables, but can be implemented via a
combination of views, stored procedures, and/or triggers.
Row Order
Tables are containers for rows, but they do not define any order for the rows that they contain. When
users select rows from a table, they should only specify the order that the rows should be returned in if
the output order matters. SQL Server may have to expend additional sorting effort to return rows in a
given order and it is important that this effort is only expended when necessary.
Normalizing Data
Normalization is a systematic process that is used to
improve the design of databases.
Normalization
Edgar F. Codd (1923–2003) was an English

computer scientist who is widely regarded as having
invented the relational model. This model
underpins the development of relational database
management systems. Codd introduced the
concept of normalization and helped the concept
evolve over many years, through a series of “normal
forms.”
Codd introduced first normal form in 1970, followed by second normal form, and then third normal form
in 1971. Since that time, higher forms of normalization have been introduced by theorists, but most
database designs today are considered to be “normalized” if they are in third normal form.
Intentional Denormalization
Not all databases should be normalized. It is common to intentionally denormalize databases for
performance reasons or for ease of end-user analysis.
For example, dimensional models that are widely used in data warehouses (such as the data warehouses
that are commonly used with SQL Server Analysis Services) are intentionally designed not to be
normalized.
Tables might also be denormalized to avoid the need for time-consuming calculations or to minimize
physical database design constraints such as locking.
Common Normal Forms

In general, normalizing a database design leads to
an improved design. It is possible to avoid most
common table design errors in database systems by
applying normalization rules.
Normalization
Normalization is used to:
 Free the database of modification anomalies.

 Minimize redesign when the structure of the
database needs to be changed.
 Ensure that the data model is intuitive to users.

 Avoid any bias toward particular forms of querying.
Although there is disagreement on the interpretation of these rules, general agreement exists on most
common symptoms of violating the rules.
First Normal Form
Eliminate repeating groups in individual tables. Create a separate table for each set of related data.
Identify each set of related data by using a primary key.
No repeating groups should exist. For example, a Product table should not include columns such as
Supplier1, Supplier2, and Supplier3. Column values should not include repeating groups. For example, a
column should not contain a comma-separated list of suppliers.
Duplicate rows should not exist in tables. You can use unique keys to avoid having duplicate rows. A
candidate key is a column or set of columns that you can use to uniquely identify a row in a table. An
alternate interpretation of first normal form rules would disallow the use of nullable columns.
Second Normal Form
Create separate tables for sets of values that apply to multiple records. Relate these tables by using a
foreign key.
A common error with second normal form would be to hold the details of products that a supplier
provides in the same table as the details of the supplier's credit history. These values should be stored
separately.
Third Normal Form
Eliminate fields that do not depend on the key.
Imagine a Sales table that had OrderNumber, ProductID, ProductName, SalesAmount, and SalesDate
columns. This table would not be in third normal form. A candidate key for the table might be the
OrderNumber column. The ProductName column only depends on the ProductID column, and not on
the candidate key. The Sales table should be separated from a Product table and likely linked to it by
ProductID.
Formal database terminology is precise, but can be hard to follow when it is first encountered. In the next
demonstration, you will see examples of common normalization errors.
Primary Keys
A primary key is a form of constraint that is applied
to a table. A candidate key is used to identify a
column or set of columns that can be used to
uniquely identify a row. A primary key is chosen
from any potential candidate keys.
Primary Key
A primary key must be unique and cannot be NULL.
Primary keys are a form of constraint. (Constraints
are discussed later in this course.)
Consider a table that holds an EmployeeID column
and a NationalIDNumber column, along with the
employee's name and personal details. The EmployeeID and NationalIDNumber columns are both likely
to be possible candidate keys. In this case, the EmployeeID column might be the primary key, but either
candidate key could be used. You will see later that some data types will lead to better performing
systems when they are used as primary keys, but logically any candidate key could be nominated to be
the primary key.
It may be necessary to combine multiple columns into a key before the key can be used to uniquely
identify a row. In formal database terminology, no candidate key is more important than any other
candidate key. However, when tables are correctly normalized, they will usually have only a single
candidate key that could be used as a primary key. However, this is not always the case. Ideally, keys that
are used as primary keys should not change over time.
Natural vs. Surrogate Keys
A surrogate key is another form of key that is used as a unique identifier within a table, but it is not
derived from “real” data. Natural keys are formed from data within the table.
For example, a Customer table may have a CustomerID or CustomerCode column that contains numeric,
GUID, or alphanumeric codes. The surrogate key would not be related to the other attributes of a
customer.
The use of surrogate keys is another topic that can lead to strong debate between database professionals.
Foreign Keys
A foreign key is used to establish references or
relationships between tables.
It is a requirement to hold the details of the primary

key (or another unique key) from one table as a
column in another table. For example, a
CustomerOrders table might include a CustomerID
column. A foreign key reference is used to ensure
that any CustomerID value that is entered in the
CustomerOrders table does in fact exist in the
Customers table.
In SQL Server, the reference is only checked if the

column that holds the foreign key value is not
NULL.
Self-Referencing Tables
A table can hold a foreign key reference to itself. For example, an Employees table might contain a
ManagerID column. An employee's manager is also an employee. A foreign key reference can be made
from the ManagerID column of the Employees table to the EmployeeID column in the same table.
Reference Checking
It is not possible to update or delete referenced keys unless options that cascade the changes to related
tables are used. For example, you cannot change the ID for a customer when there are orders in a
CustomerOrders table that reference that customer's ID.
Tables might also include multiple foreign key references. For example, an Orders table might have
foreign keys that refer to a Customers table and a Products table.
Terminology
Foreign keys are referred to as being used to “enforce referential integrity.” Foreign keys are a form of
constraint and will be covered in more detail in a later module.
The ANSI SQL 2003 definition refers to self-referencing tables as having “recursive foreign keys.”
Working with System Tables

System tables are the tables that the SQL Server
database engine provides. They should not be
directly modified. Querying the system tables
directly should also be avoided if possible.
System Tables in Earlier Versions
If you have worked with SQL Server 2000 and

earlier versions, you might be expecting databases
to contain a large number of system tables.
Users often modified these system tables

(sometimes by accident) and this caused issues
when applying service packs and updates. Worse, it
could have led to unexpected behavior or failures if the data was not changed correctly. Users also often
took dependencies on the format of these system tables. That made it difficult for new versions of SQL
Server to have improved designs for these tables while avoiding the chance of breaking existing
applications. As an example, when it was necessary to expand the syslogins table, a new sysxlogins table
was added instead of changing the existing table.
In SQL Server 2005, these tables were hidden and replaced by a set of system views that show the
contents of the system tables. These views are permission-based and display data to a user only if the user
has appropriate permission to view the data.
System Tables in the msdb Database
msdb is the database that SQL Server Agent uses, primarily for organizing scheduled background tasks
that are known as “jobs.” A large number of system tables are still present in the msdb database. Again,
while it is acceptable to query these tables, they should not be directly modified. Unless the table is
documented, no dependency on its format should be taken when designing applications.
Lesson 4
Working with Schemas
SQL Server 2005 introduced a change to how schemas are used. Since that version, schemas are used as
containers for objects such as tables, views, and stored procedures. Schemas can be particularly helpful in
providing a level of organization and structure when large numbers of objects are present in a database.
It is also possible to assign security permissions at the schema level rather than individually on the objects
that are contained within the schemas. Doing this can greatly simplify the design of system security
requirements.
Lesson Objectives
 Describe the role of a schema.
 Describe the role of object name resolution.

 Create schemas.
What Is a Schema?
Schemas are used to contain objects and to provide
a security boundary for the assignment of
permissions. In SQL Server, schemas are used as
containers for objects, rather like a folder is used to
hold files at the operating system level. Since their
introduction in SQL Server 2005, schemas can be
used to contain objects such as tables, stored
procedures, functions, types, and views. Schemas
form a part of the multipart naming convention for
objects. In SQL Server, an object is formally referred
to by a name of the form
Server.Database.Schema.Object.
Security Boundary
Schemas can be used to simplify the assignment of permissions. An example of applying permissions at
the schema level would be to assign the EXECUTE permission on a schema to a user. The user could then
execute all stored procedures within the schema. This simplifies the granting of permissions because there
is no need to set up individual permissions on each stored procedure.
It is important to understand that schemas are not used to define physical storage locations for data, as
occurs in some other database engines.
Upgrading Older Applications
If you are upgrading applications from SQL Server 2000 and earlier versions, it is important to understand
that the naming convention changed when schemas were introduced. Previously, names were of the form
Server.Database.Owner.Object.
Objects still have owners, but the owner's name does not form a part of the multipart naming convention
from SQL Server 2005 onward. When upgrading databases from earlier versions, SQL Server will
automatically create a schema that has the same name as existing object owners, so that applications that
use multipart names will continue to work.
Object Name Resolution

It is important to use at least two-part names when
referring to objects in SQL Server code, such as
stored procedures, functions, and views.
When object names are referred to in the code, SQL

Server must determine which underlying objects are
being referred to. For example, consider the
following statement.
SELECT ProductID, Name, Size FROM Product;
More than one Product table could exist in separate

schemas of the same database. When single-part names are used, SQL Server must then determine which
Product table is being referred to.
Most users have default schemas assigned, but not all types of users have these. Default schemas are
assigned to users based on standard Windows® and SQL Server logins. It is also possible to assign default
schemas to Windows groups when using SQL Server 2012. Users without default schemas are considered
to have the dbo schema as their default schema.
When locating an object, SQL Server will first check the user's default schema. If the object is not found,
SQL Server will then check the dbo schema to try to locate the object.
It is important to include schema names when referring to objects instead of depending upon schema
name resolution, such as in this modified version of the previous statement.
SELECT ProductID, Name, Size FROM Production.Product;
Apart from rare situations, using multipart names leads to more reliable code that does not depend upon
default schema settings.
Creating Schemas
Schemas are created by using the CREATE
SCHEMA command. This command can also
include the definition of objects to be created
within the schema at the time the schema is
created.
CREATE SCHEMA
Schemas have both names and owners. In the first
example shown on the slide, a schema named
Reporting is being created. It is owned by the user,
Terry. Although both schemas and the objects
contained in the schemas have owners and the
owners do not have to be the same, having
different owners for schemas and the objects contained within them can lead to complex security issues.
Object Creation at Schema Creation Time
Besides creating schemas, the CREATE SCHEMA statement can include options for object creation.
Although the second example on the slide might appear to be three statements (CREATE SCHEMA,
CREATE TABLE, and GRANT), it is in fact a single statement. Both CREATE TABLE and GRANT are
options that are being applied to the CREATE SCHEMA statement.
Within the newly created KnowledgeBase schema, the Article table is being created and the SELECT
permission on the database is being granted to Salespeople.
Statements such as the second CREATE SCHEMA example on the slide can lead to issues if the entire
statement is not executed together.
Demonstration: Working with Schemas

Create a schema, create a schema with an included object, and drop a schema.
Demonstration Steps
Create a schema, create a schema with an included object, and drop a schema
2. Ensure that you have completed the previous demonstrations in this module.
6. Open the 21 - Demonstration 2A.sql script file.
Lesson 5
Creating and Altering Tables
Now that you understand the core concepts surrounding the design of tables, this lesson introduces you
to the Transact-SQL syntax that is used when defining, modifying, or dropping tables. Temporary tables
are a special form of table that can be used to hold temporary result sets. Computed columns are used to
create columns where the value held in the column is automatically calculated, either from expressions
involving other columns from the table or from the execution of functions.
Lesson Objectives
 Create tables.
 Drop tables.
 Alter tables.
 Use temporary tables.
 Work with computed columns.
Creating Tables
Tables are created by using the CREATE TABLE
statement. This statement is also used to define the
columns that are associated with the table and
identify constraints such as primary and secondary
keys.
CREATE TABLE
When you create tables by using the CREATE

TABLE statement, make sure that you supply both a
schema name and a table name. If the schema
name is not specified, the table will be created in
the default schema of the user who is executing the
statement. This could lead to the creation of scripts
that are not robust because they could generate different schema designs when different users execute
them.
Nullability
You should specify NULL or NOT NULL for each column in the table. SQL Server has defaults for this that
you can change via the ANSI_NULL_DEFAULT setting. Scripts should always be designed to be as reliable
as possible and specifying nullability in data definition language (DDL) scripts helps to improve script
reliability.
Primary Key
You can specify a primary key constraint beside the name of a column if only a single column is included
in the key. It must be included after the list of columns when more than one column is included in the
key. See the following example, where the SalesID value is only unique for each SalesRegisterID value:
CREATE TABLE PetStore.SalesReceipt

( SalesRegisterID int NOT NULL,
SalesID int NOT NULL,
CustomerID int NOT NULL,
SalesAmount decimal(18,2) NOT NULL,
PRIMARY KEY (SalesRegisterID, SalesID)
);
Primary keys are constraints and are more fully described along with other constraints in a later module.
Dropping Tables
The DROP TABLE statement is used to drop tables
from a database. If a table is referenced by a
foreign key constraint, it cannot be dropped.
When dropping a table, all permissions, constraints,

indexes, and triggers that are related to the table
are also dropped. Deletion is permanent. (For
example, there is no concept like the Recycle Bin in
Windows.)
Code that references the table (such as code that is

contained within stored procedures, functions, and
views) is not dropped. This can lead to “orphaned”
code that refers to nonexistent objects. SQL Server
2008 introduced a set of dependency views that can be used to locate code that references nonexistent
objects. The details of both referenced and referencing entities are available from the
sys.sql_expression_dependencies view. Referenced and referencing entities are also available separately
from the sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities dynamic management views.
Views are discussed in a later module.
Altering Tables
Altering a table is useful because permissions on
the table are retained along with the data in the
table. If you drop and re-create the table with a
new definition, both the permissions on the table
and the data in the table are lost. If the table is
referenced by a foreign key, it cannot be dropped.
However, it can be altered.
Tables are modified by using the ALTER TABLE

statement. You can use this statement to add or
drop columns and constraints or to enable or
disable constraints and triggers. (Constraints and
triggers are discussed in later modules.)
Note that the syntax for adding and dropping columns is inconsistent. The word COLUMN is required for
DROP, but not for ADD. In fact, it is not an optional keyword for ADD either. If the word COLUMN is
omitted in a DROP, SQL Server assumes that it is a constraint being dropped.
In the slide example, the PreferredName column is being added to the PetStore.Owner table. Later, the
PreferredName column is being dropped from the PetStore.Owner table. Note the difference in syntax
regarding the word COLUMN.
Demonstration: Working with Tables

 Create tables and alter tables.
Demonstration Steps
Create tables and alter tables

Temporary Tables
Temporary tables are used to hold temporary result
sets within a user's session. They are created within
the tempdb database and deleted automatically
when they go out of scope. This typically occurs
when the code in which they were created
completes or aborts. Temporary tables are very
similar to other tables, except that they are only
visible to the creator and in the same scope (and
sub-scopes) within the session. They are
automatically deleted when a session ends or when
they go out of scope. Although temporary tables
are deleted when they go out of scope, you should
explicitly delete them when they are no longer required, to reduce overall resource requirements on the
server. Temporary tables are often created in code by using the SELECT INTO statement.
A table is created as a temporary table if its name has a number sign (#) prefix. A global temporary table
is created if the name has a double-number-sign (##) prefix. Global temporary tables are visible to all
users and are not commonly used.
Passing Temporary Tables
Temporary tables are also often used to pass rowsets between stored procedures. For example, a
temporary table that is created in a stored procedure is visible to other stored procedures that are
executed from within the first procedure. Although this use is possible, it is not considered good practice
in general. It breaks common rules of abstraction for coding and also makes it more difficult to debug or
troubleshoot the nested procedures. SQL Server 2008 introduced table-valued parameters (TVPs) that can
provide an alternate mechanism for passing tables to stored procedures or functions. (TVPs are discussed
later in this course.)
The overuse of temporary tables is a common Transact-SQL coding error that often leads to performance
and resource issues. Extensive use of temporary tables can be an indicator of poor coding techniques,
often due to a lack of set-based logic design.
Demonstration: Working with Temporary Tables

Demonstration Steps
Work with Temporary Tables

6. Open the 32 - Demonstration 3B.sql script file.
Computed Columns
Computed columns are columns that are derived
from other columns or from the result of executing
functions.
Computed columns were introduced in SQL Server

2000. In the example shown on the slide, the
YearOfBirth column is calculated by executing the
DATEPART function to extract the year from the
DateOfBirth column in the same table.
In the example shown, you can also see the word

PERSISTED added to the definition of the
computed column. Persisted computed columns
were introduced in SQL Server 2005.
A nonpersisted computed column is calculated every time a SELECT operation occurs on the column and
it does not consume space on disk. A persisted computed column is calculated when the data in the row
is inserted or updated and does consume space on the disk. The data in the column is then selected like
the data in any other column.
The core difference between persisted and nonpersisted computed columns relates to when the
computational performance impact is exerted. Nonpersisted computed columns work best for data that is
modified regularly, but selected rarely. Persisted computed columns work best for data that is modified
rarely, but selected regularly. In most business systems, data is read much more regularly than it is
updated. For this reason, most computed columns would perform best as persisted computed columns.
Demonstration: Working with Computed Columns

Work with computed columns.
Demonstration Steps
Work with computed columns

6. Open the 33 - Demonstration 3C.sql script file.

8. Close SQL Server Management Studio without saving any changes.

Lab: Designing and Implementing Tables

Scenario
A business analyst from your organization has asked you to design a schema for some new tables that are
being added to the MarketDev database. Initially, you need to provide assistance in deciding which data
types to use for three new tables that she is designing. You need to provide an improved schema design
based on good design practices and an appropriate level of normalization. The business analyst was also
confused about when data should be nullable. You need to decide about nullability for each column in
your improved design.
The new tables need to be isolated in their own schema. You need to create the required schema called
DirectMarketing. The owner of the schema should be dbo.
When the schema has been created, if you have enough time, you need to create the tables that have
been designed.
Objectives
After completing this lab, you will be able to:
 Choose appropriate data types.
 Create a schema.
 Create tables.
Estimated Time: 45 Minutes
Password: Pa$$w0rd
Exercise 1: Choose Appropriate Data Types

Scenario
In this exercise, a new developer has sought your assistance in deciding which data types to use for three
new tables that she is designing. She presents you with a list of organizational data requirements for each
table. You need to decide on appropriate data types for each item.
2. Determine Column Names and Data Types

2. Run the Setup Windows Command Script file (Setup.cmd) in the D:\Labfiles\Lab02\Starter folder as
Administrator.
 Task 2: Determine Column Names and Data Types

1. Open D:\Labfiles\Lab02\Starter\Supporting Documentation.docx.
2. Review the supporting documentation for details of the PhoneCampaign, Opportunity, and
SpecialOrder tables and determine column names, data types, and nullability for each data item in
the design.
Decided on appropriate data types for your tables.
Created a schema.
Created tables.
Exercise 2: Create a Schema

Scenario
You need to isolate the new tables in their own schema. You need to create the required schema called
DirectMarketing. The owner of the schema should be dbo.

1. Connect to the MarketDev Database
2. Create a Schema Named DirectMarketing
 Task 1: Connect to the MarketDev Database

1. Connect to the MarketDev Database.
 Task 2: Create a Schema Named DirectMarketing

1. Write a query to create a new schema called DirectMarketing owned by dbo.
2. Execute the query.
Created a schema.
Exercise 3: Create the Tables

Scenario
Now you need to create the tables that you have designed.
1. Create the Tables
 Task 1: Create the Tables

1. Create the tables that were designed in Exercise 1. Take into consideration the nullability of each
column and each table should have a primary key. At this point there is no need to create CHECK or
FOREIGN KEY constraints
Created the tables that you designed in the first exercise of this lab.
Question: When should a column be declared as nullable?
Question: Would it be reasonable to have columns called, for example, AddressLine1,

AddressLine2, and AddressLine3 in a normalized design?

Best Practice: All tables should have primary keys.
Foreign keys should be declared within the database in almost all circumstances. Developers
often suggest that the application will ensure referential integrity, but experience shows that this
is a poor option. Databases are often accessed by multiple applications, and bugs are also easy to
miss when they first start to occur.
Review Question(s)
Question: What is a primary key?
Question: What is a foreign key?
Question: What is meant by the term “referential integrity”?

3-1
Module 3
Ensuring Data Integrity through Constraints
Contents:
Module Overview 3-1
Lesson 1: Enforcing Data Integrity 3-2
Lesson 2: Implementing Domain Integrity 3-5
Lesson 3: Implementing Entity and Referential Integrity 3-8
Lab: Ensuring Data Integrity Through Constraints 3-15
Module Overview
The quality of data in your database largely determines the usefulness and effectiveness of applications
(and people) that rely on it, and it can play a major role in the success or failure of an organization or a
business venture. Ensuring data integrity is a critical step in maintaining high-quality data.
You should enforce data integrity at all levels of an application from first entry or collection through
storage. Microsoft® SQL Server® data management software provides various features that simplify the
enforcement of data integrity.
Objectives
 Explain the available options for enforcing data integrity and the levels at which they should be
applied.
 Implement domain integrity.
 Implement entity and referential integrity.

3-2 Ensuring Data Integrity through Constraints
Lesson 1
Enforcing Data Integrity
An important step in database planning is deciding the best way to enforce the integrity of the data. Data
integrity refers to the consistency and accuracy of data that is stored in a database.
Lesson Objectives
 Explain how data integrity checks need to apply across different layers of an application.
 Describe the different types of data integrity.
 Explain the available options for enforcing data integrity.
Data Integrity Across Application Layers

Data integrity can be applied at different levels
within an application. There is no right and wrong
answer for all situations.
Application Levels
Applications are often structured in levels. This is

done to keep related functionality together and to
improve the maintainability of code and the chance
of it being reusable. Common examples of
application levels are:
 User-interface level
 Middle tier (sometimes referred to as business

logic)
 Data tier
Data integrity could be enforced at each of these levels.
User-Interface Level
There are several advantages of enforcing integrity at the user-interface level. The responsiveness to the
end user is usually higher because it is possible to trap minor errors before any calls are made to other
layers of code. Error messages are often clearer because the code is more aware of the action that the
user has taken that caused the error to occur.
The main disadvantage of enforcing integrity at the user-interface level is that more than a single
application might need to work with the same underlying data and each application might enforce the
rules differently.
Middle Tier
Many integrity issues are directly related to business logic requirements. The middle tier is often where
the bulk of those requirements exist in code. In addition, multiple user interfaces often reuse the middle
tier. Implementing integrity at this level helps to avoid different user interfaces applying different rules
and checks. At this level, the logic is still quite aware of the functions that cause errors, so the error
messages that are returned to the user can still be quite specific.
It is also easy for integrity checks that are only applied in the middle tier to be ineffective due to race
conditions. For example, it might seem easy to check that a customer exists and then enable an order to
be placed for the customer. Consider, though, the possibility that another user could remove the
customer between the time that you check for the customer's existence and the time that you record the
order.
Data Tier
The advantage of implementing integrity at the data tier is that upper layers cannot bypass it. In
particular, it is common for the same data to be accessed by multiple applications or even directly
through tools such as SQL Server Management Studio. If integrity is not maintained at the data-tier level,
all applications need to consistently apply all of the rules and checks.
The challenge of implementing some forms of integrity at the data tier (usually within the database) is
that the data tier is often unaware of the user actions that caused an error to occur, so the error messages
that are returned from this layer tend to be very precise in describing the issue, but quite cryptic for an
end user to understand. They typically need to be retranslated by upper layers of code before being
presented to end users.
Multiple Tiers
The correct solution in most situations involves applying rules and checks at multiple levels. However, the
challenge with this approach is in maintaining consistency between the rules and checks at different
application levels.
Types of Data Integrity

There are three basic forms of data integrity
commonly enforced in database applications:
domain integrity, entity integrity, and referential
integrity.
Domain (or column) integrity specifies a set of data

values that are valid for a column and determines
whether to allow null values. Domain integrity is
often enforced by using validity checking and can
be enforced by restricting the data type, format, or
range of possible values that are allowed in a
column. For example, assigning a tinyint data type
to a column ensures that only values from 0 to 255
can be stored in that column.
Entity (or table) integrity requires that all rows in a table have a way of being uniquely identified. This is
commonly called a primary key value. Whether the primary key value can be changed or whether the
whole row can be deleted depends on the level of integrity that is required between the primary key and
any other tables, based on referential integrity.
Referential integrity ensures that the relationships among the primary keys (in the referenced table) and
foreign keys (in the referencing tables) are always maintained. You are not permitted to insert a value in
the referencing column that doesn’t exist in the referenced column in the target table. A row in a
referenced table cannot be deleted nor can the primary key be changed if a foreign key refers to the row
unless a form of cascading action is permitted. You can define referential integrity relationships within the
same table or between separate tables.
As an example of referential integrity, you may need to ensure that an order cannot be placed for a
nonexistent customer.
Options for Enforcing Data Integrity

The table on the slide summarizes the mechanisms
that SQL Server provides for enforcing data
integrity.
Data Types
The first option for making sure that data has

integrity is to ensure that only the correct type of
data is stored. For example, you cannot place
alphabetic characters into a column that has been
defined as storing integers.
The choice of a data type will also define the
permitted range of values that can be stored. For
example, the smallint data type can only contain values from –32,768 to 32,767.
For XML data (which is discussed in Module 13, XML schemas can be used to further constrain the data
that is held in the XML data type.
Nullability
The nullability of a column determines whether a value must be present in the column. This is often
referred to as whether a column is mandatory or not.
Default Values
If a column is not nullable, a value must be placed in it whenever a new row is inserted. A default value
enables users to insert a specific value into a column when no value is supplied in the statement that
inserted the row.
Constraints
Constraints are used to limit the permitted values in a column further than the limits that the data type
provides. For example, a tinyint column can have values from 0 to 255. You might decide to further
constrain the column so that only values between 1 and 9 are permitted in the column.
You can also apply constraints at the table level and enforce relationships between the columns of a table.
For example, you might have a column that holds an order number, but it is not mandatory. You might
then add a constraint that specifies that the column must have a value if the Salesperson column also has
a value.
Triggers
Triggers are procedures (somewhat like stored procedures) that are executed whenever specific events
such as INSERT or UPDATE occur on a specific object such as a table. In the code for the trigger, you can
then enforce even more complex rules for integrity. Triggers are discussed in Module 10.
Objects from Earlier Versions
Early versions of SQL Server supported objects called rules and defaults. Note that defaults were a type of
object and not the same as DEFAULT constraints. Defaults were separate objects that were then bound to
columns. They were reused across multiple columns.
These objects have been deprecated because they were not compliant with Structured Query Language
(SQL) standards. Code that is based on these objects should be replaced. In general, you should replace
rules with CHECK constraints and defaults with DEFAULT constraints.
Lesson 2
Implementing Domain Integrity
Domain integrity limits the range and type of values that can be stored in a column. It is usually the most
important form of data integrity when first designing a database. If domain integrity is not enforced,
processing errors can occur when unexpected or out-of-range values are encountered.
Lesson Objectives
 Describe how you can use data types to enforce domain integrity.
 Describe how you can use DEFAULT constraints to provide default values for columns.
 Describe how you can use CHECK constraints to enforce domain integrity.
Data Types
Choosing an appropriate data type for each column
is one of the most important decisions that you
must take when you are designing a table as part of
a database Data types were discussed in Module 2.
You can assign data types to a column by using one
of the following methods:
 Using SQL Server system data types.
 Creating alias data types that are based on

system data types.
 Creating user-defined data types from data

types created in the Microsoft .NET Framework
common language runtime.
System Data Types
SQL Server supplies system data types and a large range of data types is available. Choosing a data type
determines both the types of data that can be stored and the range of values that is permitted.
Alias Data Types
It is common for consistency problems to occur when tables are designed. This is even more common
when more than one person has designed the tables. For example, you may have several tables that store
the weight of a product that was sold. One column might be defined as decimal(18,3), another column
might be defined as decimal(12,2), and another column might be defined as decimal(16,5). For
consistency, alias data types enable the creation of a data type called ProductWeight, define it as
decimal(18,3), and then use it as the data type for all of the columns. This can help lead to more
consistent database designs.
An additional advantage of alias data types is that code generation tools can create more consistent code
when the tools have the additional information about the data types that alias data types provide. For
example, you could decide to have a user-interface design program that always displayed and/or
prompted for product weights in a specific way.
User-Defined Data Types
The addition of managed code to SQL Server as part of SQL Server 2005 onward made it possible to
create entirely new data types. Although alias data types are user-defined, they are still effectively subsets
of the existing system data types. User-defined data types that are created in managed code enable the
design of not only the data that is stored in a data type, but also the behavior of the data type. For
example, you could design a jpeg data type. Besides designing how it would store images, you could
decide that it could be updated by calling a predesigned Resize method. Designing user-defined data
types is discussed in a Module 12.
DEFAULT Constraints
A DEFAULT constraint provides a value for a column
when no value is specified in the statement that
inserted the row. You can view the existing
definition of DEFAULT constraints by querying the
sys.default_constraints view.
DEFAULT Constraint
Sometimes a column is mandatory, that is, a value

must be provided for the column. However, the
application or program that is inserting the row
might not be providing a value for that column. In
this case, you may want to apply a rule by which
the value of the column is automatically generated.
DEFAULT constraints are associated with a table column. They are used to provide a default value for the
column when the user does not supply a value. The value is retrieved from the evaluation of an expression
and the data type that the expression returns must be compatible with the data type of the column.
Nullable Columns and DEFAULT Constraints

Without DEFAULT constraints, if a column is nullable and no value is provided for the column in the
statement that inserted the row, the column would be left NULL. If a DEFAULT constraint existed on the
column, the default value would be used instead of the column being left NULL.
However, note that if the statement that inserted the row explicitly inserted NULL, the default value would
not be used.
Named Constraints
SQL Server does not require you to supply names for constraints that you create. If a name is not supplied,
SQL Server will automatically generate a name. However, the names that are generated are not very
intuitive. Therefore, it is generally considered a good idea to provide names for constraints as you create
them and to do so in a consistent naming pattern.
A good example of why naming constraints is important is that if a column needs to be deleted, you must
first remove any constraints that are associated with the column. Dropping a constraint requires you to
provide a name for the constraint that you are dropping. Having a consistent naming standard for
constraints helps you to know what that name is likely to be rather than having to execute a query to find
the name. (Locating the name of a constraint would involve querying the sys.constraints system view or
searching in Object Explorer.)
CHECK Constraints
CHECK constraints limit the values that a column
can accept by controlling the values that can be put
in the column.
After determining the data type for a column, you

may want to further restrict the values that can be
placed into the column. For example, you might
decide that a varchar(7) column must be five
characters long if the first character is the letter A.
More commonly, CHECK constraints are used as a

form of “sanity” check. For example, you might
decide that a salary needs to be within a certain
range, or a person’s age must be in the range 0 to
150.
Logical Expression
CHECK constraints work with any logical (Boolean) expression that can return TRUE, FALSE, or
UNKNOWN. Particular care must be given to any expression that could have a NULL return value. CHECK
constraints reject values that evaluate to FALSE. This does not include an unknown return value because
these values will not be rejected.
Table-Level CHECK Constraints
Apart from checking the value in a particular column, you can apply CHECK constraints at the table level
to check the relationship between the values in more than a single column from the same table. For
example, you could decide that the FromDate column should not have a larger value than the ToDate
column in the same row.
Demonstration: Data and Domain Integrity

 Enforce data and domain integrity.
Demonstration Steps
Enforce data and domain integrity

6. If Solution Explorer is not visible, click the View menu and click Solution Explorer.
7. Expand the Queries folder and double-click 21 - Demonstration 2A.sql.
Lesson 3
Implementing Entity and Referential Integrity
It is important to be able to uniquely identify rows within tables and to be able to establish relationships
across tables. For example, you will need to make sure that a customer can be identified and that the
customer exists before you allow an order to be placed for that customer. This can be enforced by using a
combination of entity and referential integrity.
Lesson Objectives
 Explain how PRIMARY KEY constraints are used to enforce entity integrity.
 Describe how UNIQUE constraints differ from PRIMARY KEY constraints.
 Explain how FOREIGN KEY constraints are used to enforce referential integrity.
 Describe how table relationships can be maintained while deleting or updating data through
cascading relationships.
 Explain the common considerations for constraint checking.
 Describe how IDENTITY properties are implemented.
 Explain when to use sequences and how to configure them.
PRIMARY KEY Constraints

PRIMARY KEY constraints are used to uniquely
identify each row in a table. They must be unique
and not NULL. They may involve multiple columns.
SQL Server will internally create an index to support
the PRIMARY KEY constraint.
In database terminology, the term “candidate key”

is used to describe the column or combination of
columns that are required to be able to uniquely
identify a row of data within a table. None of the
columns that are part of a candidate key are
permitted to be nullable.
A primary key is a candidate key that has been

chosen as the primary way to identify each row in a table. For example, in the example shown on the slide,
the OpportunityID column has been chosen as the primary key.
As with other types of constraints, even though a name is not required when defining a PRIMARY KEY
constraint, it is desirable to choose a name for the constraint rather than leaving SQL Server to do so.
UNIQUE Constraints
A UNIQUE constraint indicates that the column or
combination of columns is unique. One row can be
NULL (if the column nullability permits this). SQL
Server will internally create an index to support the
UNIQUE constraint.
A UNIQUE constraint is used to ensure that more

than one row does not have a single value.
For example, in Spain, all Spanish citizens over the

age of 14 are issued with a national identity
document called a Documento Nacional de
Identidad (DNI). It is a unique number in the format
99999999-X where 9 is a digit and X is a letter used
as a checksum of the digits. People from other countries who need a Spanish identification number are
given a Número de Identidad de Extranjero (NIE), which has a slightly different format of X-99999999-X.
If you were storing a tax identifier for employees in Spain, you would store one of these values, include a
CHECK constraint to make sure that the value was in one of the two valid formats, and have a UNIQUE
constraint on the column that stores these values. Note that this may be unrelated to the fact that the
table has another unique identifier such as EmployeeID that is used as a primary key for the table.
As with other types of constraints, even though a name is not required when defining a UNIQUE
NULL and UNIQUE

Although it is possible for a value that is required to be unique to be NULL, this is only permitted for a
single row. In practice, this means that nullable unique columns are rare.
FOREIGN KEY Constraints

A FOREIGN KEY constraint is used to establish a link
between the data in tables and can be used to
enforce the relationship.
As mentioned earlier, you might want to make sure

that a customer exists before you allow an order to
be entered for the customer. This form of integrity
(referred to as referential integrity) can be enforced
by using FOREIGN KEY constraints.
A FOREIGN KEY constraint must refer to either a
PRIMARY KEY or UNIQUE constraint in the target
table if the value in the referencing table is not
NULL.
Note that you cannot change the length of a column when a FOREIGN KEY constraint is defined on it.
The target table can be the same table. For example, an Employee row might reference a manager who is
another row in the same Employee table.
As with other types of constraints, even though a name is not required when defining a FOREIGN KEY
WITH NOCHECK Option
When you add a FOREIGN KEY constraint to a column (or columns) in a table, SQL Server will check the
data that is already in the column to make sure that the reference to the target table is valid. However, if
you specify WITH NOCHECK, SQL Server does not apply the check to existing rows and will only check
the reference in future when rows are inserted or updated. The WITH NOCHECK option can be applied
to other types of constraints, too.
REFERENCES Permission
Before you can place a FOREIGN KEY constraint on a table, you must at least have REFERENCES
permission on the target table. This avoids the situation where another user could place a reference to
one of your tables, leaving you unable to drop or substantially change your own table until the other user
removed that reference. However, in terms of security, keep in mind that providing REFERENCES
permission to a user on a table for which they do not have SELECT permission does not totally prevent
them from working out what the data in the table is by a brute force attempt that involves trying all
possible values.
Cascading Referential Integrity

The FOREIGN KEY constraint includes a CASCADE
option that enables any change to a column value
that defines a UNIQUE or PRIMARY KEY constraint
to propagate the change to any foreign key values
that reference it. This action is referred to as
cascading referential integrity.
By using cascading referential integrity constraints,

you can define the actions that SQL Server takes
when a user tries to update or delete a key column
(or columns) to which a FOREIGN KEY constraint
makes reference.
The action to be taken is separately defined for

UPDATE and DELETE actions and can have four value:
1. NO ACTION is the default. For example, if you attempt to delete a customer and there are orders for
the customer, the deletion will fail.
2. CASCADE makes the required changes to the referencing tables. If the customer is being deleted, his
or her orders will be deleted, too. If the customer primary key is being updated (although note that
this is undesirable anyway), the customer key in the orders table will also be updated so that the
orders still refer to the correct customer.
3. SET NULL causes the values in the columns in the referencing table to be nullified. For the customer
and orders example, this means that the orders would still exist, but they would not refer to any
customer.
4. SET DEFAULT causes the values in the columns in the referencing table to be set to their default
values. This provides more control than the SET NULL option, which always sets the values to NULL.
Caution
Although cascading referential integrity is easy to set up, you should exercise extreme caution when using
it within database designs.
For example, if you used the CASCADE option in the example above, would it really be okay for the
orders for the customer to be removed when you decided to remove the customer? Most organizations
might not mind orders disappearing, but might be much less happy to see other objects such as invoices
disappearing. Also, keep in mind the cascading nature of this situation. When you remove the customer,
you remove the orders. However, there may be other tables that reference the orders table (such as order
details or even invoices), and these would be removed, too.
Considerations for Constraint Checking

There are a few common considerations that you
need to understand when you are working with
constraints.
Naming
As mentioned earlier in the module, you should

specify meaningful names for constraints rather
than leaving SQL Server to select a name. SQL
Server provides complicated system-generated
names. Often, you need to refer to constraints by
name. Therefore, it is better to have chosen them
yourself and to have applied a consistent naming
convention when doing so.
Changing Constraints
You can create, alter, or drop constraints without having to drop and re-create the underlying table.
You use the ALTER TABLE statement to add, alter, or drop constraints.
Error Checking in Applications
Even though you have specified constraints in your database layer, you may also want to check the same
logic in higher layers of code. Doing so will lead to more responsive systems because they will go through
fewer layers of code. It will also provide more meaningful errors to users because the code is closer to the
business-related logic that led to the errors. The challenge is in keeping the checks between different
layers consistent.
High-Performance Data Loading or Updates
When you are performing bulk loading or updates of data, you can often achieve better performance by
disabling CHECK and FOREIGN KEY constraints while performing the bulk operations and then reenabling
them afterwards, rather than having them checked row by row during the bulk operation.
Demonstration: Entity and Referential Integrity

 Define entity integrity for table, define referential integrity for tables, and define cascading referential
integrity constraints.
Demonstration Steps
Define entity integrity for table, define referential integrity for tables, and define cascading referential
integrity constraints
2. Run D:\Demofiles\Mod03\Setup.cmd as an administrator to revert any changes

IDENTITY Property
It is common to require a series of numbers to be
automatically provided for an integer column. The
IDENTITY property on a database column indicates
that an INSERT statement will not provide the value
for the column; instead, SQL Server will provide it
automatically.
IDENTITY is a property that is typically associated

with int or bigint columns that provide automated
generation of values during insert operations. You
may be familiar with auto-numbering systems or
sequences in other database engines. IDENTITY
columns are not identical to these, but you can use
them to replace the functionality from those other database engines.
When you specify the IDENTITY property, you specify a seed and an increment. The seed is the starting
value. The increment is how much the value goes up by each time. Both seed and increment default to a
value of 1 if they are not specified.
Although explicit inserts are not normally allowed for columns that have an IDENTITY property, it is
possible to explicitly insert values. You can temporarily enable the ability to insert into an IDENTITY
column by using a connection option. You can use SET IDENTITY_INSERT ON to enable the user to insert
values into the column by using the IDENTITY property instead of having the column auto-generated.
Having the IDENTITY property on a column does not in itself ensure that the column is unique. Unless
there is also a UNIQUE constraint on the column, there is no guarantee that values in a column that has
the IDENTITY property will be unique.
Retrieving the Inserted Identity Value
After inserting a row into a table, you often need to know the value that was placed into the column with
the IDENTITY property. The system variable @@IDENTITY returns the last identity value that was used
within the session, in any scope. This can be a problem with triggers that perform inserts on another table
with an IDENTITY column as part of an INSERT statement.
For example, if you insert a row into a customer table, the customer might be assigned a new identity
value. However, if a trigger on the customer table caused an entry to be written into an audit logging
table when inserts are performed, the @@IDENTITY variable would return the identity value from the
audit logging table, rather than the one from the customer table.
To deal effectively with this, the SCOPE_IDENTITY() function was introduced. It provides the last identity
value within the current scope only. In the previous example, it would return the identity value from the
customer table.
Another complexity relates to multi-row inserts, which were introduced in SQL Server 2008. In this
situation, you may want to retrieve the IDENTITY column value for more than one row at a time.
Typically, this would be implemented by the use of the OUTPUT clause on the INSERT statement.
Sequences
You can use sequences in a similar way to
IDENTITY properties when a sequence of values is
required. However, unlike IDENTITY properties,
sequences are not tied to any specific table. This
means that you could use a single sequence to
provide key values for a group of tables.
Sequences can be cyclic. They can return to a low
value when a specified maximum value has been
exceeded.
In the example on the slide, a sequence called
BookingID is created in the Booking schema. The
sequence is defined as generating integer values. By
default, sequences generate bigint values.
Values from sequences are retrieved by using the NEXT VALUE FOR clause. In the example shown on the
slide, the sequence is being used to provide the default value for the FlightBookingID column in the
Booking.FlightBooking table.
Sequences are created by the CREATE SEQUENCE statement, modified by the ALTER SEQUENCE
statement, and deleted by the DROP SEQUENCE statement.
Other database engines provide sequence values, so the addition of sequence support in SQL Server 2012
and SQL Server 2014 can assist with migrating code to SQL Server from other database engines.
Note that values that are retrieved from the sequence are never returned for reuse. This means that gaps
can occur in the set of sequence values. In addition, a range of sequence values can be retrieved in a
single call via the sp_sequence_get_range system stored procedure. Options also exist to cache sets of
sequence values to improve performance. When a server failure occurs, the entire cached set of values is
lost.
Demonstration: Working with Identity and Sequences

 Work with identity constraints, create a sequence, and use a sequence to provide key values for two
tables.
Demonstration Steps
Work with identity constraints, create a sequence, and use a sequence to provide key values for two tables

Lab: Ensuring Data Integrity Through Constraints

Scenario
A table named DirectMarketing.Opportunity has recently been added to the Marketing system in the
AdventureWorks database, but has no constraints in place. In this lab, you will implement the required
constraints to ensure data integrity and, if you have time, test that constraints work as specified.
The following table should be used when you are designing your constraints.
Column Data type Required? Validation rule
OpportunityID int Yes Composite

primary key
ProspectID int Yes Composite

primary key.
Must be a valid
prospect
DateRaised datetime Yes Defaults to the

current date
Likelihood tinyint Yes
Rating char(1) Yes
EstimatedClosingDate date Yes
EstimatedRevenue decimal(10,2) Yes
Objectives
In this lab, you will add constraints to tables.
Password: Pa$$w0rd
Exercise 1: Add Constraints

Scenario
You have been given the design for a table called DirectMarketing.Opportunity. You need to alter the
table with the appropriate constraints based upon the provided specifications.
1. Review the supporting documentation.

2. Alter the DirectMarketing.Opportunity table.
2. Review the Supporting Documentation
3. Alter the DirectMarketing.Opportunity Table


 Task 2: Review the Supporting Documentation

1. Review the table design requirements that were supplied in the scenario.
 Task 3: Alter the DirectMarketing.Opportunity Table

1. Work through the list of requirements and alter the table to make columns required based on the
requirements.
2. Work through the list of requirements and alter the table to make columns the primary key based on
the requirements.
3. Work through the list of requirements and alter the table to make columns foreign keys based on the
requirements.
4. Work through the list of requirements and alter the table to add DEFAULT constraints to columns
based on the requirements.
Results: Having completed this lab, you will have added constraints to the DirectMarketing.Opportunity
table.
Exercise 2: Test the Constraints (only if time permits)

Scenario
You should now test each of the constraints that you designed to ensure that they work as expected.
1. Test the default values and data types.
2. Test the primary key.
3. Test the foreign key reference on ProspectID.
1. Test the Default Values and Data Types
2. Test the Primary Key
3. Test the Foreign Key

 Task 1: Test the Default Values and Data Types

2. Type the query below in the query pane.
INSERT INTO DirectMarketing.Opportunity (OpportunityID,ProspectID,

Likelihood,Rating,EstimatedClosingDate, EstimatedRevenue)
VALUES (1,1,8,’A’,’12/12/2013’,123000.00);
SELECT * FROM DirectMarketing.Opportunity;
GO
3. In the toolbar, click Execute.
Note: This query should execute without errors.
 Task 2: Test the Primary Key


VALUES (1,1,8,’A’,’12/12/2013’,123000.00);
GO

Note: This query should fail due to the PRIMARY KEY constraint.
 Task 3: Test the Foreign Key


VALUES (2,10,8,’A’,’12/12/2013’,123000.00);
GO
Note: This query should fail due to the FOREIGN KEY constraint.
Results: After completing this exercise, you should have successfully tested your constraints.
Question: In SQL Server Management Studio, you have successfully run a script that created
a table, but you don’t see the table in Object Explorer. What do you need to do?
Question: What does the DEFAULT option do when you create a column?
Question: What requirement does a PRIMARY KEY constraint have that a UNIQUE constraint
does not?

Best Practice: When you create a constraint on a column, if you do not specify a name for
the constraint, SQL Server will generate a unique name for the constraint. However, you should
always name constraints to adhere to your naming conventions.
Review Question(s)
Question: Why implement CHECK constraints if an application is already checking the input
data?
Question: What are some scenarios in which you may want to temporarily disable constraint
checking?
4-1
Module 4
Introduction to Indexes
Contents:
Module Overview 4-1
Lesson 1: Core Indexing Concepts 4-2
Lesson 2: Single-Column and Composite Indexes 4-7
Lesson 3: Table Structures in SQL Server 4-9
Lesson 4: Working with Clustered Indexes 4-14
Lesson 5: Working with Nonclustered Indexes 4-21
Lab: Creating Indexes 4-26

Module Overview
An index is a collection of pages associated with a table. Indexes are used to improve the performance of
queries or enforce uniqueness. Before learning to implement indexes, it is important to understand how
they work, how effective different data types are when used within indexes, and how indexes can be
constructed from multiple columns. This module discusses table structures without indexes and the
different index types available in SQL Server.
Objectives
 Explain core indexing concepts.
 Describe single-column and composite indexes.
 Describe the different table structures in SQL Server.
 Implement clustered indexes.
 Implement nonclustered indexes.

4-2 Introduction to Indexes
Lesson 1
Core Indexing Concepts
Although it is possible for Microsoft® SQL Server® data management software to read all of the pages in
a table when it is calculating the results of a query, doing so is often highly inefficient. Instead, you can
use indexes to point to the location of required data and to minimize the need for scanning entire tables.
In this lesson, you will learn how indexes are structured and learn the key measures that are associated
with the design of indexes. Finally, you will see how indexes can become fragmented over time.
Lesson Objectives
 Describe how SQL Server accesses data.
 Describe the need for indexes.
 Explain the concept of B-Tree index structures.

 Explain the concepts of selectivity, density, and index depth.
 Explain why index fragmentation occurs.
 Describe data types and indexes.
How SQL Server Accesses Data

SQL Server can access data in a table by reading all
of the pages in the table (known as a table scan) or
by using index pages to locate the required rows.
Each page is 8 kilobytes (KB) in size.
Whenever SQL Server needs to access data in a
table, it has to choose between reading all of the
pages in the table or seeking and reading one or
more indexes on the table that would reduce the
amount of effort to locate the required rows.
You can always resolve queries by reading the

underlying table data. Indexes are not required, but
accessing data by reading large numbers of pages
is usually considerably slower than methods that use appropriate indexes.
Sometimes SQL Server creates its own temporary indexes to improve query performance. However, doing
so is up to the optimizer and beyond the control of the database administrator or programmer, so these
temporary indexes will not be discussed in this module. The temporary indexes are only used to improve a
query plan if no proper indexing already exists.
In this module, you will consider standard indexes that are created on tables. SQL Server also includes
other types of index:
 Integrated full-text search is a special type of index that provides flexible searching of text.
 Spatial indexes are used with the GEOMETRY and GEOGRAPHY data types.
 Primary and secondary XML indexes assist when querying XML data.
 Columnstore indexes are used to speed up operations for data that is not constantly changing, such
as data in data warehouses.
Each of these other index types is discussed in later modules in this course.
The Need for Indexes

Indexes are not described in ANSI Structured Query
Language (SQL) definitions. Indexes are considered
to be an implementation detail. SQL Server uses
indexes for improving the performance of queries
and for implementing certain constraints.
As mentioned in the last topic, SQL Server can

always read the entire table to work out required
results, but doing so can be inefficient. Indexes can
reduce the effort that is required to locate results,
but only if the indexes are designed well.
SQL Server also uses indexes as part of its

implementation of PRIMARY KEY and UNIQUE
constraints. When you assign a PRIMARY KEY or UNIQUE constraint to a column or set of columns, SQL
Server automatically indexes that column or set of columns. It does so to make it possible to check quickly
whether a given value is already present.
A Useful Analogy
At this point, it is useful to consider an analogy that might be easier to relate to. Consider a physical
library. Most libraries store books in a given order, which is basically an alphabetical order within a set of
defined categories.
Note that even when you store the books in alphabetical order, there are various ways to do so. The order
of the books could be based on the name of the book or the name of the author. Whichever option is
chosen makes one form of access easy and other forms of access harder. For example, if books were
stored in book name order, how would you locate books that were written by a particular author? Indexes
assist with this type of problem.
Index Structures
Tree structures provide rapid search capabilities for
large numbers of entries in a list.
Indexes in database systems are often based on

binary tree (B-Tree) structures. Binary trees are
simple structures where at each level, a decision is
made to navigate left or right. This style of tree can
quickly become unbalanced and less useful.
SQL Server indexes are based on a form of self-
balancing tree. Whereas binary trees have at most
two children per node, SQL Server indexes can have
a large number of children per node. This helps
improve the efficiency of the indexes and avoids the
need for excessive depth within an index. Depth is defined as the number of levels from the top node
(called the root node) to the bottom nodes (called leaf nodes).
Selectivity, Density, and Index Depth

Additional indexes on a table are most useful when
they are highly selective.
For example, imagine how you would locate books

by a specific author in a physical library by using a
card file index. The process would involve the
following steps:
 Finding the first entry for the author in the

index.
 Locating the book in the bookcases based on

the information in the index entry.
 Returning to the index and finding the next

entry for the author.
 Locating the book in the bookcases based on the information in that next index entry.
You would need to keep repeating the same steps until you had found all of the books by that author.
Now imagine doing the same for a range of authors, such as one-third of all of the authors in the library.
You quickly reach a point where it would be quicker to just scan the whole library and ignore the author
index rather than running backward and forward between the index and the bookcases.
Density is a measure of the lack of uniqueness of the data in a table. A dense column is one that has a
high number of duplicates.
Index depth is a measure of the number of levels from the root node to the leaf nodes. Users often
imagine that SQL Server indexes are quite deep, but the reality is quite different. The large number of
children that each node in the index can have produces a very flat index structure. Indexes with only three
or four layers are very common.
Index Fragmentation
Index fragmentation is the inefficient use of pages
within an index. Fragmentation occurs over time as
data is modified.
For operations that read data, indexes perform best
when each page of the index is as full as possible.
Although indexes may initially start full (or relatively
full), modifications to the data in the indexes can
cause the need to split index pages.
From our physical library analogy, imagine a library

that has full bookcases. What occurs when a new
book needs to be added? If the book is added to
the end of the library, the process is easy, but if the
book needs to be added in the middle of a full bookcase, there is a need to readjust the bookcase.
Internal vs. External Fragmentation
Internal fragmentation is similar to what would occur if an existing bookcase was split into two bookcases.
Each bookcase would then be only half full.
External fragmentation relates to where the new bookcase would be physically located. It would probably
need to be placed at the end of the library, even though it would “logically” need to be in a different
order. That means that to read the bookcases in order, you could no longer just walk directly from
bookcase to bookcase. Instead, you would need to follow pointers around the library to follow a chain
from one bookcase to another.
Detecting Fragmentation
SQL Server provides a measure of fragmentation in the sys.dm_db_index_physical_stats dynamic

management view. The avg_fragmentation_in_percent column shows the percentage of fragmentation.
SQL Server Management Studio also provides details of index fragmentation in the properties page for
each index.
Data Types and Indexes

Not all data types work equally well as components
of indexes. The size of the data and the selectivity
of the search are the most important considerations
for performance, but you should also consider
usability. Character-based indexes are typically less
efficient, but character data is often used to search
for a record, so an index can be very beneficial.
Numeric Index Data

When numeric values are used as components in
indexes, a large number of entries can fit in a small
number of index pages. This makes reading indexes
based on numeric values very fast.
Character Index Data
Character data values tend to be larger than numeric values. For example, a character column might hold
a customer's name or address details. This means that far fewer entries can exist in a given number of
index pages, which makes character-based indexes slower to seek.
Character-based indexes also tend to cause fragmentation problems because new values are almost never
ascending or descending.
Date-Related Index Data
Date-related data types are only slightly less efficient than integer data types. Date-related data types are
relatively small and can be compared and sorted quickly.
GUID Index Data
Globally unique identifier (GUID) values are reasonably efficient within indexes. There is a common
misconception that they are large, but they are 16 bytes long and can be compared in a binary fashion.
This means that they pack quite tightly into indexes and can be compared and sorted quite quickly.
Bit Index Data
There is a very common misconception that bit columns are not useful in indexes. This stems from the
fact that there are only two values. However, the number of values is not the issue.
Selectivity of queries is the most important issue. For example, consider a transaction table that contains
100 million rows, where one of the columns (IsFinalized) indicates whether a transaction has been
completed. There might only be 500 transactions that are not completed. An index that uses the
IsFinalized column would be very useful for finding the unfinalized transactions. It would be highly
selective.
Demonstration: Viewing Index Fragmentation

 Identify fragmented indexes.
Demonstration Steps
Identify fragmented indexes


6. On the View menu, click Solution Explorer.
7. Expand the Queries folder.
8. Open the 11 – Demonstration 1Asql script file.
Lesson 2
Single-Column and Composite Indexes
The indexes that have been discussed so far have been based on data from single columns. Indexes can
also be based on data from multiple columns and constructed in ascending or descending order. This
lesson investigates these concepts and the effects that they have on index design along with details of
how SQL Server maintains statistics on the data that is contained within indexes.
Lesson Objectives
 Describe the differences between single-column and composite indexes.
 Describe the differences between ascending and descending indexes.
 Explain how SQL Server keeps statistics on indexes.
Single-Column vs. Composite Indexes

Indexes can be constructed on multiple columns
rather than on single columns. Multicolumn indexes
are known as composite indexes.
In business applications, composite indexes are

often more useful than single-column indexes. The
advantages of composite indexes are:
 Higher selectivity.
 The possibility of avoiding the need to sort the

output rows.
In our physical library analogy, consider a query

that required the location of books by a publisher
within a specific release year. Although a publisher index would be useful for finding all of the books that
the publisher released, it would not help to narrow down the search to those books within the release
year. Separate indexes on the publisher and the release year would not be useful, but an index that
contained both publisher and release year could be very selective.
Similarly, an index by topic would be of limited value, too. After the correct topic had been located, it
would be necessary to search all of the books on that topic to determine if they were by the specified
author.
The best option would be an author index that also included details of each book's topic. In that case, a
scan of the index pages for the author would be all that was required to work out which books needed to
be accessed.
When you are constructing composite indexes, in the absence of any other design criteria, you should
typically index the most selective column first.
Ascending vs. Descending Indexes

Each component of an index can be created in an
ascending or descending order. For single-column
indexes, ascending and descending indexes are
equally useful. For composite indexes, specifying
the order of individual columns within the index
might be useful.
In general, it makes no difference whether a single-

column index is ascending or descending. From our
physical library analogy, you could scan either the
bookshelves or the indexes from either end. The
same amount of effort would be required no matter
which end you started from.
Composite indexes can benefit from each component having a different order. Often this is used to avoid
sorts. For example, you might need to output orders by date descending within customer ascending.
From our physical library analogy, imagine that an author index contains a list of books by release date
within the author index. Answering the query would be easier if the index was already structured this way.
Index Statistics
SQL Server keeps statistics on indexes to assist when
making decisions about how to access the data in a
table.
Earlier in the module, you saw that SQL Server
needs to make decisions about how to access the
data in a table. For each table that is referenced in a
query, SQL Server might decide to read the data
pages or it might decide to use an index.
It is important to realize that SQL Server must make

this decision before it begins to execute a query.
This means that it needs to have information that
will assist it in making this determination. For each
index, SQL Server keeps statistics that tell it how the data is distributed.
Physical Library Analogy
When discussing the physical library analogy earlier, it was mentioned that if you were looking up the
books for an author, using an index that is ordered by author could be useful. However, if you were
locating books for a range of authors, there would be a point at which scanning the entire library would
be quicker than running backward and forward between the index and the shelves of books.
The key issue here is that, before executing the query, you need to know how selective (and therefore
useful) the indexes would be. The statistics that SQL Server holds on indexes provide this knowledge.
Lesson 3
Table Structures in SQL Server
Tables in SQL Server can be structured in two ways. Rows can be added in any order, or rows can be
ordered. In this lesson, you will investigate both options, and gain an understanding of how each option
affects common data modification operations. Finally, you will see how unique, clustered indexes are
structured differently to non-unique, clustered indexes.
Lesson Objectives
 Describe how tables can be organized as heaps.
 Explain how common operations are performed on heaps.
 Detail the issues that can arise with forwarding pointers.
 Describe how tables can be organized by using clustered indexes.

 Explain how common operations are performed on tables that have clustered indexes.
 Describe how unique clustered indexes are structured differently to non-unique, clustered indexes.
What Is a Heap?
A heap is a table that has no enforced order for
either the pages within the table or for the data
rows within each page.
The simplest table structure that is available in SQL

Server is a heap. Data rows are added to the first
available location within the table's pages that have
sufficient space. If no space is available, additional
pages are added to the table and the rows are
placed in those pages.
Although no index structure exists for a heap, SQL

Server tracks the available pages by using an entry
in an internal structure called an Index Allocation
Map (IAM). Heaps are allocated an index ID of zero.
In the physical library analogy, a heap would be represented by structuring your library so that every book
is just placed in any available space that is large enough. Without any other assistance, finding a book
would involve scanning one bookcase after another.
Operations on Heaps
The most common operations that are performed
on tables are INSERT, UPDATE, DELETE, and
SELECT operations. It is important to understand
how each of these operations is affected by
structuring a table as a heap.

In the library analogy, an INSERT operation would
be executed by locating any gap that was large
enough to hold the book and placing it there. If no
space that is large enough is available, a new
bookcase would be allocated and the book placed
into it. This would continue unless a limit existed on
the number of bookcases that the library could contain.
A DELETE operation could be imagined as scanning the bookcases until the book is found, removing the
book, and throwing it away. More precisely, it would be like placing a tag on the book to say that it is to
be thrown out the next time the library is cleaned up or space on the bookcase is needed.
An UPDATE operation would be represented by replacing a book with a (potentially) different copy of the
same book. If the replacement book was the same (or smaller) size as the original book, it could be placed
directly back in the same location as the original book. However, if the replacement book was larger, the
original book would be removed and the replacement placed into another location. The new location for
the book could be in the same bookcase or in another bookcase.
There is a common misconception that adding additional indexes always reduces the performance of data
modification operations. However, it is clear that for the DELETE and UPDATE operations described
above, having another way to find these rows might well be useful. In Module 5, you will see how to
achieve this.
Forwarding Pointers
When other indexes point to rows in a heap, data
modification operations cause forwarding pointers
to be inserted into the heap. This can cause
performance issues over time.
Now imagine that the physical library was

organized as a heap where books were stored in no
particular order. Further imagine that three
additional indexes were created in the library, to
make it easier to find books by author, ISBN, and
release date.
There was no order to the books on the bookcases, so when an entry was found in the ISBN index, the
entry would refer to the physical location of the book. The entry would include an address like “Bookcase
12, Shelf 5, Book 3.” That is, there would need to be a specific address for a book.
An update to the book that meant that it needed to be moved to a different location would be
problematic. One option for resolving this would be to locate all index entries for the book and update
the new physical location.
An alternate option would be to leave a note in the location where the book used to be that points to
where the book has been moved to. This is what a forwarding pointer is in SQL Server. A forwarding
pointer enables rows to be updated and moved without the need to update other indexes that point to
them.
A further challenge arises if the book needed to be moved again. There are two ways in which this could
be handled. Either yet another note could be left pointing to the new location or the original note could
be modified to point to the new location. Either way, the original indexes would not need to be updated.
SQL Server deals with this by updating the original forwarding pointer. This way, performance does not
continue to degrade by having to follow a chain of forwarding pointers.
ALTER TABLE WITH REBUILD
Forwarding pointers were a common performance problem with tables in SQL Server that were structured
as heaps. There were no straightforward options for “cleaning up” a heap to remove the forwarding
pointers.
Although options existed for removing forwarding pointers, each had significant disadvantages. SQL
Server 2008 introduced a method for dealing with this problem via the following command.
ALTER TABLE SomeTable WITH REBUILD;
Note that although options to rebuild indexes were available in prior versions, the option to rebuild a
table was not available. You can also use this command to change the compression settings for a table.
(Page and row compression are advanced topics that are beyond the scope of this course.)
What Is a Clustered Index?

Rather than storing data rows of a data as a heap,
you can design tables that have an internal logical
ordering. This kind of table is known as a clustered
index.
A table that has a clustered index has a predefined

order for rows within a page and for pages within
the table. The order is based on a key that consists
of one or more columns. The key is commonly
called a clustering key.
The rows of a table can only be in a single order, so

there can only be one clustered index on a table. An
Index Allocation Map entry is used to point to a
clustered index. Clustered indexes are always identified by using index id = 1
There is a common misconception that pages in a clustered index are “physically stored in order.”
Although this is possible in rare situations, it is not commonly the case. If it were true, fragmentation of
clustered indexes would not exist. SQL Server tries to align physical and logical order while it creates an
index, but disorder can arise as data is modified.
Index and data pages are linked within a logical hierarchy and also double-linked across all pages at the
same level of the hierarchy to assist when scanning across an index.
In the library analogy, a clustered index is similar to storing all books in a specific order. An example of
this would be to store books in International Standard Book Number (ISBN) order. Clearly, the library can
only be in a single order.
Operations on Clustered Indexes

Earlier you saw how common operations were
performed on tables that were structured as heaps.
It is important to understand how each of those
operations is affected when you are structuring a
table that has a clustered index.

In a library that is ordered in ISBN order, an INSERT
operation requires a new book to be placed in
exactly the correct logical ISBN order. If there is
space somewhere on the bookcase that is in the
required position, the book can be placed into the
correct location and all other books in the bookcase
moved to accommodate the new book. If there is not sufficient space, the bookcase needs to be split.
Note that a new bookcase would be physically placed at the end of the library, but would be logically
inserted into the list of bookcases.
INSERT operations would be straightforward if the books were being added in ISBN order. New books
could always be added to the end of the library and new bookcases added as required. In this case, no
splitting is required.
When an UPDATE operation is performed, if the replacement book is the same size or smaller and the
ISBN has not changed, the book can just be replaced in the same place. If the replacement book is larger,
the ISBN has not changed, and there is spare space within the bookcase, all other books in the bookcase
can slide along to enable the larger book to be replaced in the same spot.
If there was insufficient space in the bookcase to accommodate the larger book, the bookcase would need
to be split. If the ISBN of the replacement book was different from the original book, the original book
would need to be removed and the replacement book treated like the insertion of a new book.
A DELETE operation would involve the book being removed from the bookcase. (Again, more formally, it
would be flagged as free in a free space map, but simply left in place for later removal.)
When a SELECT operation is performed, if the ISBN is known, the required book can be quickly located by
efficiently searching the library. If a range of ISBNs was requested, the books would be located by finding
the first book and continuing to collect books in order until a book was encountered that was out of
range or until the end of the library was reached.
Unique vs. Non-Unique, Clustered Indexes

SQL Server must be able to uniquely identify any
row in a table. Clustered indexes can be created as
unique or non-unique.
If you do not specify indexes as being unique, SQL
Server will add another value to the clustering key
where necessary to ensure that the values are
unique for each row. This value is commonly called
a “uniqueifier.”
In the library analogy, a unique index is like a rule

that says that no more than a single copy of any book can ever be stored. If someone tried to insert a new
book and another book was found to have the same ISBN (assuming that the ISBN was the clustering
key), the insertion of the new book would be refused.
It is important to understand that the comparison is made only on the clustering key. The book would be
rejected for having the same ISBN, even if other properties of the book were different.
A non-unique, clustered index is similar to having a rule that allows more than a single book that has the
same ISBN. The issue is that it is likely to be desirable to track each copy of the book separately. The
uniqueifier that SQL Server adds would be like a “Copy Number” being added to books that can be
duplicated. The uniqueifier is not visible to users.
Demonstration: Rebuilding Heaps

 Create a table as a heap, check the fragmentation and forwarding pointers for a heap, and rebuild a
heap.
Demonstration Steps
Create a table as a heap, check the fragmentation and forwarding pointers for a heap, and rebuild a heap

7. Open the 31 – Demonstration 3A.sql script file.
Lesson 4
Working with Clustered Indexes
If a decision has been made to structure a table by using a clustered index, it is important to be familiar
with how the indexes are created, dropped, or altered. In this lesson, you will see how to perform these
actions, understand how SQL Server performs them automatically in some situations, and see how to
incorporate free space within indexes to improve insert performance.
Lesson Objectives
 Create clustered indexes.
 Drop a clustered index.
 Alter a clustered index.
 Incorporate free space in indexes.

 Understand the characteristics of good clustering keys.
 Understand the appropriate data types for clustering keys.
 Persist data by using indexes.
Creating Clustered Indexes

It is possible to create clustered indexes either
directly by using the CREATE INDEX command or
automatically in some situations where a PRIMARY
KEY constraint is specified on the table.
It is very important to understand the distinction
between a primary key and a clustering key. Many
users confuse the two terms or attempt to use them
interchangeably. A primary key is a constraint. It is a
logical concept that is supported by an index, but
the index may or may not be a clustered index.
When a PRIMARY KEY constraint is added to a
table, the default action in SQL Server is to make it
a clustered primary key if no other clustered index already exists on the table. You can override this action
by specifying the word NONCLUSTERED when declaring the PRIMARY KEY constraint.
In the first example on the slide, the dbo.Article table was created. The ArticleID column has a PRIMARY
KEY constraint associated with it. There is no other clustered index on the table, so the index that is
created to support the PRIMARY KEY constraint will be created as a clustered primary key. ArticleID will
be both the clustering key and the primary key for the table.
In the second example on the slide, the dbo.LogData table is initially created as a heap. When the
PRIMARY KEY constraint is added to the table, no other clustered index is present on the table, so SQL
Server will create the index to support the PRIMARY KEY constraint as a clustered index.
If a table has been created as a heap, it can be converted to a clustered index structure by adding a
clustered index to the table. In the fourth command shown in the examples on the slide, a clustered index
named CL_LogTime is added to the dbo.LogTime table and the LogTimeID column is the clustering key.
This command will not only create an index over the data, it causes the entire structure of the table to be
reorganized.
Dropping a Clustered Index

The method that is used to drop a clustered index
depends upon the way in which the clustered index
was created.
You can use the DROP INDEX command to drop

clustered indexes that were created by using the
CREATE INDEX command. Indexes that are created
internally to support constraints need to be
removed by removing the constraint.
Note in the second example on the slide that the

PRIMARY KEY constraint is being dropped. This
would cause a clustered index that had been
created to support that key to also be dropped.
When the clustered index is dropped, the data in the table is not lost. The table is reorganized as a heap.
Altering a Clustered Index

Minor modifications to indexes are permitted
through the ALTER INDEX statement. However,
you cannot use this statement to modify the
structure of the index, including the columns that
make up the key.
A few maintenance operations are possible by using

the ALTER INDEX statement, such as rebuilding or
reorganizing an index. (Reorganizing an index only
affects the leaf level of the index.) Note that if the
reorganization of an index is interrupted, the work
that has been done up to that point is not lost.
However, if a rebuild operation is interrupted, all of
the work that has been done up to the point of interruption is lost.
Restructuring an index is not permitted within an ALTER INDEX statement. You cannot add or remove
columns that make up the clustering key by using this command and you cannot move the index to a
different filegroup.
WITH DROP_EXISTING
An option to change the structure of an index is provided while creating a replacement index. The
CREATE INDEX command includes a WITH DROP_EXISTING clause that can enable the statement to
replace an existing index. This operation is also typically much faster than dropping and re-creating the
index because SQL Server can build the index based on the old index structure.
Note that you cannot change an index from being clustered to nonclustered or back by using this
command. (Nonclustered indexes are covered in Module 5.)
Disabling Indexes
Although the ALTER INDEX statement includes a DISABLE option that can be applied to any index, this
option is of limited use with clustered indexes. After a clustered index is disabled, no access to the data in
the table is then permitted until it is rebuilt.
Incorporating Free Space in Indexes

The FILLFACTOR and PAD_INDEX options are
used to provide free space within index pages. This
can improve the performance of INSERT and
UPDATE operations in some situations, but often to
the detriment of SELECT operations.
FILLFACTOR and PAD_INDEX
The availability of free space in an index page can

have a significant effect on the performance of
UPDATE operations in the index. If an index record
must be inserted and there is no free space, a new
index page must be created and the contents of the
old page split across the two pages. This can affect
performance if it happens too frequently.
You can alleviate the performance impacts of page splits by leaving empty space on each page when you
are creating an index, including a clustered index. You can achieve this by specifying a FILLFACTOR value.
FILLFACTOR defaults to 0, which means “fill 100 percent.” Any other value (including 100) is taken as the
percentage of how full each page should be. For the example on the slide, this means 70 percent full and
30 percent free space on each page.
FILLFACTOR only applies to leaf-level pages in an index. PAD_INDEX is an option that, when it is
enabled, causes the same free space to be allocated in the nonleaf levels of the index.
Characteristics of Good Clustering Keys

You can use many different types of data for
clustering a table. Not every situation is identical,
but there is a set of characteristics that generally
create the best clustering keys. Typically, keys
should be short, static, increasing, and unique.
Although some designs might call for different

styles of clustering key, most designs call for
clustering keys that have the following
characteristics:
 Short. Clustering keys should be short. They

need to be sorted and they are stored at the
leaf level of every other index. There is a limit
of 16 columns and 900 bytes, but good clustering keys are typically much smaller than this.
 Static. Clustering keys should be based on data values that do not change. This is one reason why
primary keys are often used for this purpose. A change to the clustering key will mean the need to
move the row. You have seen already that moving rows is generally not desirable.
 Increasing. This assists with INSERT behavior. If the keys within the data are increasing as they are
inserted, the inserts happen directly at the logical end of the table. This minimizes fragmentation (the
need to split pages) and reduces the amount of memory that is needed for page buffers.
 Unique. Unique clustering keys do not require SQL Server to add a uniqueifier column. It is important
to declare unique values as being unique. Otherwise, SQL Server will still add a uniqueifier column to
the key.
Although this list provides good general guidelines, you must evaluate typical query patterns when you
are designing clustering keys.
Appropriate Data Types for Clustering Keys

Just as some data types are generally better as
components of indexes than other data types, some
data types are more appropriate for use as
clustering keys than others.
int and bigint typically make the best clustering
keys in general use, particularly if they are used in
conjunction with an IDENTITY constraint that causes
their values to continue to increase. (Constraints are
discussed in Module 3.
The biggest challenge in current designs is the use

(and overuse) of GUIDs that are stored in
uniqueidentifier columns. Although they are larger
than the integer types, GUIDs are random in nature and routinely cause index fragmentation through
page splits when they are used as clustering keys.
You can use character data types for clustering keys, but the sorting performance of character data types
is limited. Character values often tend to change in typical business applications.
Date data is typically not unique, but provides excellent advantages in size and sorting performance. It
works well for date range queries that are common in typical business applications.
Logical vs. Physical Schema

Users typically struggle with the concept that their physical data schema does not have to match their
logical data schema. For example, although GUIDs might be used throughout an application layer, you do
not have to use them throughout the physical implementation of the schema. One option would be to
use one table to look up an int that was based on a GUID and have that int used everywhere else in the
design.
Persisting Data by Using Indexes

Persisting data by using indexes can deliver
considerable performance improvements. To enable
you to achieve these improvements, SQL Server
enables you to create indexes on views and create
indexes on computed columns.
Benefits of Indexed Views

Commonly used queries are often incorporated into
views. Views mask the complexity of queries
because all of the query logic is contained within
the view, which is in turn stored in a SQL Server
database. The approach can help to make
application development more efficient, because
instead of having to incorporate the query logic into their applications, developers can use simpler
Transact-SQL statements that call views. In addition, if the query logic changes, you only need to update
the view and not all of the applications. Views do not improve query performance. When you call a view,
the query within the view runs and returns a data set, but this data set is not persisted, so every time the
view is called, SQL Server must build the data set again. However, you can improve performance by
creating indexed views.
An indexed view is a unique, clustered index that you create on a view. The index stores the data set that
is the result of the query that the view contains, so the data set is said to be persisted or materialized.
When the view is called, SQL Server can return the data set directly from the index, and does not need to
run the query. By avoiding the costs of processing of the query logic, including the joins, aggregations,
and filters that the query contains, SQL Server can significantly improve response times. Indexed views can
potentially provide additional performance benefits, because the query optimizer can choose to use an
index that is built on a view even if the view is not referenced in the FROM clause of the query. For
example, if a query has the same definition as the syntax of an indexed view, or it queries a subset of the
data that the indexed view contains, the optimizer can use the indexed view to answer the query.
Considerations for Planning Indexed Views
When you are planning indexed views, consider the following points:
 Indexed views provide the most significant performance benefits for queries that are commonly used
or high priority, and queries that include operations such as joins or aggregations. Creating indexes
for infrequently run, low-priority queries might deliver improved performance for those queries, but
the costs of index maintenance will probably outweigh the benefits.
 Indexed views can cause performance degradation when data sets are frequently modified because
inserts, updates, and deletes all require the data to be changed in both the index and the supporting
tables. Furthermore, SQL Server might need to perform aggregations every time a row is modified in
the underlying table.
 When you drop a view, all indexes on the view are also dropped. The data set is no longer persisted,
so after you drop a view, the query optimizer processes the view in the same way as a standard view.
 There are numerous requirements for creating indexed views, including:
o You must set the ANSI_NULLS and QUOTED_IDENTIFIER options to ON when you execute the
CREATE VIEW statement.
o You must set the ANSI_NULLS option to ON when you execute the CREATE TABLE statements
to create the tables that the view will reference. For this reason, you should ensure that you
consider early in the planning stage whether you might use indexed views.
o A view that has an index can only reference base tables in the same database as the view, and it
cannot reference other views.
o The definition of an indexed view must be deterministic. Deterministic expressions always return
the same result when you execute them with the same set of input values. Certain functions are
not deterministic, so you cannot use them in an indexed view. For example, the DATEADD
function is deterministic because it always returns the same result when it is used with a specific
set of parameter values. However, the GETDATE function is not deterministic because the value it
returns changes each time it is executed.
Reference Links: For a full list of the requirements for creating indexed views, see the
Creating Indexed Views topic in SQL Server Books Online.
Benefits of Indexing Computed Columns

A computed column is a column in a table that is derived from the values of other columns in the same
table. For example, in a table that tracks product sales, you might create a computed column that
multiplies the unit price of a product by the quantity of that product sold to calculate a revenue value for
each order. Applications that query the database could then obtain the revenue values without having to
specify the calculation themselves. When you create a computed column, SQL Server does not store the
computed values, and it only calculates them when the column is included in a query. Building an index
on a computed column improves performance because the index does include the computed values, so
SQL Server does not need to calculate them when the query is executed. Furthermore, the values in the
index automatically update when the values in the base columns change, so the index remains up to date.
When you are deciding whether to index computed columns, you should consider the following points:
 When the data in the base columns that the computed column references changes, the index is
correspondingly updated. If the data changes frequently, these index updates can impair
performance.
 When you rebuild an index on a computed column, SQL Server recalculates the values in the column.
The amount of time that this takes will depend on the number of rows and the complexity of the
calculation, but if you rebuild indexes often, you should consider the impact that this can have.
 You can only build indexes on computed columns that are deterministic.
Reference Links: For information about the requirements for creating indexes on
computed columns, see the Indexes on Computed Columns topic in SQL Server Books Online.
Demonstration: Working with Clustered Indexes

 Create a table that has a clustered index, detect fragmentation in a clustered index, and correct
fragmentation in a clustered index.
Demonstration Steps
Create a table that has a clustered index, detect fragmentation in a clustered index, and correct
fragmentation in a clustered index

Lesson 5
Working with Nonclustered Indexes
In this module, you will learn how SQL Server structures nonclustered indexes and how they can provide
performance improvements for your applications. You will also see how to create, alter, and drop
nonclustered indexes.
Lesson Objectives
 Describe nonclustered indexes.
 Describe how nonclustered indexes differ from heaps.

 Describe how nonclustered indexes differ from clustered indexes.
 Create nonclustered indexes.
 Describe the INCLUDE clause.
 Describe the performance impact of lookups in nested loops.
What Is a Nonclustered Index?

Key Points
You have seen how tables can be structured as

heaps or have clustered indexes. You can create
additional indexes on the tables to provide
alternate ways to rapidly locate required data.
These additional indexes are called nonclustered
indexes.
Nonclustered Indexes
A table can have up to 999 nonclustered indexes.

These indexes are assigned index IDs that are
greater than or equal to 2. Nonclustered indexes
can be defined on a table regardless of whether the table uses a clustered index or a heap, and are used
to improve the performance of important queries.
Whenever you update key columns from the nonclustered index or update clustering keys on the base
table, the nonclustered indexes need to be updated, too. This affects the data modification performance
of the system. Each additional index that is added to a table increases the work that SQL Server might
need to perform when modifying the data rows in the table. You must take care to balance the number of
indexes that are created against the overhead that they introduce.
Ongoing Review
An application's data access patterns may change over time, particularly in enterprises where ongoing
development work is being performed on the applications. This means that nonclustered indexes that are
created at one point in time may need to be altered or even dropped at a later point in time, to continue
to achieve high performance levels.
Physical Analogy
Continuing our library analogy, nonclustered indexes are indexes that point back to the bookcases. They
provide alternate ways to look up the information in the library. For example, they might enable access by
author, by release date, or by publisher. They can also be composite indexes where you could find an
index by release date within the entries for each author.
Nonclustered Indexes over Heaps

Nonclustered indexes have the same B-Tree
structure as clustered indexes, but in the
nonclustered index, the data and the index are
stored separately. When the underlying table is
structured as a heap, the leaf level of a nonclustered
index holds Row ID pointers instead of data. By
default, no data apart from the keys is stored at the
leaf level.
After traversing the structure of the nonclustered

index, SQL Server obtains Row ID pointers in the
leaf level of the index and uses these pointers to
directly access all required data pages.
You can create multiple nonclustered indexes on a table regardless of whether the table is structured as a
heap or has a clustered index.
Physical Analogy
Based on the library analogy, a nonclustered index over a heap is like an author index pointing to books
that have been stored in no particular order within the bookcases. When an author is found in the index,
the entry in the index for each book would have an address like “Bookcase 4, Shelf 3, Book 12.” Note that
it would be a pointer to the exact location of the book.
Nonclustered Indexes over Clustered Indexes

You have seen that the base table could be
structured by using a clustered index instead of a
heap. Although SQL Server could have been
designed so that nonclustered indexes still pointed
to Row IDs, it is not designed that way. Instead, the
leaf level of a nonclustered index contains the
clustering keys for the base table.
After traversing the structure of the nonclustered

index, SQL Server obtains clustering keys from the
leaf level of the index. It then uses these keys to
traverse the structure of the clustered index to
locate the required data pages. Note that two sets
of index traversal occur.
If the clustered index is not a unique, clustered index, the leaf level of the nonclustered index also needs
to hold the uniqueifier value for the data rows.
Physical Analogy
In the library analogy, a nonclustered index over a clustered index is like having an author index built over
a library where the books are all stored in ISBN order. When the required author is found in the author
index, the entry in the index provides details of the ISBNs for the required books. These ISBNs are then
searched for by using the second index to locate the books within the bookcases. If the bookcases need to
be rearranged (for example, due to other rows being modified), it is not necessary to make any changes
to the author index because it is only providing keys that are used for locating the books, rather than the
physical location of the books.
Creating Nonclustered Indexes

Nonclustered indexes are created by using the
CREATE INDEX statement. When you do not
specify which type of index you require, the
CREATE INDEX statement creates nonclustered
indexes by default. Wherever possible, the clustered
index (if the table needs one) should be created
prior to the nonclustered indexes. If this does not
happen, SQL Server has to rebuild all of the
nonclustered indexes while creating the clustered
index.
Creating a nonclustered index requires you to

supply a name for the index, the name of the table
to be indexed, and the columns that need to be used to create the index key. It is important to choose an
appropriate naming scheme for indexes. Many standards for naming indexes exist, along with strong
opinions on which of the standards is best. The important thing is to choose a standard and follow it
consistently.
If an index is created only to enhance performance, rather than as part of the initial schema of an
application, one suggested standard is to include in the name of the index the date of creation and a
reference to documentation that describes why the index was created. Database administrators are often
hesitant to remove indexes when they do not know why those indexes were created. Keeping
documentation that explains why indexes were created avoids that confusion.
Composite, Nonclustered Indexes
A composite index specifies more than one column as the key value. Using composite indexes can
enhance query performance, especially when users regularly search for information in more than one way.
However, wide keys increase the storage requirements of an index.
Most useful nonclustered indexes in business applications are composite indexes. A common error is to
create single-column indexes on many columns of a table. These indexes are rarely useful.
In composite indexes, the ordering of key columns is important. In the absence of any other requirements,
you should specify the most selective column first. You can specify each column that makes up the key as
ASC (ascending) or DESC (descending). Ascending is the default order.
INCLUDE Clause
In earlier versions of SQL Server (prior to 2005), it
was common for database administrators or
developers to create indexes that had a large
number of columns, to attempt to cover important
queries. Covering a query avoids the need for
lookup operations and can greatly increase the
performance of queries. The INCLUDE clause was
introduced to make the creation of covering
indexes easier.
Adding columns to the key of an index adds a great

deal of overhead to the index structure. For
example, in the library analogy, if an index was
constructed on PublisherID, ReleaseDate, and Title, the index would internally be sorted by Title for no
benefit. A further issue is the limitation of 16 columns and 900 bytes for an index because this limits the
ability to add columns to index keys when trying to cover queries. The nonleaf-level structure of the index
would also be larger.
SQL Server 2005 introduced the ability to include one or more columns (up to 1,024 columns) only at the
leaf level of the index. The index structure in other levels is unaffected by these included columns. They
are included only to help with covering queries. If more than one column is listed in an INCLUDE clause,
the order of the columns within the clause is not relevant.
Performance Impacts
Indexes that provide all columns required for a query are considered to “cover” the query. Covering
indexes can have a very positive performance impact on the queries that they are designed to support.
However, although it would be possible to create an index to cover most queries, doing so could be
counterproductive. Each index that is added to a table can negatively impact the performance of data
modifications on the table. For this reason, it is important to decide which queries are most important and
to aim to cover only those queries.
Performance Impact of Lookups in Nested Loops

Nonclustered indexes can be very useful when you
need to find specific data based on the key columns
of the index. However, for each entry found, SQL
Server needs to use the values from the leaf level of
the index (either clustering keys or Row ID) to look
up the data rows from the base table. This lookup
process can be very expensive.
In the library analogy, this is equivalent to looking

up an author in an index and for each entry found,
running over to the bookcase to retrieve the books
that the index pointed to. There is a point at which
the effort of doing this is not worthwhile and it is
quicker to scan the entire library.
Demonstration: Working with Nonclustered Indexes

 Create covering indexes.
Demonstration Steps
Create covering indexes
3. Start SQL Server Management Studio.
Lab: Creating Indexes

Scenario
When you are designing a table, one of the most important decisions is choosing an appropriate table
structure.
In this lab, you will implement clustered and nonclustered indexes.
Objectives
Create tables that have clustered indexes.
Improved performance with nonclustered indexes.
Password: Pa$$w0rd
Exercise 1: Create Tables That Have Clustered Indexes

Scenario
The design documentation calls for some tables that have clustered indexes. You will create two tables
that have clustered indexes.

1. Review the requirements.
2. Create the tables in the AdventureWorks database.
The supporting documentation for this exercise is located in D:\LabFiles\Lab04\Starter\Supporting

Documentation.docx.
2. Review the Requirements
3. Create the Tables in the AdventureWorks Database

 Task 2: Review the Requirements

1. Navigate to D:\LabFiles\Lab04\Starter\ and open Supporting Documentation.docx.
2. Review the requirements in the supporting documentation for the tables.

 Task 3: Create the Tables in the AdventureWorks Database

1. Create a table based on the supporting documentation for Table 1.
2. Create a table based on the supporting documentation for Table 2.
Results: After completing this exercise, you will have created tables with clustered indexes.
Exercise 2: Improve Performance Through Nonclustered Indexes

Scenario
The marketing system includes a query that is constantly executed and is performing too slowly. It
retrieves peoples’ names sorted in the order LastName, MiddleName, FirstName. You need to create an
index to support this query.
1. Implement a Nonclustered Index
 Task 1: Implement a Nonclustered Index

1. Add a nonclustered index to the LastName, FirstName, and MiddleName columns of the
Person.Person table in the AdventureWorks database.
Results: After completing this lab, you will have created a nonclustered index.
Question: When is it important that a clustered index has an increasing key?
Question: Which table structure is automatically assigned when a table is assigned a primary
key during table creation and no structure is specified?

Best Practice: Unless specific circumstances arise, most tables should have a clustered
index.
The clustered index may or may not be placed on the table's primary key.
When using GUID primary keys in the logical data model, consider avoiding their use throughout
the physical implementation of the data model.
Review Question(s)
Question: What is the main problem with using unique identifiers as primary keys?
Question: Where are newly inserted rows placed when a table is structured as a heap?
5-1
Module 5
Advanced Indexing
Contents:
Module Overview 5-1
Lesson 1: Core Concepts of Execution Plans 5-2
Lesson 2: Common Execution Plan Elements 5-9
Lesson 3: Working with Execution Plans 5-14
Lesson 4: Designing Effective Nonclustered Indexes 5-17
Lesson 5: Performance Monitoring 5-21
Lab: Advanced Indexing 5-27

Module Overview
In earlier modules, you have seen that one of the most important decisions that Microsoft® SQL Server®
takes when executing a query, is how to access the data in any of the tables involved in the query. SQL
Server can read the underlying table (which might be structured as a heap or with a clustered index), but
it might also choose to use another index. It is important to know how to determine the outcomes of the
decisions that SQL Server makes. Execution plans show how each step of a query was executed. In this
module, you will learn how to read and interpret execution plans and you will see how nonclustered
indexes have the potential to significantly enhance the performance of your applications. You will also
learn to use a tool that can help you design these indexes appropriately.
Objectives
 Explain the core concepts related to the use of execution plans
 Describe the role of the most common execution plan elements
 Work with execution plans
 Design effective nonclustered indexes
 Explain how to monitor performance and create a performance baseline

5-2 Advanced Indexing
Lesson 1
Core Concepts of Execution Plans
The first steps in working with execution plans in Microsoft® SQL Server® data management software are
to understand why execution plans are so important and to understand the phases that SQL Server passes
through when it executes a query. When you have that information, you can learn what an execution plan
is, what the different types of execution plans are, and how execution plans relate to execution contexts. It
is possible to retrieve execution plans in a variety of formats. It is also important to understand the
differences between each of these formats and to know when to use each format.
Lesson Objectives
 Explain why execution plans matter.
 Describe the phases that SQL Server passes through while executing a query.
 Explain what execution plans are.
 Describe the difference between actual and estimated execution plans.
 Describe execution contexts.
 Make effective use of the different execution plan formats.
 Use SET statements in conjunction with execution plans.
Why Execution Plans Matter

Rather than trying to guess how a query is to be
performed or how it was performed, execution
plans enable you to obtain precise answers.
Execution plans are also commonly referred to as
query plans.
If you read posts in the SQL Server forums or
newsgroups, or participate in any of the email
distribution lists that are related to SQL Server, you
will notice questions that occur very regularly:
 Why does my query take so long to execute?
 This query is very similar to another query that

executes quickly, yet this query takes much longer to complete. Why is this happening?
 I created an index to make access to the table fast, but SQL Server is ignoring the index. Why won't it
use my index?
 I have created an index on every column in the table, yet SQL Server still takes the same time to
execute my query. Why is it ignoring the indexes?
SQL Server provides tools to help answer these common questions. Execution plans show how SQL Server
intends to execute a query or how it executed a query. The ability to interpret these execution plans
enables you to answer the questions above.
Many users capture execution plans and then try to resolve the worst performing aspects of a query.
However, the best use of execution plans is in verifying that the plan that you expected to be used was, in
fact, used. This means that you already need to have an idea of how you expect SQL Server to execute
your queries.
Query Execution Phases

SQL Server executes queries in a series of phases. A
key outcome of one of the phases is an execution
plan. After the plan has been compiled, you may
cache it for later use.
Transact-SQL Parsing
When you are executing queries, the first phase is

to check that the statements that are supplied in
the batch follow the rules of the language. Each
statement is checked to find any syntax errors.
Object names within the statements are located.
In the second phase, SQL Server resolves the names of objects to their underlying object IDs. SQL Server
needs to know exactly which object is being referred to. For example, consider the statement in the
following code example.
SELECT * FROM Product;
At first glance, it might seem that mapping the Product table to its underlying object ID would be easy,
but remember that SQL Server supports more than a single object that has the same name in a database,
through the use of schemas. For example, note that each of the objects in the following code could be
completely different in structure and that the names relate to entirely different objects.
SELECT * FROM Production.Product;

SELECT * FROM Sales.Product;
SELECT * FROM Marketing.Product;
SQL Server needs to apply a set of rules to relate the table name “Product” to the intended object.
Query Optimization
After the object IDs have been resolved, SQL Server needs to decide how to execute the overall batch.
Based on the available statistics, SQL Server will make decisions about how to access the data that is
contained in each of the tables that are part of each query. This might involve creating new statistics or
updating existing statistics before executing the query.
SQL Server does not always find the best possible plan. It weighs up the cost of a plan, based on its
estimate of the cost of resources that are required to execute the plan. The cost is based on CPU
resources, memory, and I/O operations and is strongly influenced by the available statistics. The aim is to
find a satisfactory plan in a reasonable period of time. The more complex a Structured Query Language
(SQL) batch is, the longer it could take SQL Server to evaluate all of the possible plans that could be used
to execute the batch. Finding the best plan might take longer than executing a less optimal plan.
There is no need to consider alternate plans for data definition language (DDL) statements such as
CREATE, ALTER, or DROP. Many simple queries also have trivial plans that are quickly identified.
Query Plan Execution
After a plan is found, the execution engine and storage engine work to execute the plan. It may or may
not succeed because run-time errors could occur.
Plan Caching
If the plan is considered sufficiently useful, it may be stored in the Plan Cache. On later executions of the
batch, SQL Server will attempt to reuse execution plans from the Plan Cache. This is not always possible
and, for certain types of query, not always desirable.
What Is an Execution Plan?

An execution plan is a map that details either how
SQL Server would execute a query or how SQL
Server did execute a query. SQL Server uses a cost-
based optimizer.
Execution plans show the overall method that SQL

Server is using to satisfy the requirements of the
query. As part of the plan, SQL Server decides the
types of operations to be performed and the order
in which the operations will be performed. Many
operations are related to the choice that SQL Server
makes about how to access data in a table and
whether available indexes will be used. These
decisions are based on the statistics that are available to SQL Server at the time.
SQL Server uses a cost-based optimizer: each element of the query plan is assigned a cost in relation to
the total cost of the batch. SQL Server Management Studio also calculates a relationship between the
costs of each statement, which is useful where a batch contains more than one statement.
The costs that are either estimated or calculated as part of the plan can only be interpreted within the
context of the plan. It is possible to compare the cost of individual elements across statements in a single
batch, but you should not make comparisons between the costs of elements in different batches. You can
only use costs to determine whether an operation is cheaper or more expensive than another operation.
You cannot use costs to estimate execution time.
Actual vs. Estimated Execution Plans

SQL Server can record the plan that it used for
executing a query. However, before it executes a
query, it needs to create an initial plan.
It is possible to ask SQL Server to return details of
the execution plan that was used, along with results
that were returned from a query. These plans are
known as “actual” execution plans. In SQL Server
Management Studio, on the Query menu, there is
an Include Actual Execution Plan option. After the
results from a query are returned, another output
tab is created that shows the execution plan that
was used.
Another option on the Query menu is Display Estimated Execution Plan. This asks SQL Server to
calculate an execution plan for a query (or batch) based on how it would attempt to execute the query.
This is calculated without actually executing the query. This type of plan is known as an “estimated”
execution plan. Estimated execution plans are very useful when you are designing queries or when you
are debugging queries that are suffering from performance problems.
Note that it is not always possible to retrieve an estimated execution plan. One common reason for this is
that the batch might include statements that create objects and then access them. The objects do not
exist yet, so SQL Server has no knowledge of them and cannot create a plan for processing them. You will
see an example of this in the next demonstration.
When SQL Server executes a plan, it may also make choices that differ from an estimated plan. This is
commonly related to the available resources (or more likely the lack of available resources) at the time
when the batch is executed.
Execution plans include row counts in each data path. For estimated execution plans, these are based on
estimates from the available statistics. For actual execution plans, both the estimated and actual row
counts are shown.
What Is an Execution Context?

Execution plans are reentrant. This means that more
than one user can be executing the same execution
plan at one time. Each user needs separate data
that is related to his or her individual execution of
the plan. This data is held in an object known as an
“execution context.”
Execution plans detail the steps that SQL Server
would take (or did take) when it was executing a
batch of statements. When multiple users are
executing the plan concurrently, there needs to be
a structure that holds data that is related to their
individual executions of the plan.
Execution contexts are cached for reuse in a very similar way to the caching that occurs with execution
plans. When a user executes a plan, SQL Server retrieves an execution context from the cache if there is
one available, even if it was generated for a different user.
To maximize performance and minimize memory requirements, execution contexts are not fully
completed when they are created. Branches of the code are “fleshed out” when the code needs to move
to the branch. This means that if a procedure includes a set of procedural logic statements (like the IF
statement), the execution context that is retrieved from the cache may have gone in a different logical
direction and not yet have all the details that are required, even if it was a different user who executed the
procedure.
For caching reuse, it is useful to avoid too much procedural logic in stored procedures. You should favor
set-based logic instead.
Execution Plan Formats

There are three formats for execution plans. Text-
based plans are now deprecated, so you should use
XML-based plans instead. Graphical plans render
XML-based plans for ease of use.
Prior to SQL Server 2005, only text-based plans

were available, and many tools still use this type of
plan. You can retrieve text-based plans from SQL
Server by executing the following statement.
SET SHOWPLAN_TEXT ON;
Text-based execution plans were superseded by

XML-based plans in SQL Server 2005 and are now deprecated. They should not be used in new
development work.
Plan Portability
SQL Server provided a graphical rendering of execution plans to make reading text-based plans easier.
One challenge with this, however, was that it was very difficult to send a copy of a plan to another user for
review. XML plans can be saved as an .sqlplan file type and are entirely portable between systems. You
can render graphical plans from XML plans, including plans that have been received from other users.
Note that graphical plans include only a subset of the information that is available in an XML plan.
Although it is not easy to read XML plans directly, you can obtain further information by reading the
contents of the XML plan.
XML plans are also ideal for programmatic access for users who are creating tools and utilities because
XML is relatively easy to consume programmatically in an application.
SET Statements
The Transact-SQL SET statements enable you to
view execution plan information in text format, or
to capture it in XML format so that you can use
other applications to view it or process it. The
output from these statements is displayed on the
Messages tab in the Results pane in SQL Server
Management Studio.
SET STATISTICS IO
SET STATISTICS IO displays the following

information for queries that you execute:
 Scan count. The scan count is the number of

seeks or scans that need to be performed at the leaf level of an index to retrieve all of the required
data. When a clustered index is used, if the columns contain unique values, the scan count will be 0.
This is because after SQL Server reads the index and arrives at the unique key value in the leaf level,
there is no need to read further. For a clustered index on a column that has non-unique values, the
scan count will be 1. This is because after SQL Server reads the index and arrives at the key value in the
leaf level, it must scan to locate the required values from among the non-unique key values that the
leaf level contains. A value greater than 1 indicates that an index was accessed multiple times during
the query, for example, when a nested loops join is used that requires an index read for each value that
it is attempting to match.
 Physical reads. The physical reads value represents the number of pages that have been read from
the disk. If the required data is already in the cache, this will be 0. If the data is not in the cache, SQL
Server accesses the pages from the disk, places them in the data cache, and then reads them from
there.
 Logical reads. The logical reads value represents the number of pages that have been read from the
cache. The fewer reads that a query requires, the faster it will execute.
 Read-ahead reads. The read-ahead reads value represents the number of pages that SQL Server read
from the disk into the cache to execute the query. The read-ahead mechanism anticipates the data
pages and index pages that might be needed to execute the query, and accesses them before they are
required for processing, which improves performance.
 Large object (LOB) logical reads, LOB physical reads, and LOB read-ahead reads. These values
indicate the number of logical reads, physical reads, and read-ahead reads that were performed to
access LOB data.
The code example below includes the SET STATISTICS IO ON option in query execution:
SET STATISTICS IO
SET STATISTICS IO ON;
SELECT MONTH(s.OrderDate) AS OrderMonth, p.ProductName, SUM(s.SalesAmount) AS Revenue
FROM SalesOrder AS s
JOIN Product AS p ON s.ProductCode = p.ProductCode
WHERE YEAR(s.OrderDate) = YEAR(getdate())
GROUP BY MONTH(s.OrderDate), p.ProductName
ORDER BY MONTH(s.OrderDate), p.ProductName
SET STATISTICS IO OFF;
SET STATISTICS TIME

SET STATISTICS TIME displays the time that was taken to parse, compile, and execute a query. Parsing
and compiling a query involves checking syntax, creating an execution tree, and creating an execution
plan in the cache. The SET STATISTICS TIME output displays the output for parsing and compiling
together. When you execute a Transact-SQL statement for a second time, the plan is likely to be still in the
cache from the previous time that the statement ran. When it is not necessary to create a plan, the parse
and compile times will usually be lower, possibly even with a value of 0. The SET STATISTICS TIME
output displays the time that was taken to execute the plan separately. The output displays two times: the
CPU time and the elapsed time. CPU time is a measure of the time that the CPU spent performing the
task. Elapsed time is the total time that it took to perform the task. Elapsed time can sometimes be
significantly longer than CPU time because it includes the time that was taken for I/O processing in
addition to the CPU time. It is common to see variation in the CPU time and the elapsed time because
these values depend in part on the overall workload of the server, and not just on the individual query.
SET SHOWPLAN_TEXT and SET SHOWPLAN_ALL

The SET SHOWPLAN_TEXT and SET SHOWPLAN_ALL commands cause SQL Server to display the
execution plan for the query in a text format. They are comparable to using the Display Estimated
Execution Plan option, because they display the execution information without executing the query. SET
SHOWPLAN_ALL displays more information about the plan than SET SHOWPLAN_TEXT.
SET STATISTIC PROFILE

The SET STATISTICS PROFILE command provides similar output to SET SHOWPLAN_ALL, except that it
displays the output after the Transact-SQL statement has executed. It includes more detail, such as the
number of rows that were processed by the operators in the plan.
SET SHOWPLAN_XML
The SET SHOWPLAN_XML command displays the execution plan in XML format, which enables you to
use the output in other applications. SET SHOWPLAN_XML does not execute the Transact-SQL
statement.
SET STATISTICS XML

The SET STATISTICS XML command provides similar output to SET SHOWPLAN_XML, except that it
displays the output after the Transact-SQL statement has executed. It includes more detail, such as the
number of rows that were processed by the operators in the plan.
Note: SET SHOWPLAN_TEXT, SET SHOWPLAN_ALL, and SET STATISTICS PROFILE will
be deprecated in a future version of SQL Server, so you should avoid using them. Instead of SET
SHOWPLAN_TEXT and SET SHOWPLAN_ALL, you should use SET SHOWPLAN_XML. Instead
of SET STATISTICS PROFILE, you should use SET STATISTICS XML.
Reference Links: For more information about the SET commands, see the Displaying
Execution Plans by Using the Showplan SET Options topic in the Microsoft Developer Network
(MSDN) library.
Demonstration: Viewing Execution Plans in SQL Server Management

Studio
Use execution plans.
Demonstration Steps
Use execution plans

Lesson 2
Common Execution Plan Elements
Now that you have learned about the role of execution plans, along with the format of the plans, it is
important to learn to interpret the plans. Execution plans can contain a large number of different types of
elements. Certain elements, however, appear regularly in execution plans. In this lesson, you will learn to
interpret execution plans and learn about the most common elements of execution plans.
Lesson Objectives
 Describe table scans, clustered index scans, and clustered index seeks.
 Describe nested loops and lookups.
 Describe merge joins and hash matches.
 Describe aggregations.
 Describe filter and sort operations.
 Describe data modification statements.
Table Scans, Clustered Index Scans, and Clustered Index Seeks

Three execution plan elements relate to reading
data from a table. The particular element that is
used depends upon whether the table structure is a
heap or has a clustered index, and whether the
clustered index (if present) is useful in resolving the
query.
Table scans are a problem in many queries. There is

a common misconception that table scans are a
problem, but that clustered index scans are not. No
doubt this relates to the word “index” in the name
of the element. Table scans and clustered index
scans are essentially identical except that table
scans apply to heaps and clustered index scans apply to tables that have clustered indexes.
If a query's logic is related to the clustering key for the table, SQL Server may be able to use the index that
supports it to quickly locate the row or rows required. For example, if a Customer table is clustered on a
CustomerID column, consider how the following query would be executed.
SELECT * FROM dbo.Customer WHERE CustomerID = 12;
SQL Server does not need to read the entire table and can use the index to quickly locate the correct
customer. This is referred to as a clustered index seek. By comparison, if the WHERE clause had been on
another nonindexed column, a table scan would have occurred.
Nested Loops and Lookups

Nested loops are one of the most commonly
encountered operations. They are used to
implement join operations and are commonly
associated with row identifier (RID) or key lookup
elements. Nested loop operations are used to
implement joins.
For each row in the upper input, a lookup is

performed against the lower input. The difference
between a RID Lookup and a Key Lookup is
whether the table has a clustered index. RID
Lookup applies to heaps. Key Lookup applies to
tables that have clustered indexes.
In some earlier documentation, a Key Lookup was also referred to as a Bookmark Lookup. The Key
Lookup operator was introduced in SQL Server 2005 Service Pack 2. Note also that in earlier versions of
SQL Server 2005, the Bookmark Lookup was shown as a Clustered Index Seek operator that had a
LOOKUP keyword associated with it.
In the physical library analogy, a lookup is similar to reading through an author index and for each book
that is found in the index, going to collect it from the bookcases.
Lookups are often expensive operations because they need to be executed once for every row of the
upper input source. Note that in the execution plan shown, more than half of the cost of the query is
accounted for by the Key Lookup operator. In the next module, you will see options for minimizing this
cost in some situations. The Nested Loops operator is the preferred choice whenever the number of rows
in the upper input source is small when compared with the number of rows in the lower input source.
Merge Joins and Hash Matches

Merge joins and hash matches are other forms of
join operations. Merge Join operations are more
efficient than Hash Match operations, but require
sorted inputs.
Merge Joins
Apart from nested loop operations in which each

row of one table is used to look up rows from
another table, it is common to need to join tables
where simple lookups are not possible.
Imagine two piles of paper sitting on the floor of

your office. One pile of paper holds details of all of
your customers, one customer for each sheet. The other pile of paper holds details of customer orders,
one order for each sheet. If you needed to merge the two piles of paper together so that each customer's
sheet was adjacent to his or her orders, how would you perform the merge?
The answer depends upon the order of the sheets. If the customer sheets were in customer ID order and
the customer order sheets were also in customer ID order, merging the two piles would be easy. The
process involved is similar to what occurs when you use a Merge Join operator. You can only use this
operator when the inputs are already in the same order. One option to consider would be to presort the
two piles.
You can use the Merge Join operator to implement a variety of join types such as left outer joins, left
semi joins, left anti semi joins, right outer joins, right semi joins, right anti semi joins, and unions.
Hash Matches
Now imagine how you would merge the piles of customers and customer orders if the customers were in
customer ID order, but the customer orders were ordered by customer order number. The same problem
would occur if the customer sheets were in postal code order. These situations are similar to the problem
that Hash Match operations encounter. There is no easy way to merge the piles. One option would be to
presort the data and then use a Merge Join operation, but a Hash Match operation is often more
efficient in this case.
Hash Match operations use a relatively “brute force” method of joining. One input is broken into a set of
“hash buckets” based on an algorithm. The other input is processed based on the same algorithm. In the
analogy with the piles of paper, the algorithm could be to obtain the first digit of the customer ID. Using
this algorithm, 10 buckets would be created. Now you can calculate the hash value for one row, and look
in the bucket that contains matching rows from the other table. The bucket will contain a relatively small
number of rows, and can be searched without having to do an entire table scan. If a match is found, the
rows are joined and returned. If no match is found, the input row is discarded and the next one is
examined.
Although it may not always be possible to avoid Hash Match operations in query plans, their presence is
often an indication of a lack of appropriate indexing on the underlying tables. In data warehouses, Hash
Match joins are often the most common form of join due to minimal indexing.
Aggregations
There are two types of Aggregate operator: Stream
Aggregate and Hash Match Aggregate. Stream
Aggregate operations are very efficient.
Imagine being asked to count how many orders are

present for each customer based on a list of
customer orders. How would you perform this
operation?
Similar to the discussion on Merge Join and Hash
Match operations, the answer depends on the
order in which the customer orders are being held.
If the customer orders are already in customer ID

order, performing the count (or other aggregation) is very easy. This is the equivalent of a Stream
Aggregate operation.
However, if the aggregate being calculated is based on a different attribute of the customer orders than
the attribute by which they are sorted, performing the calculations is much more complex.
One option would be to sort all of the customer orders by customer ID first, and then to count all of the
customer orders for each customer ID.
Another option is to process the input by using a hashing algorithm like the one that is used for Hash
Match operations. This is what SQL Server does when it uses a Hash Match Aggregate operation. The
presence of these operations in a query plan is often (but not always) an indication of a lack of
appropriate indexing on the underlying table.
Filter and Sort Operations

Filter operations implement WHERE or HAVING
clause predicates. Sort operations sort input data.
WHERE clauses and HAVING clauses limit the rows

that a query returns. You can use a Filter operation
to implement this limit. Data rows from the input
are only passed to the output if they meet specified
filter criteria based on the predicates in those
clauses.
Filter operations are typically low cost and are

processed as the data passes through the element.
Users are often surprised not to see a Filter
operation each time they include a WHERE or
HAVING clause in their query. SQL Server tries to filter the data as early as possible in the query plan and
this will often happen as part of earlier operations.
Sort operations are often used to implement ORDER BY clauses in queries, but they have other uses. For
example, you could use a Sort operator to sort rows before they are passed to other operations such as
Merge Join operations or for performing DISTINCT or UNION operations.
Sorting data rows can be an expensive operation. You should avoid unnecessary ORDER BY operations.
Not all data needs to be put in a specific order. However, if a sorted result is required, you should always
use an ORDER BY clause. Do not depend upon a sorted outcome from an execution plan always staying in
that same order.
Data Modification Statements

INSERT, UPDATE, and DELETE operations are used
to present the outcome of underlying Transact-SQL
data modification statements. You can implement
Transact-SQL MERGE statements by using
combinations of INSERT, UPDATE, and DELETE
operations.
The purpose of these operations will usually be self-

evident, but what might not be obvious is the
potential cost of these operations or the complexity
that can be involved.
A Transact-SQL INSERT, UPDATE, or DELETE

statement might involve much more than the
related execution plan operation.
Demonstration: Working with Common Execution Plan Elements

 Run queries that demonstrate the most common execution plan elements.
Demonstration Steps
Run queries that demonstrate the most common execution plan elements
2. If you have not completed the previous demonstrations in this module, run
D:\Demofiles\Mod05\Setup.cmd as an administrator to revert any changes.

Lesson 3
Working with Execution Plans
Now that you understand the importance of execution plans and are familiar with common elements that
the plans contain, you need to consider the different ways in which the plans can be captured. In this
lesson, you will see various ways to capture plans and explore the criteria by which SQL Server decides
whether to reuse plans. When working with execution plans, SQL Server exposes several dynamic
management views (DMVs) that you can use to explore query plan reuse. You will also see how execution
plans are used.
Lesson Objectives
 Implement methods for capturing plans.
 Explain how SQL Server decides whether to reuse existing plans when it reexecutes queries.
 Use DMVs that are related to execution plans.
Methods for Capturing Plans

Other options apart from SQL Server Management
Studio exist for capturing plans.
You can use SQL Server Management Studio to

obtain both estimated and actual execution plans.
The same options have been added to Microsoft
Visual Studio®. This can help to avoid the need to
have two tools open when you are performing
development against SQL Server. However, it is not
always possible to load queries into SQL Server
Management Studio or Visual Studio for analysis.
Often you will need to analyze systems that are in
production or queries that third-party applications
have generated where you have no direct access to the source code.
SQL Server Profiler has a Performance events > Showplan XML event that you can use to add a column
to a trace. The trace will then include the actual execution plans. You need to take care when you use this
option because you could quickly generate a huge trace output if you do not use appropriate filtering.
The overall performance of the system could be degraded.
SQL Server Profiler is still very commonly used, but over time, it will be replaced by the Extended Events
profiling sessions that are integrated into SQL Server Management Studio in SQL Server 2014. The
Extended Events profiling capability is more extensive than that provided by SQL Server Profiler. However,
you should continue to use SQL Server Profiler for capturing traces of SQL Server Analysis Services activity.
Dynamic management views provide information about recent expensive queries and missing indexes
that SQL Server detected when it created the plan. Activity Monitor in SQL Server can display the results of
querying these DMVs.
The Data Collector in SQL Server collects information from the DMVs, uploads it to a central database,
and provides a series of reports based on the data. Unlike Activity Monitor, which shows recent expensive
queries, Data Collector can show historical entries. This can be very useful when a user asks about a
problem that occurred last Tuesday morning rather than at the time when the problem is occurring.
Reexecuting Queries
SQL Server attempts to reuse execution plans where
possible. Although this is often desirable, reusing
existing plans can be counterproductive to
performance.
Reusing query plans avoids the overhead of

compiling and optimizing the queries. Some
queries, however, perform poorly when they are
executed with a plan that was generated for a
different set of parameters.
For example, consider a query that has
FromCustomerID and ToCustomerID parameters. If
the value of the FromCustomerID parameter was
the same as the value of the ToCustomerID parameter, an index seek based on the CustomerID column
might be highly selective. However, a later execution of that query where a large number of customers
were requested would not be selective. This means that SQL Server would perform better if it
reconsidered how to execute the query, and thus generate a new plan.
Usefulness of Cached Plans
Even for cached plans, SQL Server may eventually decide to evict them from the cache and recompile the
queries. The two main reasons for this are:
 Correctness (changes to SET options, schema changes, and so on).
 Optimality (data has been sufficiently modified to require a new plan to be considered).
SQL Server assigns a cost to each plan that is cached, to estimate its “value.” The value is a measure of
how expensive the execution plan was to generate. When memory resources become tight, SQL Server
will need to decide which plans are the most useful to keep. The decision to evict a plan from memory is
based on this reduced cost value.
Options are available to force compilation behavior of code, but they should be used sparingly and only
where necessary.
DMVs Related to Execution Plans

Dynamic management views provide insight into
the internal operations of SQL Server. Several of
these views are useful when you are investigating
execution plans. Most DMV values are reset
whenever the server is restarted. Some are reset
more often.
View Description
sys.dm_exec_connections One row for each user connection to the server
sys.dm_exec_sessions One row for each session, including system and

user sessions
sys.dm_exec_query_stats Query statistics about plans that are currently

in the plan cache
sys.dm_exec_requests Associated with a session and providing one

row for each currently executing request
sys.dm_exec_sql_text() Provides the ability to find the Transact-SQL

code that is being executed for a request
sys.dm_exec_query_plan() Provides the ability to find the execution plan

that is associated with a request
sys.dm_exec_cached_plans Details of cached query plans
sys.dm_exec_cached_plan_dependent_objects() Details of dependent objects for those plans
Demonstration: Viewing Cached Plans

In this demonstration, you will see:
 How to view cached execution plans
Demonstration Steps
Viewing cached execution plans

Lesson 4
Designing Effective Nonclustered Indexes
Before you start to implement nonclustered indexes, you need to design them appropriately. In this
lesson, you will learn how to find information about the indexes that have been created and how to create
filtered indexes.
Lesson Objectives
 Consider various methods for obtaining index information.
 Use filtered indexes.
Methods for Obtaining Index Information

You might require information about existing
indexes before you create, modify, or remove an
index. SQL Server provides many ways to obtain
information about indexes.
SQL Server Management Studio
SQL Server Management Studio offers a variety of

ways to obtain information about indexes. Object
Explorer lists the indexes that are associated with
tables. This includes indexes that users have created
and those indexes that relate to PRIMARY KEY and
UNIQUE constraints in cases where SQL Server has
created indexes to support those constraints.
Each index has a property page that details the structure of the index and the characteristics of its
operational, usage, and physical layout.
SQL Server Management Studio also includes a set of prebuilt reports that show the state of a database.
Many of these reports relate to index structure and usage.
System Stored Procedures and Catalog Views
The sp_helpindex system stored procedure returns details of the indexes that have been created on a
specified table.
SQL Server provides a series of catalog views that provide information about indexes. Some of the more
useful views are shown in the following table.
System view Notes
sys.indexes Index type, filegroup, or partition scheme ID, and the

current setting of index options that are stored in metadata
sys.index_columns Column ID, position within the index, type (key or nonkey),
and sort order (ASC or DESC)
sys.stats Statistics associated with a table, including statistic name

and whether it was created automatically or by a user
System view Notes
sys.stats_columns Column ID associated with the statistic
Dynamic Management Views
SQL Server provides a series of dynamic management objects that contain useful information about the
structure and usage of indexes. Some of the most useful views and functions are shown in the following
table.
View Notes
sys.dm_db_index_physical_stats Index size and fragmentation statistics
sys.dm_db_index_operational_stats Current index and table I/O statistics
sys.dm_db_index_usage_stats Index usage statistics by access type
System Functions
SQL Server provides a set of functions that provide information about the structure of indexes. Some of
the more useful functions are shown in the following table.
Function Notes
INDEXKEY_PROPERTY Index column position within the index and column sort
order (ASC or DESC)
INDEXPROPERTY Index type, number of levels, and current setting of

index options that are stored in metadata
INDEX_COL Name of the key column of the specified index
Filtered Indexes
Unless you specify otherwise, when you create a
nonclustered index on a table, the index will include
every row in the table. Although indexing all of the
rows in a table is frequently desirable, there are
scenarios when it might not be:
 Huge tables. Imagine a huge table that users

frequently query by filtering on only a small
subset of the data values that the table
contains. In this situation, an index that is built
on the whole table would include many rows
that are rarely accessed, and the costs of
storing and maintaining this index might be
greater than the benefits that were achieved in terms of query response times.
 Tables that have many NULL values. When a column includes many NULL values, a nonclustered
index that is built on that column can be inefficient.
You can use filtered indexes to create smaller, more focused indexes that deliver greater efficiency and
better performance.
Benefits of Filtered Indexes
Filtered indexes are nonclustered indexes that you define by including a WHERE clause in the CREATE
INDEX statement. The WHERE clause filter limits the rows that the index will include, which has several
benefits, including:
 The index is smaller, so it consumes less disk space.
 The index is more efficient to manage, for example, rebuild and reindex operations will be faster.
 The index will deliver faster response times because small indexes take less time to read than large
ones.
 The size of the index statistics is correspondingly smaller, so updating statistics for a filtered index is
faster.
For example, most queries against the Employee table in the HumanResources database specify the data
value New York for the City column in the WHERE clause. By creating an index that includes only rows
that have New York in the City column, you can create a more efficient index that offers better
performance than an unfiltered index. When you are planning your indexing strategy, you should
consider the trade-off between indexes that have a broad coverage and indexes that are focused, but
might deliver better performance. Focused indexes are useful when you have a small number of high-
priority queries as the focus of your strategy. Broader indexes are useful when you have many queries of
equal priority.
The code example below creates a filtered index that includes a WHERE clause to limit the number of
rows that the index contains:
Creating a Filtered Index

CREATE NONCLUSTERED INDEX NC_EMP_ADDRESS
ON HR.Address
(
AddressLine1,
AddressLine2
)
WHERE City='New York'
Filtered Indexes and Indexed Views
You can use an indexed view to achieve a similar result to that achieved by using a filtered index; you just
need to specify a filter in the indexed view definition to exclude the unwanted rows. However, there are
some important differences between the two solutions. When you are deciding between using an indexed
view or a filtered index, consider the following points:
 You can use indexed views to create indexes that are based on multiple tables, but you can only
create a filtered index on a single table.
 Filtered indexes only support simple comparison operators in the WHERE clause of the index
definition, so, for example, you cannot use the LIKE operator to create a filtered view. If you need to
filter by using more complex logic, you can use an indexed view.
 The query optimizer uses filtered indexes in more situations than indexed views, so by using a filtered
index, you are more likely to improve performance across more queries.
 You can perform index rebuild operations while a filtered index is online, but indexed views do not
support online rebuilds.
 Updates of filtered indexes generally require fewer CPU resources than updates to indexed views,
which helps to minimize maintenance costs.
 Filtered indexes do not need to be unique indexes, but indexed views do because an index that is
built on a view is a clustered index.
Demonstration: Obtaining Index Information

 How to view information about indexes.
Demonstration Steps
Viewing index information
2. If you have not completed the previous demonstrations in this module, run

Lesson 5
Performance Monitoring
Many factors can affect database performance. Correct indexing, hardware, network performance,
application design, logical and physical database design, data changes, and operating system
configuration are just a few of the things that could have major effects on database performance for the
user. This lesson describes the options for monitoring performance, and explains how you can create a
baseline to aid troubleshooting.
Lesson Objectives
 Describe the considerations for performance monitoring and tuning.
 Describe the performance monitoring tools in SQL Server.
 Explain how to establish a performance baseline.
Performance Monitoring and Tuning

The focus of a performance monitoring and tuning
strategy should be to aim for the performance that
you need. At a high level, this is typically defined by
a service level agreement (SLA). To achieve
performance targets, you need to choose the
metrics by which you will measure performance
carefully. For example, if you base your
performance on average query run times, you
might perform performance tuning to reduce the
time that it takes to run a monthly report from five
minutes to 20 seconds. This has a side effect that
the time that it takes to place an order has
increased from 0.25 seconds to 30 seconds. Your average query run time has drastically reduced; however,
your monthly report runs overnight and has no effect on your system as long as it takes less than six
hours, but your order processing must be near-instantaneous to give a good customer experience. This is
an extreme example, but it highlights the fact that you should be very careful when choosing which
metrics you use to measure performance. Each situation will be different. Some queries will run more
frequently, and you might choose to prioritize these, but even an infrequently run query might need to be
prioritized because when it does run, it must run quickly.
There can be no definitive rules about what a performance monitoring and tuning strategy should
include, and you should take each system on a case-by-case basis. Tune the system to meet your goals,
remove any bottlenecks that prevent you from meeting your goals, benchmark the system to provide a
performance baseline, and then monitor your system to ensure that you are meeting or exceeding your
baseline.
Performance Monitoring Tools

Various built-in and third-party performance
monitoring tools are available to help you tune
your system. The tools that are built into SQL Server
include:
 Database Engine Tuning Advisor. Database

Engine Tuning Advisor recommends and
creates indexes for your system. You supply it
with either a trace file of queries that ran
against your database, or scripts that you want
to optimize for. If you want to optimize for a
normal workload, you should ensure that a
trace runs for long enough to capture a typical
selection of queries. Database Engine Tuning Advisor will not prioritize one query over another, and if
the trace captures an unusual query, it will be given the same priority as all other queries in the trace.
Conversely, the trace is likely to miss weekly or monthly queries and not optimize your system for
them at all. Although Database Engine Tuning Advisor is probably too simplistic for most people who
are attending this course, it is still useful to run as an automated review pass on your design. There
may be something that you did not consider or forgot to implement that Database Engine Tuning
Advisor picks up. Database Engine Tuning Advisor can use significant resources and should be run
when the database is used least, or by running Database Engine Tuning Advisor against a mirrored
test system.
 SQL Server Profiler. SQL Server Profiler is an essential performance tuning tool. SQL Server Profiler
captures events and stores them in a trace file that can then be replayed at a later date. The events
that are captured can be specified by using included templates, or by choosing exactly which events
you require. There are many useful events that you can capture, including locking information,
caching information, recompilation information, and stored procedures, scripts, and batches starting
and stopping. You can use the trace file for detecting problems, as a source for Database Engine
Tuning Advisor, and as a means of capturing a typical workload, which can then be replayed against
your system to test the effects of modifications that you have made.
Note: SQL Server Profiler is being deprecated. It will be removed from a future version of
SQL Server and replaced by Extended Events Profiler and SQL Server Distributed Replay.
However, SQL Server Profiler is still the current recommended tool for capturing and replaying
traces.
 Performance Monitor. Performance Monitor is a Windows® performance monitoring tool. It is a

good approach to start with Performance Monitor because you can view many counters in real time
and see whether performance problems are caused by SQL Server itself or an underlying problem. For
example, performance problems in SQL Server might be caused by another application or service, or
by network congestion. You can also use Performance Monitor to record performance activity so that
you can review it later.
 Activity Monitor. Activity Monitor is available on the SQL Server Management Studio toolbar.
Activity Monitor enables you to identify expensive queries, and to view data file I/O, resource wait
times, processes, percentage of processor time, the number of waiting tasks, database I/O, and batch
requests per second. Activity Monitor is useful for identifying performance problems after you have
used Performance Monitor to determine that it is SQL Server that is causing the problem, not a
different component of the system such as Windows or Microsoft SharePoint® Server.
 DMVs. The sys.dm_db_index_usage_stats DMV returns a large amount of information about index
operations, how many times they were performed, and when they were last performed.
Establishing a Performance Baseline

After you optimize your system, you should create a
baseline that includes the most important
performance metrics, in addition to metrics that
provide a general system overview. A baseline
provides several benefits:
 A baseline is a starting point for

troubleshooting. For example, it is a lot easier
to investigate a suspected memory problem
when you have metrics that tell you how the
memory performs under normal conditions, or
how memory usage has changed over the last
month.
 A baseline provides a sound basis for hardware planning because it enables you to spot trends and
create projections for future hardware requirements. When hardware budgets are limited, this
approach can help to ensure that you spend the budget in the most cost-effective way.
 A baseline enables you to assess the impact of changes in database design or hardware. After you
make the changes, you can use the baseline to verify that you have achieved the desired
improvements. If there is no improvement, you can roll back the changes, but if there is
improvement, you can implement the changes. After you have made changes to a server, you should
create a new baseline that reflects the new configuration.
When you are planning a performance baseline, you should aim to create samples that monitor system
resource usage over an extended period of time, and to include periods of low, normal, and high usage.
This will help you to gain a true picture of system performance, rather than just a snapshot of
performance at a single point in time. The longer you monitor, the more reliable the statistics will be;
however, you will need to balance this against the impact of monitoring on system resources, including
storage space and CPU utilization, so that monitoring itself does not become a factor that negatively
affects performance. To minimize the impact of monitoring, you should monitor your servers from a
remote workstation, and connect to them by using Performance Monitor. You can specify the server that
you want to monitor in the Add Counters dialog box. You should avoid using remote desktop
connections to connect to a server and then running Performance Monitor on that server because this
uses server resources.
After you create a baseline, you should periodically compare current server performance with the baseline
figures. You should investigate any values that are significantly above or below baseline figures. You
should investigate unexpected improvement in addition to unexpected performance degradation. For
example, if no customers can access your website because of a denial-of-service attack, you might find
that the database server is running unusually quickly. This improvement is actually caused by a problem
elsewhere.
Performance Counters for Creating a Baseline
You can create a baseline by monitoring the following counters, and recording the information in a log:
 Counters for assessing memory:
o Memory:Available Mbytes. This counter captures the amount of available memory on the
server in megabytes. If there is not enough free memory, the operating system will use the
paging file, which impairs performance. There is no ideal figure for this counter that will suit all
servers, but you should ensure that there is enough free memory to handle not just the SQL
Server workloads, but any other workloads that run on the server, such as backup jobs and
administrative connections.
o Paging File:% Usage. This counter captures page file usage, and ideally should be a very low
value. A high value indicates that the server has insufficient memory. You can also use the
Memory:Pages/sec. counter to verify this.
o SQL Server:Buffer Manager:Buffer cache hit ratio. This counter indicates the percentage of
pages that are read from the data cache without having to read from disk. Ideally, this figure
should be over 90 percent; if it is lower than this, the impact of disk I/O can become a problem.
o SQL Server:Buffer Manager:Page life expectancy. This counter indicates in seconds how long
pages that are read into memory will remain in the cache before being removed to enable the
caching of other pages. Higher values indicate that there is sufficient memory available; if the
value falls, this could be because the workload has increased and you need to add more memory.
Alternatively, it could indicate that poorly written queries are using table or index scans, which
bring the entire table or index into memory, forcing other items to be removed.
o SQL Server:Memory Manager:Memory Grants Pending. This counter indicates the number of
queries that are currently waiting to be allocated memory so that they can execute. The ideal
value for this counter is 0. A value higher than this is a strong indication that the server has
insufficient memory.
 Counters for assessing physical disks:

o Physical Disk:Avg. Disk sec/Read and Physical Disk:Avg. Disk sec/Write. These counters
enable you to monitor average read and write times. You can use the following figures as general
guidelines for assessing disk I/O by using these counters:
 Less than 10 milliseconds represents very good performance for online transaction
processing (OLTP) systems.
 Between 10 and 20 milliseconds represents good or acceptable performance for OLTP
systems.
 Between 20 and 50 milliseconds represents below average to slow performance for OLTP
systems.
 More than 50 milliseconds typically on an OLTP system indicates a bottleneck.
 Less than 30 milliseconds for a decision support system (DSS) typically represents good
performance.
Note that these figures are not appropriate for every system, and represent general guidelines only. You
should always measure your system against the specific technical and business requirements that have
been identified for it to ensure that you obtain the required levels of performance.
o PhysicalDisk: Avg. Disk Queue Length. A value greater than 2 for an individual disk often
indicates a potential bottleneck, particularly if you are also experiencing high disk latency.
o Processor:% Privileged Time. This counter indicates the percentage of total time that a CPU or
CPU core spends executing kernel commands, which includes SQL Server disk I/O requests. You
can use it to help identify inefficient and over-utilized disk subsystems.
o The counters that are described above measure all disk activity, regardless of its source. To
identify disk I/O that results specifically from SQL Server activity, you can use the following
counters:
 SQL Server:Buffer Manager: Page reads/sec
 SQL Server:Buffer Manager: Page writes/sec
 SQL Server:Buffer Manager: Checkpoint pages/sec
 SQL Server:Buffer Manager: Lazy writes/sec
 Counters for assessing CPUs:
o Processor:% Processor Time. This counter indicates the percentage of time that a processor
spends processing workloads (sometimes referred to as executing non-idle threads). You can use
this counter to monitor individual CPUs and CPU cores or to monitor the total for all CPUs and
cores. If the value of this counter is consistently greater than 80 percent, it may indicate that the
CPU or CPUs represent a bottleneck in the system. On the other hand, a value of 20 percent or
less indicates space capacity, which you could use to consolidate other databases or instances.
o System:Processor Queue Length. This counter indicates the number of threads that are waiting
for CPUs to become available so that they can be processed. On a single processor system, a
value that is consistently greater than five can indicate that the CPU or CPUs represent a
bottleneck in the system. On multiprocessor systems, you should divide the queue length by the
number of processors to obtain the relevant value.
 Counters for assessing network performance:
o Network Interface:Bytes Total/sec. This counter captures the total number of bytes that are
sent and received over a network connection for each second.
o Network Interface:Current Bandwidth. This counter records the actual capacity (as opposed to
the rated capacity) of a network interface card.
You can calculate network utilization for a specific network adapter in the following way:
(Network Interface:Bytes Total/sec ÷ Network Interface:Current Bandwidth) × 100. If this figure is

consistently greater than 90 percent, the network connection may represent a bottleneck.
o IPv4:Datagrams/sec and IPv6:Datagrams/sec. You can use these counters to capture the
number of IP datagrams that are sent and received over a defined period of time, and use this as
a benchmark when you are testing network performance.
o SQL Server:Availability Replica: Bytes Received from Replica/sec and SQL

Server:Availability Replica: Bytes Sent to Replica/sec. These counters monitor the traffic
between availability replicas in an AlwaysOn availability group. You can use performance
counters to assess the impact of this traffic.
In addition to the counters that are described above, SQL Server includes a range of dedicated
performance objects and counters that you can use to create a baseline and to troubleshoot, including
the SQL Server:General Statistics and SQL Server:SQL Statistics performance objects. These objects
include a range of counters that you can use to create a baseline and to troubleshoot CPU-related
performance issues:
 SQL Server:General Statistics:User Connections. You can use this counter to establish the number
of user connections to a server, and then monitor this over time. This can be used to corroborate the
data from other counters. For example, if you identify a CPU issue that is getting gradually worse, you
can check this against the number of user connections over the same time period to see if there is a
correlation.
o SQL Server SQL Statistics:SQL Compilations/sec and SQL Server SQL Statistics:SQL Re-
Compilations/sec. You can use these counters to track the number of times SQL Server compiles
and recompiles execution plans. Compiling an execution plan can be resource-intensive, so you
typically want to see a small number of compilations and recompilations. You can compare the
SQL Server SQL Statistics:SQL Compilations/sec counter against the SQL Server SQL
Statistics:Batch Requests/sec counter to see how many of the batches that are submitted to the
server require a compilation. The number of recompilations should be significantly lower than the
number of compilations, ideally about 10 percent. If this figure is significantly higher, you should
investigate the cause of the recompilations.
Lab: Advanced Indexing

Scenario
You have been asked to explain the concept of index statistics and selectivity to a new developer. You will
explore the statistics that are available on an existing index and determine how selective some sample
queries would be.
One of the company developers has provided you with a list of the most important queries that the new
marketing management system will execute. Depending upon how much time you have available, you
need to determine the best column orders for indexes to support each query.
Objectives
 Explored existing index statistics.
 Created a covering index.

Password: Pa$$w0rd
Exercise 1: Explore Existing Index Statistics

Scenario
You have been asked to explain the concept of index statistics and selectivity to a new developer. You will
explore the statistics that are available on an existing index and determine how selective some sample
queries would be.
2. View Statistics
3. Review the Results
4. Create Statistics
5. Reexecute the SQL Command from Task 1
6. Use the DBCC SHOW_STATISTICS Command
7. Answer Questions
8. Execute an SQL Command and Check the Accuracy of Some Statistics

9. Calculate the Selectivity of Each Query

Ensure that the 20464C-MIA-DC and 20464C-MIA-SQL virtual machines are both running, and then
Run the Setup Windows Command Script file (Setup.cmd) in the D:\Labfiles\Lab05\Starter folder as
Administrator.
 Task 2: View Statistics

1. Execute the following command in the AdventureWorks database.
SELECT * FROM sys.stats WHERE object_id = OBJECT_ID('Production.Product');

GO
 Task 3: Review the Results

1. Review the results.
2. Check to see whether any autostats have been generated.
 Task 4: Create Statistics

1. Create manual statistics on the Color column and call them Product_Color_Stats.
2. Use a full scan of the data when you are creating the statistics.
 Task 5: Reexecute the SQL Command from Task 1

1. Reexecute the following command in the AdventureWorks database.

GO
 Task 6: Use the DBCC SHOW_STATISTICS Command

1. Using the DBCC SHOW_STATISTICS command, review the created Product_Color_Stats statistics.
 Task 7: Answer Questions

1. Answer the following questions about the Product_Color_Stats statistics:
a. How many rows were sampled?

b. How many steps were created?
c. What was the average key length?
d. How many black products are there?
 Task 8: Execute an SQL Command and Check the Accuracy of Some Statistics
1. Execute the following command to check how accurate the statistics that have been generated are.
SELECT COUNT(1) FROM Production.Product WHERE Color = 'Black';
 Task 9: Calculate the Selectivity of Each Query

1. Calculate the total number of rows in the Marketing.Prospect table
2. Calculate the selectivity of each of the three queries shown.

Query 1
SELECT ProspectID, FirstName, LastName FROM Marketing.Prospect WHERE FirstName LIKE

'A%';
Note: A sample result would be 2013 ÷ 19955 or approximately 10.1 percent.

Query 2

'Alejandro%';

Query 3

'Arif%';
Results: After this exercise, you will have assessed selectivity on various queries.
Exercise 2: Create a Covering Index

Scenario
You have a specific query that you want to optimize. You have decided to have a look at the results from
Database Engine Tuning Advisor and, if they agree with your assessment, use these results to implement
an index.
1. Assess Design by Using Database Engine Tuning Advisor
2. Create a Covering Index
 Task 1: Assess Design by Using Database Engine Tuning Advisor

1. Start Database Engine Tuning Advisor.
2. Connect to the MIA-SQL server.
3. Open the workload file D:\Labfiles\Lab05\Starter\PersonQuery.sql.
4. Analyze and tune AdventureWorks.

5. Start the analysis.
6. Save the recommendations as D:\Labfiles\Lab05\Starter\PersonIndex.sql.
7. Close Database Engine Tuning Advisor
 Task 2: Create a Covering Index

1. Open D:\Labfiles\Lab05\Starter\PersonIndex.sql, change the index name to idx_Person_Covering
and execute the script.
Results: After completing this exercise, you will have created a covering index.
Question: Can two different queries end up with the same execution plan?

Best Practice: Avoid capturing execution plans for large numbers of statements when you
use SQL Server Profiler.
If you need to capture plans by using SQL Server Profiler, make sure that the trace is filtered to
reduce the number of events that are captured.
Review Question(s)
Question: What is the difference between a graphical execution plan and an XML execution
plan?
Question: Why might a Transact-SQL DELETE statement have a complex execution plan?
6-1
Module 6
In-Memory Database Capabilities
Contents:
Module Overview 6-1
Lesson 1: The Buffer Pool Extension 6-2
Lesson 2: Columnstore Indexes 6-5
Module Overview
The capacity of physical memory has grown substantially in recent years, while the cost of memory
modules has dropped. As a result, modern servers generally have much higher memory specifications
than servers in the past. Microsoft® SQL Server® 2014 data management software includes new and
enhanced features that take advantage of the increasing amount of memory in modern servers to
improve I/O performance. This module explores some of these features and explains how to use them to
maximize the performance and scalability of your database applications.
Objectives
 Use the buffer pool extension to improve performance for read-heavy online transaction processing
(OLTP) workloads.
 Use columnstore indexes to improve performance in data warehouse query workloads.

6-2 In-Memory Database Capabilities
Lesson 1
The Buffer Pool Extension
SQL Server uses a buffer pool of memory to cache data pages, reducing I/O demand and improving
overall performance. As database workloads intensify over time, you can add more memory to maintain
performance, but this solution is not always practical. Adding storage is often easier than adding memory,
and SQL Server 2014 introduces the buffer pool extension to enable you to use fast storage devices for
buffer pool pages.
Lesson Objectives
 Describe the key features and purpose of the buffer pool extension.
 Identify scenarios where the buffer pool extension can improve performance.
 Configure the buffer pool extension.
What Is the Buffer Pool Extension?

The buffer pool extension is an extension for the
SQL Server buffer pool that targets non-volatile
storage devices such as solid-state drives (SSDs).
When the buffer pool extension is enabled, SQL
Server uses it for data pages in a similar way to the
main buffer pool memory.
Only clean pages that contain data that is

committed are stored in the buffer pool extension.
This ensures that there is no risk of data loss in the
event of a storage device failure. In addition, if a
storage device that contains the buffer pool
extension fails, the extension is automatically
disabled. You can easily reenable the extension when the failed storage device has been replaced.
The buffer pool extension provides the following benefits:
 Performance gains on OLTP applications that have a high amount of read operations can be
improved significantly.
 SSD devices are often less expensive per megabyte than physical memory, making this approach a
cost-effective way to improve performance in I/O-bound databases.
 It is easily possible to enable the buffer pool extension, and doing so requires no changes to existing
applications.
Note: The buffer pool extension is only available in 64-bit installations of SQL Server 2014
Enterprise.
Scenarios for the Buffer Pool Extension

The buffer pool extension has been shown to
improve the performance of OLTP databases.
Database workloads can vary significantly, but using
the buffer pool extension is typically beneficial
when the following conditions are true:
 The I/O workload consists of OLTP operations

that have a high volume of reads.
 The database server contains up to 32 GB of

physical memory.
 The buffer pool extension is configured to use a
file that takes up between four and 10 times
the amount of physical memory in the server.
 The buffer pool extension file is stored on high-throughput SSD storage.
Scenarios where the buffer pool extension is unlikely to significantly improve performance include:
 Data warehouse workloads.
 OLTP workloads that have a high volume of write operations.
 Servers on which more than 64 GB of physical memory is available to SQL Server.
Configuring the Buffer Pool Extension

To enable the buffer pool extension, you must use
the ALTER SERVER CONFIGURATION statement
and specify the file name and size to be used for
the buffer pool extension file.
The following code example enables the buffer pool

extension with a size of 50 GB:
Enabling the Buffer Pool Extension

ALTER SERVER CONFIGURATION
SET BUFFER POOL EXTENSION ON
(FILENAME = 'E:\SSDCACHE\MYCACHE.BPE', SIZE
= 50 GB);
To disable the buffer pool extension, use the ALTER SERVER CONFIGURATION statement with the SET
BUFFER POOL EXTENSION OFF clause.
To resize or relocate the buffer pool extension file, you must disable the buffer pool extension and then
reenable it with the required configuration. When you disable the buffer pool extension, SQL Server will
have less buffer memory available, which may cause an immediate increase in memory pressure and I/O
and result in performance degradation. You should therefore plan reconfiguration of the buffer pool
extension carefully to minimize disruption to application users.
You can view the status of the buffer pool extension by querying the
sys.dm_os_buffer_pool_extension_configuration dynamic management view (DMV), and you can monitor
its usage by querying the sys.dm_os_buffer_descriptors DMV.
Demonstration: Using the Buffer Pool Extension

 Configure the buffer pool extension.
Demonstration Steps
Configure the buffer pool extension
Lesson 2
Columnstore Indexes
SQL Server 2012 introduced significant new indexing functionality that can dramatically improve query
response times. This functionality, which is named columnstore indexes, has been significantly enhanced
in SQL Server 2014.
Lesson Objectives
 Describe columnstore indexes.
 Describe columnstore index scenarios.

 Describe the differences between clustered and nonclustered columnstore indexes.
 Create a columnstore index.
What Are Columnstore Indexes?

SQL Server 2014 supports columnstore indexes that
are based on xVelocity in-memory technology.
Columnstore indexes consist of data pages that
store data from each column in the index on a
dedicated set of pages. Creating a columnstore
index on multiple columns in a fact table (or a large
dimension table) in a data warehouse can
significantly increase query performance.
The performance improvement for a typical data

warehouse query when you are using a
columnstore index can be as much as 10 times
greater. This massive performance gain is achieved
because of two key characteristics of columnstore indexes:
 Storage. Data is stored in a compressed columnar data format (stored by column) instead of row
store format (stored by row). It is possible to achieve compression ratios of seven times greater in a
columnstore index.
 Batch mode execution. Data is processed in batches (of 1,000-row blocks) instead of row by row.
Depending on filtering and other factors, a query may also benefit from “segment elimination,” which
involves bypassing million-row chunks (segments) of data and further reducing I/O.
Columnstore Index Scenarios
When to Use a Columnstore Index

Although columnstore indexes can deliver huge
performance benefits, they are not appropriate in
all situations. You should consider the following
factors when you decide whether to use a
columnstore index:
 Database schema. Columnstore indexes are

best suited to the star and snowflake schemas
that are typically used in data warehouses.
 The size of the fact tables. Columnstore

indexes deliver the best performance for very large tables that have millions or billions of rows. Make
sure that you include all of the table’s columns in the index for the best results.
 The size of the dimension tables. Consider using columnstore indexes for very large fact or
dimension tables that have millions of rows. For smaller tables, columnstore indexes might not
provide a major performance benefit.
 Data compression. Use columnstore indexes on tables that contain data, such as character or
numeric data with frequently repeated values that will compress well.
 The types of queries. Columnstore indexes deliver the best results with certain types of queries, such
as aggregate queries that join two tables and simple aggregate queries on a single table.
If you are unsure whether a columnstore index is suitable, you can create one and test the impact on your
query workload.
Using Hints with Columnstore Indexes

The SQL Server query optimizer generally selects the optimal execution plan for a given query. However,
you can override the query optimizer and force it to use, or not use, a particular columnstore index by
specifying a query hint in the queries that you write. You can use the
IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint to prevent a query from using the
columnstore index, and the WITH(INDEX(<indexname>)) hint to force a query to use a named index.
Limitations of Columnstore Indexes

Columnstore indexes are subject to the following limitations:
 Nonclustered columnstore indexes are read-only; you cannot perform INSERT, UPDATE, DELETE, or
MERGE operations on a table that has a nonclustered columnstore index. To update the data in a
table with a nonclustered columnstore index, you can drop the index, update the data, and then re-
create the index or use partition switching to add new data. Alternatively, you can use a clustered
columnstore index, which you can update.
 Columnstore indexes support a maximum of 1,024 columns.
 You cannot create a columnstore index on a view.
 You cannot use columnstore indexes in conjunction with the following SQL Server features:
 Change Data Capture
 Change tracking
 FILESTREAM columns
 Page, row, and vardecimal storage compression

 Replication
 Sparse columns
 You cannot store in-memory OLTP data as a SQL Server data file in Microsoft Azure™. This is because
it requires FILESTREAM data, which is not currently supported in Microsoft Azure. It is possible to use
in-memory functionality in a Microsoft Azure virtual machine.
Reference Links: For a full list of the limitations of using columnstore indexes, see the
Columnstore Indexes topic in SQL Server Books Online.
Clustered and Nonclustered Columnstore Indexes

Columnstore indexes can be clustered or
nonclustered.
Clustered Columnstore Indexes

A clustered columnstore index has the following
characteristics:
 You can only create one in the Enterprise,

Developer, and Evaluation editions of SQL
Server 2014.
 It includes all of the columns in the table.
 It is the only index on the table.
 It does not store the columns in a sorted order, but rather optimizes storage for compression and
performance.
 You can update it.
Note: Clustered columnstore indexes are new in SQL Server 2014. In SQL Server 2012, you
can only create nonclustered columnstore indexes.
Updating Clustered Columnstore Indexes

You can update clustered columnstore indexes, and you can bulk-load, insert, update, and delete data in a
clustered columnstore indexed table by using standard Transact-SQL statements.
Clustered columnstore indexes store the data in compressed columnstore segments. However, some data
is stored in a rowstore table that is referred to as the “deltastore,” which is an intermediary storage
location for use until the data can be compressed and moved into a columnstore segment. The following
rules are used to manage data modifications:
 When you use an INSERT statement to insert a new row, it is stored in the deltastore until there are
enough rows to meet the minimum size for a rowgroup. This rowgroup is then compressed and
moved into the columnstore segments.
 When you execute a DELETE statement, affected rows that are stored in the deltastore are physically
deleted. Affected data in the columnstore segments is marked as deleted and the physical storage is
only reclaimed when the index is rebuilt.
 When you execute an UPDATE statement, affected rows in the deltastore are updated. Affected rows
in the columnstore are marked as deleted and a new row is inserted into the deltastore.
Nonclustered Columnstore Indexes

A nonclustered columnstore index has the following characteristics:
 It can include some or all of the columns in the table.
 It can be combined with other indexes on the same table.
 You cannot update it. Tables that contain a nonclustered columnstore index are read-only.
Updating Nonclustered Columnstore Indexes

Nonclustered columnstore indexes are read-only, but given that a typical data warehouse is a static
database that is updated periodically through an extract, transform, and load (ETL) process, the read-only
nature of columnstore indexes is less of a limitation than it might at first seem. However, administrators
do need to plan how to handle updates to data in tables that have nonclustered columnstore indexes.
There are two ways to update nonclustered columnstore indexes:
 Periodically drop the index, perform the updates to the table, and then re-create the index.
This approach is the simplest way of handling updates, and fits in with the way in which many
organizations already perform data loads into their data warehouses. The disadvantage of this
approach is that creating a columnstore index can be time-consuming when the base table is very
large, and this can be problematic when the window for performing a data load is relatively short.
 Use table partitioning. When you create an index on a partitioned table, SQL Server automatically
aligns the index with the table, meaning that the index is divided up in the same way as the table.
When you switch a partition out of the table, the aligned index partition switches out of the table,
too. You can use partition-switching to perform inserts, updates, merges, and deletes:
o To perform a bulk insert, partition the table, load new data into a staging table, build a
columnstore index on the staging table, and then use partition-switching to load the data into
the partitioned data warehouse table.
o For other types of updates, you can switch a partition out of the data warehouse table into a
staging table, drop or disable the columnstore index on the staging table, perform the updates,
re-create or rebuild the columnstore index on the staging table, and then switch the staging
table back into the data warehouse table.
Trickle Updating
The techniques that are described above enable administrators to update nonclustered columnstore index
tables in a typical data warehouse scenario, where access to static data is adequate. However, it is
sometimes necessary to provide users with access to live data and recent updates between data loads.
Although you cannot update a table that has a nonclustered columnstore index directly, you can provide
access to changing data by using a delta table. A delta table is a table that has the same columns as the
table that has the columnstore index, and contains changed data such as new rows. You can write queries
that use the UNION operator to combine the changed data in the delta table with the static data in the
table that has the columnstore index. This approach is sometimes called trickle updating. During the
periodic data warehouse data load, you can remove the data from the delta table and load it into the
columnstore table. This helps to keep the delta table relatively small, which is necessary to ensure that you
maintain the performance benefit that the columnstore index provides.
For queries that involve aggregating data from the columnstore table and the delta table, you can use a
common table expression to perform local-global aggregation. Local-global aggregation involves
separately aggregating the required values from the delta table and the columnstore table, and then
combining and aggregating the two results sets.
The following code example uses a common table expression to combine and aggregate data from a
columnstore index and data from a delta table:
Combining Data from a Columnstore Index with Data from a Delta Table
WITH AggregateSOD (ProductKey, UnitPrice)
AS (SELECT ProductKey, SUM(UnitPrice) FROM SalesOrderDetail
GROUP BY ProductKey
UNION
SELECT ProductKey, SUM(UnitPrice) FROM SOD_Delta
GROUP BY ProductKey)
SELECT ProductKey, SUM(UnitPrice) AS Total FROM AggregateSOD
GROUP BY ProductKey
ORDER BY Total DESC
Creating a Columnstore Index

You can create a columnstore index by using a
Transact-SQL statement or by using SQL Server
Management Studio.
To create a clustered columnstore index, use the
CREATE CLUSTERED COLUMNSTORE INDEX
statement as shown in the following code example:
Creating a Clustered Columnstore Index

CREATE CLUSTERED COLUMNSTORE INDEX
csidx_FactSalesOrderDetails
ON FactSalesOrderDetails;
To create a nonclustered columnstore index, use the CREATE NONCLUSTERED COLUMNSTORE INDEX
statement as shown in the following code example:
Creating a Nonclustered Columnstore Index

CREATE NONCLUSTERED COLUMNSTORE INDEX nccsidx_FactSalesOrder
ON FactSalesOrder
(CustomerKey,
SalesPersonKey,
ProductKey,
OrderDateKey,
OrderNo,
ItemNo,
Quantity,
Cost,
SalesAmount,
Shipping,
Discount);
To create a columnstore index by using SQL Server Management Studio, in Object Explorer, expand the
relevant database, expand the Tables node, expand the table that you want to index, right-click the
Indexes node, click New Index, and then create the required kind of columnstore index.
Demonstration: Using a Columnstore Index

 Create a columnstore index.
Demonstration Steps
Create a columnstore index

Lab: Using In-Memory Database Capabilities

Scenario
You are planning to optimize some database workloads by using the in-memory database capabilities of
SQL Server 2014. To test these capabilities, you will enable the buffer pool extension and create
columnstore indexes.
Objectives
 Enable the buffer pool extension.
 Create clustered and nonclustered columnstore indexes.
Password: Pa$$w0rd
Exercise 1: Enable the Buffer Pool Extension

Scenario
You have added an SSD device to a database server and assigned the volume letter S: to it.
You want to extend the buffer pool onto the SSD device by using a 10-GB file named BufferCache.bpe.
2. Configure the Buffer Pool Extension
3. Verify the Configuration of the Buffer Pool Extension

1. Ensure that the MIA-DC and MIA-SQL virtual machines are both running, and then log on to MIA-
SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. In the D:\Labfiles\Lab06\Starter folder, run Setup.cmd as Administrator.
 Task 2: Configure the Buffer Pool Extension

1. Use SQL Server Management Studio to connect to the MIA-SQL instance of SQL Server 2014 by using
Windows authentication.
2. Enable the buffer pool extension by using the following settings:
o File name: S:\BufferCache.bpe
o Size: 10 GB
 Task 3: Verify the Configuration of the Buffer Pool Extension

1. Verify that the buffer pool cache is enabled.
2. Verify that the buffer pool extension file exists.

Results: After completing this exercise, you should have enabled the buffer pool extension.
Exercise 2: Create Columnstore Indexes

Scenario
You plan to improve the performance of the AdventureWorksDW data warehouse by using columnstore
indexes. You need to improve the performance of queries that use the FactInternetSales and
FactProductInventory tables.
New data is loaded to the FactInternetSales table on a weekly basis by an ETL process that drops all
indexes, loads the new data, and re-creates all indexes. The FactProductInventory table is updated on an
ongoing basis.
You want to retain the existing indexes on the FactInternetSales table, but you do not need to retain any
existing indexes or keys on the FactProductInventory table.
1. Create a Columnstore Index on the FactInternetSales Table
2. Create a Columnstore Index on the FactProductInventory Table
 Task 1: Create a Columnstore Index on the FactInternetSales Table

1. In SQL Server Management Studio, in the D:\Labfiles\Lab06\Starter folder, open the Query
FactInternetSales.sql script file.
2. Configure SQL Server Management Studio to include the actual execution plan, and then execute the
script in the AdventureWorksDW database. Review the execution plan, and note the indexes that
were used.
3. Based on the scenario for this exercise, decide whether a clustered or nonclustered index is
appropriate for the FactInternetSales table.
4. Create the required columnstore index, dropping existing indexes and keys if required and including
all columns in the FactInternetSales table. Then reexecute the query to verify that the new
columnstore index is used along with existing indexes.
 Task 2: Create a Columnstore Index on the FactProductInventory Table

FactProductInventory.sql script file.
2. Configure SQL Server Management Studio to include the actual execution plan, and then execute the
script in the AdventureWorksDW database. Review the execution plan, and note the indexes that
were used.
3. Based on the scenario for this exercise, decide whether a clustered or nonclustered index is
appropriate for the FactProductInventory table.
4. Create the required columnstore index, dropping existing indexes and keys if required. Then
reexecute the query to verify that the new columnstore index is used along with existing indexes.
Results: After completing this exercise, you should have created columnstore indexes.

In this module, you have learned about some of the ways in which SQL Server 2014 takes advantage of
the increasing amount of RAM in modern servers, and provides in-memory optimizations for database
workloads.
7-1
Module 7
Designing and Implementing Views
Contents:
Module Overview 7-1
Lesson 1: Introduction to Views 7-2
Lesson 2: Creating and Managing Views 7-6
Lesson 3: Performance Considerations for Views 7-11
Lab: Designing and Implementing Views 7-15
Module Overview
Views are a type of virtual table because the result set of a view is not usually saved in the database. Views
can simplify the design of database applications by abstracting the complexity of the underlying objects.
Views can also provide a layer of security. It is possible to give users permission to access a view without
permission to access the objects on which the view is constructed.
Objectives
 Explain the role of views in database development.
 Create and manage views.
 Describe the performance-related impacts of views.

7-2 Designing and Implementing Views
Lesson 1
Introduction to Views
In this lesson, you will gain an understanding of views and how they are used. You will also investigate the
system views that Microsoft® SQL Server® data management software supplies. A view is effectively a
named SELECT query. Unlike ordinary tables (base tables) in a relational database, a view is not part of the
physical schema; it is a dynamic, virtual table that is computed or collected from data in the database.
Effective use of views in database system design helps improve performance and manageability. In this
lesson, you will learn about views, the different types of views, and how to use them.
Lesson Objectives
 Describe views.
 Describe the different types of view that SQL Server provides.

 Explain the advantages that views offer.
 Work with system views.
 Work with dynamic management views.
What Is a View?
You can think of a view as a named virtual table
that is defined through a SELECT statement. To an
application, a view behaves very similarly to a table.
The data that is accessible through a view is not

stored in the database as a distinct object, except in
the case of indexed views. (Indexed views are
described later in this module.) What is stored in
the database is the SELECT statement. The data
tables that the SELECT statement references are
known as the base tables for the view. In addition
to being based on tables, views can reference other
views. Queries against views are written in the same
way that queries are written against tables.
Filtering by Using Views

Views can filter the base tables vertically, horizontally, or in both ways. Vertical filtering is used to limit the
columns that the view returns. For example, consider a drop-down list of employee names that is
displayed in the user interface of an application. Although this data could be retrieved from the Employee
table, many of the columns in the Employee table might be private and should not be returned to all
users. It would be possible to provide an EmployeeLookup view to return only the columns that general
users are permitted to view.
Horizontal filtering is used to limit the rows that the view returns. For example, a Sales table might hold
details of the sales for the entire organization. Sales staff might only be permitted to view sales for their
own region or state. You could create a view that limits the rows that are returned to those for a particular
state or region.
Types of Views
There are four basic types of view: standard views,
system views (including dynamic management
views), indexed views, and partitioned views
(including distributed partitioned views).
Standard Views
Standard views combine data from one or more
base tables (or views) into a new virtual table. From
the base tables (or views), particular columns and
rows can be returned. Any computations, such as
joins or aggregations, are performed during query
execution for each query that references the view.
System Views
SQL Server provides system views, which show details of the system catalog or aspects of the state of SQL
Server. Dynamic management views (DMVs) were introduced in SQL Server 2005 and enhanced in every
edition since then. DMVs provide dynamic information about the state of SQL Server, such as information
about the current sessions or the queries those sessions are executing.
Indexed Views
Indexed views materialize the view through the creation of a clustered index on the view. This is usually
done to improve query performance and will consume disk space. You can avoid complex joins or lengthy
aggregations at execution time by precalculating the results. Indexed views are discussed later in this
module.
Partitioned Views
Partitioned views unite data from multiple tables into a single view. One column in the view defines which
underlying table stores the data and CHECK constraints on the table enforce this. Distributed partitioned
views are formed when the tables that are being combined by a UNION operation are located on
separate instances of SQL Server.
Advantages of Views
Views are generally used to focus, simplify, and
customize the perception that each user has of the
tables in the database.
Views provide a layer of abstraction in database

development. They can enable users to focus on a
subset of data that is relevant to them, or that they
are permitted to work with. Users do not need to
deal with the complex queries that might be
involved within the view. They can query the view
as they would query a table.
You can also use views as security mechanisms by

allowing users to access data through the view,
without granting them permissions to directly access the underlying base tables of the view.
Many external applications cannot execute stored procedures or Transact-SQL code, but can select data
from tables or views. Creating a view enables you to isolate the data that is needed for these export
functions.
It is possible to use views to provide a backward-compatible interface to emulate a table that previously
existed, but whose schema has changed. For example, if a Customer table has been split into two tables,
CustomerGeneral and CustomerCredit, a Customer view could be created over the two new tables to
make it appear that the Customer table still exists. This would enable existing applications to query the
data without requiring the applications to be altered.
Reporting applications often need to execute complex queries to retrieve the report data. Rather than
embedding this logic in the reporting application, a view could be created to supply the data that the
reporting application requires in a much simpler format.
System Views
SQL Server provides information about its
configuration through a series of system views.
These views also provide metadata that describes
both the objects that you create in the database
and the objects that SQL Server provides. Catalog
views are primarily used to retrieve metadata about
tables and other objects in databases.
Earlier versions of SQL Server provided a set of

virtual tables that were exposed as system views.
For backward compatibility, a set of “compatibility”
views have been provided to enable applications
that used the virtual tables to continue to work.
These views, however, are deprecated and you should not use them for new development work.
The International Organization for Standardization (ISO) has standards for Structured Query Language
(SQL). Each database engine vendor uses different methods of storing and accessing metadata, so a
standard mechanism was designed. This interface is provided by the views in the INFORMATION_SCHEMA
schema. The most commonly used INFORMATION_SCHEMA views are:
 INFORMATION_SCHEMA.CHECK_CONSTRAINTS INFORMATION_SCHEMA.COLUMNS
 INFORMATION_SCHEMA.PARAMETERS
 INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
 INFORMATION_SCHEMA.ROUTINE_COLUMNS
 INFORMATION_SCHEMA.ROUTINES
 INFORMATION_SCHEMA.TABLE_CONSTRAINTS INFORMATION_SCHEMA.TABLE_PRIVILEGES
 INFORMATION_SCHEMA.TABLES INFORMATION_SCHEMA.VIEW_COLUMN_USAGE
 INFORMATION_SCHEMA.VIEW_TABLE_USAGE INFORMATION_SCHEMA.VIEWS
Dynamic Management Views are discussed in the next topic

Dynamic Management Views

DMVs provide a relational method for querying the
internal state of a SQL Server instance.
SQL Server 2005 introduced the concept of

dynamic management objects (DMOs). These
objects include DMVs and dynamic management
functions (DMFs). Each object is used to return
internal state information from SQL Server. Many of
the objects provide detailed information about the
internal operation of SQL Server. DMOs have a
sys.dm_ prefix. The difference between DMVs and
DMFs is that DMFs have parameters passed to
them.
You can see the list of current DMVs by looking down the list of System Views in Object Explorer in SQL
Server Management Studio. Similarly, you can see the list of current DMFs by looking down the list of
System Functions in Object Explorer.
You can use DMOs to view and monitor the internal health and performance of a server along with
aspects of its configuration. They also have an important role in assisting with troubleshooting problems
(such as blocking issues) and with performance tuning.
Demonstration: Querying System Views and Dynamic Management Views

 Query system views and query dynamic management views.
Demonstration Steps
Query system views and dynamic management views

Lesson 2
Creating and Managing Views
In the previous lesson, you learned about the role of views. In this lesson, you will learn how to create,
drop, and alter views. You will also learn how views and the objects on which they are based have owners
and how this can affect the use of views. You will see how to find information about existing views and
how to obfuscate the definitions of views.
Lesson Objectives
 Create views.
 Drop views.
 Alter views.
 Explain the concept of ownership chaining and how it applies to views.

 List the available sources of information about views.
 Work with updatable views.
 Obfuscate view definitions.
Creating Views
To create a view, the database owner must grant
you permission to do so. Creating a view involves
associating a name with a SELECT statement.
CREATE VIEW
Views can be based on other views instead of being
based on the underlying tables. Up to 32 levels of
nesting are permitted. You should take care when
nesting views deeply because it can become
difficult to understand the complexity of the
underlying code and to troubleshoot performance
problems that are related to the views.
Views have no natural output order. Queries that access the views should specify the order for the
returned rows. You can use the ORDER BY clause in a view, but only to satisfy the needs of a clause such
as the TOP clause.
If you specify the WITH SCHEMABINDING option, the underlying tables cannot be changed in a way
that would affect the view definition. If you later decide to index the view, you must use the WITH
SCHEMABINDING option.
Expressions that are returned as columns need to be aliased. It is also common to define column aliases in
the SELECT statement within the view definition, but you can also provide a column list after the name of
the view.
You can see column aliases in the following code example:
CREATE VIEW
CREATE VIEW HumanResources.EmployeeList
(EmployeeID, FamilyName, GivenName)
AS
SELECT EmployeeID, LastName, FirstName
FROM HumanResources.Employee;
Dropping Views
Dropping a view removes the definition of the view
and all permissions that are associated with the
view.
DROP VIEW
Even if a view is re-created with exactly the same
name as a view that has been dropped, permissions
that were formerly associated with the view are
removed.
It is important to record why views are created and

then to drop them if they are no longer required for
the purpose for which they were created. Retaining
view definitions that are not in use adds to the work that is required when reorganizing the structure of
databases.
If a view was created by using the WITH SCHEMABINDING option, it will need to be removed before it is
possible to make changes to the structure of the underlying tables.
The DROP VIEW statement supports the dropping of multiple views via a comma-delimited list, as shown
in the following code example:
DROP VIEW
DROP VIEW Sales.WASales, Sales.CTSales, Sales.CASales;
Altering Views
After a view is defined, you can modify its definition
without dropping and re-creating the view.
ALTER VIEW
The ALTER VIEW statement modifies a previously
created view. (This includes indexed views, which
are discussed in the next lesson.)
The main advantage of using ALTER VIEW is that

any permissions that are associated with the view
are retained. Altering a view also involves less code
than dropping and re-creating a view.
Ownership Chains and Views

When you are querying a view, there needs to be
an unbroken chain of ownership from the view to
the underlying tables unless the user who is
executing the query also has permissions on the
underlying table or tables.
Ownership Chaining
One of the key reasons for using views is to provide
a layer of security abstraction so that access is given
to views and not to the underlying table or tables.
For this mechanism to function correctly, an
unbroken ownership chain must exist.
For example, a user, John, has no access to a table that Nupur owns. If Nupur creates a view or stored
procedure that accesses the table and gives John permission to the view, John can then access the view
and through it, the data in the underlying table. However, if Nupur creates a view or stored procedure
that accesses a table that Tim owns and grants John access to the view or stored procedure, John would
not be able to use the view or stored procedure, even if Nupur has access to Tim's table, because of the
broken ownership chain. Two options are available to correct this situation:
 Tim could own the view or stored procedure instead of Nupur.

 John could be granted permission to the underlying table. (This is often undesirable.)
Ownership Chains vs. Schemas

SQL Server 2005 introduced the concept of schemas. At that point, the two-part naming for objects
changed from owner.object to schema.object. There seems to be a widespread misunderstanding that
since that time, objects no longer have owners. This is not true. Objects still have owners. Even schemas
have owners. The configuration of security is simplified if schema owners also own objects that are
contained in the schemas.
Sources of Information About Views

Views are queried in the same way that ordinary
tables are queried. However, you may also want to
discover information about how a view is defined or
about its properties.
You may need to see the definition of the view to

understand how its data is derived from the source
tables or to see the data that the view defines.
SQL Server Management Studio provides access to

a list of views in Object Explorer. This includes both
system views and views that users have created. By
expanding the view nodes in Object Explorer, you
can see details of the columns, triggers, indexes,
and statistics that are defined on the views.
In Transact-SQL, you can obtain the list of views in a database by querying the sys.views view.
In earlier versions of SQL Server, you could locate object definitions (including the definitions of
unencrypted views) by executing the sp_helptext system stored procedure.
The OBJECT_DEFINITION() function enables you to query the definition of an object in a relational
format. The output of the function is easier to consume in an application than the output of a system
stored procedure such as sp_helptext.
If you change the name of an object that a view references, you must modify the view so that its text
reflects the new name. Therefore, before renaming an object, display the dependencies of the object first
to determine whether the proposed change will affect any views.
You can find overall dependencies by querying the sys.sql_expression_dependencies view. You can find
column-level dependencies by querying the sys.dm_sql_referenced_entities view.
Updatable Views
It is possible to update data in the base tables by
updating a view.
Updates that are performed on views cannot affect

columns from more than one base table. (To work
around this restriction, you can create INSTEAD OF
triggers. These triggers are discussed in Module 10
Responding to Data Manipulation via Triggers.
Although views can contain aggregated values from
the base tables, it is not possible to update these
columns or any columns that are involved in
grouping operations such as GROUP BY, HAVING,
or DISTINCT.
It is possible to modify a row in a view in such a way that the row would no longer belong to the view. For
example, you could have a view that selected rows where the State column contained the value WA. You
could then update the row and set the State column to the value CA. If the view was queried again, the
row would seem to have vanished. To avoid the chance of this happening, you can specify the WITH
CHECK OPTION clause when you define the view. It will check during data modifications that any row that
had been modified would still be returned by the same view.
Data that is modified in a base table via a view still needs to meet the restrictions on those columns (such
as nullability, constraints, and defaults) as if the base table was modified directly. This can be particularly
challenging if all of the columns in the base table are not present in the view. For example, an INSERT
operation on the view would fail if the base table upon which it was based required mandatory columns
that were not exposed in the view and did not have DEFAULT values.
Obfuscating View Definitions

Database developers often want to protect the
definitions of their database objects. You can
include the WITH ENCRYPTION clause when you
are defining or altering a view.
WITH ENCRYPTION
The WITH ENCRYPTION clause provides limited
obfuscation of the definition of a view.
It is important to keep copies of the source code for

views. This is even more important when the view is
created by using the WITH ENCRYPTION clause.
Encrypted code (including the code definitions of

views) makes it harder to perform problem diagnosis and query tracing and tuning.
The encryption that is provided is not very strong. Many third-party tools exist that can decrypt the source
code, so you should not depend on this to protect your intellectual property if doing so is critical to you.
Demonstration: Implementing Views

 Create, query and drop views.
Demonstration Steps
Create, query and drop views

Lesson 3
Performance Considerations for Views
Now that you understand why views are important and know how to create them, it is important to
understand the potential performance impacts of using views.
In this lesson, you will see how views are incorporated directly into the execution plans of queries in which
they are used. You will see the effect and potential disadvantages of nesting views and see how it is
possible to improve performance in some situations.
Finally, you will see how it is possible to combine the data from multiple tables into a single view, even if
those tables are on different servers.
Lesson Objectives
 Explain the dynamic resolution process for views.

 List the most important considerations for working with nested views.
 Describe the purpose of partitioned views.
 Create indexed views.

 List the most important considerations for working with indexed views.
Views and Dynamic Resolution

Standard views are expanded and incorporated into
the queries in which they are referenced. The
objects that they reference are resolved at
execution time.
A single query plan is created that merges the

query that is being executed and the definition of
any views that it accesses. A separate query plan for
the view is not created.
Merging the view query into the outer query is

called “inlining” the query. It can be very beneficial
to performance because SQL Server can eliminate
unnecessary joins and table accesses from queries.
Standard views do not appear in execution plans for queries because the views are not accessed. The
underlying objects that they reference will be seen in the execution plans.
You should avoid using SELECT * in a view definition. As an example, you will notice that, if you add a
new column to the base table, the view will not reflect the column until the view has been refreshed. You
can correct this situation by executing an updated ALTER VIEW statement or by calling the
sp_refreshview system stored procedure.
Considerations for Nested Views

Although views can reference other views, you need
to consider carefully when doing this.
You can nest views up to 32 levels deep. Layers of

abstraction are often regarded as desirable when
designing code in any programming language.
Views can help to provide this.
The biggest concern with nested views is that it is
easy to create code that is difficult for the query
optimizer to work with, without realizing that this is
occurring.
Nested views can make it much harder to

troubleshoot performance problems and more difficult to understand where complexity is arising in code.
Partitioned Views
Partitioned views enable you to split the data in a
large table into smaller member tables. The data is
partitioned between the member tables based on
ranges of data values in one of the columns.
Data ranges for each member table in a partitioned
view are defined in a CHECK constraint that is
specified on the partitioning column. A UNION ALL
statement is used to combine selects of all of the
member tables into a single result set.
When you perform an INSERT operation on the

view, SQL Server places the row into the correct
underlying table from the view.
In a local partitioned view, all participating tables and the view reside on the same instance of SQL Server.
In most cases, you should use table partitioning instead of local partitioned views.
In a distributed partitioned view, at least one of the participating tables resides on a different (remote)
server. You can use distributed partitioned views to implement a federation of database servers.
Good planning and testing are crucial because major performance problems can arise if the design of the
partitioned views is not appropriate.
Demonstration: Investigating Views and Performance

 Investigate how views can affect query performance.
Demonstration Steps
Investigate how views can affect query performance
2. If you have not completed the previous demonstrations, run D:\Demofiles\Mod07\Setup.cmd as an

administrator to revert any changes.

Considerations for Indexed Views

The use of indexed views is governed by a set of
considerations that must be met for the indexes on
the views to be utilized. Premium editions of SQL
Server take more complete advantage of indexed
views.
Indexed views can be a challenge to set up and use.

SQL Server Books Online details a list of SET options
that need to be in place both at creation time for
the indexed view and in sessions that take
advantage of the indexed views. You should pay
particular attention to the
CONCAT_NULL_YIELDS_NULL and
QUOTED_IDENTIFIER settings.
You can only build indexes on views that are deterministic. That is, the views must always return the same
data unless the underlying table data is altered. For example, an indexed view could not contain a column
that returned the outcome of the SYSDATETIME() function.
WITH SCHEMABINDING is an option that the view must have been created with before it is possible to
create an index on the view. The WITH SCHEMABINDING option prevents changes to the schema of the
underlying tables while the view exists.
Creating Indexed Views

It is possible to create clustered indexes over views.
A view that has a clustered index is called an
“indexed view.” Indexed views are the closest
equivalent in SQL Server to “materialized views” in
other relational database management systems
(RDBMSs). Indexed views can have a profound
(positive) impact on the performance of queries in
particular circumstances.
The concept of an indexed view might at first seem

odd because an index is being created over an
object that is not persisted.
Indexed views are very useful for maintaining

precalculated aggregates or joins. When updates to the underlying data are made, SQL Server
automatically makes updates to the data that is stored in the indexed view.
You can imagine an indexed view as a special type of table that has a clustered index. The differences are
that the schema of the table is not defined directly; it is defined by the SELECT statement in the view.
Also, you don't modify the table directly; you modify the data in the “real” tables that underpin the view.
When the data in the underlying tables is modified, SQL Server realizes that it needs to update the data in
the indexed view.
Indexed views have a negative impact on the performance of INSERT, DELETE, and UPDATE operations
on the underlying tables, but they can also have a positive impact on the performance of SELECT queries
on the view. They are most useful for data that is regularly selected, but much less frequently updated.
Lab: Designing and Implementing Views

Scenario
A new web-based stock promotion system is being rolled out. Your manager is very concerned about
providing access from the web-based system directly to the tables in your database. She has requested
you to design some views that the web-based system could connect to instead.
Details of organizational contacts are contained in several tables. The relationship management system
that the account management team is using needs to be able to gain access to these contacts. However,
the team needs a single view that contains all contacts. You need to design, implement, and test the
required view.
Objectives
 Design and implement the WebStock view.
 Design and implement the Contacts view.

User name: AdventureWorks\Student
Password: Pa$$w0rd
Exercise 1: Design and Implement the WebStock Views

Scenario
Supporting Documentation
View1: OnlineProducts
ViewColumn SourceColumn
ProductID ProductID
ProductName ProductName
ProductNumber ProductNumber
Color Color (note “N/A” should be returned when

NULL)
Availability Based on DaysToManufacture (0 =

Instock, 1 = Overnight, 2 = Fast, Other
Values = Call)
Size Size
UnitOfMeasure SizeUnitMeasureCode
Price ListPrice
Weight Weight
This view is based on the Marketing.Product table. Rows should only appear if the product has begun to
be sold and is still being sold. (Derive this from SellStartDate and SellEndDate.)
View2: AvailableModels
ViewColumn SourceColumn
ProductID ProductID
ProductName ProductName
ProductModelID ProductModelID
ProductModel ProductModel
This view is based on the Marketing.Product and Marketing.ProductModel tables. Rows should only
appear if the product has at least one model, has begun to be sold, and is still being sold. (Derive this
from SellStartDate and SellEndDate.)
2. Review the Design Requirements
3. Design and Implement the Views

4. Test the Views

 Task 2: Review the Design Requirements

1. You have been given the design requirements for the OnlineProducts and AvailableModels views in
the Exercise Scenario. Review these requirements.
 Task 3: Design and Implement the Views

1. Design and implement the views.
 Task 4: Test the Views

1. Query both views to ensure that they return the required data.
Created the OnlineProducts view.
Created the AvailableModels view.
Exercise 2: Design and Implement the Contacts View

Scenario
View3: Contacts
ViewColumn Customer SalesPerson
ContactID CustomerID SalespersonID
FirstName FirstName FirstName
MiddleName MiddleName MiddleName
LastName LastName LastName
ContactRole ‘Customer’ ‘Salesperson’
This view is based on the Sales.Customer and Sales.SalesPerson tables.
2. Design and Implement the View
3. Test the View

1. You have been given the design requirements for the Contacts view in the Exercise Scenario. Review
these requirements.
 Task 2: Design and Implement the View

1. Design and implement the views.
 Task 3: Test the View

1. Query the view to ensure that it returns the required data
Created the Contacts view.
Question: What considerations are there for views that involve multiple tables?
Question: What is required for columns in views that are created from expressions?

Best Practice: Use views to focus data for users.
Avoid nesting many layers within views.
Avoid ownership chain problems within views.
Ensure consistent connection SET options when intending to index views.
Review Question(s)
Question: How does SQL Server store the view in the database?
Question: What is a standard, nonindexed view?
Question: What is an unbroken ownership chain?

8-1
Module 8
Designing and Implementing Stored Procedures
Contents:
Module Overview 8-1
Lesson 1: Introduction to Stored Procedures 8-2
Lesson 2: Working with Stored Procedures 8-6
Lesson 3: Implementing Parameterized Stored Procedures 8-12
Lesson 4: Controlling Execution Context 8-17
Lab: Designing and Implementing Stored Procedures 8-20
Module Overview
Stored procedures enable you to create Transact-SQL logic that will be stored and executed at the server.
This logic might enforce business rules or data consistency. Stored procedures are also used to return sets
of rows based upon input parameters. You will see the potential advantages of the use of stored
procedures in this module along with guidelines on creating them.
Objectives
 Describe the role of stored procedures and the potential benefits of using them.
 Work with stored procedures.
 Implement parameterized stored procedures.
 Control the execution context of a stored procedure.

8-2 Designing and Implementing Stored Procedures
Lesson 1
Introduction to Stored Procedures
Microsoft® SQL Server® data management software provides several stored procedures and users can
create stored procedures, too. In this lesson, you will see the role of stored procedures and the potential
benefits of using them. System stored procedures provide a large amount of prebuilt functionality that
you can take advantage of when you are building applications. When you are designing stored
procedures, it is also important to realize that not all Transact-SQL statements are permitted within stored
procedures.
Lesson Objectives
 Describe the role of stored procedures.
 Identify the potential benefits of using stored procedures.

 Work with system stored procedures.
 Identify statements that are not permitted within the body of a stored procedure declaration.
What Is a Stored Procedure?

A stored procedure is a named collection of
Transact-SQL statements that is stored on the server
within the database itself. Stored procedures are a
method of encapsulating repetitive tasks; they
support user-declared variables, conditional
execution, and other powerful programming
features.
Transact-SQL Code and Logic Reuse

When applications interact with SQL Server, they
can send commands to the server in two basic ways.
The application could send each batch of Transact-
SQL commands to the server to be executed and
resend the same commands if the same function needs to be executed again later.
Alternatively, a stored procedure could be created at the server level to encapsulate all of the Transact-
SQL statements that are required. Stored procedures are given names and are called by name. The
application can then simply ask to execute the stored procedure each time it needs to use that same
functionality, rather than sending all of the statements that would otherwise be required.
Stored Procedures
Stored procedures are similar to procedures, methods, and functions in high-level languages. They can
have input and output parameters and a return value.
As a side effect of executing the stored procedure, rows of data can also be returned from the stored
procedure. In fact, multiple rowsets can be returned from a single stored procedure.
Stored procedures can be created in either Transact-SQL code or managed .NET code and are run by the
EXECUTE Transact-SQL statement. The creation of stored procedures in managed code will be discussed
in a Module 12, Implementing Managed Code in SQL Server.
Benefits of Stored Procedures

Using stored procedures offers several benefits over
issuing Transact-SQL code directly from an
application.
Security Boundary
Stored procedures can be part of a scheme that
helps to increase application security. They can be
treated as a security boundary. Users can be given
permission to execute a stored procedure without
being given permission to access the objects that
the stored procedure accesses.
For example, you can give a user (or set of users via
a role) permission to execute a stored procedure that updates a table without granting the user any
permissions directly on the table.
Modular Programming
Code reuse is important. Stored procedures help by enabling logic to be created once and then enabling
the logic to be called many times and from many applications. Maintenance is easier because if a change
is needed, you only need to change the procedure, without needing to change the application code at all
in many cases. Changing a stored procedure could avoid the need to change the data access logic in a
group of applications.
Delayed Binding
It is possible to create a stored procedure that accesses (or references) a database object that does not yet
exist. This can be helpful in simplifying the order in which database objects need to be created. This is
referred to as deferred name resolution.
Performance
Sending the name of a stored procedure to be executed rather than hundreds or thousands of lines of
executable Transact-SQL code can offer a significant reduction in the level of network traffic.
Before Transact-SQL code is executed, it needs to be compiled. When a stored procedure is compiled, in
many cases, SQL Server will attempt to retain (and reuse) the query plan that it previously generated, to
avoid the cost of the compilation of the code.
Although it is possible to reuse execution plans for ad-hoc Transact-SQL code that applications have
issued, SQL Server favors the reuse of stored procedure execution plans. Query plans for ad-hoc Transact-
SQL statements are among the first items to be removed from memory when memory pressure is
occurring.
The rules that govern the reuse of query plans for ad-hoc Transact-SQL code are largely based on
matching the text of the queries exactly. Any difference at all (for example, white space or casing) will
cause a different query plan to be used, unless the difference is only a value that SQL Server decides must
be the equivalent of a parameter.
Stored procedures have a much higher chance of achieving query plan reuse.
Working with System Stored Procedures

SQL Server is supplied with a large amount of
prebuilt functionality that is shipped within system
stored procedures and system extended stored
procedures.
Types of System Stored Procedure

There are two basic types of system stored
procedure: system stored procedures and system
extended stored procedures. Both are supplied
prebuilt with SQL Server. The core difference
between the two is that the code for system stored
procedures is written in Transact-SQL and is
supplied in the master database that is installed
with SQL Server, whereas the code for the system extended stored procedures is written in unmanaged
native code (typically C++) and supplied via a dynamic-link library (DLL). Note that since SQL Server 2005,
the objects that the procedures access are actually located in a hidden resource database rather than
directly in the master database, but the effect is the same.
Originally, there was a basic distinction in the naming of these stored procedures, where system stored
procedures had an sp_ prefix and system extended stored procedures had an xp_ prefix. Over time, the
need to maintain backward compatibility has caused a mixture of these prefixes to appear in both types
of procedure. Now, most system stored procedures have an sp_ prefix and most system extended stored
procedures have an xp_ prefix.
System Stored Procedures

System stored procedures are “special” in that you can execute them from within any database without
needing to specify the master database as part of their name. They are typically used for administrative
tasks that relate to configuring servers, databases, and objects or for retrieving information about them.
System stored procedures are created within the sys schema. Examples of system stored procedures are
sys.sp_configure, sys.sp_addmessage, and sys.sp_executesql.
System Extended Stored Procedures

System extended stored procedures are used to extend the functionality of the server in ways that you
cannot achieve by using Transact-SQL code alone. Examples of system extended stored procedures are
sys.xp_dirtree, sys.xp_cmdshell, and sys.sp_trace_create. (Note how the last example here has an sp_
prefix).
User Extended Stored Procedures

Although it is still possible to create user-defined extended stored procedures and attach them to SQL
Server, the ability to do so is now deprecated. Extended stored procedures run directly within the memory
space of SQL Server. This is not a safe place for users to be executing code. User-defined extended stored
procedures are well known to the SQL Server product support group as a source of problems that are
difficult to resolve.
You should now use managed-code stored procedures instead of user-defined extended stored
procedures. The use of managed code to create stored procedures will be described in Module 12,
Implementing Managed Code in SQL Server.
Statements Not Permitted in Stored Procedures

Not all Transact-SQL statements are permitted
within stored procedure declarations. The table on
the slide shows the statements that you cannot use.
You can use most Transact-SQL statements within

the bodies of stored procedures. For the statements
that are not permitted, the reason usually relates to
one of the following:
 Creation of other objects.
 Changing SET options that relate to query

plans.
 Changing database context by using the USE

statement.
Note that stored procedures can access objects in other databases, but the objects must be referred to by
name, not by attempting to change the database context to another database. That is, you cannot use the
USE statement within the body of a stored procedure in the way that you can use it in a Transact-SQL
script.
Demonstration: Working with System Stored Procedures and System

Extended Stored Procedures
 Execute system stored procedures
Demonstration Steps
Execute system stored procedures

Lesson 2
Working with Stored Procedures
Now that you understand why stored procedures are important, you need to understand the practicalities
that are involved in working with stored procedures.
Lesson Objectives
 Create a stored procedure.
 Execute stored procedures.
 Alter a stored procedure.
 Drop a stored procedure.
 Identify stored procedure dependencies.
 Explain guidelines for creating stored procedures.
 Obfuscate stored procedure definitions.
Creating a Stored Procedure

You use the Transact-SQL CREATE PROCEDURE
statement to create new procedures.
CREATE PROCEDURE is commonly abbreviated to

CREATE PROC. You cannot replace a procedure by
using the CREATE PROC statement. You need to
alter it explicitly by using an ALTER PROC
statement or by dropping it and then re-creating it.
The CREATE PROC statement must be the only

statement in the Transact-SQL batch. All statements
from the AS keyword until the end of the script or
until the end of the batch (using a batch separator
such as GO) will become part of the body of the
stored procedure. Creating a stored procedure requires both the CREATE PROCEDURE permission in the
current database and the ALTER permission on the schema in which the procedure is being created. It is
important to keep connection settings such as QUOTED_IDENTIFIER and ANSI_NULLS consistent when
you are working with stored procedures. The settings that are associated with the stored procedure are
taken from the settings in the session where it is created.
Stored procedures are always created in the current database with the single exception of stored
procedures that are created with a number sign (#) prefix in their name. The # prefix on a name indicates
that it is a temporary object. As such, it would be created in the tempdb database and removed at the
end of the user's session.
Debugging Stored Procedures

When you are working with stored procedures, a good practice is first to write and test the Transact-SQL
statements that you want to include in your stored procedure and then, if you receive the results that you
expected, wrap the Transact-SQL statements in a CREATE PROCEDURE statement.
Note: Although wrapping the body of a stored procedure with a BEGIN…END block is not
required, doing so is considered a good practice. Note also that you can terminate the execution
of a stored procedure by executing a RETURN statement within the stored procedure.
Executing Stored Procedures

You use the Transact-SQL EXECUTE statement to
execute stored procedures. EXECUTE is commonly
abbreviated to EXEC.
EXECUTE Statement
The EXECUTE statement is mostly used to execute
stored procedures, but can also be used to execute
other objects such as dynamic Structured Query
Language (SQL) statements.
As mentioned in the first lesson, you can execute
system stored procedures within the master
database without having to explicitly refer to that
database. That does not apply to other stored procedures.
Two-Part Naming on Referenced Objects

When you are creating stored procedures, it is very important to use at least two-part names for objects
that the stored procedure references. If you refer to a table by both its schema name and its table name,
you avoid any ambiguity about which table you are referring to and you maximize the chance of SQL
Server being able to reuse query execution plans for the stored procedure.
If you use only the name of a table, SQL Server will first search in your default schema for the table. Then,
if it does not locate a table that has that name, it will search the dbo schema for a table that has that
name. This minimizes options for query plan reuse for SQL Server because, until the moment when the
stored procedure is executed, SQL Server cannot tell which objects it needs to refer to because different
users can have different default schemas.
Two-Part Naming When Creating Stored Procedures

If you create a stored procedure by only supplying the name of the procedure (and not the schema name,
too), SQL Server will attempt to create the stored procedure in your default schema. Scripts that create
stored procedures in this way tend to be fragile because the location of the created stored procedure
would depend upon the default schema of the user who was executing the script.
Two-Part Naming When Executing Stored Procedures

When you execute a stored procedure, you should also supply the name of both the schema and the
stored procedure. If you supply only the name of the stored procedure, SQL Server can end up trying to
find the stored procedure in several places.
If the stored procedure name starts with sp_ (not recommended for user stored procedures):
 SQL Server first looks in the master database in the sys schema for the stored procedure.
 SQL Server then looks in the default schema for the user who is executing the stored procedure.
 SQL Server then looks in the dbo schema in the current database for the stored procedure.
Having SQL Server perform unnecessary steps to locate a stored procedure reduces performance for no
reason.
Altering a Stored Procedure

You use the Transact-SQL ALTER PROCEDURE
statement to replace an existing procedure. ALTER
PROCEDURE is commonly abbreviated to ALTER
PROC.
ALTER PROC
The main reason for using the ALTER PROC
statement is to retain any existing permissions on
the procedure while it is being changed. Users may
have been granted permission to execute the
procedure. If you drop the procedure and re-create
it, those permissions that had been granted to the
users would be removed when the procedure was
dropped.
Procedure Type
Note that the type of procedure cannot be changed. For example, a Transact-SQL procedure cannot be
changed to a managed-code procedure by using an ALTER PROCEDURE statement or vice versa.
Connection Settings
The connection settings, such as QUOTED_IDENTIFIER and ANSI_NULLS, that will be associated with the
modified stored procedure will be those taken from the session that makes the change, not from the
original stored procedure, so it is important to keep these consistent when you are making changes.
Complete Replacement
Note that when you alter a stored procedure, you need to resupply any options (such as the WITH
ENCRYPTION clause) that were supplied while creating the procedure. None of these options are retained
and they are replaced by whatever options are supplied in the ALTER PROC statement.
Dropping a Stored Procedure

Dropping a stored procedure is straightforward.
You use the DROP PROCEDURE statement to drop
a stored procedure. DROP PROCEDURE is
commonly abbreviated to DROP PROC.
sys.procedures System View

You can see a list of existing procedures in the
current database by querying the sys.procedures
view.
Permissions
Dropping a procedure requires either ALTER
permission on the schema that the procedure is
part of or CONTROL permission on the procedure itself.
Stored Procedure Dependencies

Before you drop a stored procedure, it is a good
idea to check for any other objects that are
dependent upon the stored procedure.
sp_depends
Earlier versions of SQL Server used the sp_depends
system stored procedure to return details of
dependencies between objects. It was known to
have issues and to report incomplete information
due to issues with deferred name resolution.
sys.sql_expression_dependencies
Use of the sys.sql_expression_dependencies view
replaces the previous use of the sp_depends system stored procedure. The
sys.sql_expression_dependencies view provides a “one row per name” dependency on user-defined
entities in the current database. sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities
provide more targeted views over the data that the sys.sql_expression_dependencies view provides.
You will see an example of these dependency views being used in the next demonstration.
Guidelines for Creating Stored Procedures

There are several important guidelines that you
should consider when you are creating stored
procedures.
Qualifying Names Inside Stored

Procedures
Earlier in this lesson, the importance of using at
least two-part naming when referring to objects
within a stored procedure was described. This
applies both to the creation of stored procedures
and to their execution.
Keeping Consistent SET Options

Database Engine saves the settings of both SET QUOTED_IDENTIFIER and SET ANSI_NULLS when a
Transact-SQL stored procedure is created or altered. These original settings are used when the stored
procedure is executed.
Applying Consistent Naming Conventions

It is recommended that you do not create any stored procedures that use sp_ as a prefix. SQL Server uses
the sp_ prefix to designate system stored procedures. The name that you choose may conflict with some
future system procedure.
It is important to have a consistent way of naming your stored procedures. For example, some people use
a naming convention that is based on the use of a table name followed by an action. However, this does
not work well for more complex procedures that affect multiple tables. Others use an action verb followed
by a description of the action to be performed.
There is no right or wrong way to do this in all situations, but you should decide on a method for naming
objects that your applications are to use and apply the method consistently. It is possible to enforce
naming conventions on most objects by using Policy-Based Management (first introduced in SQL Server
2008 and beyond the scope of this course) or DDL triggers (first introduced in SQL Server 2005 and also
beyond the scope of this course).
Using @@nestlevel to See Current Nesting Level

Stored procedures are nested when one stored procedure calls another or executes managed code by
referencing a common language runtime (CLR) routine, type, or aggregate. You can nest stored
procedures and managed-code references up to 32 levels. You can use @@nestlevel to check the nesting
level of the current stored procedure execution.
Keeping to One Procedure for Each Task

Avoid writing “one procedure to rule them all” (with apologies to JRR Tolkien and The Lord of the Rings).
Don't write one procedure that does an enormous number of tasks. Doing this limits the possibilities for
reuse and can hinder performance.
Obfuscating Stored Procedure Definitions

SQL Server enables you to obfuscate the definition
of stored procedures by using the WITH
ENCRYPTION clause. You must exercise caution in
using it, however, because it makes working with
the application more difficult and is likely not to
achieve the aims at which it is being targeted.
WITH ENCRYPTION
As mentioned in Module 7, it is important to
understand that although SQL Server provides the
WITH ENCRYPTION clause to obfuscate the
definition of your stored procedures, the encryption
is not particularly strong.
In fact, the encryption is known to be relatively easy to defeat because the encryption keys are stored in
known locations within the encrypted text. There are both direct methods and several third-party tools
that can reverse the encryption.
You need to keep original copies of the source code regardless of the fact that decryption might be
possible. Do not depend upon this.
Encrypted code is much harder to work with in terms of diagnosing and tuning performance issues.
Demonstration: Implementing Stored Procedures

 Create, execute, and alter a stored procedure.
Demonstration Steps
Create, execute, and alter a stored procedure
2. Ensure that you have run the previous demos in this module.
Lesson 3
Implementing Parameterized Stored Procedures
The stored procedures that you have seen earlier in this module have not involved parameters. They have
produced their output without needing any input from the user and they have not returned any values
apart from the rows that they have returned. Stored procedures are more flexible when you include
parameters as part of the procedure definition because you can create more generic application logic.
Stored procedures can use both input and output parameters and return values.
Although the reuse of query execution plans is desirable in general, there are situations where this reuse is
detrimental. You will see situations where this can occur and consider options for workarounds to avoid
the detrimental outcomes.
Lesson Objectives
 Parameterize stored procedures.
 Use input parameters.
 Use output parameters.

 Explain the issues that surround parameter sniffing and performance, and describe the potential
workarounds.
Working with Parameterized Stored Procedures

Parameterized stored procedures enable a much
higher level of code reuse. They contain three major
components: input parameters, output parameters,
and return values.
Input Parameters
Parameters are used to exchange data between
stored procedures and the application or tool that
called the stored procedure. They enable the caller
to pass a data value to the stored procedure. To
define a stored procedure that accepts input
parameters, you declare one or more variables as
parameters in the CREATE PROCEDURE statement.
You will see an example of this in the next topic.
Output Parameters
Output parameters enable the stored procedure to pass a data value or a cursor variable back to the
caller. To use an output parameter within Transact-SQL, you must specify the OUTPUT keyword in both
the CREATE PROCEDURE statement and the EXECUTE statement.
Return Values
Every stored procedure returns an integer return code to the caller. If the stored procedure does not
explicitly set a value for the return code, the return code is 0 if no error occurs; otherwise a negative value
is returned.
Return values are commonly used to return a status result or an error code from a procedure and are sent
by the Transact-SQL RETURN statement.
Although it is possible to send a value that is related to business logic via a RETURN statement, in
general, you should use output parameters to generate values rather than the RETURN value.
Using Input Parameters

Stored procedures can accept input parameters in a
similar way to how parameters are passed to
functions, methods, or subroutines in higher-level
languages.
Stored procedure parameters must have an at sign

(@) prefix and must have a data type specified. The
data type will be checked when a call is made.
There are two ways to call a stored procedure by

using input parameters. One is to pass the
parameters as a list in the same order as in the
CREATE PROCEDURE statement. The other is to
pass a parameter name and value pair. You cannot
combine these two options in a single EXEC call.
Default Values
Provide default values for a parameter where appropriate. If a default is defined, a user can execute the
stored procedure without specifying a value for that parameter.
This is an example of a default value in a stored procedure:
Default Values
CREATE PROCEDURE Sales.OrdersByDueDateAndStatus
@DueDate datetime, @Status tinyint = 5
AS
Two parameters have been defined (@DueDate and @Status). The @DueDate parameter has no default
value and must be supplied when the procedure is executed. The @Status parameter has a default value
of 5. If a value for the parameter is not supplied when the stored procedure is executed, a value of 5 will
be used.
Validating Input Parameters

As a best practice, validate all incoming parameter values at the beginning of a stored procedure to trap
missing and invalid values early. This might include checking whether the parameter is NULL. Validating
parameters early avoids doing substantial work in the procedure and then having to undo all that work.
Executing a Stored Procedure by Using Input Parameters

This is an example of executing a stored procedure and supplying input parameters:
Executing a Stored Procedure by Using Input Parameters

EXEC Sales.OrdersByDueDateAndStatus '20050613',8;
This execution supplies a value for both @DueDate and @Status. Note that the names of the parameters
have not been mentioned. SQL Server knows which parameter is which by its position in the parameter
list.
This is an example of the previous stored procedure with one input parameter supplied and one
parameter using the default value:
Using Default Values

EXEC Sales.OrdersByDueDateAndStatus '20050713';
In this case, a value for the @DueDate parameter has been supplied, but no value for the @Status
parameter has been supplied. In this case, the procedure will be executed with the @Status value set at a
default value of 5.
This is an example of a stored procedure being executed and both parameters are defined by name.
Identifying Parameters by Name

EXEC Sales.OrdersByDueDateAndStatus @DueDate = '20050713',
@Status = 5;
In this case, the stored procedure is being called by using both parameters, but they are being identified
by name.
In this example, the results will be the same, even though they are in a different order, because the
parameters are defined by name:
Identifying Parameters by Name

EXEC Sales.OrdersByDueDateAndStatus @Status = 5,
@DueDate = '20050713';
Using Output Parameters

Output parameters are declared and used in a
similar way to input parameters, but output
parameters have a few special requirements.
Requirements for Output Parameters

 You must specify the OUTPUT keyword when
you are declaring the output parameters of the
stored procedure.
 You must also specify the OUTPUT keyword in

the list of parameters that are passed during
the EXEC statement.
Look at the beginning of the procedure declaration

in the example on the slide.
Input and Output Parameters

CREATE PROC Sales.GetOrderCountByDueDate
@DueDate datetime, @OrderCount int OUTPUT
AS
In this case, the @DueDate parameter is an input parameter and the @OrderCount parameter has been
specified as an output parameter. Note that, in SQL Server, there is no true equivalent of a .NET output
parameter. SQL Server output parameters are really input/output parameters.
Now look at how the procedure is called.
Executing a Stored Procedure by Using Input and Output Parameters

DECLARE @DueDate datetime = '20050713';
DECLARE @OrderCount int;
EXEC Sales.GetOrderCountByDueDate @DueDate, @OrderCount OUTPUT;
SELECT @OrderCount;
First, variables to hold the parameter values have been declared. In this case, a variable to hold a due date
has been declared, along with another to hold the order count.
In the EXEC call, note that the @OrderCount parameter is followed by the OUTPUT keyword. If you do
not specify the output parameter in the EXEC statement, the stored procedure would still execute as
normal, including preparing a value to return in the output parameter. However, the output parameter
value would simply not be copied back into the @OrderCount variable. This is a common bug when
working with output parameters.
Finally, you would then use the returned value in the business logic that follows the EXEC call.
Parameter Sniffing and Performance

In general, it is good to be able to reuse query
plans when a stored procedure is reexecuted.
Sometimes, however, a stored procedure would
benefit from an entirely different query execution
plan for different parameter values.
It has been mentioned that SQL Server attempts to
reuse query execution plans from one execution of
a stored procedure to the next. Although this is
mostly helpful, imagine a procedure that takes a
range of names as parameters. If you ask for the
rows from A to A, you might need a very different
query plan to the times when you ask for A to Z.
SQL Server provides various ways to deal with this problem, which is often called a “parameter-sniffing”
problem. Note that parameter sniffing only applies to parameters, not to variables within the batch. The
code for these looks very similar, but variable values are not “sniffed” at all and this can lead to poor
execution plans regardless.
WITH RECOMPILE
You can add a WITH RECOMPILE option when you are declaring a stored procedure. This causes the
procedure to be recompiled every time it is executed.
sp_recompile System Stored Procedure

If you call sp_recompile, any existing plans for the stored procedure that is passed to it will be marked as
invalid and the procedure will be recompiled next time it is executed. You can also pass the name of a
table or view to this procedure. In that case, all existing plans that reference the object will be invalidated
and recompiled the next time they are executed.
EXEC WITH RECOMPILE

If you add WITH RECOMPILE to the EXEC statement, SQL Server will recompile the procedure before
running it and will not store the resulting plan. In this case, the original plan would be preserved and can
be reused later.
OPTIMIZE FOR
There is an OPTION (OPTIMIZE FOR) query hint that enables you to specify the value of a parameter
that should be assumed when compiling the procedure, regardless of the actual value of the parameter.
You can see an example of this in the following code example.
OPTIMIZE FOR
CREATE PROCEDURE dbo.GetProductNames
@ProductIDLimit int
AS
BEGIN
SELECT ProductID,Name
FROM Production.Product
WHERE ProductID < @ProductIDLimit
OPTION (OPTIMIZE FOR (@ProductIDLimit = 1000))
END;
Demonstration: Passing Parameters to Stored Procedures

 Pass parameters to stored procedures.
Demonstration Steps
Pass parameters to stored procedures
2. Ensure that you have run the previous demos in this module

Lesson 4
Controlling Execution Context
Stored procedures normally execute in the security context of the user who is calling the procedure. As
long as a chain of ownership extends from the stored procedure to the objects that are referenced, the
user can execute the procedure without the need for permissions on the underlying objects. Ownership-
chaining issues with stored procedures are identical to those for views. Sometimes, however, more precise
control over the security context in which the procedure is executing is desired.
Lesson Objectives
 Control execution context.
 Use the EXECUTE AS clause.
 View execution context.
Controlling Execution Context

The security context in which a stored procedure
executes is referred to as its execution context. This
context is used to establish the identity against
which permissions to execute statements or
perform actions are checked.
Execution Contexts
A login token and a user token represent an
execution context. The tokens identify the primary
and secondary principals against which permissions
are checked and the source that is used to
authenticate the token. A login that connects to an
instance of SQL Server has one login token and one
or more user tokens, depending on the number of databases to which the account has access.
User and Login Security Tokens

A security token for a user or login contains the following:
 One server or database principal as the primary identity.

 One or more principals as secondary identities.
 Zero or more authenticators.
 The privileges and permissions of the primary and secondary identities.
Login token: A login token is valid across the instance of SQL Server. It contains the primary and
secondary identities against which server-level permissions and any database-level permissions that are
associated with these identities are checked. The primary identity is the login itself. The secondary identity
includes permissions that are inherited from rules and groups.
User token: A user token is valid only for a specific database. It contains the primary and secondary
identities against which database-level permissions are checked. The primary identity is the database user
itself. The secondary identity includes permissions that are inherited from database roles. User tokens do
not contain server-role memberships and do not honor the server-level permissions that are granted to
the identities in the token including those that are granted to the server-level public role.
Controlling Security Context

Although the default behavior of execution contexts is usually appropriate, there are times when it is
desirable to execute within a different security context.
For example, if you add a WITH EXECUTE AS 'Pat' clause to the definition of a stored procedure, it will
cause the procedure to be executed with 'Pat' as the security context rather than with the default security
context that is supplied by the caller of the stored procedure.
The EXECUTE AS Clause

The EXECUTE AS clause sets the execution context
of modules such as stored procedures. It is useful
when you need to override the default security
context.
Explicit Impersonation
SQL Server supports the ability to impersonate
another principal either explicitly by using the
stand-alone EXECUTE AS statement, or implicitly
by using the EXECUTE AS clause on modules.
You can use the stand-alone EXECUTE AS

statement to impersonate server-level principals, or
logins, by using the EXECUTE AS LOGIN statement. You can also use the stand-alone EXECUTE AS
statement to impersonate database-level principals, or users, by using the EXECUTE AS USER statement.
To execute as another user, you must first have IMPERSONATE permission on that user. Any login in the
sysadmin role has IMPERSONATE permission on all users.
Implicit Impersonation
You can perform implicit impersonations by using the WITH EXECUTE AS clause on modules to
impersonate the specified user or login at the database or server level. This impersonation depends on
whether the module is a database-level module, such as a stored procedure or function, or a server-level
module, such as a server-level trigger.
When you impersonate a principal by using the EXECUTE AS LOGIN statement, or within a server-scoped
module by using the EXECUTE AS clause, the scope of the impersonation is server-wide. This means that,
after the context switch, it is possible to access any resource within the server on which the impersonated
login has permissions.
However, when you impersonate a principal by using the EXECUTE AS USER statement, or within a
database-scoped module by using the EXECUTE AS clause, the scope of impersonation is restricted to the
database by default. This means that references to objects that are outside the scope of the database will
return an error.
Viewing Execution Context

You may want to programmatically query the
current security context details. The sys.login_token
and sys.user_token system views provide these
details.
sys.login_token System View

The sys.login_token system view shows all tokens
that are associated with the login. This includes the
login itself and the roles of which the user is a
member.
sys.user_token System View

The sys.user_token system view shows all tokens
that are associated with the user within the database.
Demonstration: Viewing Execution Context

 View and change the execution context.
Demonstration Steps
View and change the execution context
2. Ensure that you have run the previous demos in this module.


Lab: Designing and Implementing Stored Procedures

Scenario
You need to create a set of stored procedures to support a new reporting application. The procedures will
be created within the Marketing schema.
Objectives
 Create stored procedures.
 Create parameterized stored procedures.
Password: Pa$$w0rd
Exercise 1: Create Stored Procedures

Scenario
In this exercise, you will create a stored procedure to support one of the new reports.
Stored procedure Marketing.GetProductColors
Input parameters None
Output parameters None
Output columns Color (from Production.Product)
Output order Color
Notes Colors should not be returned more than

once in the output. NULL values should not
be returned.
2. Review the Marketing.GetProductColors Stored Procedure Specification
3. Design, Create, and Test the Marketing.GetProductColors Stored Procedure

2. In the D:\Labfiles\Lab08\Starter folder, right-click Setup.cmd, and then click Run as administrator.
3. When you are prompted, click Yes to confirm that you want to run the command file, and then wait
for the script to finish.
 Task 2: Review the Marketing.GetProductColors Stored Procedure Specification

1. Review the supplied design requirements in the supporting documentation in the Exercise Scenario
for Marketing.GetProductColors.
 Task 3: Design, Create, and Test the Marketing.GetProductColors Stored Procedure

1. Design and implement the stored procedure.
Created the GetProductColors stored procedure.
Exercise 2: Create a Parameterized Stored Procedure

Scenario
In this exercise, you will create a stored procedure to support one of the new reports.
Stored procedure Marketing.GetProductsByColor
Input parameters @Color (same datatype as the Color column in the

Production.Product table)
Output columns ProductID, ProductName, ListPrice (returned as a column

named Price), Color, Size and SizeUnitMeasureCode (returned
as a column named UnitOfMeasure) (from
Production.Product)
Output order ProductName
Notes The procedure should return products that have no Color if

the parameter is NULL.
Input Parameters: None
Output Parameters: None
Output Columns: Color (from Production.Product)
Output Order: Color
Notes: Colors should not be returned more than once in the output.
NULL values should not be returned.

1. Review the Marketing.GetProductsByColor Stored Procedure Specification
2. Design, Create, and Test the Marketing.GetProductsByColor Stored Procedure
 Task 1: Review the Marketing.GetProductsByColor Stored Procedure Specification

for Marketing.GetProductsByColor.
 Task 2: Design, Create, and Test the Marketing.GetProductsByColor Stored

Procedure
1. Design and implement the stored procedure.
2. Execute the stored procedure.
Note: Ensure that approximately 26 rows are returned for blue products. Ensure that approximately
248 rows are returned for products that have no color.
Created the GetProductByColor stored procedure.
Question: When do you need the OUTPUT keyword for output parameters when you are
working with stored procedures?

Best Practice: Use the EXECUTE AS clause to override the execution context of stored
procedures that use dynamic SQL, rather than granting permissions on the underlying tables to
users.
Design procedures to perform individual tasks. Avoid designing procedures that perform a large
number of tasks, unless those tasks are performed by executing other stored procedures.
Keep consistent ownership of stored procedures, views, tables, and other objects within
databases.
Review Question(s)
Question: What happens to the WITH RECOMPILE option when you use it with a CREATE
PROC statement?
Question: What happens to the WITH RECOMPILE option when you use it with an
EXECUTE statement?
9-1
Module 9
Designing and Implementing User-Defined Functions
Contents:
Module Overview 9-1
Lesson 1: Overview of Functions 9-2
Lesson 2: Designing and Implementing Scalar Functions 9-4
Lesson 3: Designing and Implementing Table-Valued Functions 9-8
Lesson 4: Considerations for Implementing Functions 9-11
Lesson 5: Alternatives to Functions 9-15
Lab: Designing and Implementing User-Defined Functions 9-17

Module Overview
Functions are routines that are used to encapsulate frequently performed logic. Rather than having to
repeat all of the function logic, any code that must perform the logic can call the function.
In this module, you will learn to design and implement user-defined functions (UDFs) that enforce
business rules or data consistency, and to modify and maintain existing functions that other developers
have written.
Objectives
 Describe different types of functions.
 Design and implement scalar functions.
 Design and implement table-valued functions.
 Describe considerations for implementing functions.
 Describe alternatives to functions.

9-2 Designing and Implementing User-Defined Functions
Lesson 1
Overview of Functions
Functions are routines that consist of one or more Transact-SQL statements that you can use to
encapsulate code for reuse. A function takes zero or more input parameters and returns either a scalar
value or a table. Functions do not support output parameters, but do return results, either as a single
value or a table.
This lesson provides an overview of functions and describes system functions.
Lesson Objectives
 Describe different types of functions.
 Use system functions.
Types of Functions
Most high-level programming languages offer
functions as blocks of code that are called by name
and can process input parameters. Microsoft® SQL
Server® data management software offers three
types of functions: scalar functions, table-valued
functions (TVFs), and system functions. You can
create two types of TVFs: inline TVFs and
multistatement TVFs.
Scalar Functions
Scalar functions return a single data value of the
type that is defined in a RETURNS clause. An
example of a scalar function would be a function
that extracts the protocol from a URL. From the string “http://www.microsoft.com”, the function would
return the string “http”.
Inline Table-Valued Functions

An inline TVF returns a table that is the result of a single SELECT statement. This is similar to a view, but
an inline TVF is more flexible in that parameters can be passed to the SELECT statement within the
function.
For example, if a table holds details of sales for an entire country, you could create individual views to
return details of sales for particular states within the country. You could write an inline TVF that takes the
state code or ID as a parameter and returns all of the details of sales for the state that match the
parameter. In this way, you would only need a single function to provide details for all states, rather than
separate views for each state.
Multistatement Table-Valued Functions

A multistatement TVF returns a table that one or more Transact-SQL statements built. It is similar to a
stored procedure. Multistatement TVFs are created for the same reasons as inline TVFs, but are used when
the logic that the function needs to implement is too complex to be expressed in a single SELECT
statement. You can call them from within a FROM clause.
System Functions
System functions are built-in functions that SQL Server provides to help you perform a variety of
operations. You cannot modify them. System functions are described in the next topic.
System Functions
SQL Server has a wide variety of built-in functions
that you can use in queries to return data or to
perform operations on data.
Most of the functions are scalar functions. They

provide the functionality that is commonly provided
by functions in other high-level languages, such as
operations on data types (including strings and
dates and times) and conversions between data
types.
SQL Server provides a library of mathematical and

cryptographic functions. Other functions provide
details of the configuration of the system and its
security.
Aggregates such as MIN, MAX, AVG, SUM, and COUNT perform calculations across groups of rows. Many
of these functions automatically ignore NULL rows.
Ranking functions such as ROW_NUMBER, RANK, DENSE RANK, and NTILE perform windowing
operations on rows of data.
Lesson 2
Designing and Implementing Scalar Functions
You have seen that functions are routines that consist of one or more Transact-SQL statements that you
can use to encapsulate code for reuse, and that functions can take zero or more input parameters and
return either scalar values or tables.
This lesson provides an overview of scalar functions and explains why and how you use them, in addition
to explaining the syntax for creating them.
Lesson Objectives
 Describe a scalar function.
 Create scalar functions.
 Explain deterministic and nondeterministic functions.
What Is a Scalar Function?

You use scalar functions to return information from
a database. A scalar function returns a single data
value of the type that is defined in a RETURNS
clause.
Scalar Functions
Unlike the definition of a stored procedure, where it
is optional to use a BEGIN…END block that wraps
the body of the stored procedure, the body of a
function must be defined in a BEGIN…END block.
The function body contains the series of Transact-
SQL statements that return the value.
For example, consider the function definition in the following code example.
CREATE FUNCTION
CREATE FUNCTION dbo.ExtractProtocolFromURL
( @URL nvarchar(1000))
RETURNS nvarchar(1000)
AS BEGIN
RETURN CASE WHEN CHARINDEX(N':',@URL,1) >= 1
THEN SUBSTRING(@URL,1,CHARINDEX(N':',@URL,1) - 1)
END;
END;
Note that the body of the function consists of a single RETURN statement that is wrapped in a
BEGIN…END block.
You can use the function in the following code example as an expression wherever a single value could be
used.
Using a Function as an Expression

SELECT dbo.ExtractProtocolFromURL(N'http://www.microsoft.com');
IF (dbo.ExtractProtocolFromURL(@URL) = N'http')
...
You can also implement scalar functions in managed code. Managed code will be discussed in a Module
12, Implementing Managed Code in SQL Server. The allowable return values for scalar functions differ
between functions that are defined in Transact-SQL and functions that are defined by using managed
code.
Creating Scalar Functions

User-defined functions are created by using the
CREATE FUNCTION statement, modified by using
the ALTER FUNCTION statement, and removed by
using the DROP FUNCTION statement. Even
though you must wrap the body of the function
(apart from inline functions) in a BEGIN…END block,
CREATE FUNCTION must be the only statement in
the batch.
Scalar UDFs
You use scalar functions to return information from
a database. A scalar function returns a single data
value of the type that is defined in a RETURNS
clause. The body of the function, which is defined in a BEGIN…END block, contains the series of Transact-
SQL statements that return the value.
Guidelines
Consider the following guidelines when you create scalar UDFs:
 Make sure that you use two-part naming for the function and for all database objects that the
function references.
 Avoid Transact-SQL errors that lead to a statement being canceled and the process continuing with
the next statement in the module (such as within triggers or stored procedures) because they are
treated differently inside a function. In functions, such errors cause the execution of the function to
stop.
Side-Effects
A function that modifies the underlying database is considered to have “side-effects.” In SQL Server,
functions are not permitted to have side-effects. You cannot change data in a database within a function,
you may not call a stored procedure, and you may not execute dynamic Structured Query Language (SQL)
code.
Deterministic and Nondeterministic Functions

Both built-in functions and UDFs fall into one of
two categories: deterministic and nondeterministic.
This distinction is important because it determines
where you can use a function. For example, you
cannot use a nondeterministic function in the
definition of a calculated column.
Deterministic Functions
A deterministic function is one that will always
return the same result when it is provided with the
same set of input values for the same database
state.
Consider the function definition in the following code example:
Deterministic Function
CREATE FUNCTION dbo.AddInteger
(@FirstValue int, @SecondValue int)
RETURNS int
AS BEGIN
RETURN @FirstValue + @SecondValue;
END;
GO
Every time the function is called with the same two integer values, it will return exactly the same result.
Nondeterministic Functions
A nondeterministic function is one that may return different results for the same set of input values each
time it is called, even if the database remains in the same state.
Consider the function in the following code example:
Nondeterministic Function
CREATE FUNCTION dbo.CurrentUTCTimeAsString()
RETURNS varchar(40)
AS BEGIN
RETURN CONVERT(varchar(40),SYSUTCDATETIME(),100);
END;
Each time the function is called, it will return a different value, even though no input parameters are
supplied.
You can use the OBJECTPROPERTY() function to determine if a UDF is deterministic.

Demonstration: Working with Scalar Functions

 Work with scalar functions.
Demonstration Steps
Work with scalar functions
Lesson 3
Designing and Implementing Table-Valued Functions
In this lesson, you will learn how to work with functions that return tables instead of single values. There
are two types of TVFs: inline and multistatement. Both types of TVF will be covered in this lesson.
The ability to return a table of data is important because it enables a function to be used as a source of
rows in place of a table in a Transact-SQL statement. In many cases, this can avoid the need to create
temporary tables.
Lesson Objectives
 Describe TVFs.
 Describe inline TVFs.
 Describe multistatement TVFs.
What Are Table-Valued Functions?

Unlike scalar functions, TVFs return a table that can
contain many rows of data, each with many
columns.
Table-Valued Functions
There are two ways to create TVFs. Inline TVFs
return an output table that is defined by a RETURN
statement that consists of a single SELECT
statement. If the logic of the function is too
complex to include in a single SELECT statement,
you need to implement the function as a
multistatement TVF.
Multistatement TVFs construct a table within the body of the function and then return the table. They also
need to define the schema of the table to be returned.
You can use both types of TVF as the equivalent of parameterized views.
Inline Table-Valued Functions

You can use inline functions to achieve the
functionality of parameterized views. One of the
limitations of a view is that you cannot include a
user-provided parameter within the view when you
create it.
In the code example on the slide, note that the

return type is TABLE. The definition of the columns
of the table is not shown. You do not explicitly
define the schema of the returned table. The output
table schema is derived from the SELECT statement
that you provide within the RETURN statement.
Every column that the SELECT statement returns
should also have a distinct name.
For inline functions, the body of the function is not enclosed in a BEGIN…END block. A syntax error occurs
if you attempt to use this block. The CREATE FUNCTION statement still needs to be the only statement in
the batch.
Multistatement Table-Valued Functions

A multistatement TVF enables more complexity in
how the table to be returned is constructed. You
can use UDFs that return a table to replace views.
This is very useful when the logic that is required for
constructing the return table is more complex than
would be possible within the definition of a view.
A TVF (like a stored procedure) can use complex
logic and multiple Transact-SQL statements to build
a table.
In the example on the slide, a function is created
that returns a table of dates. For each row, two
columns are returned: the position of the date
within the range of dates, and the calculated date. The system does not already include a table of dates,
so a loop needs to be constructed to calculate the required range of dates. You cannot implement this in
a single SELECT statement unless another object, such as a table of numbers, is already present in the
database. In each iteration of the loop, an INSERT operation is performed in the table that is later
returned.
In the same way that you use a view, you can use a TVF in the FROM clause of a Transact-SQL statement.
Demonstration: Implementing Table-Valued Functions

 Implement TVFs.
Demonstration Steps


Lesson 4
Considerations for Implementing Functions
Although the ability to create functions in Transact-SQL is very important, you need to bear in mind some
key considerations when you are creating functions. In particular, it is important to avoid negative
performance impacts through inappropriate use of functions. Performance problems due to such
inappropriate usage are very common. This lesson provides guidelines for the implementation of
functions and describes how to control their security context.
Lesson Objectives
 Describe the performance impacts of scalar functions.
 Describe the performance impacts of table-valued functions.
 Control the execution context.

 Use the EXECUTE AS clause.
 Explain some guidelines for creating functions.
Performance Impacts of Scalar Functions

The code for views is incorporated directly into the
code for the query that accesses the view. This is
not the case for scalar functions.
Common Performance Problems

The overuse of scalar functions is a common cause
of performance problems in SQL Server systems. For
example, a WHERE clause predicate that calls a
scalar function calls that function for every target
row. In many cases, extracting the code from the
function definition and incorporating it directly into
the query will resolve the performance issue. You
will see an example of this in the next lab.
Performance Impacts of Table-Valued Functions

The code for a TVF may or may not be incorporated
into the query that uses the function depending on
what type of TVF it is. Inline TVFs are directly
incorporated into the code of the query that uses
them.
Common Performance Problems

Multistatement TVFs are not incorporated into the
code of the query that uses them. The
inappropriate usage of such TVFs is a common
cause of performance issues in SQL Server.
You can use the CROSS APPLY operator to call a TVF for each row in the table on the left within the
query. Designs that require the calling of a TVF for every row in a table can lead to significant
performance overhead. You should examine the design to see if there is a way to avoid the need to call
the function for each row.
Controlling the Execution Context

Execution context establishes the identity against
which permissions are checked. The user or login
that is connected to the session, or calling a module
(such as a stored procedure or function),
determines the execution context.
When you use the EXECUTE AS clause to change
the execution context so that a code module
executes as a user other than the caller, the code is
said to “impersonate” the alternative user.
Before you can create a function that executes as

another user, you need to have IMPERSONATE
permission on that user, or be part of the dbo role.
The EXECUTE AS Clause

The EXECUTE AS clause sets the execution context
of a session. You can use the EXECUTE AS clause in
a stored procedure or function to set the identity
that is used as the execution context for the stored
procedure or function.
EXECUTE AS enables you to create procedures that

execute code that the user who is executing the
procedure is not permitted to execute. In this way,
you do not need to be concerned about broken
ownership chains or dynamic SQL execution.
SQL Server supports the ability to impersonate another principal either explicitly by using the stand-alone
EXECUTE AS statement, or implicitly by using the EXECUTE AS clause on modules. You can use the stand-
alone EXECUTE AS statement to impersonate server-level principals, or logins, by using the EXECUTE AS
LOGIN statement. You can also use the stand-alone EXECUTE AS statement to impersonate database-
level principals, or users, by using the EXECUTE AS USER statement.
Implicit impersonations that are performed through the EXECUTE AS clause on modules impersonate the
specified user or login at the database or server level. This impersonation depends on whether the module
is a database-level module, such as a stored procedure or function, or a server-level module, such as a
server-level trigger.
When you are impersonating a principal by using the EXECUTE AS LOGIN statement, or within a server-
scoped module by using the EXECUTE AS clause, the scope of the impersonation is server-wide. This
means that, after the context switch, it is possible to access any resource within the server on which the
impersonated login has permissions.
However, when you are impersonating a principal by using the EXECUTE AS USER statement, or within a
database-scoped module by using the EXECUTE AS clause, the scope of impersonation is restricted to the
database by default. This means that references to objects that are outside the scope of the database will
return an error.
Guidelines for Creating Functions

Consider the following guidelines when you create
user-defined functions:
 In many cases, the performance of inline

functions is much higher than the performance
of multistatement functions. Wherever possible,
try to implement functions as inline functions.
 Avoid building large, general-purpose

functions. Keep functions relatively small and
targeted at a specific purpose. This will avoid
code complexity, but will also increase the
opportunities for reusing the functions.
 Use two-part naming to qualify the name of any database objects that are referred to within the
function and also use two-part naming when you are choosing the name of the function.
 Consider the impact of using functions in combination with indexes. In particular, note that a WHERE
clause that uses a predicate, such as the following code example, is likely to remove the usefulness of
an index on CustomerID.
For example, consider the function definition in the following code example:
Functions with Indexes

WHERE Function(CustomerID) = Value
Avoid statements that will raise Transact-SQL errors because exception handling is not permitted within
functions.
Demonstration: Controlling the Execution Context

 Alter the execution context of a function.
Demonstration Steps
Alter the execution context of a function
9. Close SQL Server Management Studio without saving any changes

Lesson 5
Alternatives to Functions
Functions are only one option for implementing code. This lesson explores situations where other
solutions may be appropriate and helps you to make decisions about which solution to use.
Lesson Objectives
 Compare table-valued functions and stored procedures.
 Compare table-valued functions and views.
Comparing Table-Valued Functions and Stored Procedures

You can often use TVFs and stored procedures to
achieve similar outcomes. However, not all client
applications can call both. This means that you
cannot necessarily use them interchangeably. Each
approach also has its pros and cons.
Although it is possible to access the output rows of
a stored procedure by using an INSERT EXEC
statement, it is easier to consume the output of a
function in code than the output of a stored
procedure.
For example, you cannot execute the following

code:
Cannot Select from a Stored Procedure

SELECT * FROM (EXEC dbo.GetCriticalPathNodes);
You could assign the output of a function to a variable in code.

Stored procedures can modify data in database tables. Functions cannot modify data in database tables.
Functions that include such “side-effects” are not permitted. Functions can have significant performance
impacts when they are called for each row in a query, such as when a TVF is called by using a CROSS
APPLY or OUTER APPLY statement.
Stored procedures can execute dynamic SQL statements. Functions are not permitted to execute dynamic
SQL statements.
Stored procedures can include detailed exception handling. Functions cannot contain exception handling.
Stored procedures can return multiple resultsets from a single stored procedure call. TVFs can return a
single rowset from a function call. There is no mechanism to permit the return of multiple rowsets from a
single function call.
Comparing Table-Valued Functions and Views

TVFs can provide similar outcomes to views.
Views, and TVFs that do not contain parameters,

can usually be consumed by most client
applications that can access tables. Not all such
applications can pass parameters to a TVF.
It is possible to update views and inline TVFs. This is

not the case for multistatement TVFs.
Views can have INSTEAD OF triggers associated

with them. This is mostly used to provide for
updatable views based on multiple base tables.
Views and inline TVFs are incorporated into

surrounding queries. Multistatement TVFs are not incorporated into surrounding queries and often lead
to performance issues when they are used inappropriately.
Lab: Designing and Implementing User-Defined Functions

Scenario
The existing Marketing application includes some functions. Your manager has requested your assistance
in creating a new function for formatting phone numbers. She also needs you to modify an existing
function to improve its usability.
Objectives
 Create a function.
 Modify an existing function.
Password: Pa$$w0rd
Exercise 1: Format Phone Numbers

Scenario
Your manager has noticed that different users tend to format phone numbers that are entered into the
database in different ways. She has asked you to create a function that will be used to format the phone
numbers. You need to design, implement, and test the function.
1. Review the design requirements.
2. Design and create the function.

3. Test the function.
3. Design and Create the Function
4. Test the Function


1. Navigate to D:\Labfiles\Lab09\Starter, and then open Supporting Documentation.docx.
2. Review the Function Specifications: Phone Number section in the supporting documentation.
 Task 3: Design and Create the Function

1. Design and create the function for reformatting phone numbers.
 Task 4: Test the Function

1. Execute the FormatPhoneNumber function to ensure that the function correctly formats the phone
number.
Results: After this exercise, you should have created a new FormatPhoneNumber function within the
dbo schema.
Exercise 2: Modify an Existing Function

Scenario
An existing function, dbo.StringListToTable, takes a comma-delimited list of strings and returns a table.
In some application code, this causes issues with data types because the list often contains integers rather
than just strings.
1. Review the requirements.
2. Design and create the function.

3. Test the function.
4. Test the function by using an alternate delimiter such as the pipe character (|).

1. Review the requirements
2. Design and Create the Function
3. Test the Function
4. Test the Function by Using an Alternate Delimiter
 Task 1: Review the requirements

2. Review the requirement for the dbo.IntegerListToTable function in the supporting documentation.

1. Design and create the dbo.IntegerListToTable function.

1. Execute the dbo.IntegerListToTable function to ensure that it returns the correct results.
 Task 4: Test the Function by Using an Alternate Delimiter

1. Test the dbo.IntegerListToTable function, and then pass in an alternate delimiter such as the pipe
character (|).
Results: After this exercise, you should have created a new IntegerListToTable function within a dbo
schema.

Best Practice: Avoid calling multistatement TVFs for each row of a query. In many cases,
you can dramatically improve performance by extracting the code from the query into the
surrounding query.
Use the WITH EXECUTE AS clause to override the security context of code that needs to perform
actions that the user who is executing the code does not have.
Review Question(s)
Question: When you are using the EXECUTE AS clause, what privileges should you grant to
the login or user that is being impersonated?
Question: When you are using the EXECUTE AS clause, what privileges should you grant to
the login or user that is creating the code?
10-1
Module 10
Responding to Data Manipulation via Triggers
Contents:
Lesson 1: Designing DML Triggers 10-2
Lesson 2: Implementing DML Triggers 10-7
Lesson 3: Advanced Trigger Concepts 10-11
Lab: Responding to Data Manipulation by Using Triggers 10-17
Module Overview
Data manipulation language (DML) triggers are a powerful tool that enables you to enforce domain,
entity, and referential data integrity and business logic. The enforcement of integrity helps you to build
reliable applications. In this module, you will learn what DML triggers are and how they enforce data
integrity, the different types of trigger that are available to you, and how to define triggers in your
database.
Objectives
 Design DML triggers.

 Implement DML triggers.
 Explain advanced DML trigger concepts.

10-2 Responding to Data Manipulation via Triggers
Lesson 1
Designing DML Triggers
Before you begin to create DML triggers, you need to become familiar with how they should be designed,
so that you can avoid making common design errors. Several types of DML trigger are available. It is
important to know what they do, how they work, and how they differ from data definition language (DDL)
triggers. DML triggers need to be able to work with both the previous state of the database and its
changed state. You will see how the inserted and deleted virtual tables provide that capability. DML
triggers are often added after applications are built, so you need to make sure that adding a trigger does
not cause errors in the applications that were designed without them being in place. The SET NOCOUNT
ON command helps to avoid the side-effects of triggers.
Lesson Objectives
 Describe DML triggers.
 Explain how AFTER triggers differ from INSTEAD OF triggers and where you should use each of them.
 Access both the prior and final states of the database data by using the inserted and deleted virtual
tables.
 Avoid affecting existing applications by using SET NOCOUNT ON.
 Describe performance-related considerations for triggers.
What Are DML Triggers?

A DML trigger is a special kind of stored procedure
that executes when an INSERT, UPDATE, or
DELETE statement modifies the data in a specified
table or view. This includes any INSERT, UPDATE,
or DELETE statement that forms part of a MERGE
statement. A trigger can query other tables and can
include complex Transact-SQL statements.
DDL triggers are similar to DML triggers, but DDL

triggers fire when DDL events occur. DDL events
occur for most CREATE, ALTER, or DROP
statements in the Transact-SQL language.
Logon triggers are a special form of trigger that fire

when a new session is established. There is no concept of a Logoff trigger at present.
Trigger Operation
The trigger and the statement that fires it are treated as a single operation, which you can roll back from
within the trigger. By rolling back an operation, you can undo the effect of a Transact-SQL statement if
the logic in your triggers decides that the statement should not have been executed. If the statement is
part of another transaction, that outer transaction is also rolled back.
Triggers can cascade changes through related tables in the database; however, in many cases, you can
execute these changes more efficiently by using cascading referential integrity constraints.
Complex Logic and Meaningful Error Messages

Triggers can guard against malicious or incorrect INSERT, UPDATE, and DELETE operations and enforce
other restrictions that are more complex than those that are defined by using CHECK constraints. For
example, a trigger could check referential integrity for one column, only when another column holds a
specific value.
Unlike CHECK constraints, triggers can reference columns in other tables. For example, a trigger can use a
SELECT statement from another table to compare to the inserted or updated data and to perform
additional actions, such as modifying the data or displaying a user-defined error message.
Triggers can evaluate the state of a table before and after a data modification and take actions based on
that difference. For example, you may want to check that the balance of a customer’s account does not
change by more than a certain amount if the person processing the change is not a manager.
Triggers also enable the use of custom error messages for when constraint violations occur. This could
make the messages that are passed to end users more meaningful.
Multiple Triggers
Multiple triggers of the same type (INSERT, UPDATE, or DELETE) on a table enable multiple different
actions to occur in response to the same modification statement. You might create multiple triggers to
separate the logic that each performs, but note that you do not have complete control over the order in
which they fire. You can only specify which trigger should fire first and which should fire last.
AFTER Triggers vs. INSTEAD OF Triggers

There are two types of DML trigger: AFTER triggers
and INSTEAD OF triggers. The main difference
between them relates to when they fire. You can
implement both types of DML trigger in either
Transact-SQL or managed code. In this module, you
will explore how they are designed and
implemented by using Transact-SQL.
Even if an UPDATE statement (or other data

modification statement) modifies many rows, the
trigger only fires a single time. For that reason, you
need to design triggers to handle multiple rows.
This design differs from other database engines
where triggers are written to target single rows and are called multiple times when a statement affects
multiple rows.
AFTER Triggers
AFTER triggers fire after the data modifications that are part of the event to which they relate complete.
This means that an INSERT, UPDATE, or DELETE statement executes and modifies the data in the
database. After that modification has completed, AFTER triggers that are associated with that event fire,
but still within the same operation that triggered them.
Common reasons for implementing AFTER triggers are:
 Providing auditing of the changes that were made.
 Implementing complex rules involving the relationship between tables.
 Implementing default values or calculated values within rows.

In many cases, you can replace trigger-based code with other forms of code. For example, Microsoft®
SQL Server® data management software might provide auditing. Relationships between tables are more
typically implemented by using foreign key constraints. Default values and calculated values are typically
implemented by using DEFAULT constraints and persisted calculated columns. However, in some
situations, the complexity of the logic that is required will make triggers a good solution.
If the trigger executes a ROLLBACK statement, the data modification statement with which it is associated
will be rolled back. If that statement was part of a larger transaction, that outer transaction would be
rolled back, too.
INSTEAD OF Triggers
An INSTEAD OF trigger is a special type of trigger that executes alternate code instead of executing the
statement from which it was fired.
When you use an INSTEAD OF trigger, only the code in the trigger is executed. The original INSERT,
UPDATE, or DELETE operation that caused the trigger to fire does not occur.
A common use case for INSTEAD OF triggers is to enable views that are based on multiple base tables to
be updatable.
Inserted and Deleted Virtual Tables

When you are designing a trigger, you need to be
able to make decisions based on what changes
have been made to the data. To make effective
decisions, you need access to details of both the
unmodified and modified versions of the data. DML
triggers provide this through a pair of virtual tables
called inserted and deleted. These virtual tables are
often then joined to the modified table data as part
of the logic within the trigger.
INSERT, UPDATE, and DELETE Operations

After an INSERT operation, the inserted virtual
table holds details of the rows that have just been
inserted. The underlying table also contains those rows.
After an UPDATE operation, the inserted virtual table holds details of the modified versions of the rows.
The underlying table also contains those rows in the modified form.
After a DELETE operation, the deleted virtual table holds details of the rows that have just been deleted.
The underlying table no longer contains those rows.
After an UPDATE operation, the deleted virtual table holds details of the rows from before the
modification was made. The underlying table holds the modified versions.
INSTEAD OF Triggers
When you attempt an INSERT, UPDATE, or DELETE statement and an INSTEAD OF trigger is associated
with the event on the table, the inserted and deleted virtual tables hold details of the modifications that
need to be made, but have not happened yet.
Scope of Inserted and Deleted

The inserted and deleted virtual tables are only available during the execution of the trigger code and are
scoped directly to the trigger code. This means that, if the trigger code were to execute a stored
procedure, that stored procedure would not have access to the inserted and deleted virtual tables.
SET NOCOUNT ON
When you are adding a trigger to a table, you need
to avoid breaking any existing applications that are
accessing the table unless the intended purpose of
the trigger is to prevent misbehaving applications
from making inappropriate data changes.
It is common for application programs to issue data
modification statements and to check the returned
count of the number of rows that are affected.
This process is often performed as part of an
optimistic concurrency check. For example, consider
the following code example:
UPDATE Statement
UPDATE Customer
SET Customer.FullName = @NewName,
Customer.Address = @NewAddress
WHERE Customer.CustomerID = @CustomerID
AND Customer.Concurrency = @Concurrency;
In this case, the Concurrency column is a rowversion data type column. The application was designed so
that the update only occurs if the Concurrency column has not been altered. Using rowversion columns,
every modification to the row causes a change in the rowversion column.
When the application intends to modify a single row, it issues an UPDATE statement for that row. The
application then checks the count of updated rows that SQL Server returns. When the application sees
that only a single row has been modified, the application knows that only the row that it intended to
change was affected. It also knows that no other user had modified the row since the application read the
data.
A common mistake when you are adding triggers is that if the trigger also causes row modifications (for
example, writes an audit row into an audit table), that count is returned in addition to the expected count.
You can avoid this situation by using the SET NOCOUNT ON statement. Most triggers should include this
statement.
Returning Rowsets
Although it is possible to include a SELECT statement within a trigger and for it to return rows, the
creation of this type of side-effect is discouraged. The ability to do this is now deprecated and should not
be used in new development work. There is a configuration setting, ‘disallow results from triggers’, which,
when it is set to 1, disallows this capability.
Considerations for Triggers

In general, it is preferable to use constraints rather
than triggers for performance reasons. Triggers are
also complex to debug because the actions that
they perform are not visible directly in the code
that causes them to fire. Triggers also increase how
long data modification transactions take because
they add extra steps that SQL Server needs to
process during these operations. You should design
triggers to be as short as possible and to be specific
to a given task, rather than being designed to
perform a large number of tasks within a single
trigger.
Note that you can disable and reenable triggers by using the ALTER TRIGGER statement.
Constraints vs. Triggers

When an AFTER trigger decides to disallow a data modification, it does so by executing a ROLLBACK
statement. The ROLLBACK statement undoes all of the work that the original statement performed.
However, you can achieve higher performance by avoiding the data modification ever occurring.
Constraints are checked before any data modification is attempted, so they often provide much higher
performance than is possible with triggers, particularly in ROLLBACK situations. You can use constraints
when the checks that you need to perform are relatively simple. Triggers make it possible to check
complex logic.
Rowversions and tempdb

Since SQL Server 2005, trigger performance has been improved when compared to earlier versions. In
earlier versions of SQL Server, the inserted and deleted virtual tables were constructed from entries in the
transaction log. The data in these tables needed to be reconstructed when it was required. From SQL
Server 2005 onward, a special rowversion table has been provided in the tempdb database. This special
table holds copies of the data in the inserted and deleted virtual tables for the duration of the trigger. This
design has improved the performance of triggers, but means that excessive usage of triggers could cause
performance issues within the tempdb database.
Lesson 2
Implementing DML Triggers
The first lesson provided information about designing DML triggers. You now need to consider how to
implement the designs that have been created.
Lesson Objectives
 Implement AFTER INSERT triggers.
 Implement AFTER DELETE triggers.
 Implement AFTER UPDATE triggers.
AFTER INSERT Triggers

An AFTER INSERT trigger is a trigger that executes
whenever an INSERT statement enters data into a
table or view on which the trigger is configured.
The action of the INSERT statement is completed
before the trigger fires, but the trigger action is
logically part of the INSERT operation.
AFTER INSERT Trigger Actions

When an AFTER INSERT trigger fires, new rows are
added to both the base table and the inserted
virtual table. The inserted virtual table holds a copy
of the rows that have been inserted into the base
table.
The trigger can examine the inserted virtual table to determine what to do in response to the
modification.
Multirow Inserts
In the code example on the slide, insertions for the Sales.Opportunity table are being audited to a table
called Sales.OpportunityAudit. Note that the trigger processes all inserted rows at the same time. A
common error when designing AFTER INSERT triggers is to write them with the assumption that only a
single row is being inserted.
Demonstration: Working with AFTER INSERT Triggers

 Create an AFTER INSERT trigger.
Demonstration Steps
Create an AFTER INSERT trigger
6. Open the Queries folder.
AFTER DELETE Triggers

An AFTER DELETE trigger is a trigger that executes
whenever a DELETE statement removes data from a
The action of the DELETE statement is completed
before the trigger fires, but logically within the
operation of the statement that fired the trigger.
AFTER DELETE Trigger Actions

When an AFTER DELETE trigger fires, rows are
removed from the base table and added to the
deleted virtual table. The deleted virtual table holds
a copy of the rows that have been deleted from the
base table.
The trigger can examine the deleted virtual table to determine what to do in response to the
modification.
Multirow Deletes
In the code example on the slide, rows in the Product.Product table are being flagged as discontinued if
the product category row with which they are associated in the Product.Category table is deleted. Note
that the trigger processes all deleted rows at the same time. A common error when designing AFTER
DELETE triggers is to write them with the assumption that only a single row is being deleted.
TRUNCATE TABLE
When rows are deleted from a table by using a DELETE statement, any AFTER DELETE triggers are fired
when the deletion is completed. TRUNCATE TABLE is an administrative option that removes all rows
from a table. It needs additional permissions above those required for deleting rows. It does not fire any
AFTER DELETE triggers that are associated with the table.
Demonstration: Working with AFTER DELETE Triggers

 Create and test AFTER DELETE triggers.
Demonstration Steps
Create and test AFTER DELETE triggers
2. If you have not completed the previous demonstration in this module, then run
D:\Demofiles\Mod10\Setup.cmd as an administrator to revert any changes
AFTER UPDATE Triggers

An AFTER UPDATE trigger is a trigger that executes
whenever an UPDATE statement modifies data in a
The action of the UPDATE statement is completed
before the trigger fires.
AFTER UPDATE Trigger Actions

When an AFTER UPDATE trigger fires, update
actions are treated as a set of deletions of how the
rows were and insertions of how the rows are now.
Rows that are to be modified in the base table are
copied to the deleted virtual table and the updated
versions of the rows are copied to the inserted
virtual table. The inserted virtual table holds a copy of the rows in their modified state, the same as how
the rows appear now in the base table.
The trigger can examine both the inserted and deleted virtual tables to determine what to do in response
to the modification.
Multirow Updates
In the code example on the slide, the Product.ProductReview table contains a column called
ModifiedDate. The trigger is being used to ensure that when changes are made to the
Product.ProductReview table, the value in the ModifiedDate column always reflects when any changes
last happened. Note that the trigger processes all updated rows at the same time. A common error when
designing AFTER UPDATE triggers is to write them with the assumption that only a single row is being
updated.
Demonstration: Working with AFTER UPDATE Triggers

 Create and test AFTER UPDATE triggers.
Demonstration Steps
Create and test AFTER UPDATE triggers
2. If you have not completed the previous demonstrations in this module, then run

7. Open the 23 - Demonstration 2C.sql script file.
Lesson 3
Advanced Trigger Concepts
In the previous two lessons, you have learned to design and implement DML AFTER triggers. However, to
make effective use of these triggers, you need to understand some additional areas of complexity that are
related to them. You also need to understand where to use triggers and where to consider alternatives to
triggers.
Lesson Objectives
 Implement DML INSTEAD OF triggers.
 Explain how nested triggers work and how configurations might affect their operation.
 Explain additional considerations for recursive triggers.
 Use the UPDATE function to build logic based on the columns being updated.
 Describe the limited control that you can exert over the order in which triggers fire when multiple
triggers are defined for the same event on the same object.
 Explain the alternatives to using triggers.
INSTEAD OF Triggers
INSTEAD OF triggers cause the execution of
alternate code instead of executing the statement
that caused them to fire.
INSTEAD OF Triggers vs. BEFORE Triggers

Some other database engines provide BEFORE
triggers. In those databases, the action in the
BEFORE trigger happens before the data
modification statement that also occurs. INSTEAD
OF triggers in SQL Server are different from the
BEFORE triggers that you may have encountered in
other database engines. Using an INSTEAD OF
trigger as it is implemented in SQL Server, only the
code in the trigger is executed. The original operation that caused the trigger to fire is not executed.
Updatable Views
A very common use case for INSTEAD OF triggers is to enable views that are based on multiple base
tables to be updatable. You can define INSTEAD OF triggers on views that have one or more base tables,
where they can extend the types of updates that a view can support.
This trigger executes instead of the original triggering action. INSTEAD OF triggers increase the variety of
types of updates that you can perform against a view. Each table or view is limited to one INSTEAD OF
trigger for each triggering action (INSERT, UPDATE, or DELETE).
You can specify an INSTEAD OF trigger on both tables and views. You cannot create an INSTEAD OF
trigger on views that have the WITH CHECK OPTION clause defined. You can perform operations on the
base tables within the trigger. This avoids the trigger being called again. For example, you could perform
a set of checks before inserting data and then perform the insert on the base table.
Demonstration: Working with INSTEAD OF Triggers

 Create and test an INSTEAD OF DELETE trigger.
Demonstration Steps
Create and test an INSTEAD OF DELETE trigger

How Nested Triggers Work

Triggers can contain UPDATE, INSERT, or DELETE
statements. When these statements on one table
cause triggers on another table to fire, the triggers
are considered to be nested.
Triggers are often used for auditing purposes.

Nested triggers are essential for full auditing to
occur. Otherwise, actions would occur on tables
without being audited.
It is possible to control whether nested trigger

actions are permitted. By default, these actions are
permitted by using a configuration option at the
server level. You can also detect the current nesting
level by querying @@nestlevel.
A failure at any level of a set of nested triggers cancels the entire original statement, and all data
modifications are rolled back.
A nested trigger will not fire twice in the same trigger transaction; a trigger does not call itself in response
to a second update to the same table within the trigger.
Complexity of Debugging
It was mentioned in an earlier lesson that debugging triggers can be difficult. Nested triggers are
particularly difficult to debug. One common method that is used during debugging is to include PRINT
statements within the body of the trigger code so that you can determine where a failure occurred.
However, it is important that these statements are only used during debugging phases.
Considerations for Recursive Triggers

A recursive trigger is a trigger that performs an
action that causes the same trigger to fire again
either directly or indirectly. Any trigger can contain
an UPDATE, INSERT, or DELETE statement that
affects the same table or another table. By enabling
the recursive trigger option on a database, a trigger
that changes data in a table can activate itself
again, in a recursive execution.
Direct Recursion
Direct recursion occurs when a trigger fires and
performs an action on the same table that causes
the same trigger to fire again. For example, an
application updates table T1, which causes trigger Trig1 to fire. Trigger Trig1 updates table T1 again,
which causes trigger Trig1 to fire again.
Indirect Recursion
Indirect recursion occurs when a trigger fires and performs an action that causes another trigger to fire on
a different table, which subsequently causes an update to occur on the original table, which then causes
the original trigger to fire again. For example, an application updates table T2, which causes trigger Trig2
to fire. Trig2 updates table T3, which causes trigger Trig3 to fire. Trigger Trig3 in turn updates table T2,
which causes trigger Trig2 to fire again.
To prevent indirect recursion of this sort, turn off the nested triggers option at the server instance level.
UPDATE Function
It is a common requirement to build logic that only
takes action if particular columns are being
updated.
Be careful not to confuse the UPDATE function

with the UPDATE statement. The UPDATE function
enables you to detect whether a particular column
is being updated in the action of an UPDATE
statement. For example, you might want to take a
particular action only when the size of a product
changes. The column is referenced by the name of
the column.
Change of Value
Note that the UPDATE function does not indicate if the value is actually changing. It only indicates if the
column is part of the list of columns in the SET clause of the UPDATE statement. To detect if the value in
a column is actually being changed to a different value, you need to interrogate the inserted and deleted
virtual tables.
COLUMNS_UPDATED Function
SQL Server also provides a function called COLUMNS_UPDATED. This function returns a bitmap that
indicates which columns are being updated. The values in the bitmap depend upon the positional
information for the columns. Hard-coding that sort of information in the code within a trigger is generally
not considered good coding practice because it affects the readability (and hence the maintainability) of
your code. It also reduces the reliability of your code because schema changes to the table could break
the code.
Firing Order for Triggers

It is possible to assign multiple triggers to a single
event on a single object. Only limited control is
available over the firing order of these triggers.
sp_settriggerorder
Developers often seek to control the firing order of
multiple triggers that are defined for a single event
on a single object. For example, a developer might
create three AFTER INSERT triggers on the same
table, each implementing different business rules or
administrative tasks.
In general, code within one trigger should not

depend upon the order of execution of other triggers. Limited control of firing order is available through
the sp_settriggerorder system stored procedure. It enables you to specify the triggers that will fire first and
last from a set of triggers that all apply to the same event on the same object.
The possible values for the @order parameter are First, Last, or None. None is the default action. An error
will occur if the First and Last triggers both refer to the same trigger.
For DML triggers, the possible values for the @stmttype parameter are INSERT, UPDATE, or DELETE.
Alternatives to Using Triggers

Triggers can handle complex logic and are
sometimes necessary. However, triggers are often
used in situations where alternatives would be
preferable.
Checking Values
You could use triggers to check that values in
columns are valid or within given ranges. In general,
you should use CHECK constraints instead of
triggers for this because CHECK constraints perform
the check before the data modification is
attempted.
If you are using triggers to check the correlation of values across multiple columns within a table, you
should usually create table-level CHECK constraints instead.
Defaults
You can use triggers to provide default values for columns when no values have been provided in INSERT
statements. However, you should generally use DEFAULT constraints for this instead.
Foreign Keys
You can use triggers to check the relationship between tables. However, you should generally use
FOREIGN KEY constraints for this.
Computed Columns
You can use triggers to maintain the value in one column based on the value in other columns. In general,
you should use computed columns or persisted computed columns for this.
Precalculating Aggregates
You can use triggers to maintain precalculated aggregates in one table, based on the values in rows in
another table. In general, you should use indexed views to provide this functionality.
Suitable Situations for Using Triggers

Although general guidelines are provided here, replacing the triggers with these alternatives is not always
possible. For example, the logic that is required when checking values might be too complex for a CHECK
constraint.
As another example, a FOREIGN KEY constraint cannot be contained on a column that is also used for
other purposes. Consider a column that holds an employee number only if another column holds the
value ‘E’. This typically indicates a poor database design, but you can use triggers to ensure this sort of
relationship.
Demonstration: Replacing Triggers with Computed Columns

 How to replace a trigger with a computed column.
Demonstration Steps
Replace a trigger with a computed column


Lab: Responding to Data Manipulation by Using Triggers

Scenario
You are required to audit any changes to data in a table that contains sensitive balance data. You have
decided to implement this by using DML triggers because the SQL Server Audit mechanism does not
provide directly for the requirements in this case.
The Production.ProductAudit table is used to hold changes to high-value products. When inserting rows
into this table, the data required in each column is shown in the following table.
Column Data type Value to insert
AuditID int IDENTITY
ProductID int ProductID
UpdateTime datetime2 SYSDATETIME()
ModifyingUser varchar(30) ORIGINAL_LOGIN()
OriginalListPrice decimal(18,2) ListPrice before update
NewListPrice decimal(18,2) ListPrice after update
Objectives
 Create triggers.
 Modify triggers.

Password: Pa$$w0rd
Exercise 1: Create and Test the Audit Trigger

Scenario
The Production.Product table includes a column called ListPrice. Whenever an update is made to the
table, if either the existing balance or the new balance is greater than 1,000 US dollars, an entry needs to
be written to the Production.ProductAudit audit table.
Note: Inserts or deletes on the table do not need to be audited. Details of the current user
can be taken from the ORIGINAL_LOGIN() function.
3. Design a Trigger
4. Test the Behavior of the Trigger


1. Navigate to D:\Labfiles\Lab10\Starter.
2. Open Supporting Documentation.docx.
3. Review the existing structure of the Production.ProductAudit table and the values required in each
column, based on the supporting documentation.
4. Review the existing structure of the Production.Product table.
 Task 3: Design a Trigger

1. Design and create a trigger that meets the needs of the supporting documentation.
 Task 4: Test the Behavior of the Trigger

1. Execute data modification statements that are designed to test whether the trigger is working as
expected.
Results: After this exercise, you should have created a new trigger. Tests should have shown that it is
working as expected.
Exercise 2: Improve the Audit Trigger

Scenario
Now that the trigger that was created in the first exercise has been deployed to production, the
operations team is complaining that too many entries are being audited. Many accounts have more than
10,000 US dollars as a balance and minor movements of money are causing audit entries. You need to
modify the trigger so that only changes in the balance of more than 10,000 US dollars are audited instead.
1. Modify the trigger based on the updated requirements.
2. Delete all rows from the Marketing.CampaignAudit table.
3. Test the modified trigger.

1. Modify the Trigger
2. Delete all Rows from the Marketing.CampaignAudit Table
3. Test the Modified Trigger
 Task 1: Modify the Trigger

1. Review the design of the existing trigger and decide what modifications are required.
2. Use an ALTER TRIGGER statement to change the existing trigger so that it will meet the updated
requirements.
 Task 2: Delete all Rows from the Marketing.CampaignAudit Table

1. Execute a DELETE statement to remove all existing rows from the Marketing.CampaignAudit table.
 Task 3: Test the Modified Trigger

1. Execute data modification statements that are designed to test whether the trigger is working as
expected.
Results: After this exercise, you should have altered the trigger. Tests should show that it is now working
as expected.

Best Practice:
1. In many business scenarios, it makes sense to mark records as deleted with a status column and use a
trigger or stored procedure to update an audit trail table. The changes can then be audited, the data
is not lost, and the IT staff can perform purges or archival of the deleted records.
2. Avoid using triggers in situations where constraints could be used instead.
Review Question(s)
Question: How do constraints and triggers differ regarding timing of execution?
11-1
Module 11
Using In-Memory Tables
Contents:
Lesson 1: Memory-Optimized Tables 11-2
Lesson 2: Natively Compiled Stored Procedures 11-9
Module Overview
Microsoft® SQL Server® 2014 data management software introduces in-memory OLTP functionality
features to improve performance of OLTP workloads. Memory-optimized tables are primarily stored in
memory which provides the improved performance by reducing hard disk access and natively compiled
stored procedures further improve performance over traditional interpreted Transact-SQL.
Objectives
 Use memory-optimized tables to improve performance for latch-bound workloads.
 Use natively compiled stored procedures.

11-2 Using In-Memory Tables
Lesson 1
Memory-Optimized Tables
SQL Server 2014 introduces memory-optimized tables as a way to improve the performance of latch-
bound OLTP workloads. Memory-optimized tables are stored in memory, and do not use locks to enforce
concurrency isolation. This dramatically improves performance for many OLTP workloads.
Lesson Objectives
 Describe the key features of memory-optimized tables.
 Describe scenarios for memory-optimized tables.

 Use the Memory-Optimization Advisor.
 Create memory-optimized tables.
 Use indexes in memory-optimized tables.
 Plan memory-optimized tables.
 Query memory-optimized tables.
What Are Memory-Optimized Tables?

Memory-optimized tables are tables in SQL Server
that are defined as C structs and compiled as
dynamic-link libraries (DLLs) that can be loaded into
memory. The query processor in SQL Server 2014
transparently converts Transact-SQL queries against
memory-optimized tables into the appropriate C
calls, enabling you to use them just like any other
table in a SQL Server database.
Memory-optimized tables:
 Are defined as C structs, compiled into DLLs,

and loaded into memory.
 Can persist their data to disk as FILESTREAM data, or they can be nondurable.
 Do not apply any locking semantics during transactional data modifications.
 Can be indexed by using hash or range indexes.
 Can coexist with disk-based tables in the same database.
 Can be queried by using Transact-SQL through interop services that the SQL Server query processor
provides.
 Cannot include some data types, including text, image, and nvarchar(max).
 Do not support identity columns or foreign key constraints.

Scenarios for Memory-Optimized Tables

Memory-optimized tables provide some
performance benefits by storing data in the
memory and reducing disk I/O. However, SQL
Server uses caching to optimize queries that access
commonly used data anyway, so the gains from in-
memory storage may not be significant for some
tables. The primary feature of memory-optimized
tables that improves database performance is the
lack of any locking to manage transaction isolation.
Memory-optimized tables are therefore likely to be
of most benefit when you need to optimize
performance for latch-bound workloads that
support concurrent access to the same tables.
Common Scenarios for Memory-Optimized Tables

Common latch-bound scenarios include OLTP workloads in which:
 Multiple concurrent queries modify large numbers of rows in a transaction.
 A table contains “hot” pages. For example, a table that contains a clustered index on an incrementing
key value will inherently suffer from concurrency issues because all insert transactions occur in the last
page of the index.
Considerations for Memory-Optimized Table Concurrency

When you update data in memory-optimized tables, SQL Server uses an optimistic concurrency row-
versioning mechanism to track changes to rows, so that the values in a row at a specific time are known.
The in-memory nature of memory-optimized tables means that data modifications occur extremely
quickly and conflicts are relatively rare. However, if a conflict error is detected, the transaction in which
the error occurred is terminated. You should therefore design applications to handle concurrency conflict
errors in a similar way to handling deadlock conditions.
Concurrency errors that can occur in memory-optimized tables include:

 Write conflicts. These occur when an attempt is made to update or delete a record that has been
updated since the transaction began.
 Repeatable read validation failures. These occur when a row that the transaction has read has
changed since the transaction began.
 Serializable validation failures. These occur when a new (or phantom) row is inserted into the range
of rows that the transaction accesses while it is still in progress.
 Commit dependency failures. These occur when a transaction has a dependency on another
transaction that has failed to commit.
Converting Tables with Memory Optimization Advisor

Memory Optimization Advisor will review your
existing disk-based tables and run through a
checklist to verify that your environment and the
specific tables are suitable for you to convert the
tables to memory-optimized tables.
Memory Optimization Advisor has five steps:
Migration Validation
Migration validation reports on any features of your
disk-based tables that are not supported in
memory-optimized tables.
Migration Warnings
Migration warnings don’t prevent a disk-based table from being migrated to a memory-optimized table,
or stop the table from functioning once it’s converted, but the warnings will list any other associated
objects, such as stored procedures, that might not function correctly post-migration.
Migration Options
You can now specify options such as the filegroup, the new name for the original unmigrated disk-based
table, and whether to transfer the data from the original table to the new memory-optimized table.
Primary Key Migration

If you are migrating to a durable table you must specify a primary key or create a new primary key at this
stage. You can also specify whether the index should be a hash index or not. Hash indexes are better for
point lookups. Non-hash indexes are better for range lookups.
Index Migration
Index migration gives you the same options as primary key migration for each of the indexes on the table.
Summary
The summary lists the options that you have specified in the previous stages and allows you to migrate
the table, or to create a script to migrate the table at a subsequent time.
To start Memory Optimization Advisor, in SQL Server Management Studio, right-click a table in Object
Explorer and select Memory Optimization Advisor.
Creating Memory-Optimized Tables

You can create memory-optimized tables by using
Transact-SQL or the graphical tools in SQL Server
Management Studio.
Creating a Filegroup for Memory-

Optimized Data
Databases in which you want to create memory-
optimized tables must contain a filegroup for
memory-optimized data.
You can add a filegroup for memory-optimized data to a database by using the ALTER DATABASE
statement, as the following example shows:
Adding a Filegroup for Memory-Optimized Data

ALTER DATABASE MyDB
ADD FILEGROUP mem_data CONTAINS MEMORY_OPTIMIZED_DATA;
GO
ALTER DATABASE MyDB
ADD FILE (NAME = 'MemData' FILENAME = 'D:\Data\MyDB_MemData.ndf')
TO FILEGROUP mem_data;
Creating a Memory-Optimized Table

When you create a memory-optimized table, you can include a hash index and specify the durability of
the table data. By default, the durability option is set to SCHEMA_AND_DATA, so the data in the table is
persisted to FILESTREAM data in the memory-optimized filegroup on which the table is created. You can
also specify a durable value of SCHEMA_ONLY so that only the table definition is persisted and any data
in the table will be lost in the event of the database server shutting down.
To create a memory-optimized table, execute a CREATE TABLE statement that has the
MEMORY_OPTIMIZED option set to ON, as shown in the following example:
Creating a Memory-Optimized Table

CREATE TABLE dbo.MemoryTable
(OrderId INTEGER NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000),
OrderDate DATETIME NOT NULL,
ProductCode INTEGER NULL,
Quantity INTEGER NULL)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
Note: When the durability option is set to SCHEMA_AND_DATA, the data is written to
disk as a stream, not in 8-KB pages as used by disk-based tables. The ability to set the durability
option to SCHEMA_ONLY is useful when the table is used for transient data, such as a session
state table in a web server farm.
All tables that have a durability option of SCHEMA_AND_DATA must include a primary key. You can
specify this inline for single-column primary keys, as shown in the previous example, or you can specify it
after all of the column definitions.
To create a memory-optimized table that has a composite primary key, you must specify the PRIMARY
KEY constraint after the column definitions, as shown in the following example:
Creating a Memory-Optimized Table That Has a Composite Primary Key

CREATE TABLE dbo.MemoryTable2
(OrderId INTEGER NOT NULL,
LineItem INTEGER NOT NULL,
Quantity INTEGER NULL
PRIMARY KEY NONCLUSTERED HASH (OrderID, LineItem) WITH (BUCKET_COUNT = 1000000))
Indexes in Memory-Optimized Tables

Memory-optimized tables support two kinds of
index:
1. Hash indexes. Hash indexes define a specified

number of storage locations, or “buckets,” in
which rows are stored. You apply an algorithm
to the indexed key values to determine the
bucket in which the row is stored. When a
bucket contains multiple rows, a linked list is
created by adding a pointer in the first row to
the second row, in the second row to the third
row, and so on.
2. Range indexes. Range indexes use a latch-free

variation of a binary tree (B-Tree) structure, called a “BW-Tree,” to organize the rows based on key
values.
All memory-optimized tables must include at least one index, which can be the index that was created for
the primary key.
To create indexes in addition to the primary key, you must specify the indexes after the column
definitions:
Creating an Index in a Memory-Optimized Table

CREATE TABLE dbo.IndexedMemoryTable
(OrderId INTEGER NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
Quantity INTEGER NULL
INDEX idx_MemTab_OrderDate NONCLUSTERED HASH(OrderDate) WITH (BUCKET_COUNT = 100000))
Planning Memory-Optimized Tables

SQL Server 2014 includes an Analysis, Migration,
and Reporting (AMR) tool that you can use to
analyze workloads in existing databases and
determine which tables and stored procedures
should be converted to in-memory objects. The
AMR tool uses the Data Collector feature of SQL
Server to collect workload statistics in a
management data warehouse (MDW). SQL Server
2014 includes a predefined data collection set
named Transaction Performance Collection Sets
that gathers the required data for analysis.
After the data has been collected, you can view a

Transaction Performance Analysis Overview report that recommends tables and stored procedures to
convert to in-memory objects. It provides a matrix that shows the relative performance gains on one axis
and the migration effort on another axis, as shown on the slide. Factors that increase the estimated
migration effort include the presence of data types or constraints that are not supported in memory-
optimized tables.
Querying Memory-Optimized Tables

When a database contains memory-optimized
tables, applications can query those tables in two
ways.
Query Interop
You can use Transact-SQL statements to access
memory-optimized tables in the same way as
traditional disk-based tables. The SQL Server 2014
query engine provides an interop layer that does
the necessary interpretation to query the compiled
in-memory table. You can use this technique to
create queries that access both memory-optimized
tables and disk-based tables, for example, by using
a JOIN clause.
Native Compilation
You can increase the performance of workloads that use memory-optimized tables further by creating
natively compiled stored procedures. You can define these by using CREATE PROCEDURE statements
that the SQL Server 2014 query engine converts to native C code. The C version of the stored procedure is
compiled into a DLL, which is loaded into memory. You can only use natively compiled stored procedures
to access memory-optimized tables; they cannot reference disk-based tables.
Demonstration: Using Memory-Optimized Tables

 Use memory-optimized tables.
Demonstration Steps
Using memory-optimized tables
Lesson 2
Natively Compiled Stored Procedures
Natively compiled stored procedures are stored procedures that are compiled into native code. They are
written in traditional Transact-SQL code, but are compiled when they are created rather than when they
are executed which improves performance.
Lesson Objectives
 Describe the key features of natively compiled stored procedures.
 Create natively compiled stored procedures.
What Are Natively Compiled Stored Procedures?

Natively compiled stored procedures are written in
Transact-SQL, but are then compiled into native
code when they are created. This differs from
traditional interpreted stored procedures, which are
compiled the first time that they run. Compiling at
creation time can cause errors at creation time that
would not appear in an interpreted stored
procedure until it is executed.
Natively compiled stored procedures access

memory-optimized tables.
Natively compiled stored procedures contain one

block of Transact-SQL called an atomic block. This
block will either succeed or fail as a single unit. Atomic blocks can specify their transaction isolation level.
They are not available to interpreted stored procedures.
For more information, see the Introduction to Natively Compiled Stored Procedures article on
MSDN.
http://go.microsoft.com/fwlink/?LinkID=394850&clcid=0x409
Creating Natively Compiled Stored Procedures

To create a natively compiled stored procedure, you
must use the CREATE PROCEDURE statement with
the following options:
 NATIVE_COMPILATION
 SCHEMABINDING
 EXECUTE AS
In addition to these options, you must initiate a

transaction in your stored procedure by using the
BEGIN ATOMIC clause, specifying the transaction
isolation level and language. You can specify one of
the following transaction isolation levels.
 SNAPSHOT. Using this isolation level, all data that the transaction reads is consistent with the version
that was stored at the start of the transaction. Data modifications that other, concurrent transactions
have made are not visible and attempts to modify rows that other transactions have modified result
in an error.
 REPEATABLE READ. Using this isolation level, every read is repeatable until the end of the transaction.
If another, concurrent transaction has modified a row that the transaction had read, the transaction
will fail to commit due to a repeatable read validation error.
 SERIALIZABLE. Using this isolation level, all data is consistent with the version that was stored at the
start of the transaction, and repeatable reads are validated. In addition, the insertion of “phantom”
rows by other, concurrent transactions will cause the transaction to fail.
The following code example shows a CREATE PROCEDURE statement that is used to create a natively
compiled stored procedure:
Creating a Natively Compiled Stored Procedure

CREATE PROCEDURE dbo.DeleteCustomer @CustomerID INT
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(TRANSACTION ISOLATION LEVEL = SNAPSHOT;
LANGUAGE = 'us_English')
DELETE dbo.Customer WHERE CustomerID = @CustomerID
DELETE dbo.OpenOrders WHERE CustomerID = @CustomerID
END;
Demonstration: Creating a Natively Compiled Stored Procedure

 Create a natively compiled stored procedure.
Demonstration Steps
Create a natively compiled stored procedure
2. Ensure that you have run the previous demonstration.

Scenario
You are planning to optimize some database workloads by using the in-memory database capabilities of
SQL Server 2014. You will create memory-optimized tables and natively compiled stored procedures to
optimize OLTP workloads.
Objectives
 Create a memory-optimized table.
 Create a natively compiled stored procedure.
Password: Pa$$w0rd
Exercise 1: Use Memory-Optimized Tables

Scenario
The Adventure Works website, through which customers can order goods, uses the InternetSales
database. The database already includes tables for sales transactions, customers, and payment types. You
need to add a table to support shopping cart functionality. The shopping cart table will experience a high
volume of concurrent transactions, so, to maximize performance, you want to implement it as a memory-
optimized table.

2. Add a Filegroup for Memory-Optimized Data
3. Create a Memory-Optimized Table

 Task 2: Add a Filegroup for Memory-Optimized Data

1. Add a filegroup for memory-optimized data to the InternetSales database.
2. Add a file for memory-optimized data to the InternetSales database. You should store the file in the
filegroup that you created in the previous step.
 Task 3: Create a Memory-Optimized Table

1. Create a memory-optimized table named ShoppingCart in the InternetSales database.
2. The table should include the following columns:
o SessionID: integer
o TimeAdded: datetime
o CustomerKey: integer
o ProductKey: integer
o Quantity: integer
3. The table should include a composite primary key hash index on the SessionID and ProductKey
columns with 100000 buckets.
4. Test the table by inserting some rows and querying the table. You can use any valid values for this
test.
Results: After completing this exercise, you should have created a memory-optimized table and a natively
compiled stored procedure in a database with a filegroup for memory-optimized data.
Exercise 2: Use Natively Compiled Stored Procedures

Scenario
The Adventure Works website now includes a memory-optimized table. You now wish to create a natively
compiled stored procedure to take full advantage of the performance benefits of in-memory tables.
1. Create Natively Compiled Stored Procedures
 Task 1: Create Natively Compiled Stored Procedures

1. Create a natively compiled stored procedure named AddItemToCart. The stored procedure should
include a parameter for each column in the ShoppingCart table, and should insert a row into the
ShoppingCart table by using a SNAPSHOT isolation transaction.
2. Create a natively compiled stored procedure named DeleteItemFromCart. The stored procedure
should include SessionID and ProductKey parameters, and should delete matching rows from the
ShoppingCart table by using a SNAPSHOT isolation transaction.
3. Create a natively compiled stored procedure named EmptyCart. The stored procedure should include
SessionID parameters, and should delete matching rows from the ShoppingCart table by using a
SNAPSHOT isolation transaction.
4. Test each of the stored procedures by writing Transact-SQL statements to call them with appropriate
parameter values.
Results: After completing this exercise, you should have created a natively compiled stored procedure.

In this module, you have learned how to store tables in memory and how to natively compile stored
procedures to access the memory-optimized tables.
12-1
Module 12
Implementing Managed Code in SQL Server
Contents:
Lesson 1: Introduction to CLR Integration in SQL Server 12-2
Lesson 2: Importing and Cataloging Assemblies 12-9
Lesson 3: Implementing CLR Integration in SQL Server 12-13
Lab: Implementing Managed Code in SQL Server 12-23
Module Overview
As a database professional, you are asked to create databases and related objects to meet business needs.
You can meet most requirements by using Transact-SQL. However, there are times when the requirements
go beyond the abilities of Transact-SQL. These requirements may include functionality such as:
 Complex or compound data types, such as currency values that include culture information, complex
numbers, and dates that include a calendar system, or storing entire arrays of values in a single
column.
 Accessing image files on the operating system and reading them or copying them into the database.
All of these are examples of requirements that you can meet by using common language runtime (CLR)
integration in Microsoft® SQL Server® data management software. You can use integrated code to
create user-defined functions, stored procedures, aggregates, types, and triggers. You can develop these
objects by using any .NET language and they can be highly specialized. In this module, you will learn
about using CLR integrated code to create user-defined database objects that the .NET Framework
manages.
Objectives
 Explain the importance of CLR integration in SQL Server.
 Import and catalog assemblies.

 Implement CLR integration in SQL Server.
12-2 Implementing Managed Code in SQL Server
Lesson 1
Introduction to CLR Integration in SQL Server
Among database professionals, there is a constant desire to extend the built-in functionality of SQL Server.
For example, you might want to add a new aggregate to the existing list of aggregates that SQL Server
supplies. There is no right or wrong method to extend the product. Particular methods are more or less
suited to particular needs and situations. CLR integration in SQL Server is one method for extending SQL
Server. It is important to understand CLR integration in SQL Server and its appropriate use cases.
Lesson Objectives
 Explain the ways in which you can extend SQL Server.
 Describe the .NET Framework.
 Describe the .NET CLR environment.

 Explain the need for managed code in SQL Server.
 Explain the situations where the use of Transact-SQL is inappropriate.
 Choose appropriate use cases for managed code in SQL Server.
Options for Extending SQL Server

Many SQL Server components have mechanisms
that enable you to extend their functionality.
Previous mechanisms for extending the Database
Engine, such as the use of extended stored
procedures, are limited in both effectiveness and
safety. Managed code is a safer alternative.
Managed Code
Managed code is code that is written to operate
within the .NET Framework. There seems to be
concern among database administrators about
running managed code within the Database Engine,
but it is important to realize that even the most
unsafe managed code that you write is always safer than any extended stored procedure code.
You can create many applications by using the “out-of-the-box” tools and functionality that SQL Server
provides. However, being able to reuse previously developed functionality helps to produce higher quality
outcomes. Therefore, it is desirable to package that reusable functionality as an extension of the SQL
Server product.
Many SQL Server components are extensible. As an example, SQL Server Reporting Services enables you
to create rendering extensions, security extensions, data processing extensions, delivery extensions,
custom code, and external assemblies.
Database Engine Extensibility

Traditionally, it has been possible to extend the Database Engine by creating extended stored procedures.
These are specially crafted procedures that are written in C++ and are complex to code. In addition, when
they are operating, they execute directly within the process space of the SQL Server engine, which is a
cause of some concern. The process space of the SQL Server engine is not a safe place to be executing
code because minor errors could cause failure or instability of the Database Engine itself. The ability to
create extended stored procedures is now deprecated.
Introduction to the .NET Framework

The .NET Framework is the foundation for
developing Windows® applications and services
including SQL Server. The .NET Framework offers
tools to the developer that make application and
service development easier, and provides a good
basis for code to extend SQL Server.
Win32 and Win64 APIs

The Windows operating system has evolved over
many years. The programming interfaces to the
operating system are commonly referred to as the
Win32 and Win64 application programming
interfaces (APIs). These interfaces evolved over the
same period. In general, they are complex and inconsistent in the way that they are designed. This is
largely because they have evolved over time rather than being designed with a single set of guidelines at
one time.
.NET Framework
The .NET framework is a layer of software that sits above the Win32 and Win64 APIs and abstracts the
underlying complexity. This framework is written in a consistent fashion to a tightly written set of design
guidelines. Many people describe it as appearing to have been “written by one brain.” It is not specific to
any one programming language and also contains many thousands of prebuilt and pretested objects.
These objects are collectively referred to as the .NET Framework class libraries.
These capabilities make the .NET Framework a good base for building code to extend SQL Server.
.NET Common Language Runtime

The .NET CLR is the layer in the .NET Framework
that enables you to create programs and
procedures in any .NET language and deploy it for
use. The resulting resources are referred to as
managed code. It is important to distinguish the
CLR from the common language specification (CLS).
The CLR integration feature within SQL Server

enables you to use .NET assemblies to customize
your SQL Server databases.
The .NET CLR offers:

 Access to existing managed code.
 Security features to ensure that managed code will not compromise the server.
 The ability to create new resources by using .NET languages such as Microsoft Visual C#® and
Microsoft Visual Basic® .NET.
Memory Management
A key problem that arose in development directly against the Win32 and Win64 APIs related to memory
management. In older Component Object Model (COM) programming that was used with these APIs,
releasing memory when it was no longer needed was based on reference counting. The idea was that the
following sequence of events would occur:
 Object A creates Object B.
 When Object B is created, it notes that it has one reference to itself.
 Object C might then acquire a reference to Object B, too. Object B then notes that it has two
references to itself.
 Object C releases its reference. Object B then notes that it has only a single reference to itself.
 Object A releases its reference, too. Object B then notes that it now has no references to itself, so it
proceeds to destroy itself.
The problem with this scheme is that it is easy to create situations where memory is lost. For a simple
example, consider circular references. If two objects have references to each other, but no other object
has any reference to either of them, they can both sit in memory forever as long as they have a reference
to each other. This causes a leak (or loss) of the memory that those objects consume. Over time, creation
of such situations could cause the loss of all available memory on the system.
This sort of memory management scheme would not be suitable within the Database Engine. The .NET
Framework includes a sophisticated memory management system that is known as garbage collection. It
is designed to avoid any chance of such memory leaks. Instead of objects needing to count references, the
CLR periodically checks which objects are “reachable” and disposes of the other objects.
Type Safety
Another common problem with Win32 and Win64 code relates to what is known as type safety. When a
function or procedure is called, all that is known to the caller is the address in memory of the function.
The caller assembles a list of any required parameters, places them in an area that is called the stack, and
jumps to the memory address of the function. Problems arise when the design of the function and/or its
parameters change and the calling code is not updated. The function can then end up referring to
memory locations that do not exist.
The .NET CLR is designed to avoid such problems. As an example, in addition to providing details of the
address of a function, it provides details of what is called the signature of a function. This specifies the
data types of each of the parameters and the order that they need to be in. The CLR will not enable a
function to be called with the wrong number or types of parameters. This is referred to as “type safety.”
Hosting the CLR

The CLR is also designed to be hostable. This means that it can itself be operated within other
programming environments. For CLR integration in SQL Server, SQL Server becomes the host for the CLR.
From the point of view of the CLR, it is as though the CLR thinks that SQL Server is the operating system.
This enables SQL Server to exert great control over how the CLR operates in terms of performance,
security, and stability.
CLS
The CLS is the common language specification. It specifies the rules that languages must conform to, so
that interoperability between languages is possible. For example, even though it is possible in C# to create
a method called SayHello and another method called Sayhello, these methods could not be called from
another language that was not case-sensitive. The CLS states that, to avoid interoperability problems, you
should not create these two methods.
Why Use Managed Code in SQL Server?

Managed code enables SQL Server to access
thousands of available .NET libraries and assemblies
that third parties have created, in addition to those
that you develop. SQL Server provides a rich
development environment (Microsoft Visual
Studio®) for building managed code. You can
create many objects in either Transact-SQL or
managed code, but managed code enables you to
create new types of objects that you cannot create
in Transact-SQL.
In the last topic, the critical nature of type safety

was discussed. However, for efficient development,
you also need to achieve a high degree of code reuse. The .NET Framework offers a large set of libraries,
each of which contains a large set of prewritten (and pretested) objects (typically referred to as classes)
that you can easily use directly in SQL Server by using the CLR integration feature in SQL Server. For
example, the Regular Expression (RegEx) library in the .NET Framework is a very powerful string
manipulation library that you can utilize within SQL Server by using the CLR integration feature in SQL
Server.
The inclusion of managed code in SQL Server also enables much easier access to external resources and in
some cases provides higher performance.
Although there have been advances in error handling in Transact-SQL in recent years, the error handling
that the Transact-SQL language provides is still well short of the type of error handling that higher-level
languages typically provide. Writing managed code enables you to take advantage of these more
extensive error-handling capabilities.
Alternative to Transact-SQL Objects

Many objects that you can create in Transact-SQL can also be created in managed code. This includes the
following sets of objects:
 Scalar user-defined functions
 Table-valued user-defined functions
 Stored procedures
 Data manipulation language (DML) triggers
 Data definition language (DDL) triggers
New Object Types

In managed code, you can also construct types of objects that you cannot construct in Transact-SQL.
These include the following sets of objects:
 User-defined data types
 User-defined aggregates
Transact-SQL vs. Managed Code

When you are considering whether to create an
object in Transact-SQL or managed code, there is
no right or wrong answer for all situations.
Transact-SQL
Transact-SQL is the primary method for
manipulating data within databases. It is designed
for direct data access and offers high performance,
particularly when it is working against very large
sets of data. However, Transact-SQL is not a fully-
fledged high-level programming language.
Transact-SQL has no object-oriented capabilities.

For example, you cannot create a stored procedure that takes a parameter of an animal data type and
pass a parameter of a cat data type to it. Transact-SQL is not designed for tasks such as intensive
calculations or string handling and its objects are designed in a single flat namespace. Almost all system
objects in SQL Server reside in a single sys schema. You cannot create a Transact-SQL schema within
another Transact-SQL schema. Managed code enables you to create a hierarchy of namespaces. Transact-
SQL does offer a useful set of built-in functions.
Managed Code
Managed code provides full object-oriented capabilities, although this only applies within the managed
code itself. Transact-SQL code does not support the object-oriented capabilities.
Managed code works well in situations that require intensive calculations (such as encryption) or string
handling.
General Rules
Two good general rules apply when you are making a choice between using Transact-SQL and managed
code:
 The more data-oriented the need is, the more likely it is that Transact-SQL will be the better answer.
 The more the need is focused on calculation, strings, or external access, the more likely it is that
managed code will be the better answer.
Appropriate Use of Database Objects in Managed Code

In the last topic, you saw some general rules for
choosing between Transact-SQL and managed
code. You need to consider how these rules would
map to database objects in SQL Server. Remember
that there is no right or wrong answer in all
situations.
Scalar UDFs
It is well-known that scalar user-defined functions
(UDFs) that are written in Transact-SQL can cause
performance problems in SQL Server environments.
Managed code is often a good option for
implementing scalar UDFs as long as the function
does not depend on data access.
Table-Valued UDFs
The more data-related table-valued UDFs are, the more they are likely to be best implemented in
Transact-SQL. A common use case for managed code in table-valued UDFs is for functions that need to
access external resources such as the file system, environment variables, and the registry.
Stored Procedures
Stored procedures have traditionally been written in Transact-SQL. Most stored procedures should
continue to be written in Transact-SQL. There are very few good use cases for managed code in stored
procedures. The exceptions to this are stored procedures that need to access external resources or
perform complex calculations. However, you should consider whether code that performs these tasks
should be implemented within SQL Server at all.
DML Triggers
Almost all data manipulation language triggers are heavily oriented toward data access and are written in
Transact-SQL. There are very few valid use cases for implementing DML triggers in managed code.
DDL Triggers
Data definition language triggers are again often data-oriented. However, some DDL triggers need to do
extensive XML processing, particularly based on the XML EVENTDATA structure that SQL Server passes to
these triggers. The more that extensive XML processing is required, the more likely it is that the DDL
trigger would be best implemented in managed code. Managed code would also be a better option if the
DDL trigger needed to access external resources, but this is rarely a good idea within any form of trigger.
User-Defined Aggregates
Transact-SQL offers no concept of user-defined aggregates. You need to implement these in managed
code.

Transact-SQL offers the ability to create alias data types, but these are not really new data types. They are
more like subsets (or subclasses) of existing built-in data types. Managed code offers the ability to create
entirely new data types and determine not only what data needs to be stored, but also the behavior of the
data type.
Lesson 2
Importing and Cataloging Assemblies
Assemblies are the unit of both deployment and security in the .NET Framework. Managed code in SQL
Server resides within assemblies. Before you can start to work with managed code in SQL Server, you need
to learn about assemblies and how you can import them into SQL Server and secure them.
Lesson Objectives
 Explain what an assembly is.
 Detail the permission sets that are available for securing assemblies.
 Import an assembly.
What Is an Assembly?
Assemblies are the unit of both deployment and
security in the .NET Framework. They contain the
code that will be executed, are self-describing, and
may contain resources.
Structure of an Assembly
Prior to managed code, executable files (.exe files)
and dynamic-link libraries (.dll files) contained only
executable code. Compilers produce executable
code by converting instructions in higher-level
languages into the binary codes that the
computer’s processor requires for execution.
Managed code assemblies have a specific structure. In addition to executable code, they contain a
manifest. This manifest provides a list of the contents of the assembly and of the programming interfaces
that the assembly provides. This enables other code to interrogate an assembly to determine both what it
contains and what it can do. As an example, SQL Server can gain a great deal of understanding of an
assembly by reading this manifest when it is cataloging an assembly.
Assemblies can contain other resources such as icons. These are also listed in the manifest. You can
structure assemblies as either .exe files or .dll files. The only difference between the two is that .exe files
also include an area that is called the portable execution header (PE header), which the operating system
uses to find out where the executing code of an .exe file starts. SQL Server will only import .dll files and
will refuse to import .exe files.
Deployment and Security

Assemblies are the unit of managed code that is deployed and versioned. They are created outside SQL
Server, so it is possible to share assemblies between SQL Server and business applications.
Assemblies also form a boundary at which security is applied. In the next topic, you will see how this
security is configured.
Assembly Permission Sets

Using the CLR offers several levels of trust that you
can set within policies for the machine and host on
which the assembly runs. There are three SQL Server
permission sets that enable the administrator to
control the server’s exposure to security and
integrity risks: SAFE, EXTERNAL_ACCESS, and
UNSAFE.
Regardless of what the code in an assembly

attempts to do, the permission set on the assembly
that contains the code determines the permitted
actions.
SAFE
Administrators should regard SAFE as really meaning what the name says. It is a particularly limited
permission set, but it does allow access to the SQL Server database in which it is cataloged via a special
type of connection that is known as a context connection. Administrators should be comfortable with the
cataloging of SAFE assemblies. SAFE is the default permission set.
EXTERNAL_ACCESS
EXTERNAL_ACCESS is the permission set that is required before code in an assembly can access local and
network resources, environment variables, and the registry of the server. This permission set is still quite
safe and is typically used when any form of external access is required. Administrators should be fairly
comfortable with the cataloging of EXTERNAL_ACCESS assemblies, after a justification for the external
access requirements has been made.
UNSAFE
UNSAFE is the unrestricted permission set. It should be rarely used for general development. UNSAFE is
required for code that calls external unmanaged code or code that holds state across function calls, and
so on. Administrators should only allow the cataloging of UNSAFE assemblies in situations that have been
very carefully considered and justified.
Setup for EXTERNAL_ACCESS and UNSAFE

The EXTERNAL_ACCESS and UNSAFE permission sets require further configuration before you can use
them. You need to establish a level of trust. There are two ways to do this:
 You can flag the database as TRUSTWORTHY by using the ALTER DATABASE SET TRUSTWORTHY
ON statement. In general, this is not recommended without an understanding of what changes this
makes to the database security environment.
 An asymmetric key is created from the assembly file that is cataloged in the master database. Next, a
login mapping to that key is created. Finally, the login is granted the EXTERNAL ACCESS ASSEMBLY
permission on the assembly. This is the recommended method of granting permission to use
EXTERNAL_ACCESS or UNSAFE permission sets, but setting it up is an advanced topic that is beyond
the scope of this course.
Importing an Assembly
Before you can use the code in an assembly within
SQL Server, you must import and catalog the
assembly within a database.
CREATE ASSEMBLY
You can use the CREATE ASSEMBLY statement
both to import and catalog an assembly within the
current database. SQL Server assigns a permission
set to the assembly that is based on the WITH
PERMISSON_SET clause in the CREATE ASSEMBLY
statement. If no permission set is explicitly
requested, the assembly will be cataloged as a SAFE
assembly and the code within the assembly will only
be able to execute tasks that the SAFE permission set permits.
Before you can execute any code in a user-created assembly, you must set the ‘clr enabled’ option to 1
(enabled) at the instance level. It is still possible to catalog an assembly and the objects within it even if
this option is disabled. It only prevents code execution.
After the assembly is cataloged in the database, the contents of the assembly are contained within the
database and SQL Server no longer needs the file from which it was cataloged. After the assembly is
cataloged, it will be loaded from within the database when it is required, not from the file system.
Assembly Path
There are three locations from which an assembly can be imported:
1. A .dll file on a local drive. The drive may not be a mapped drive.
2. A .dll file from a Universal Naming Convention (UNC) path. (A UNC path is of the form
\\SERVER\Share\PathToFile\File.dll.)
3. A binary string that contains the contents of the .dll file.
At first, it might seem odd to consider cataloging an assembly from a binary string, but this is how Visual
Studio catalogs assemblies if you deploy an assembly directly from Visual Studio. Visual Studio cannot
assume that you have access to the file system of the server. You might be using an instance of SQL Server
or using a database that a hosting company is hosting and have no access to the file system of the server
at all.
Cataloging an assembly from a binary string enables you to stream an assembly to the server within the
CREATE ASSEMBLY statement. It is worth noting that, if you later generate a script for the database, any
contained assemblies will also be scripted as binary strings.
Demonstration: Importing and Cataloging an Assembly

 Import and catalog an assembly.
Demonstration Steps
Import and catalog an assembly
Lesson 3
Implementing CLR Integration in SQL Server
After an assembly has been cataloged, you also need to catalog any objects within it. This will make the
objects visible within SQL Server so that they can be called from within Transact-SQL code.
Lesson Objectives
 Explain how appropriate attribute usage is important when you are creating assemblies.
 Implement scalar UDFs that have been written in managed code.
 Implement table-valued UDFs that have been written in managed code.
 Implement stored procedures that have been written in managed code.
 Implement stored procedures that have been written in managed code and that require access to
external resources.
 Implement triggers that have been written in managed code.
 Implement user-defined aggregates that have been written in managed code.
 Implement user-defined data types that have been written in managed code.
 Take into account considerations for user-defined data types that have been written in managed
code.
Attribute Usage
Attributes are metadata that is included within code
and is used to describe that code. When you are
implementing managed code within SQL Server,
attributes are used for reasons of deployment,
performance, and correctness.
Attributes
If you have not written any managed code, the
concept of attributes may be unfamiliar to you.
Attributes are metadata (or data about data) that is
used to describe functions, methods, and classes.
Attributes do not form part of the logic of the
objects; instead, they describe aspects of them.
For example, consider an attribute that records the name of the author of a method. This does not change
how the method operates, but it could be useful information for anyone who uses the method. The .NET
Framework also has a special set of logic called Reflection that enables one set of managed code to
interrogate details of another set of managed code. Attributes are returned as part of this process. SQL
Server accesses the attributes that you associate with your code through reflection.
Deployment
The first reason why attributes are helpful relates to deployment. Adding a SqlFunction attribute to a
managed code method tells Visual Studio (or other code that is used for deployment) that the method
should be cataloged as a function within SQL Server. Adding an attribute to a method is also referred to
as “adorning” the method with the attribute.
If you do not add a SqlFunction attribute to a method, you can still manually catalog the method as a
function in SQL Server. The limitation is that automated deployment systems will not know to do so.
You might wonder why SQL Server does not just automatically catalog all methods as functions when it
catalogs an assembly. The reason is that you can use methods for more than just functions. Some
methods are only used within the assembly and are not intended to be used by code that utilizes the
functionality that the assembly provides.
Performance
The second reason why attributes are helpful relates to performance. Consider the DataAccess property
of the SqlFunction attribute that is shown on the slide. This property tells SQL Server that no data context
needs to be provided for this method. It does not access data from the database. This makes the function
quicker to execute and reduces its memory requirements.
As another example of how an attribute can help with performance, consider an attribute that tells SQL
Server that a method call always returns NULL if the parameter that is passed to the method is NULL. In
that case, SQL Server knows that it does not need to call the method at all if the value is NULL.
Correctness
The final reason why attributes are helpful relates to correctness. If a new Circle data type is created, it
might provide a method that is called Shrink. SQL Server needs to know that if this method is called, the
internal state of the user-defined data type will be changed when the method returns. This helps SQL
Server to know how the method can be used. For example, SQL Server would then know that the method
could be called in the SET clause of an UPDATE statement. It would also prevent SQL Server from
enabling the method to be called in a SELECT list or WHERE clause in a SELECT statement.
Scalar UDFs
Scalar user-defined functions are a common use
case for managed code and often offer a higher-
performing alternative to their equivalent Transact-
SQL functions.
CREATE FUNCTION
You can use the CREATE FUNCTION statement to
catalog a scalar user-defined function that has been
written in managed code. In the statement, you
need to provide the details of the returned data
type and a path to the method within the assembly.
Note that the name that a function is called within
SQL Server does not have to match the name that
the method is called within the assembly. However, it is considered good practice to have these matched
with each other to avoid confusion.
The autodeployment attribute that is related to this is the SqlFunction attribute.

EXTERNAL NAME
When you are cataloging the function, the EXTERNAL NAME clause is used to point to where the method
exists within the assembly. This normally consists of a three-part name:
 The first part of the name refers to the alias for the assembly that was used in the CREATE
ASSEMBLY statement.
 The second part of the name must contain the namespace that contains the method. In the example
on the slide, UserDefinedFunctions is a class. However, the UserDefinedFunctions class itself could
be contained within another namespace. If that other namespace was called CompanyFunctions, the
second part of the name would need to be specified as
[CompanyFunctions.UserDefinedFunctions].
 The third part of the name refers to the method within the class.
Note that even if the code has been built in a case-insensitive language such as Visual Basic, and the
database collation is set to case-insensitive, the assembly name that is provided in the EXTERNAL NAME
clause is case- sensitive.
Table-Valued UDFs
Table-valued functions (TVFs) are cataloged in a
similar way to scalar functions, but they need to
include the definition of the returned table.
CREATE FUNCTION
You can also use the CREATE FUNCTION
statement to catalog TVFs that are written in
managed code. The return data type, however,
must be TABLE. After the data type, you need to
provide the definition of the schema of the table. In
the example shown on the slide, the table consists
of two columns, both of integer data type.
Deployment Attribute
The definition of TVFs provides an example of why the properties of an attribute are useful. First, the
SqlFunction attribute indicates that the method should be cataloged as a function. The properties of the
attribute indicate:
 That the function does not require access to database data.
 The name of the FillRow method. (Do not be concerned with the FillRowMethodName method at
this point. Although it must be present, it relates to the internal design of the function.)
 The schema for the returned table. An automated deployment system (such as the one provided in
Visual Studio) needs to know the format of the returned table to be able to automatically catalog this
function in SQL Server.
Parameter Naming
The names that you choose for the parameter in Transact-SQL do not need to match the names that you
use in the managed code.
For example, you could catalog the function in the example on the slide in the following way:
Parameter Naming
CREATE FUNCTION dbo.RangeOfIntegers
(@StartValue int, @EndValue int)
RETURNS TABLE (PositionInList int, IntegerValue int)
AS EXTERNAL NAME
SQLCLR_Demo2.UserDefinedFunctions.RangeOfIntegers
However, you should create Transact-SQL parameters that have the same name as the parameters in the
managed code unless there is a compelling reason to make them different. An example of this would be a
parameter name that was used in managed code that was not a valid parameter name in Transact-SQL.
Even in this situation, a better option would be to change the parameter names in the managed code
wherever possible.
Implementing Stored Procedures in Managed Code

You can also write stored procedures in managed
code. Most stored procedures that are written at
present tend to be oriented around data access. As
such, replacing a Transact-SQL stored procedure
with a managed code stored procedure is unlikely
to be useful.
CREATE PROCEDURE
You can use the CREATE PROCEDURE statement to
catalog a stored procedure that is written in
managed code. The relevant deployment attribute
is the SqlProcedure attribute. This attribute tells
Visual Studio (or any other deployment tool) that
the method should be cataloged as a stored procedure.
You should list parameters that need to be passed to the stored procedure in the same way that they are
listed for a Transact-SQL stored procedure definition.
The EXTERNAL NAME clause works identically to cataloging scalar UDFs.
SqlPipe
Stored procedures that are written in managed code support both input and output parameters, just like
their equivalent procedures in Transact-SQL.
Like stored procedures that are written in Transact-SQL, stored procedures that are written in managed
code need a way to return rows of data. You use the SqlPipe object within the stored procedure code to
achieve this data. This object can return rows of data.
If you call the Send method of the SqlPipe object and pass a string value to it, the outcome is the same
as if you had issued a PRINT statement in a Transact-SQL–based stored procedure. You will see the values
returned on the Messages tab in SQL Server Management Studio.
You can see the SqlPipe object used in the following code example:
SqlPipe
public partial class StoredProcedures
{
[SqlProcedure]
public static void ProductsByColor(SqlString Color)
{
SqlConnection conn =
new SqlConnection("context connection=true");
SqlCommand command = conn.CreateCommand();
SqlPipe outputPipe = SqlContext.Pipe;
outputPipe.Send("Hello. It's now " +
DateTime.Now.ToLongTimeString() + " at the server.");
if (Color.IsNull)
{
command.CommandText =
"SELECT * FROM Production.Product "
+ "WHERE (Color IS NULL) ORDER BY ProductID";
}
else
{
command.CommandText =
"SELECT * FROM Production.Product "
+ "WHERE (Color = @Color) ORDER BY ProductID";
command.Parameters.Add(
new SqlParameter("@Color", Color.Value));
}
conn.Open();
outputPipe.Send(command.ExecuteReader());
conn.Close();
}
};
Implementing Stored Procedures That Require External Access

In the previous topic, you learned how simple it is
to replace Transact-SQL stored procedures that
perform data access with stored procedures that are
written in managed code. You also learned how
unlikely it is for stored procedures that are written
in managed code to be useful. This is because the
appropriate use cases for stored procedures that
are written in managed code are more likely to
involve access to external resources.
EXTERNAL_ACCESS Permission Set

A stored procedure that accepts data to be written
to an operating system file would be a more useful
example of implementing stored procedures in managed code. The stored procedure would create the
file and write the data to it.
Access to the file system requires EXTERNAL_ACCESS permission when the assembly that contains the
method is cataloged.
Avoiding the Need for Many Extended Stored Procedures

The use of managed code in stored procedures also removes the need to use many extended stored
procedures. xp_cmdshell is an example of an extended stored procedure that is disabled by default in SQL
Server, yet many applications require it to be enabled. xp_cmdshell enables the applications to perform
operations at the file-system level. Enabling xp_cmdshell is undesirable from a security perspective and
managed code provides alternate ways to implement this required functionality in a much safer form.
Triggers
You can implement both DML and DDL triggers
from within managed code.
CREATE TRIGGER
You can use the CREATE TRIGGER statement to
catalog methods in managed code assemblies as
triggers. The relevant deployment attribute is
SqlTrigger. The SqlTrigger attribute properties
that are most useful are:
 Name. The name that the trigger will be called

when it is deployed.
 Target. The name of the object to which the

trigger will be attached.
 Event. The event (or events) on which the trigger will fire.
Access to Modifications
Like triggers that are written in Transact-SQL, triggers that are written in managed code can access the
details of the changes being made or the commands that have been executed.
Within DML triggers, access is provided to the inserted and deleted virtual tables in exactly the same way
as in DML triggers that are written in Transact-SQL.
Similarly, within DDL triggers, access is provided to the XML EVENTDATA structure.
SqlTriggerContext
A DML trigger can be associated with multiple events on an object. Within the code of a DML trigger, you
may need to know which event has caused the trigger to fire. You can use the SqlTriggerContext class to
build logic based on the event that caused the trigger to fire.
User-Defined Aggregates
User-defined aggregates are an entirely new type of
object for SQL Server; you cannot create them in
Transact-SQL. The ability to create aggregates
enables you to provide additional aggregates that
the built-in set of aggregates does not provide. For
example, you might decide that you need a
MEDIAN aggregate, but SQL Server does not supply
one. Another good use case for creating aggregates
occurs when you are migrating code from another
database engine that offers aggregates that differ
from those that SQL Server provides. You could also
create aggregates to operate on data types that are
not supported by built-in aggregates, including user-defined data types.
CREATE AGGREGATE
You can use the CREATE AGGREGATE statement to catalog user-defined aggregates that are written in
managed code. The relevant deployment attribute is SqlUserDefinedAggregate. Note that the path to a
struct or class will be a two-part name, as shown in the EXTERNAL NAME clause on the slide.
Serialization
SQL Server needs to be able to store interim results while it calculates the value of an aggregate. In
managed code, the ability to save an object as a stream of data is called “serializing” the object. User-
defined aggregates need to be serializable. In managed code, you can implement them as either classes
or structs (data structures). Most user-defined aggregates would be implemented as structs rather than as
classes, because structs are easier to implement.
The property Format.Native that is shown in the example on the slide is indicating that the struct will be
serialized by using the standard serialization mechanisms that are built in to the .NET Framework. You can
only use the built-in serialization with simple data types. For more complex data types, you need to add
user-defined serialization.
Attribute Properties
A few more useful attribute properties are shown in the example on the slide.
 IsInvariantToDuplicates. This attribute property tells SQL Server that the result of the aggregate is
the same even if it does not see the values from every row. It only needs to see unique values. To
visualize this, consider which rows the built-in MAX or MIN aggregates need to process and how this
compares to which rows the built-in COUNT aggregate needs to see.
 IsInvariantToNulls. This attribute property tells SQL Server that the result of the aggregate is
unaffected by seeing rows that do not have a value in the relevant column.
 IsNullIfEmpty. This attribute property tells SQL Server that if no rows need to be processed, the
aggregate does not need to be called because the result will be NULL anyway.
 Name. This attribute property tells Visual Studio (or any other deployment tool) what name the
aggregate should have when it is cataloged.
Note: This is not a complete list of all the possible properties, just the most useful ones.

The ability to create user-defined data types in
managed code enables you to extend the data type
system that is available from within SQL Server.
There is no equivalent method of doing this in
Transact-SQL. User-defined data types enable you
to determine not only what data is stored, but also
how the data type behaves.
CREATE TYPE
You can use the CREATE TYPE statement to catalog
user-defined data types. The data type will be
defined as a class in a managed code assembly.
Similar to user-defined aggregates, data types need
to be serializable because SQL Server needs to be able to store them. The deployment attribute is
SqlUserDefinedType.
The geometry, geography, and hierarchyid system data types are, in fact, system CLR data types. Their
operation is unrelated to the ‘clr enabled’ configuration setting at the SQL Server instance level. The ‘clr
enabled’ option only applies to user-created managed code.
Accessing Properties and Methods

The properties of an instance of a managed code data type are accessed by using the code in the
following example:
Accessing Properties
InstanceOfTheType.Property, for example, @Shape.STArea
The methods of an instance of a managed code data type are accessed by using the code in the following
example:
Accessing Methods
InstanceOfTheType.Method(), for example, @Shape.STDistance(@OtherShape)
Managed code data types might also include functionality that is useful without creating an object of the
data type first. This enables you to expose functions from within a data type somewhat like a code library.
The methods of the managed code data type itself are accessed by using the code in the following
example:
Methods of the Managed Code DataType

TypeName::Method(), for example, GEOMETRY::STGeomFromText(‘POINT (12 15)’)
Considerations for User-Defined Data Types

User-defined data types are not directly
comparable, but their properties are.
Comparing Managed Code Data Types

In the built-in set of system data types, various
operations are defined for each data type. For
example, you can compare two integers or strings.
You cannot compare user-defined data types by

using operations such as > (greater than) or < (less
than). This means that you cannot sort them, use
them in a SELECT DISTINCT clause, or directly index
them.
Note: The one exception to this is that binary comparisons are permitted when the
IsByteOrdered property of the SqlUserDefinedDataType attribute is set to true. Even in this
situation, only a simple binary comparison is performed.
The individual properties of a data type are comparable.

For example, you cannot compare two geometry data types by using the code that is shown in the
example below:
Incorrect Geometry Comparison

IF (@Shape1 < @Shape2) BEGIN
However, you can compare the properties of the two data types by using the code that is shown in the
example below:
Correct Geometry Comparison

IF (@Shape1.X < @Shape2.X) BEGIN
Indexing User-Defined Data Type Properties

Although you cannot create indexes on user-defined data types, some system managed code data types
have special indexes. For example, you can index the geometry and geography data types by using a
special type of index that is known as a spatial index.
For user-defined data types, there is no method for creating new types of index to support them. What
you can do is create a persisted calculated column in the same table and use it to “promote” the
properties of the user-defined data type into standard relational columns. You can then index these
columns.
Operator Overloading
In object-oriented programming, it is possible to define or change the operators that operate on the
object. User-defined data types do not offer this capability. For example, you cannot define a customized
meaning for a > (greater than) operator.
Demonstration: Creating Aggregates and User-Defined Data Types

 Create aggregates and user-defined data types.
Demonstration Steps
Create aggregates and user-defined data types

6. If the previous demonstration was not performed, open the 21 - Demonstration 2A.sql script file
and execute steps 1 to 3.

Lab: Implementing Managed Code in SQL Server

Scenario
You are concerned that one of your company developers has decided to implement almost all of her logic
in Structured Query Language (SQL) CLR assemblies. You will determine if this is appropriate. Also in this
lab, you will implement and test a supplied .NET assembly. You will also investigate all installed assemblies
on the system.
The following list details the proposed functionality that is being considered for managed code.
Proposed SQL CLR functionality
Table-valued function that returns a list of files in a particular folder.
Function that formats phone numbers as strings.
Trigger that records balance movements that have a value of more than 1,000.
Stored procedure that writes an XML file for a given XML parameter.
Function that counts rows in a table.
A new Customer data type.
Objectives
1. Assess proposed CLR code.

2. Implement a CLR assembly.

Password: Pa$$w0rd
Exercise 1: Assess Proposed CLR Code

Scenario
You need to assess a list of proposed functions and determine which functions should or should not be
implemented by using SQL CLR logic.
1. Review the supporting documentation.
2. For each object that is listed, determine whether it is appropriate to implement it in managed code.
The following list details the proposed functionality being considered for managed code.
Proposed SQLCLR Functionality
Table-valued function that returns a list of files in a particular folder.
Function that formats phone numbers as strings.
Trigger that records balance movements with a value of more than 1000.
Stored procedure that writes an XML file for a given XML parameter.
Function that counts rows in a table.
A new Customer data type.
2. Review the Supporting Documentation
3. Determine Whether to Implement Objects in Managed Code


1. Review the proposed list of managed code objects in the supporting documentation in the scenario.
 Task 3: Determine Whether to Implement Objects in Managed Code

1. Work through the list of proposed objects and for each object, decide whether it should or should
not be implemented in managed code and why.
Results: After this exercise, you should have created a list of which objects should be implemented in
managed code and the reasons for your decision.
Exercise 2: Implement a CLR Assembly

Scenario
You have been provided with an existing .NET assembly. You will implement it within SQL Server.
1. Ensure that the database is configured appropriately to support an EXTERNAL_ACCESS assembly.
2. Catalog the assembly and the functions that are contained within it.
3. Test the functions that are contained within the assembly.
1. Ensure That the Database is Configured Appropriately
2. Catalog the Assembly and Its Functions

3. Test the Functions in the Assembly
 Task 1: Ensure That the Database is Configured Appropriately

1. Ensure that SQL CLR integration is enabled for the SQL Server instance.
2. Flag the AdventureWorks database as trustworthy.
 Task 2: Catalog the Assembly and Its Functions

1. Use CREATE ASSEMBLY to catalog the supplied sample assembly as alias SQLCLRDemo. The path to
the assembly is D:\Labfiles\Lab12\Starter\SQLCLRDemo.DLL.
2. Query the sys.assemblies and sys.assembly_files system views to confirm the details of how the
assembly has been cataloged.
3. Use the CREATE FUNCTION statement to catalog the dbo.IsValidEmailAddress function. It takes a
parameter named @email of type NVARCHAR(4000) and returns one bit. It is found in the assembly
at SQLCLRDemo.[SQLCLRDemo.CLRDemoClass].IsValidEmailAddress.
4. Use the CREATE FUNCTION statement to catalog the dbo.FormatAustralianPhoneNumber

function. It takes a single parameter called @PhoneNumber of type NVARCHAR(4000). It returns
NVARCHAR(4000). It is found in the assembly at
SQLCLRDemo.[SQLCLRDemo.CLRDemoClass].FormatAustralianPhoneNumber.
5. Use the CREATE FUNCTION statement to catalog the dbo.FolderList function. It takes two
parameters: @RequiredPath of type NVARCHAR(4000) and @FileMask of type NVARCHAR(4000).
It returns a table of file names, with one column called FileName of type NVARCHAR(4000). It is
found in the assembly at SQLCLRDemo.[SQLCLRDemo.CLRDemoClass].FolderList.
 Task 3: Test the Functions in the Assembly

1. Execute the following Transact-SQL statements to test the functions that have been cataloged.
SELECT dbo.IsValidEmailAddress('test@somewhere.com');
GO
SELECT dbo.IsValidEmailAddress('test.somewhere.com');
GO
SELECT dbo.FormatAustralianPhoneNumber('0419201410');
GO
SELECT dbo.FormatAustralianPhoneNumber('9 87 2 41 23');
GO
GO
SELECT * FROM dbo.FolderList(
'D:\Labfiles\Lab12\Starter','*.txt');
GO
Results: After this exercise, you should have three functions working as expected.
Question: Suggest some possible uses for user-defined data types.


Best Practice: When you are deciding between using Transact-SQL and managed code,
the biggest mistake is to assume that either of them is the correct answer for every situation.
Each has its benefits and limitations and should be used for the appropriate tasks.
Developers should avoid using SQL CLR to implement code that would be better placed on
another application tier (such as on a client system).
Database administrators should avoid refusing to allow SQL CLR code without consideration. As
you have seen in this module, there is code that should be implemented in managed code rather
than in Transact-SQL.
Database administrators should set boundaries for developers:
 No row-based code that should be set-based Transact-SQL operations.
 Limited use of EXTERNAL_ACCESS permissions and only after justification.

 Rare use of UNSAFE permissions and only after very serious justifications and testing.
Review Question(s)
Question: Which types of database objects can you implement by using managed code?
Question: What purpose do attributes have in CLR managed code?

13-1
Module 13
Storing and Querying XML Data in SQL Server
Contents:
Lesson 1: Introduction to XML and XML Schemas 13-2
Lesson 2: Storing XML Data and XML Schemas in SQL Server 13-9
Lesson 3: Implementing XML Indexes 13-15
Lesson 4: Using the Transact-SQL FOR XML Statement 13-18
Lesson 5: Getting Started with XQuery 13-27
Lesson 6: Shredding XML 13-33

Lab: Storing and Querying XML Data in SQL Server 13-38
Module Overview
XML provides rules for encoding documents in a machine-readable form. It has become a widely adopted
standard for representing data structures rather than sending unstructured documents. Servers that are
running Microsoft® SQL Server® data management software often need to use XML to interchange data
with other systems and many SQL Server tools provide an XML-based interface.
SQL Server offers extensive handling of XML both for storage and for querying. This module introduces
XML, shows how it is possible to store XML data within SQL Server, and shows how to query the XML data.
The ability to query XML data directly avoids the need to shred it to a relational format before executing
Structured Query Language (SQL) queries. To effectively process XML, you need to be able to query XML
data in several ways: returning existing relational data as XML, querying data that is already XML, and
shredding XML data into a relational format.
Objectives
 Describe XML and XML schemas.
 Store XML data and associated XML schemas in SQL Server.

 Implement XML indexes within SQL Server.
 Use the Transact-SQL FOR XML statement.
 Work with basic XQuery queries.
 Shred XML to a relational form.

13-2 Storing and Querying XML Data in SQL Server
Lesson 1
Introduction to XML and XML Schemas
Before you discover how to work with XML in SQL Server, it is important to understand XML itself and
how it is used outside SQL Server. You need to understand some core XML-related terminology, along
with how you can use schemas to validate and enforce the structure of XML. One common problem with
using XML in SQL Server is a tendency to overuse it. It is important to understand the appropriate uses for
XML when you are working with SQL Server.
Lesson Objectives
 Explain core XML concepts.
 Explain the difference between documents and fragments.
 Describe the role of XML namespaces.

 Describe the role of XML schemas.
 Determine appropriate use cases for XML data storage in SQL Server.
Core XML Concepts

XML is a plain-text, Unicode-based metalanguage
(a language used to describe language). You can
use it to hold both structured and semistructured
data. It is not tied to any particular vendor,
language, or operating system. It provides access to
a wide range of technologies for manipulating,
structuring, transforming, and querying data.
Data Interchange
XML came to prominence as a format for
interchanging data between systems. It follows the
same basic structure rules as other markup
languages (such as HTML) and is used as a self-
describing language.
Consider the following XML document:
XML Document
<?xml version="1.0" encoding="iso-8859-1" ?>
<?xml-stylesheet href="orders.xsl"?>
<order id="ord123456">
<customer id="cust0921">
<first-name>Dare</first-name>
<last-name>Obasanjo</last-name>
<address>
<street>One Microsoft Way</street>
<city>Redmond</city>
<state>WA</state>
<zip>98052</zip>
</address>
</customer>
</order>
Without any context and information, you can determine that this document holds the details about an
order, the customer who placed the order, and the customer’s name and address details. This explains
why XML is defined as a self-describing language. In formal terminology, this is described as “deriving a
schema” from a document.
XML Specifics
The line in the example document that starts with “?xml” is referred to as a processing instruction. These
instructions are not a part of the data, but determine the details of encoding. The first instruction in the
example shows that version “1.0” of the XML specification is being used along with a specific encoding of
“iso-8859-1.” The second instruction indicates the use of the extensible style sheet “orders.xsl” to format
the document for display, if displaying the document is necessary.
The third line of the example is the “order” element. Note that the document data starts with an opening
order element and finishes with a closing order element shown as “</order>.“ The order element also has
an associated attribute named “id.”
It is important to realize that elements in XML (as in most other markup languages) are case-sensitive.
Element-Centric vs. Attribute-Centric XML

There are two basic ways to encode data in XML.
The following example shows element-centric XML.
Element-Centric XML
<Customer>
<Name>Tailspin Toys</Name>
<Rating>12</ Rating >
</Customer>
The following example shows the equivalent data in attribute-centric XML.
Attribute-Centric XML
<Customer Name="Tailspin Toys" Rating="12">
</Customer>
Note that if all data for an element is contained in attributes, a shortcut form of element is available.
As an example, the two XML elements below are equivalent.
Attribute-Centric Shortcut
<Customer Name="Tailspin Toys" Rating="12"></Customer>
<Customer Name="Tailspin Toys" Rating="12" />
Documents vs. Fragments

Well-formed XML has only one top-level element
and element tags are correctly nested within each
other. Text that has multiple top-level elements is
considered a fragment, not a document.
Documents vs. Fragments

Consider the following XML document.
XML Document
<customer id="cust0921" />
</order>
This code provides the details for a single order and would be considered to be an XML document.
Now consider the following XML code.
XML Fragment
</order>
</order>
This text contains the details of multiple orders. Although it is perfectly reasonable XML, it is considered to
be a fragment of XML rather than a document.
To be called a document, the XML needs to have a single root element, as shown in the following
example.
Single Root
<orders>
</order>
</order>
</orders>
XML Namespaces
An XML namespace is a collection of names that
you can use as element or attribute names. It is
used to avoid conflicts with other names. Imagine
an XML instance that contains references to both a
product and an order. Both of these elements could
have a child element called id, so any reference to
the id element could easily be ambiguous.
Namespaces are used to remove that ambiguity.
An XML namespace is defined by using the special

attribute xmlns and the value of the attribute must
be a valid Universal Resource Identifier (URI).
The following code provides an example of an XML

namespace attribute.
XML Namespace
xmlns="http://schemas.microsoft.com/sqlserver/profiles/gml"
Note that specifying an address in a namespace does not necessarily mean that you could use the URI
that is provided to retrieve the details in any particular format. Many URIs that are used in namespaces
only link to an address where a human-readable description of the namespace is found. Many other URIs
do not lead to any real resources at all. The URI is simply used as a unique identifier for the namespace to
reduce the possibility of duplicate entries.
Prefixes
When you are declaring a namespace, an alias for the namespace is assigned. In XML terminology, this
alias is called a “prefix” because of the way it is used within the remainder of the XML.
You can see this in the snippet below.
XML Prefix
xmlns="urn:AW_NS" xmlns:o="urn:AW_OrderNS"
Two namespaces have been declared. The second namespace has been assigned the prefix o.
The prefix is then used later to identify which namespace any element name is part of, as shown below.
Using Prefixes
<o:Order SalesOrderID="43860" Status="5"
OrderDate="2001-08-01T00:00:00">
<o:OrderDetail ProductID="761" Quantity="2"/>
<o:OrderDetail ProductID="770" Quantity="1"/>
</o:Order>
In this snippet, the Order and OrderDetail elements are identified as being part of the urn:AW_OrderNS
namespace by being prefixed by o.
XML Schemas
XML schemas are used to provide rules that
determine the specific elements, attributes, and
layout that should be permitted within an XML
document.
The World Wide Web Consortium (W3C) defined

XML schemas as a more capable replacement for
earlier objects that were called document type
definitions (DTDs).
An XML schema defines:

 Elements that can or must appear in a
document.
 Attributes that can or must appear in a document.
 Which elements are child elements.
 The order of child elements.
 The number of child elements.

 Whether an element is empty or can include text.
 Data types for elements and attributes.
 Default and fixed values for elements and attributes.
XML schemas are often referred to as XML Schema Definitions (XSDs). XSD is also the default file
extension that most products use when they are storing XML schemas in operating system files.
Appropriate Usage of XML Data Storage in SQL Server

Given how widely XML has come to be used in
application development in higher application tiers,
there is a tendency to overuse XML within the
database. It is important to consider when it is and
is not appropriate to use XML within SQL Server.
XML vs. Objects

Higher-level programming languages that are used
for constructing application programs often
represent entities such as customers and orders as
objects. Many developers see SQL Server as a
simple repository for objects, that is, an object-
persistence layer.
Consider the following table definition.
Table with XML

CREATE TABLE dbo.Object
( ObjectID uniqueidentifier PRIMARY KEY,
PersistedData xml
);
There is no suggestion that this would make for a good database design, but note that you could use this
table design to store all objects from an application—customers, orders, payments, and so on—in a single
table. Compare this to how tables have been traditionally designed in relational databases.
SQL Server gives the developer a wide range of choices, from a simple XML design at one end of the
spectrum to fully normalized relational tables at the other end. It is important to understand that there is
no generic right and wrong answer for where a table should be designed in this range of options.
Appropriate Use Cases

There are several reasons for storing XML data within SQL Server:
 You may be dealing with data that is already in XML, such as an order that you are receiving
electronically from a customer. You may want to share, query, and modify the XML data in an
efficient and transacted way.
 You may need to achieve a level of interoperability between your relational and XML data. Imagine
the need to join a customer table with a list of customer IDs that are being sent to you as XML.
 You may need to use XML formats to achieve cross-domain applications and need to have maximum
portability for your data. Other systems that you are communicating with may be based on entirely
different technologies and may not represent data in the same way as your server.
 You may not know the structure of your data in advance. It is common to have a mixture of
structured and semistructured data. A table might hold some standard relational columns, but also
hold some less structured data in XML columns.
 You may have very sparse data. Imagine a table that has thousands of columns where only a few
columns or rows ever tend to have any data in them. (Sparse column support in SQL Server provides
another mechanism for dealing with this situation, but it also uses XML in the form of XML column
sets. Sparse columns are an advanced topic that is beyond the scope of this course.)
 You may need to have order within your data. For example, you might need to retain order detail
lines in a specific order. Relational tables and views have no implicit order. XML documents can
exhibit a predictable order.
 You may want to have SQL Server validate that your XML data meets a particular XML schema before
processing it.
 You may want to store transferred XML data for historical reasons.
 You may want to create indexes on your XML data to make it faster to query.
Demonstration: Using XML and XML Schemas

 Structure XML and structure XML schemas.
Demonstration Steps
Structure XML and structure XML schemas
Lesson 2
Storing XML Data and XML Schemas in SQL Server
Now that you have learned about XML, schemas, and the surrounding terminology, you can consider how
to store XML data and schemas within SQL Server. This is the first step in learning how to process XML
effectively within SQL Server.
You need to see how the XML data type is used, how to define schema collections that contain XML
schemas, how to declare both typed and untyped variables and database columns, and how to specify
how well-formed the XML data needs to be before it can be stored.
Lesson Objectives
 Use the XML data type.
 Create XML schema collections.

 Declare variables and database columns as either untyped XML or typed XML.
 Choose whether XML fragments can be stored rather than entire XML documents.
XML Data Type

SQL Server 2005 introduced a native data type for
storing XML data in SQL Server. You can use it for
variables, parameters, and columns in databases.
SQL Server also exposes several methods that you
can use for querying or modifying the XML data.
xml is a built-in data type for SQL Server. It is an
intrinsic data type, which means that it is not
implemented separately through managed code.
The xml data type is limited to a maximum size of 2
GB. You can declare variables, parameters, and
database columns by using the xml data type.
You can see a variable that has been declared by

using the xml data type in the following code example.
XML Variable
DECLARE @Settings xml;
After you have declared a variable that has the xml data type, you can store any well-formed XML in it by
default.
Look at the assignments in the following code block.
Well-Formed XML
SET @Settings = '<Customer Name="Terry"></Customer>";
SET @Settings = '<Customer Name="Terry"><Customer>';
The first assignment would be successful and the second assignment would fail because the value that is
being assigned there is not well-formed XML.
Canonical Form
SQL Server stores XML data in an internal format that makes it easier for it to process the XML data when
required. It does not store the XML in the same format (including white space) as the data was received in.
For example, look at the following code block.
Canonical Form
DECLARE @Settings xml;
SET @Settings = N'<Customer Name="Terry"></Customer>';
SELECT @Settings;
When this code is executed, the result that is returned is as follows.
(No column name)
1 <Customer Name=”Terry”/>
Note that the output that is returned is logically equivalent to the input, but the output is not in exactly
the same format as the input. It is referred to as having been returned in a “canonical” or logically
equivalent form.
XML Schema Collections

Although the xml data type will only store well-
formed XML, it is possible to further constrain the
stored values by associating the data type with an
XML schema collection.
In the first lesson, you learned how you can use

XML schemas to constrain what you can store in an
XML document. SQL Server does not store XML
schemas as database objects. SQL Server has an
XML SCHEMA COLLECTION object that holds a
collection of XML schemas.
When you associate an XML SCHEMA

COLLECTION object with an XML variable,
parameter, or database column, the XML to be stored in that location needs to conform to at least one of
the schemas that is contained in the schema collection.
XML Schemas
XML schemas are legible to humans at some level, but they are designed to be processed by computer
systems. Even simple schemas tend to have quite a high level of complexity. Fortunately, you do not need
to be able to read (or worse, write!) such schemas. Tools and utilities generally create XML schemas, and
SQL Server can create them, too. You will see an example of this in a later lesson.
For example, look at the following XML schema.
XML Schema
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1"
xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sqltypes=
"http://schemas.microsoft.com/sqlserver/2004/sqltypes"
elementFormDefault="qualified">
<xsd:import namespace=
"http://schemas.microsoft.com/sqlserver/2004/sqltypes"
schemaLocation="http://schemas.microsoft.com/
sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:element name="Production.Product">
<xsd:complexType>
<xsd:attribute name="ProductID" type="sqltypes:int"
use="required" />
<xsd:attribute name="Name" use="required">
<xsd:simpleType sqltypes:sqlTypeAlias=
"[AdventureWorks].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar"
sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase IgnoreKanaType IgnoreWidth">
<xsd:maxLength value="50" />
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="Size">
<xsd:simpleType>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="Color">
<xsd:simpleType>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Creating an XML Schema Collection

An XML schema collection holds one or more schemas. The data that is being validated must match at
least one of the schemas within the collection.
You create an XML schema collection by using the CREATE XML SCHEMA COLLECTION syntax that is
shown in the following code snippet.
CREATE XML SCHEMA COLLECTION

CREATE XML SCHEMA COLLECTION SettingsSchemaCollection
AS
N'<?xml version="1.0" ?>
<xsd:schema
...
</xsd:schema>';
Altering Schema Collections

You can only modify a schema collection in a limited way in Transact-SQL. You can add new schema
components to an existing schema collection by using the Transact-SQL ALTER SCHEMA COLLECTION
statement.
System Views
You can see the details of the existing XML schema collections by querying the
sys.xml_schema_collections system view. You can see the details of the namespaces that are referenced by
XML schema collections by querying the sys.xml_schema_namespaces system view. Like XML, XML schema
collections are not stored in the format that you use to enter them. They are stripped into an internal
format.
You can get an idea of how XML schema collections are stored by querying the
sys.xml_schema_components system view, as shown in the following code example.
Storage of XML Schema Collections

SELECT cp.*
FROM sys.xml_schema_components cp
JOIN sys.xml_schema_collections c
ON cp.xml_collection_id = c.xml_collection_id
WHERE c.name = 'SettingsSchemaCollection';
Untyped XML vs. Typed XML

When you are storing XML data, you can choose to
enable any XML to be stored or you can choose to
constrain the available values by associating the
XML location with an XML schema collection.
Untyped XML
You may choose to store any well-formed XML.
One reason is that you might not have a schema for
the XML data. Another reason is that you might
want to avoid the processing overhead that is
involved in validating the XML against the XML
schema collection. For complex schemas, validating
the XML can involve substantial work.
The following example shows the creation of a table that has an untyped XML column.
Untyped XML
CREATE TABLE App.Settings
( SessionID int PRIMARY KEY,
WindowSettings xml
);
You can store any well-formed XML in the WindowSettings column, up to the maximum size of a SQL
Server XML object, which is currently 2 GB.
Typed XML
You may want to have SQL Server validate your data against a schema. You might want to take advantage
of storage and query optimizations based on the type information or want to take advantage of this type
information during the compilation of your queries.
The following example shows the same table being created, but this time, it has a typed XML column.
Typed XML
WindowSettings xml (SettingsSchemaCollection)
);
In this case, a schema collection that is called SettingsSchemaCollection has been defined. SQL Server will
not enable data to be stored in the WindowSettings column if it does not meet the requirements of at
least one of the XML schemas in SettingsSchemaCollection.
Specifying Storage of Fragments or Documents

While you are specifying typed XML, you can also
specify whether it is necessary to provide entire
XML documents or whether it is possible to store
XML fragments.
Using the CONTENT Keyword

In the example in the last topic, a table was defined
by using the following code.
Using Typed XML

WindowSettings xml
(SettingsSchemaCollection)
);
This is equivalent to defining the table by using the following code because the CONTENT keyword is the
default value for typed XML declarations.
Using the CONTENT Keyword

WindowSettings xml (CONTENT SettingsSchemaCollection)
);
Note the addition of the CONTENT keyword. When CONTENT is specified, you can store XML fragments
and entire well-formed XML documents in the typed XML location.
Using the DOCUMENT Keyword

The alternative to the default value of CONTENT is to specify the keyword DOCUMENT.
The DOCUMENT keyword is shown in the following code:
DOCUMENT Keyword
WindowSettings xml (DOCUMENT SettingsSchemaCollection)
);
In this case, XML fragments would not be able to be stored in the WindowSettings column. Only well-
formed XML documents could be stored. For example, a column that is intended to store a customer
order can then be presumed to actually hold a customer order and not some other type of XML
document.
Demonstration: Working with Typed vs. Untyped XML

 Work with typed and untyped XML.
Demonstration Steps
Work with typed and untyped XML
2. Ensure that you have run all previous demonstrations

Lesson 3
Implementing XML Indexes
Indexes on XML columns are critical for achieving the high performance of XML-based queries. There are
four types of XML index: a primary index and three types of secondary index. It is important to know how
you can use each of them to achieve the maximum performance gain for your queries.
Lesson Objectives
 Describe the need for XML indexes.
 Explain how to use each of the four types of XML index.
What Are XML Indexes?

Indexes are used in SQL Server to improve the
performance of queries. XML indexes are used to
improve the performance of XQuery-based queries.
(XQuery will be discussed in a later lesson in this
module.)
Many systems query XML data directly as text. This

can be very slow, particularly if the XML data is
large. You saw earlier how XML data is not directly
stored in a text format in SQL Server. For ease of
querying, it is broken into a form of object tree that
makes it easier to navigate in memory.
Rather than having to create these object trees as

required for queries, which is also a relatively slow process, it is possible to define XML indexes. An XML
index is rather like a copy of an XML object tree that is saved into the database for rapid reuse.
It is important to note that XML indexes can be quite large compared to the underlying XML data.
Relational indexes are often much smaller than the tables on which they are built, but it is not uncommon
to see XML indexes that are larger than the underlying data.
You should also consider alternatives to XML indexes. Promoting a value that is stored within the XML to
a persisted calculated column would make it possible to use a standard relational index to quickly locate
the value.
Types of XML Indexes

SQL Server supports four types of XML index: a
primary XML index and up to three secondary XML
types.
Primary XML Index

The primary XML index basically provides a
persisted object tree in an internal format. The tree
has been formed from the structure of the XML, is
used to speed up access to elements and attributes
within the XML, and avoids the need to read the
entire XML document for every query. Before you
can create a primary XML index on a table, the
table must have a clustered primary key.
Based on the App.Settings table that was used as an example earlier, you could create a primary XML
index by executing the following code.
Primary XML Index

CREATE PRIMARY XML INDEX PXML_Settings_WindowSettings
ON App.Settings (WindowSettings);
Secondary XML Indexes

Most of the querying benefit comes from primary XML indexes, but SQL Server also enables the creation
of three types of secondary XML index. These secondary indexes are each designed to speed up a
particular type of query. There are three forms of query that they help with: PATH, VALUE, and PROPERTY:
 A PATH index helps to decide whether a particular path to an element or attribute is valid. It is
typically used with the exist() XQuery method. (XQuery is discussed in a later lesson in this module.)
 A VALUE index helps to obtain the value of an element or attribute.
 A PROPERTY index is used when retrieving multiple values through PATH expressions.
You can only create a secondary XML index after a primary XML index has been established.
When you are creating the secondary XML index, you need to reference the primary XML index.
Secondary XML Index

CREATE XML INDEX IXML_Settings_WindowSettings_Path
ON App.Settings (WindowSettings)
USING XML INDEX PXML_Settings_WindowSettings FOR PATH;
XML Tooling Support

Note that you can create primary XML indexes in SQL Server Management Studio or by using Transact-
SQL commands. In SQL Server 2014, you can also create secondary XML indexes by using SQL Server
Management Studio, but in earlier versions, you could only create them by using Transact-SQL
commands.
Demonstration: Implementing XML Indexes

 Implement XML indexes.
Demonstration Steps
Implement XML indexes
2. Ensure that you have run all previous demonstrations.
Lesson 4
Using the Transact-SQL FOR XML Statement
There is a common requirement to return data that is stored in relational database columns as XML
documents. Typically, this requirement relates to the need to exchange data with other systems, including
those from other organizations. When you add the FOR XML clause to a Transact-SQL SELECT statement,
it causes the output to be returned as XML instead of as a relational rowset. SQL Server provides several
modes for the FOR XML clause to enable the production of many styles of XML document.
Lesson Objectives
 Explain the role of the FOR XML clause.
 Use RAW mode queries.
 Use AUTO mode queries.

 Use EXPLICIT mode queries.
 Use PATH mode queries.
 Retrieve nested XML.
Introducing the FOR XML clause

You can use the FOR XML clause to extend the
Transact-SQL SELECT statement syntax. It causes
the statement to return XML instead of rows and
columns. You can configure it to return the
attributes, elements, and/or schema that are
required for client applications.
The FOR XML clause works in one of four modes:

1. RAW mode generates a single <row> element
per row in the rowset that the SELECT
statement returns.
2. AUTO mode generates nesting in the resulting

XML based on the way in which the SELECT statement is specified. You have minimal control over the
shape of the XML that is generated. If you need to produce nested XML, AUTO mode is a better
choice than RAW mode.
3. EXPLICIT mode enables you to have more control over the shape of the XML. You can use it when
other modes do not provide enough flexibility, but this is at the cost of greater complexity. You can
mix attributes and elements as you like in deciding the shape of the XML.
4. PATH mode together with the nested FOR XML query capability provides much of the flexibility of
the EXPLICIT mode in a simpler manner.
Using RAW Mode Queries

RAW mode is the simplest mode to work with in the
FOR XML clause. It returns a simple XML
representation of the rowset and can optionally
specify a row element name and a root element.
Look at this simple Transact-SQL query.
Simple Transact-SQL Query

SELECT FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName;
When this query is executed, it returns the results in

the following table.
FirstName LastName PersonType
1 A. Leonetti SC
2 A. Wright GC
3 A. Scott Wright EM
4 Aaron Adams IN
5 Aaron Alexander IN
6 Aaron Allen IN
Now look at the modified statement after adding the FOR XML clause.
Using RAW Mode in the FOR XML Clause

FROM Person.Person
ORDER BY FirstName, LastName
FOR XML RAW;
When this query is executed, it returns the following information:
<row FirstName=”A.” LastName=”Leonetti” PersonType=”SC” />
<row FirstName=”A.” LastName=”Wright” PersonType=”GC” />
<row FirstName=”A. Scott” LastName=”Wright” PersonType=”EM” />
<row FirstName=”Aaron” LastName=”Adams” PersonType=”IN” />
<row FirstName=”Aaron” LastName=”Alexander” PersonType=”IN” />
<row FirstName=”Aaron” LastName=”Allen” PersonType=”IN” />
Note that one XML element is returned for each row from the rowset, the element has a generic name of
row, and all columns are returned as attributes. The returned order is based on the ORDER BY clause.
In the example on the slide, you can see how to override the generic element name. In that example, the
elements have been named Order instead.
In addition, notice that the results have been returned as an XML fragment rather than as an XML
document. This is because there is no root element. Also, in the example on the slide, you can see how to
automatically add a root element called Orders.
Element-Centric XML
You will notice that in the previous examples that the columns from the rowset have been returned as
attributes. This is referred to as “attribute-centric” XML. You can modify this behavior to produce
“element-centric” XML by adding the ELEMENTS keyword to the FOR XML clause.
You can see this in the following query.
Element-Centric XML
FROM Person.Person
FOR XML RAW, ELEMENTS;
When this query is executed, it returns the following output.
<row>
<FirstName>A. </FirstName>
<LastName>Leonetti </LastName>
<PersonType>SC</PersonType>
</row>
<FirstName>A. </FirstName>
<LastName>Wright</LastName>
<PersonType>GC</PersonType>
</row>
Note that each column has been returned as a subelement of the row element.
Using AUTO Mode Queries

Using AUTO mode, by default, each row in the
result set is represented as an XML element that is
named after the table (or alias) from which it was
selected. AUTO mode generates nesting in the
resulting XML based on the way in which the
SELECT statement is specified. You have minimal
control over the shape of the XML that is
generated. AUTO mode queries are more capable
of dealing with nested XML.
FOR XML AUTO

AUTO mode queries are useful if you want to
generate simple hierarchies, but they provide
limited control of the resultant XML. If you need more control over the resultant XML than AUTO mode
queries provide, you will need to consider using the PATH or EXPLICIT modes instead.
Look at the following query (which is a modified version of the query that you saw in the last topic).
Using AUTO Mode in the FOR XML Clause

FROM Person.Person
FOR XML AUTO;
Each table in the FROM clause, from which at least one column is listed in the SELECT clause, is
represented as an XML element. The columns that are listed in the SELECT clause are mapped to attributes
or subelements, if the optional ELEMENTS option is specified in the FOR XML clause. You can see the
output of this query below:
<Person.Person FirstName=”A.” LastName=”Leonetti” PersonType=”SC” />
<Person.Person FirstName=”A.” LastName=”Wright” PersonType=”GC” />
<Person.Person FirstName=”A. Scott” LastName=”Wright” PersonType=”EM” />
<Person.Person FirstName=”Aaron” LastName=”Adams” PersonType=”IN” />
<Person.Person FirstName=”Aaron” LastName=”Alexander” PersonType=”IN” />
<Person.Person FirstName=”Aaron” LastName=”Allen” PersonType=”IN” />

Note how the name of the table is directly used as the element name.
For this reason, it is common to provide an alias for the table, as shown in the following code.
Using a Table Alias

FROM Person.Person AS Person
FOR XML AUTO;
When this query is executed, it provides the following output:
<Person FirstName=”A.” LastName=”Leonetti” PersonType=”SC” />
<Person FirstName=”A.” LastName=”Wright” PersonType=”GC” />
<Person FirstName=”A. Scott” LastName=”Wright” PersonType=”EM” />
<Person FirstName=”Aaron” LastName=”Adams” PersonType=”IN” />
<Person FirstName=”Aaron” LastName=”Alexander” PersonType=”IN” />
<Person FirstName=”Aaron” LastName=”Allen” PersonType=”IN” />
Note that in the example on the slide, the nesting of the resultant XML is based upon the ORDER BY
clause, and not on any form of grouping statement.
NULL Columns
Look at the following query.
NULL Results
SELECT ProductID, Name, Color
FROM Production.Product AS Product
ORDER BY ProductID
FOR XML AUTO;
When this query is executed, it produces the following output:
<Product ProductID=”1” Name=”Adjustable Race” />

<Product ProductlD=”2” Name=”Bearing Ball” />
<Product ProductlD=”3” Name=”BB Ball Bearing” />
<Product ProductID=”4” Name=”Headset Ball Bearings” />
<Product ProductlD=”316” Name=”Blade” />
<Product ProductlD=”317” Name=”LL Crankarm” Color=”Black” />
<Product ProductlD=”318” Name=”ML Crankarm” Color=”Black” />
Note that several products do not have any color. In the resultant XML, NULL values are not returned as
zero-length strings; they are omitted from the results by default. Although this is appropriate in general, it
can cause a specific problem when you are deriving an XML schema from an XML document. For
example, if someone sent you an XML document that had product details, if none of the products
happened to have a color, you would assume that there was no color column.
XSINIL
To assist in situations where a schema needs to be derived from a document that contains nullable
columns, SQL Server provides an additional option called XSINIL. This option adds an element to the
output to indicate that an element exists, but that it is currently NULL.
Look at the following code.
XSINIL
SELECT ProductID, Name, Color
FROM Production.Product AS Product
ORDER BY ProductID
FOR XML AUTO, ELEMENTS XSINIL;
When this query is executed, it returns the following results.
<Product xm1ns:xsi="http://www.w3.org/2001/XMLschema-instance">
<ProductID>533</ProductlD>
<Name>Seat Tube</Name>
<Color xsi:nil=”true" />
</Product>
<Name>Top Tube</Name>
<color xsi:nil=”true” />
</Product>
<Name>Tension Pulley</Name>
<Color xsi:nil=”true" />
</Product>
<Name>Rear Derailleur Cage</Name>
<color>Silver</color>
</Product>
<Name>HL Road Frame - Black, 58</Name>
<Color>Black</Color>
</Product>
<ProductID>706</ProductID>
<Name>HL Road Frame - Red, 58</Name>
<Color>Red</Color>
</Product>
Note the difference between the rows that have no color and the rows that do have a color. You can also
use XSINIL in other modes such as PATH and RAW.
Using EXPLICIT Mode Queries

EXPLICIT mode gives you the greatest control over
the resulting XML, but at the price of query
complexity. Many common queries that required
EXPLICIT mode in SQL Server 2000 can be
implemented by using PATH mode. (PATH mode
was introduced in SQL Server 2005 and will be
explained in the next topic.)
FOR XML EXPLICIT

EXPLICIT mode queries define XML fragments as a
universal table, which consists of a column for each
piece of data that you require, and two additional
columns. The additional columns are used to define
the metadata for the XML fragment. The Tag column uniquely identifies the XML tag that will be used to
represent each row in the results, and the Parent column is used to control the nesting of elements. Each
row of data in the universal table represents an element in the resulting XML document.
The power of EXPLICIT mode is to mix attributes and elements at will, create wrappers and nested
complex properties, create space-separated values (for example, the OrderID attribute may have a list of
order ID values), and create mixed contents.
PATH mode, together with the nesting of FOR XML queries and the TYPE clause, gives enough power to
replace most of the EXPLICIT mode queries in a simpler, more maintainable way. EXPLICIT mode is rarely
needed now and is complicated to write queries for.
Using PATH Mode Queries

PATH mode provides a simpler way to mix elements
and attributes. You can use it in many situations as
an easier way to write query than using EXPLICIT
mode.
FOR XML PATH

PATH mode is a simpler way to introduce additional
nesting for representing complex properties. In
PATH mode, column names or column aliases are
treated as XML Path Language (XPath) expressions.
(More detail on XPath will be provided later in this
module.) These expressions indicate how the values
are being mapped to XML. Each XPath expression is
a relative XPath that provides the item type, such as the attribute, element, and scalar value, and the name
and hierarchy of the node that will be generated relative to the row element.
The slide provides an example of an XML PATH query. Note that the path to c.ContactID is shown as
@EmpID. Values that start with an at sign (@) in XPath refer to attributes. You can see in the output that
the c.ContactID value has been returned as the EmpID attribute of the row element.
The next two columns that are listed in the example on the slide detail the path to the values. For
example, the c.FirstName path is shown as EmpName/First. This indicates that the c.FirstName value
should be generated as an element named First that is a subelement of an element named EmpName,
which is itself returned as a subelement of the row element.
You can use FOR XML EXPLICIT mode queries to construct such XML from a rowset, but PATH mode
provides a simpler alternative to the potentially time-consuming EXPLICIT mode queries.
PATH mode, together with the ability to write nested FOR XML queries and the TYPE directive to return
xml data type instances, enables you to write less complex queries and gives enough power to replace
most of the EXPLICIT mode queries in a simpler, more maintainable way.
Retrieving Nested XML

You can use the TYPE keyword to return FOR XML
subqueries as xml data types rather than as
nvarchar data types.
TYPE Keyword
In the previous topics in this lesson, you have seen
how FOR XML AUTO queries can return attribute-
centric or element-centric XML. If this data is
returned from a subquery, it needs to be returned
as a specific data type.
The FOR XML clause was introduced in SQL Server

2000. That version of SQL Server did not have an
xml data type. For that reason, subqueries that had FOR XML clauses had no way to return the xml data
type. FOR XML subqueries in SQL Server 2000 returned the nvarchar data type instead.
SQL Server 2005 introduced the xml data type, but for backward compatibility, the data type for return
values from FOR XML subqueries was not changed to xml. However, a new keyword, TYPE, was
introduced that changes the return data type of FOR XML subqueries to xml.
For example, look at the following query.
XML Subquery
SELECT Customer.CustomerID, Customer.TerritoryID,
(SELECT SalesOrderID, [Status]
FROM Sales.SalesOrderHeader AS soh
WHERE Customer.CustomerID = soh.CustomerID
FOR XML AUTO) as Orders
FROM Sales.Customer as Customer
WHERE EXISTS(SELECT 1 FROM Sales.SalesOrderHeader AS soh
WHERE soh.CustomerID = Customer.CustomerID)
ORDER BY Customer.CustomerID;
The previous query will return the Orders subquery as a varchar column rather than hyperlinked XML.
Now look at the following modified query.
TYPE Keyword
SELECT Customer.CustomerID, Customer.TerritoryID,
(SELECT SalesOrderID, [Status]
FROM Sales.SalesOrderHeader AS soh
WHERE Customer.CustomerID = soh.CustomerID
FOR XML AUTO, TYPE) as Orders
FROM Sales.Customer as Customer
WHERE EXISTS(SELECT 1 FROM Sales.SalesOrderHeader AS soh
WHERE soh.CustomerID = Customer.CustomerID)
ORDER BY Customer.CustomerID;
The addition of the TYPE keyword means that the second query will return the subquery data with an xml
data type that will be hyperlinked in SQL Server Management Studio.
Demonstration: Using FOR XML Queries

 Use FOR XML queries.
Demonstration Steps
Use FOR XML queries
Lesson 5
Getting Started with XQuery
In Lesson 4, you learned how to query relational data and return it as XML. Sometimes, however, the data
is already in XML and you may need to query it directly. You might want to extract part of the XML into
another XML document; you might want to retrieve the value of an element or attribute; you might want
to check whether an element or attribute exists; and finally, you might want to directly modify the XML.
XQuery methods make it possible to perform these tasks.
Lesson Objectives
 Explain the role of XQuery.
 Use the query() method.
 Use the value() method.

 Use the exist() method.
 Use the modify() method.
What Is XQuery?
XQuery is a query language that is designed to
query XML documents. It also includes elements of
other programming languages, such as looping
constructs.
XQuery was developed by a working group within

the World Wide Web Consortium. It was developed
in conjunction with other work in the W3C, in
particular, the definition of Extensible Stylesheet
Language Transformations (XSLT). XSLT makes use
of a subset of XQuery that is known as XPath.
XPath is the syntax that is used to provide an

address for specific attributes and elements within
an XML document. You saw basic examples of this when you were considering FOR XML PATH mode
queries in the last lesson.
Look at the following XPath expression.
XPath Expression
/InvoiceList/Invoice[@InvoiceNo=1000]
This XPath expression specifies a need to traverse the InvoiceList node (that is the root element because
the expression starts with a slash mark (/)), then traverse the Invoice subelements (note that there may be
more than one of these), and then to access the InvoiceNo attribute. All invoices that have invoice
number 1,000 are returned.
Although there is unlikely to be more than one invoice that has the number 1,000, nothing about XML
syntax (without a schema) enforces this. One thing that can be hard to get used to with the XPath syntax
is that you constantly need to specify that you want the first entry of a particular type, even though
logically you may think that it should be obvious that there would only be one. You indicate the first entry
in a list by the expression [1].
In XPath, you indicate attributes by using the at sign (@) prefix. The content of the element itself is
referred to by the token text ().
FLWOR Expressions
In addition to basic path traversal, XPath supports an iterative expression language that is known as
FLWOR and commonly pronounced “flower.” FLWOR stands for “for, let, where, order, and return,” which
are the basic operations in a FLWOR query.
An example of a FLWOR expression is shown in the following XQuery query() method.
FLWOR Expression
SELECT @xmlDoc.query('<OrderedItems>
{
for $i in /InvoiceList/Invoice/Items/Item
return $i
}
</OrderedItems>');
This query supplies OrderedItems as an element. Then, within that element, it locates all items on all
invoices that are contained in the XML document and displays them as subelements of the OrderedItems
element. An example of the output from this query is shown below.
<OrderedItems>
<Item Product=”1” Price=”1.99” Quantity=”2” />
</OrderedItems>
Note that becoming proficient at XQuery is an advanced topic that is beyond the scope of this course. The
aim of this lesson is to make you aware of what is possible when you are using XQuery methods. The
available XQuery methods are shown in the following table.
Method Purpose
query() Returns selected XML data.
value() Retrieves a specific value from an element or attribute.
exist() Checks for the existence of an element or attribute.
modify() Performs insert, replace, or delete operations.
nodes() Shreds XML data to relational format.
The nodes() method will be covered in the next lesson, which discusses shredding XML to relational data.
query() Method
You can use the query() method to extract XML
from an existing XML document. The XML that is
generated can be a subset of the original XML
document. Alternatively, it is possible to generate
entirely new XML based on the values that are
contained in the original XML document.
You can use the query() method to return untyped

XML. The query() method takes an XQuery
expression that evaluates to a list of XML nodes and
enables the users to create output XML based in
some way on the fragments that it extracts from the
input XML.
An XQuery expression in SQL Server consists of two sections: a prolog and a body. The prolog can contain
a namespace declaration. You will see how to do this later in this module. The body of an XQuery
expression contains query expressions that define the result of the query. Both the input and output of a
query() method are XML.
Note that if NULL is passed to a query() method, the result that the method returns is also NULL.
Example of query() Method

Look at the query that is shown on the slide.
query() Method
SELECT XmlEvent.query(
'<EventSPIDs>
{
for $e in /EVENT_INSTANCE
return <SPID>
{number($e/SPID[1])}
</SPID>
}
</EventSPIDs>')
FROM dbo.DatabaseLog;
This query tells SQL Server to return one xml value for each row in the dbo.DatabaseLog table. The xml
value that is returned for each row will have a root element that is called EventSPIDs. For each
EVENT_INSTANCE node that is contained in the XmlEvent column within each row, a subelement
named SPID should be returned. The contents of that node will be the value of the first SPID subelement
of the EVENT_INSTANCE node returned as a number.
Sample output from the query is shown below.
<EventSPIDs>
<SPID>69</SPID>
</EventSPIDs>
You will see how this works in the demonstration at the end of this lesson.
value() Method
The value() method is useful for extracting scalar
values from XML documents as a relational value.
This method takes an XQuery expression that
identifies a single node and the desired SQL type to
be returned. The value of the XML node is returned
cast to the specified SQL type.
The purpose of the value() method shown in the

example on the slide is to navigate to the
ProductModelID attributes of the
ProductDescription element. Note the use of the
expression [1] to specify that the first of these is
retrieved and returned as an integer with the alias
Result.
Do not be too concerned with the namespace declaration in the example shown on the slide. It is only
specified because the examples in the AdventureWorks database require this.
Example Output
You can see the output from this query in the following table.
Result
1 19
2 23
3 25
4 28
5 34
6 35
Note that, as with the query() method, if NULL is passed to the value() method, NULL will be returned.
exist() Method
Use the exist() method to check for the existence
of a specified value. The exist() method enables the
user to perform checks on XML documents to
determine whether the result of an XQuery
expression is empty or nonempty. The result of this
method is:
 1 if the XQuery expression returns a nonempty

result.
 0 if the result is empty.
 NULL if the XML instance itself is NULL.
Whenever possible, for better performance, use the

exist() method on the xml data type instead of the value() method. The exist() method is most helpful
when it is used in a SQL WHERE clause and utilizes XML indexes more effectively than the value()
method.
modify() Method
You can perform data manipulation operations on
an XML instance by using the modify() method.
The modify() method changes the contents of an
XML document.
You can use the modify() method to alter the

content of an xml type variable or column. This
method takes an XML data manipulation language
(DML) statement to insert, update, or delete nodes
from the XML data. You can only use the modify()
method of the xml data type in the SET clause of an
UPDATE statement. You can insert, delete, and
update one or more nodes by using the insert,
delete, and replace value of keywords, respectively.
Note that, unlike the previous methods, an error is returned if NULL is passed to the modify() method.
Examples
In the following example, a new SalesPerson node with the text() of Bill is inserted into the first position
of the first invoice in the list of invoices.
Insert
SET @xmlDoc.modify(
‘insert element Salesperson {“Bill”}
as first
into (/InvoiceList/Invoice)[1]’);
In the following example, the name of the SalesPerson node name is replaced by Ted.
Replace
SET @xmlDoc.modify(
‘replace value of
(/InvoiceList/Invoice/Salesperson/text())[1]
with “Ted”’);
In the following example, the SalesPerson subelement would be removed from the InvoiceList/Invoice
path.
Delete
SET @xmlDoc.modify(
‘delete
(/InvoiceList/Invoice/Salesperson)[1]’);
Demonstration: Using XQuery Methods in DDL Triggers

 Use XQuery in DDL triggers
Demonstration Steps
Use XQuery in DDL triggers
2. Ensure that you have run all previous demonstrations
Lesson 6
Shredding XML
Another common need that can arise when you are working with XML data in SQL Server is to be able to
extract relational data from within an XML document. For example, you might receive a purchase order
from a customer in XML format. You need to parse the XML to retrieve the details of the items that you
need to supply.
The extraction of relational data from within XML documents is referred to as “shredding” the XML
documents. There are two basic ways to do this. SQL Server 2000 supported the creation of an in-memory
tree that you could then query by using an OPENXML function. Although that is still supported, SQL
Server 2005 introduced the XQuery nodes() method, which in many cases will be an easier way to shred
XML data.
In addition to covering these areas in this module, you will see how Transact-SQL provides a way of
simplifying how namespaces are referred to in queries.
Lesson Objectives
 Describe how to shred XML data.
 Use system stored procedures for creating and managing in-memory node trees that have been
extracted from XML documents.
 Use the OPENXML function.
 Work with XML namespaces.
 Use the nodes() method.
Overview of Shredding XML Data

One method for shredding XML data is to query an
in-memory tree that represents the XML. You can
use the sp_xml_preparedocument system stored
procedure to create an in-memory node tree from
an XML document that will make querying the XML
data faster. This enables you to obtain relational
data from within the XML document.
Shredding XML
The process for shredding XML is:
1. An XML document is received from a client

application.
2. By calling sp_xml_preparedocument, an in-memory node tree is created, based on the input XML.
3. The OPENXML table-valued function is then used to query the in-memory node tree and extra
relational data.
4. The relational data that has been extracted is normally combined with other relational data as part of
standard Transact-SQL queries.
5. Calling sp_xml_removedocument removes the node tree from memory.

The steps in this process will be explored in the following topics.
Stored Procedures for Managing In-Memory Node Trees

Before you can use the OPENXML functionality to
navigate XML documents, you need to create an in-
memory node tree. This is done by using the
sp_xml_preparedocument system stored procedure.
sp_xml_preparedocument
sp_xml_preparedocument is a system stored
procedure that takes XML either as the untyped
xml data type or as XML stored in the nvarchar
data type, creates an in-memory node tree from the
XML (to make it easier to navigate), and returns a
handle to that node tree.
sp_xml_preparedocument reads the XML text that was provided as input, parses the text by using the
Microsoft XML Core Services (MSXML) parser (Msxmlsql.dll), and provides the parsed document in a state
that is ready for consumption. This parsed document is a tree representation of the various nodes in the
XML document, such as elements, attributes, text, and comments.
Before you call sp_xml_preparedocument, you need to declare an integer variable to be passed as an
output parameter to the procedure call. When the call returns, the variable will then be holding a handle
to the node-tree.
It is important to realize that the node tree must stay available and unmoved in visible memory because
the handle is basically a pointer that needs to remain valid. This means that, on 32-bit systems, the node
tree will not be able to be stored in Address Windowing Extensions (AWE) memory.
sp_xml_removedocument
sp_xml_removedocument is a system stored procedure that frees the memory that a node tree occupies
and invalidates the handle.
In SQL Server 2000, sp_xml_preparedocument created a node tree that was session-scoped, that is, the
node tree remained in memory until the session ended or until sp_xml_removedocument was called. A
common coding error was to forget to call sp_xml_removedocument. Leaving too many node trees to
remain in memory was known to cause a severe lack of available low-address memory on 32-bit systems.
Therefore, a change was made in SQL Server 2005 that made the node trees created by
sp_xml_preparedocument become batch-scoped rather than session-scoped. Even though the tree will be
removed at the end of the batch, it is considered good practice to explicitly call sp_xml_removedocument
to minimize the use of low-address memory as much as possible.
Note that 64-bit systems generally do not have the same memory limitations.
OPENXML Function
The OPENXML function provides a rowset over in-
memory XML documents, which is similar to a table
or a view. OPENXML enables access to the XML
data as though it is a relational rowset. It does this
by providing a rowset view of the internal
representation of an XML document.
After you have created an in-memory node tree of

an XML document by using
sp_xml_preparedocument, you can use OPENXML
to write queries against the document. For example,
you might need to extract a list of products that
you need to supply to a customer from an XML-
based order that the customer sent to you. OPENXML provides a rowset view of the document, based on
the parameters that are passed to it.
The parameters that are passed to OPENXML are the XML document handle; a rowpattern, which is an
XPath expression that maps the nodes of XML data to rows; and an indication of whether to use attributes
rather than elements by default. Associated with the OPENXML clause is a WITH clause that provides a
mapping between the rowset columns and the XML nodes.
The ColPattern that is shown is an optional, generic XPath pattern that describes how the XML nodes
should be mapped to the columns. If ColPattern is not specified, the default mapping (attribute-centric
or element-centric mapping as specified by flags) occurs.
Working with XML Namespaces

Earlier in this module, you saw how an XML
namespace is a collection of names that you can
use as element or attribute names in an XML
document. The namespace qualifies names uniquely
to avoid naming conflicts with other elements that
have the same name.
When you call sp_xml_preparedocument, if you are

working with XML that contains embedded
namespaces, you can specify an optional parameter
that specifies the namespaces that are used. You
can then use the namespace prefixes in the XPath
expressions that are used within OPENXML.
In the example on the slide, note how the alias o has been assigned to the urn:AW_OrderNS XML
namespace. That alias is then used throughout the document when an element that is defined in that
namespace is used.
nodes() Method
The nodes() method provides a much easier way to
shred XML into relational data than OPENXML and
its associated system stored procedures.
The nodes() method is an XQuery method and is

useful when you want to shred an xml data type
instance into relational data. It is a table-valued
function that enables you to identify nodes that will
be mapped into a new row.
Every xml data type instance has an implicitly

provided context node. For the XML instance that is
stored in a column or a variable, this is the
document node. The document node is the implicit
node at the top of every xml data type instance.
The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In
these logical copies, the context node of every row instance is set to one of the nodes that is identified
with the query expression. This enables subsequent queries to navigate relative to these context nodes.
It is important to be careful about the query plans that are generated when you use the nodes() method.
In particular, no cardinality estimates are available when you use this method. This has the potential to
lead to poor query plans. In some cases, the cardinality is simply estimated to be a fixed value of 10,000
rows. This might cause an inappropriate query plan to be generated if your XML document contained
only a handful of nodes.
CROSS APPLY and Table-Valued Functions

The nodes() method is a table-valued function that is normally called by using the CROSS APPLY or
OUTER APPLY operations.
APPLY operations cause table-valued functions to be called for each row in the left table of the query.
Look at the query in the example on the slide.
CROSS APPLY and Table-Valued Functions

SELECT EventDetail.value('PostTime[1]','datetime2') AS PostTime,
EventDetail.value('SPID[1]', 'int') AS SPID,
EventDetail.value('ObjectType[1]','sysname') AS ObjectType,
EventDetail.value('ObjectName[1]','sysname') AS ObjectName
FROM dbo.DatabaseLog AS dl
CROSS APPLY dl.XmlEvent.nodes('/EVENT_INSTANCE')
AS EventInfo(EventDetail)
ORDER BY PostTime;
In this query, for every row in the dbo.DatabaseLog table, the nodes() method is called on the XmlEvent
column from the dbo.DatabaseLog table. When table-valued functions are used in queries like this, you
must provide an alias for both the derived table and the columns that it contains. In this case, the alias
that is provided to the derived table is EventInfo and the alias that is provided to the extracted column is
EventDetail.
One output row is being returned for each node at the level of the XPath expression /EVENT_INSTANCE.
From the returned XML column (EventDetail), a series of columns is generated by calling the value()
method. Note that it is called four times for each output row in this example. Also note that the path to
the value to be returned and the data type of that value are being specified along with output column
aliases.
Demonstration: Shredding XML

 Shred XML data by using the nodes() method.
Demonstration Steps
Shred XML data by using the nodes() method
Lab: Storing and Querying XML Data in SQL Server

Scenario
A new developer in your organization has discovered that SQL Server can store XML directly. He is keen to
use this mechanism extensively. In this lab, you will decide on appropriate usage of XML within the
documented application.
You also have an upcoming project that will require the use of XML data within SQL Server. No members
of your current team have experience working with XML data in SQL Server. You need to learn how to
process XML data within SQL Server and you have been provided with some sample queries to assist with
this learning.
Objectives
 Determine appropriate use cases for storing XML in SQL Server.
 Test XML storage in variables.

 Retrieve information about XML schema collections.
 Query SQL Server data as XML.
 Write a stored procedure that returns XML.

Password: Pa$$w0rd
Exercise 1: Assess Appropriate Use of XML Data in SQL Server

Scenario
In this exercise, you will need to assess the list of use cases that your new developer has provided and
determine which are appropriate for XML storage in SQL Server and which are not.
Use Cases
Use case requirements
Existing XML data that is stored, but not processed.
Storing attributes for a customer.
Relational data that is being passed through a

system, but not processed within it.
Storing attributes that are nested (that is, attributes

stored within attributes).
2. Review the List of Use Cases
3. Determine Which Use Cases Are Appropriate for XML


 Task 2: Review the List of Use Cases

1. Review the list of use cases in the scenario.
 Task 3: Determine Which Use Cases Are Appropriate for XML

1. Determine whether the use cases are suitable for XML storage.
Results: After this exercise, you will have seen how to analyze requirements and determine appropriate
use cases for XML storage.
Exercise 2: Test XML Data Storage in Variables

Scenario
Before you can begin to work with XML data in your organization, you need to explore how XML data is
stored in variables. You have been provided with a set of sample XML queries to assist with this. In this
exercise, you will review the effect of executing these queries.
1. Review, Execute, and Review the Results of the XML Queries
 Task 1: Review, Execute, and Review the Results of the XML Queries
1. On the taskbar, click SQL Server 2014 Management Studio, in the Connect to Server window,
ensure that Server name is MIA-SQL, and then click connect.
2. On the File menu, click Open, click File, navigate to D:\Labfiles\Lab13\Starter, and then select
InvestigateStorage.sql and click Open.
3. Review the queries, execute the queries, and determine how the output results relate to the queries.
4. Do this one query at a time for scripts 13.1 to 13.9.
Results: After this exercise, you will have seen how XML data is stored in variables.
Exercise 3: Retrieve Information About XML Schema Collections

Scenario
For some of the XML processing that you will need to perform in your upcoming project, you will need to
validate XML data by using XML schemas. In SQL Server, XML schemas are stored in XML schema
collections. You need to investigate how these schemas are used. You have been provided with a set of
sample queries to assist with this. In this exercise, you will review the effect of executing these queries.
1. Review, Execute, and Review the Results of the Queries
 Task 1: Review, Execute, and Review the Results of the Queries

XMLSchema.sql and click Open.
2. Review the queries, execute the queries, and note the output. Do this one query at a time for scripts
13.10 and 13.11.
Results: After this exercise, you will have seen how to create XML schema collections.
Exercise 4: Query SQL Server Data as XML

Scenario
In this exercise, you have decided to learn to query SQL Server data to return XML. You will review and
execute scripts that demonstrate the most important FOR XML querying techniques.

1. Review, Execute, and Review the Results of the Queries

XMLQuery.sql and click Open.
2. Review the query, execute the query, and review the results for scripts 13.21 to 13.29.
Results: After this exercise, you will have executed queries that return SQL Server relational data as XML.
Exercise 5: Write a Stored Procedure That Returns XML

Scenario
In this exercise, a new web service is being added to the marketing system. You need to create a stored
procedure that will query data from a table and return it as an XML value.
Stored Procedure Specifications

Stored Procedure WebStock.GetAvailableModelsAsXML
Input Parameters: None
Returned Rows: One XML document with attribute-centric XML.

Root element is AvailableModels.
Row element is AvailableModel.
Row contains ProductID, ProductName, ListPrice, Color and SellStartDate
(from Marketing.Product) and ProductModelID and ProductModel (from
Marketing.ProductModel) for rows where there is a SellStartDate but not
yet a SellEndDate.
Output Order: Rows within the XML should be in order of SellStartDate ascending and
then ProductName ascending. That is, sort by SellStartDate first and then
ProductName within SellStartDate.
Stored Procedure: Marketing.UpdateSalesTerritoriesByXML

Stored Procedure Marketing.UpdateSalesTerritoriesByXML
Input Parameters: @SalespersonMods xml
Returned Rows: None
Actions: Update the SalesTerritoryID column in the Marketing.Salesperson table

based upon the SalesTerritoryID values extracted from the input
parameter.
Incoming XML Object Format:

This is an example of the incoming XML
Incoming XML Object Format

<SalespersonMods>
<SalespersonMod SalespersonID="274">
<Mods>
<Mod SalesTerritoryID="3"/>
</Mods>
</SalespersonMod>
<SalespersonMod SalespersonID="278">
<Mods>
<Mod SalesTerritoryID="4"/>
</Mods>
</SalespersonMod>
</SalespersonMods>
1. Review the Requirements
2. Create the Stored Procedure

3. Test the Stored Procedure

1. Review the supporting documentation for details of the required
WebStock.GetAvailableModelsAsXML stored procedure.
 Task 2: Create the Stored Procedure

1. Create and implement the stored procedure based on the specifications that are provided.
 Task 3: Test the Stored Procedure

1. Test the stored procedure by executing the following code.
EXEC Production.GetAvailableModelsAsXML;
Results: After this exercise, you will have created and tested the required stored procedure that returns
XML.
Question: What is the purpose of an XML schema?
Question: When would you use untyped XML?
Question: You could pass XML data to a stored procedure by using either the xml data type
or the nvarchar data type. What advantage does the xml data type provide over the
nvarchar data type for this purpose?
Question: Which XML query mode did you use for implementing the
WebStock.GetAvailableModelsAsXML stored procedure?

Best Practice: Use appropriate data types for your database columns. Do not store all of
your data in XML columns.
Use XML schemas only when they are required. Validating data against schemas incurs
substantial processing overhead.
Ensure that you have at least basic XML proficiency when you are working with SQL Server, even
if you will be working primarily in database administration.
Index XML data that is stored in database columns. Use the appropriate type of index for the
types of queries that you expect.
Review Question(s)
Question: What is XML?
Question: What is the difference between an element and an attribute?
Question: What is AUTO mode?
Question: What is PATH mode?

14-1
Module 14
Working with Spatial Data in SQL Server
Contents:
Lesson 1: Introduction to Spatial Data 14-2
Lesson 2: Working with Spatial Data Types in SQL Server 14-7
Lesson 3: Using Spatial Data in Applications 14-15
Lab: Working with Spatial Data in SQL Server 14-20
Module Overview
Business applications routinely deal with addresses and locations, yet they rarely provide effective ways to
process distances and proximity. Spatial data in Microsoft® SQL Server® data management software
enables the effective storage and processing of locations, addresses, and shapes. This capability can help
business applications make better decisions and you can also use it to help visualize results, which often
makes results easier to interpret.
Objectives
 Describe the importance of spatial data and the industry standards that are related to it.
 Explain how to store spatial data in SQL Server.
 Perform calculations on and query spatial data in SQL Server.

14-2 Working with Spatial Data in SQL Server
Lesson 1
Introduction to Spatial Data
Before starting to work with spatial data, it is important to understand where it is typically used in
applications and what types of spatial data there are. Most business applications need to work with
addresses or locations. SQL Server can process both planar and geodetic data. It is important to
understand the difference between these two types of data in addition to how the SQL Server data types
relate to the relevant industry standards and measurement systems.
Lesson Objectives
 Explain how spatial data is useful in a wide variety of business applications.
 Describe the different types of spatial data.
 Describe the difference between planar and geodetic data types.

 Explain the relationship between the spatial data support in SQL Server and the industry standards.
 Work with spatial reference identifiers to provide measurement systems.
Target Applications
There is a perception that spatial data is not useful
in mainstream applications. However, this
perception is invalid: almost every business
application can benefit from the use of spatial data.
Business Applications
Although mapping provides an interesting
visualization in some cases, business applications
can make good use of spatial data for much more
routine tasks. Almost all business applications
involve the storage of addresses or locations.
Customers or clients have street addresses, mailing
addresses, and delivery addresses. The same is true
for stores, offices, suppliers, and many other business-related entities.
Business Intelligence Applications

A particularly strong use of spatial data comes when it is combined with business intelligence applications.
These applications often deal with results that are best visualized rather than being presented as tables of
numbers. Spatial capabilities make it possible to provide very rich forms of visualization.
Common Business Questions

Consider a pet accessories supply company that has stores all over the country. They know where their
stores are and they know where their customers live. The owner has a feeling that the company’s
customers are not buying from their nearest store, but she has no firm facts on which to base this feeling.
It could be true that customers really do purchase from their local store. The owner may have just come
across a small sample of data and been misled by it. Alternatively, it could be true that customers do not
purchase from their local stores. If so, it might be interesting to know what they purchase when they
travel to another store. Perhaps the local store does not hold a wide enough variety of stock. Alternatively,
perhaps the customers purchase everything they need from a more remote store because they do not like
the staff at the local store. A situation might also exist where two stores are ”cannibalizing” each other’s
business. The data might also be used to find new locations for stores.
It is important to realize that these sorts of questions are normal business questions, not specialized
mapping questions. This is the sort of problem that you can solve quite easily if you can process spatial
data in a database.
Types of Spatial Data
Key Points
In the spatial data community, several types of
spatial data are used. SQL Server works with vector-
based two-dimensional (2-D) data, but has some
storage options for three-dimensional (3-D) values.
Vector vs. Raster Data

It is possible to store spatial data either as a series
of line segments that together form an overall
shape (vector storage) or as a series of dots or pixels
that are formed by dividing a shape into smaller
pieces (raster storage).
Vector storage is the method on which spatial data in SQL Server is based. One key advantage of vector-
based storage is the way that it can scale. Imagine storing the details of a line. You could divide the line
into a series of dots that make up the line. However, if you then zoomed in to an image of the line, the
individual dots would become visible, along with the gaps between the dots. This is how raster-based
storage works. Alternatively, if the line was stored as the coordinates of the start and end points of the
line, it would not matter how much you zoomed in or out, the line would still look complete. This is
because it would effectively be redrawn at each level of magnification. This is how vector-based storage
works.
2-D, 3-D, and 4-D

You are probably familiar with seeing drawings or maps on paper that are two-dimensional in nature. A
third dimension would represent the elevation of a point on the map. Four-dimensional (4-D) systems
usually incorporate changes in a shape over time.
Spatial data in SQL Server is currently based on the 2-D technology. In some of the objects and properties
that it provides, spatial data in SQL Server supports the storage and retrieval of 3-D and 4-D values, but it
is important to realize that the third and fourth dimensions are ignored during calculations. This means
that if you calculate the distance between, say, a point and a building, the calculated distance is the same
regardless of which floor or level in the building the point is located.
Question: Which existing SQL Server data type could you use to store (but not directly
process) raster data?
Planar vs. Geodetic Data Types
Key Points
Planar systems represent the Earth as a flat surface.
Geodetic systems represent the Earth more like its
actual shape.
Planar Systems
Prior to the advent of computer systems, it was very
difficult to perform calculations on round models of
the Earth. For convenience, mapping tended to be
two-dimensional in nature. Most people are familiar
with traditional flat maps of the world.
However, as soon as larger distances are involved, flat maps provide a significant distortion, particularly as
you move from the center of the map. When most of the standard maps from atlases were first drawn,
they were oriented around where the people who were drawing the maps lived. That meant that the least
distortion occurred where the people who were using the maps were based.
As an example, in the flat map that is shown on the slide, it is not obvious how Africa’s area (about 30
million square kilometers) compares to North America’s area (about 24 million square kilometers). Also,
compare Antarctica’s size on the map and note that it is really only about 13 million square kilometers in
size.
Geodetic Systems
Geodetic systems represent the Earth as a round shape. Some systems use simple spheres, but it is
important to realize that the Earth is not actually spherical.
Spatial data in SQL Server offers several systems for representing the shape of the Earth. Most systems
model the Earth as an ellipsoid rather than as a sphere.
Question: What is the difference between an ellipsoid and a sphere?
OGC Object Hierarchy
Key Points
The Open Geospatial Consortium (OGC) is the
industry body that provides specifications for how
processing of spatial data should occur in systems
that are based on Structured Query Language
(SQL).
SQL Specification
One of the two data types that SQL Server provides
is the geometry data type. It conforms to the OGC
Simple Features for SQL Specification version 1.1.0
and is used for planar spatial data. In addition to defining how to store the data, the specification details
common properties and methods to be applied to the data.
The OGC defines a series of data types that form an object tree. In the chart that is shown on the slide, the
objects that are supported and can be created in spatial data in SQL Server are shown in blue (or the
darker color). Other objects in the OGC Geometry hierarchy are shown in yellow (or the lighter color).
Curved arc support was added in SQL Server 2012.
Extensions
SQL Server also extends the standards in several ways. SQL Server provides a round-earth data type that is
called geography, along with several additional useful properties and methods.
Methods and properties that are related to the OGC standard have been defined by using an ST prefix
(such as STDistance). Those without an ST prefix are Microsoft extensions to the standard (such as
MakeValid).
Spatial Reference Identifiers
Key Points
There have been many systems of measurement
over time. SQL Server supports many of these
measurement systems directly. When you specify a
spatial data type in SQL Server, you also specify the
measurement system to be used. You specify this by
associating a spatial reference ID with the data. A
spatial reference ID of zero indicates the lack of a
measurement system. This is commonly used where
there is no need for a specific measurement system.
Spatial Reference Systems

Any model of the Earth is an approximation, but some models are closer to reality than others. SQL Server
supports many different Earth models by using a series of spatial reference identifiers (SRIDs). Each SRID
defines the shape of the Earth model, the authority that is responsible for maintaining it, the unit of
measure that is used, and a multiplier that determines how the unit of measure could be converted to
meters.
SRID 4326
The World Geodetic System (WGS) is commonly used in cartography, geodetics, and navigation. The latest
standard is WGS 1984 (WGS 84) and is best known to most people through the Global Positioning System
(GPS). GPS is often used in navigation systems and uses WGS 84 as its coordinate system.
In spatial data in SQL Server, SRID 4326 provides support for WGS 84.
If you query the list of SRIDs in SQL Server, the entry for SRID 4326 has the following name. This is
formally called the Well-Known Text (WKT) that is associated with the ID:
WGS 84
GEOGCS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84", 6378137,
298.257223563]], PRIMEM["Greenwich", 0], UNIT["Degree", 0.0174532925199433]]
This specifies how WGS 84 models the Earth as an ellipsoid (you can imagine it as a squashed ellipsoid),
with its major radius of 6,378,137 meters at the equator, a flattening of 1 / 298.257223563 (or about 21
kilometers) at the poles, a prime meridian (that is, a starting point for measurement) at Greenwich, and a
measurement that is based on degrees. The starting point at Greenwich is specifically based at the Royal
Observatory. The units are shown as degrees and the size of a degree is specified in the final value in the
definition. Most geographic data today would be represented by SRID 4326.
Demonstration: Viewing Available Spatial Reference Systems

 View the available special reference systems
Demonstration Steps
View the available special reference systems
2. Run D:\Demofiles\Mod14\Demo14\Setup.cmd as an administrator to revert any changes.
D:\Demofiles\Mod14\Demo14\Demo14.ssmssln, and then click Open.
Question: Do you currently use GPS data in any existing applications within your
organization?
Lesson 2
Working with Spatial Data Types in SQL Server
SQL Server supports two spatial data types, geometry and geography, which have been created as
system common language runtime (CLR) data types. You need to know how to use each of these data
types and how to interchange data by using industry-standard formats.
Lesson Objectives
 Describe the support that spatial data in SQL Server provides.
 Explain how system CLR types differ from user CLR types.
 Use the geometry data type.
 Use the geography data type.
 Work with standard spatial data formats.
 Use OGC methods and properties on spatial data.
 Use Microsoft extensions to the OGC standard when working with spatial data.
Spatial Data in SQL Server
Key Points
SQL Server supplies rich support for spatial data. It
provides two data types: the geometry data type,
which is suited to flat-earth (planar) models, and
the geography data type, which is suited to round-
earth (geodetic) models.
geometry Data Type

The geometry data type is the SQL Server
implementation of the OGC Geometry data type. It
supports most of the methods and properties of the
OGC type plus extensions to the OGC type. You use the geometry data type when you are modeling flat-
earth models such as two-dimensional diagrams. The geometry data type offers a coordinate system
based on X and Y.
geography Data Type

The geography data type is a Microsoft extension to the OGC standards that is suitable when you are
working with round-earth models such as GPS data. The geography data type works with a coordinate
system that is based on Long and Lat (longitude and latitude).
Note that although “latitude and longitude” is a commonly used phrase in the general community, the
geographical community uses the terminology in the reverse order. When you are specifying inputs for
geographic data in SQL Server, the longitude value precedes the latitude value.
Additional Support
The Microsoft Bing® Maps software development kit (SDK) has been updated to work closely with spatial
data in SQL Server. SQL Server Reporting Services includes a map control that you can use to render
spatial data and a wizard to help to configure the map control. The map control is available for reports
that are built by using Business Intelligence Development Studio and for reports that are built by using
Report Builder.
An application that stores or retrieves spatial data from a database in SQL Server needs to be able to work
with that data as a spatial data type. To make this possible, a separate installer (MSI) file has been
provided as part of the SQL Server 2012 Feature Pack to enable client applications to use the spatial data
types in SQL Server. The installer is called “Microsoft System CLR Types for SQL Server 2012.” By installing
this file on client systems, an application on the client can “rehydrate” a geography object that has been
read from a SQL Server database into a SqlGeography object within .NET managed code.
ST Prefix
For the properties and methods that are implementations of the OGC standards, an ST prefix has been
added to the names of the properties and methods. For example, the X and Y coordinates of a geometry
object are provided by STX and STY properties and the Distance calculation is provided by the
STDistance method.
For Microsoft extensions to the OGC standards, no prefix has been added to the name of the methods or
properties, so, for example, there is a MakeValid method. You must also take care when you refer to
properties and methods because they are case-sensitive, even on servers that are configured for case-
insensitivity.
Question: You may have used a web service to calculate the coordinates of an address. What
is this process commonly called?
System CLR Types vs. User CLR Types

The geometry and geography data types have
been implemented as CLR types by using managed
code. They are defined as system CLR types and
work even when CLR integration is not enabled at
the SQL Server instance level.
System CLR Types

SQL Server 2005 introduced the concept of a user-
defined CLR data type for data types that were
implemented in managed code. CLR data types in
SQL Server 2005 were limited in size. They needed
to be able to be serialized into 8 KB of storage
because they needed to fit in one SQL Server data
page.
SQL Server 2008 introduced the concept of a system CLR data type, which was separate from the user-
defined data types, but also implemented in managed code. In addition, SQL Server 2008 replaced the 8-
KB limit on serialization with a 2-GB limit. This increased limit makes it possible to create complex data
types by using managed code.
In SQL Server, there are three system CLR data types that take advantage of this large data type support:
geometry, geography, and hierarchyid. Unlike user-defined CLR data types, these system data types
operate even when the ‘clr enabled’ setting for the server instance is disabled.
You can see the currently installed assemblies and whether they are user-defined by executing the
following query:
Currently Installed Assemblies

SELECT name,
assembly_id,
permission_set_desc,
is_user_defined
FROM sys.assemblies;
Accessing Properties and Methods

You can access a property of an instance of a spatial data type by referring to it as Instance.Property.
As an example of this, look at the following code that is accessing the STX property of a variable called
@Location:
Accessing Properties
SELECT @Location.STX;
You can access a method of an instance of a spatial data type by referring to it as Instance.Method().
As an example of this, look at the following code that is calling the MakeValid method of a variable called
@InputLocation
Accessing Methods
SELECT @Location = @InputLocation.MakeValid();
It is also possible to call methods that are defined on the data types (geometry and geography) rather
than on instances (that is, columns or variables) of those types. This is an important distinction.
As an example of this, look at the following code that is calling the GeomFromText method of the
geometry data type:
Calling Methods Defined on Data Types

SELECT @Location = geometry::STGeomFromText('POINT (12 15)',0);
Note that you are not calling the method on a column or variable of the geometry data type, but on the
geometry data type itself. In .NET terminology, this would be referred to as calling a public static method
on the geometry class. Note also that the methods and properties of the spatial data types are case-
sensitive, even on servers that are configured with case-insensitive default collations.
geometry Data Type
Key Points
The geometry data type is used for flat-earth (that
is, planar) data storage and calculations. It provides
comprehensive coverage of the OGC standard.
geometry Data Type

The geometry data type is based on an X and Y
coordinate system. It is a two-dimensional data
type. In the definition of the type, there is provision
for Z (elevation) and M (measure) in addition to the
X and Y coordinates. You can enter and retrieve the
Z and M values in the geometry data type, but it ignores these values when it performs calculations.
You can see the input and output of X, Y, Z, and M in the following code:
geometry Data Type

DECLARE @Location geometry;
SELECT @Location = geometry::STGeomFromText('POINT (12 15 2 9)',0);

SELECT @Location.STAsText();
SELECT @Location.AsTextZM();
When this code is executed, it returns the following results:
geometry Results
POINT (12 15)
POINT (12 15 2 9)
The SQL Server geometry data type provides comprehensive coverage of the OGC Geometry data type.
The X and Y coordinates are represented by STX and STY properties.
SRID and Geometry

When you are working with geometric data, the measurement system is not directly relevant. For
example, the area of a shape that measures 3 × 2 is still 6, regardless of whether 3 and 2 are in meters or
in inches. For this reason, there is no need to specify an SRID when you are working with geometric data.
When you are entering data, the SRID value is typically left as zero.
Spatial Results Viewer

Whenever a SQL Server result set is being displayed in SQL Server Management Studio and the results
include columns of the geometry or geography data types, a special spatial results viewer is provided to
enable you to visualize the spatial results.
geography Data Type
Key Points
The geography data type is used for round-earth
values, typically involving actual positions or
locations on the Earth. It is an extension to the OGC
standard.
geography Data Type

The geography data type is based on a latitude
and longitude coordinate system. The latitude and
longitude values are represented by the Lat and
Long properties. Unlike the geometry data type,
where the X and Y coordinates can be any valid number, the Lat and Long properties must relate to valid
latitudes and longitudes for the selected spatial reference system. SRID 4326 (or WGS 84) is the most
commonly used spatial reference system when you are working with the geography data type. The
geography data type can also store, but not process, Z and M values.
Result Size Limitations

In the earlier implementation of the geography data type in SQL Server 2008, any resulting geography
value needed to be contained within a single hemisphere. This did not mean any specific hemisphere such
as the northern or southern hemispheres, but just that no two points could be more than half the Earth
apart if they were contained in the same instance of the geography data type. This limitation was
removed in SQL Server 2012.
Point Order in Polygons

When you are defining the shape of a polygon by using a series of points, the order in which the points
are provided matters. Imagine the set of points that define a postal code region. The same set of points
actually defines two regions: all of the points inside the postal code region and all of the points outside
the postal code region.
To enclose points, they are listed in counterclockwise order. As you draw a shape, all of the points to the
left of the line that you draw will be enclosed by the shape. The points on the line are also included.
If you draw a postal code region in a clockwise direction, you are defining all points outside the region. In
earlier versions of SQL Server, because results were not permitted to span more than a single hemisphere,
an error would have been returned. This restriction was removed in SQL Server 2012.
Spatial Results Viewer

As discussed in the previous topic, a spatial results viewer is provided whenever a result set is displayed in
SQL Server Management Studio and the results include either geometry or geography data.
For geography, the viewer is quite configurable. You can set which column to display, the geographic
projection to use for display (for example, Mercator, Bonne, and so on), and you can choose to display
another column as a label over the relevant displayed region.
The spatial results viewer in SQL Server Management Studio is limited to displaying the first 5,000 objects
from the result set.
Spatial Data Formats
Key Points
The internal binary format of any CLR data type is
not directly used for input and output of the data
type in most cases. You need to accommodate
string-based representations of the data.
CLR data types (including the geometry and
geography system CLR data types) are stored in a
binary format that the designer of the data type
determines. Although it is possible to both enter
values and generate output for instances of the
data type by using a binary string, this is not
typically very helpful because you would need to have a detailed understanding of the internal binary
format.
Spatial Data Formats

The OGC and other organizations that work with spatial data define several formats that you can use for
interchanging spatial data. Some of the formats that SQL Server supports are:
 Well-Known Text (WKT). This is the most common string format and is quite human-readable.
 Well-Known Binary (WKB). This is a more compact binary representation that is useful for
interchange between computers.
 Geography Markup Language (GML). This is the XML-based representation for spatial data.
All CLR data types must implement two string-related methods. The Parse method is used to convert a
string to the data type and the ToString method is used to convert the data type back to a string. Both of
these methods are implemented in the spatial types and both assume a WKT format.
Several variations of these methods are used for input and output. For example, the STAsText method
provides a specific WKT format as output and the AsTextZM method is a Microsoft extension that
provides the Z and M values in addition to the two-dimensional coordinates.
Question: Why is there a need to represent spatial data types as strings within SQL Server?
OGC Methods and Properties
Key Points
A wide variety of OGC methods and properties has
been provided in spatial data in SQL Server, along
with a number of OGC-defined collections. Several
of the common methods and properties are
described here, but many more exist.
Common Methods
Common OGC methods include:
 The STDistance method, which returns the distance between two spatial objects. Note that this does
not only apply to points. It is also possible to calculate the distance between two polygons. The result
is returned as the minimum distance between any two points on the polygons.
 The STIntersects method, which returns 1 when two objects intersect and returns 0 otherwise.
 The STArea method, which returns the total surface area of a geometry instance.
 The STLength method, which returns the total length of the objects in a geometry instance. For
example, for a polygon, STLength returns the total length of all line segments that make up the
polygon.
 The STUnion method, which returns a new object that is formed by uniting all points from two
objects.
 The STBuffer method, which returns an object whose points are within a certain distance of an
instance of a geometry object.
Common Collection Properties

SQL Server also provides support for several collections that are defined in the OGC specifications. It is
possible to hold a geometry data type in a GeometryCollection object and it can contain several other
nested geometry objects. Properties such as STPointN and STGeometryN provide access to the
members of these collections.
Microsoft Extensions
Key Points
In addition to the OGC properties and methods,
Microsoft has provided several useful extensions to
the standards. Several of these extensions are
described in this topic, but many more exist.
Common Extensions
Although the coverage that the OGC specifications
provide is good, Microsoft has enhanced the data
types by adding properties and methods that
extend the standards. Note that the extended
methods and properties do not have the ST prefix.
The MakeValid method takes an arbitrary shape and returns another shape that is valid for storage in a
geometry data type. SQL Server produces only valid geometry instances, but enables you to store and
retrieve invalid instances. You can retrieve a valid instance that represents the same point set of any
invalid instance by using the MakeValid method.
You can use the Reduce method to reduce the complexity of an object while attempting to maintain the
overall shape of the object.
The IsNull method returns 1 if an instance of a spatial type is NULL; otherwise it returns 0.
The AsGML method returns the object encoded as GML.
An example of GML is shown here:

GML
<Point xmlns="http://www.opengis.net/gml">
<pos>12 15</pos>
</Point>
GML is excellent for information interchange, but you can see that the representation of objects in XML
can quickly become very large.
The BufferWithTolerance method returns a buffer around an object, but uses a tolerance value to allow
for minor rounding errors.
Demonstration: Working with Spatial Data Types

 Work with spatial data types in SQL Server
Demonstration Steps
Work with spatial data types in SQL Server

Lesson 3
Using Spatial Data in Applications
After you have learned how spatial data is stored and accessed in SQL Server, you need to understand the
implementation issues that need to be addressed when you are building applications that use spatial data
in SQL Server. In particular, you can create spatial indexes to improve the performance of the applications
and you need to understand how spatial indexes work and which methods they are useful for to build
performant applications.
Lesson Objectives
 Describe the need for spatial indexes.
 Explain the basic tessellation process that is used within spatial indexes in SQL Server.
 Describe the different types of spatial indexes.

 Implement spatial indexes.
 Explain which geometry methods can benefit from spatial indexes.
 Explain which geography methods can benefit from spatial indexes.
 Describe options for extending spatial data support in SQL Server.
Performance Issues in Spatial Queries
Key Points
Spatial queries can often involve a large number of
data points. Executing methods such as
STIntersects for a large number of points is slow.
Spatial indexes help to avoid unnecessary
calculations. Otherwise, complex geometric
calculations are involved.
Question: What would be the challenge in

locating the intersecting streets in your suburb
or region?
Question: Which streets would you need to check?

Question: How could you minimize the problem of needing to check all of the streets?
Tessellation Process
Key Points
Spatial indexes help to avoid unnecessary
calculations by breaking down larger problems into
problems that need to be solved and problems that
do not.
Tessellation
In the example from the discussion where you were
considering how to find streets that intersect your
suburb or region, the biggest problem is that
checking every street in the state or worse, in the
country, would take a very long time. The irony of this is that almost all of these calculations would return
an outcome that showed no intersection.
To avoid making unnecessary calculations, SQL Server breaks the problem-space into relevant areas by
using a four-level grid. Each grid level consists of several cells.
The basic idea is that if your suburb is located within a region of cells, any streets that do not extend into
those cells do not need to be checked. You can use grid levels to quickly isolate large areas that do not
need to be checked.
For example, if you are checking for streets that are part of Vienna, you do not need to check for streets
that are contained entirely within Paris. Moreover, you do not need to check any street that is contained
entirely within France. You can quickly eliminate those as not being of interest to you.
Spatial Indexes
Key Points
Spatial indexes are unlike standard relational
indexes. Instead of locating specific rows to be
returned, queries that use spatial indexes operate in
a two-phase manner. In the first phase, possible
candidates are found. In the second phase, the
returned list of candidates is individually checked.
Spatial Indexes
When you traverse a clustered or nonclustered
index on a SQL Server table, you apply the
predicates in the WHERE clause to filter the specific rows. After you have applied the predicate, you are
left with precisely the rows that you require. Spatial indexes work in a different way. Instead of precisely
locating the specific rows, you can use spatial indexes to locate rows that could potentially be of interest.
You saw in the last topic how you can apply tessellation to minimize the number of calculations that need
to be performed. Spatial indexes use this tessellation process to quickly reduce the overall number of rows
to a list of candidate rows that might potentially be of interest. In the street-based example that was
mentioned previously, if Vienna was contained inside a grid cell and a street entered that cell, you still do
not know if the street actually intersects the boundaries of Vienna. However, you know that you need to
check whether it does, because it is possible that it might.
Primary and Secondary Filters

Spatial indexes are used as a primary filter on the data. The filter returns all possible candidate rows. There
may still be false positives after you have applied the filter. You use the secondary filter to locate the
precise rows of interest. It executes the methods in the WHERE clause of the query on the filtered set of
candidate rows. This greatly reduces the number of calculations that SQL Server needs to make as long as
the spatial index has been effective.
To enable a check on the effectiveness of the primary filter, SQL Server provides a Filter method that only
applies to the primary filter. You can then compare the number of rows that the Filter method returns to
the total number of rows to see how effective the spatial index has been. This will be shown in the next
demonstration.
Implementing Spatial Indexes
Key Points
Spatial indexes are created by using the CREATE
SPATIAL INDEX statement. Indexes on the
geometry data type should specify a
BOUNDING_BOX setting.
CREATE SPATIAL INDEX

Spatial indexes are created in a similar way to
nonclustered indexes. You execute the CREATE
SPATIAL INDEX statement, providing a name for
the index, the table on which the index is to be
created, and the spatial data column that needs to be indexed. The table must have a clustered primary
key before you can build a spatial index on it.
Index Bounds
Unlike more traditional types of index, a spatial index is most useful when it knows the overall area that
the spatial data covers. Spatial indexes that are created on the geography data type do not need to
specify a bounding box because the data type is naturally limited by the Earth itself.
Spatial indexes on the geometry data type specify a BOUNDING_BOX setting. This provides the
coordinates of a rectangle that would contain all possible points or shapes of interest to the index. The
geometry data type has no natural boundaries, so specifying a bounding box enables SQL Server to
produce a more useful index. If values arise outside the bounding box coordinates, the primary filter
would need to return the rows in which they are contained.
Grid Density
SQL Server also enables you to specify grid densities when you are creating spatial indexes. You can
specify a value for the number of cells in each grid for each grid level in the index:
 A value of LOW indicates 16 cells in each grid or a 4 × 4 cell grid.

 A value of MEDIUM indicates 64 cells in each grid or an 8 × 8 cell grid.
 A value of HIGH indicates 256 cells in each grid or a 16 × 16 cell grid.
Spatial indexes are also different from other types of index because it might make sense to create multiple
spatial indexes on the same table and column. Indexes that have one set of grid densities might be more
useful than a similar index that has a different set of grid densities for locating data in a specific query.
To make spatial indexes easier to configure, SQL Server 2012 introduced automatic grid density and level
selections: GEOMETRY_AUTO_GRID and GEOGRAPHY_AUTO_GRID. The automated grid configuration
defaults to an eight-level grid.
Limitations
Spatial indexes do not support the use of ONLINE build operations, which are available for other types of
index in SQL Server Enterprise.
Geometry Methods That Spatial Indexes Support
Key Points
Not all geometry methods and not all predicate
forms can benefit from the presence of spatial
indexes. The table on the slide shows the specific
predicates that can potentially make use of a spatial
index as a primary filter. If the predicate in your
query is not in one of these forms, spatial indexes
that you create will be ignored.
Geography Methods That Spatial Indexes Support
Key Points
In a similar way to the geometry data type, not all
geography methods and not all predicate forms can
benefit from the presence of spatial indexes. The
table on the slide shows the specific predicates that
can potentially make use of a spatial index as a
primary filter. Unless the predicate in your query is
in one of these forms, spatial indexes that you
create will be ignored.
Extending Spatial Data Support in SQL Server
Key Points
An active community that contributes user-created
extensions to spatial data in SQL Server exists on
the CodePlex site. The functions, types, and
aggregates that were present in the
sqlspatial.codeplex.com project at the time of
writing this module are listed on the slide. As the
project continues to evolve, the capabilities that are
provided in the project will change. You may be
able to use some of these extensions directly. They
may also be useful as starting points when you
create your own extensions to the spatial data in SQL Server. In addition, note that several additional
built-in aggregates were added to the spatial data in SQL Server in SQL Server 2012.
Demonstration: Using Spatial Data in Applications

 Use spatial data in SQL Server to solve some business questions
Demonstration Steps
Use spatial data in SQL Server to solve some business questions


Lab: Working with Spatial Data in SQL Server

Scenario
Your organization has only recently begun to acquire spatial data within its databases. The new Marketing
database was initially designed before the company began to implement spatial data. One of the
developers has provided a table of the locations where prospects live. The table is called
Marketing.ProspectLocation. A second developer has added columns to it for Latitude and Longitude
and geocoded the addresses. You will make some changes to the system to help support the need for
spatial data.
Stored Procedure Specifications
Stored procedure name Marketing.GetNearbyProspects
Input parameters @ProspectID int

@DistanceInKms int
Output rowset columns Distance (in Kms)

ProspectID
LastName
FirstName
WorkPhoneNumber
CellPhoneNumber
AddressLine1
AddressLine2
City
Longitude
Latitude
Output order Distance
Objectives
 Become familiar with the geometry data type.
 Added spatial data to a table.

Password: Pa$$w0rd
Exercise 1: Become Familiar with the geometry Data Type

Scenario
In this lab, you have decided to learn to write queries by using the geometry data type in SQL Server. You
will review and execute scripts that demonstrate querying techniques.
2. Review and Execute the Sample Script

 Task 2: Review and Execute the Sample Script

1. Open D:\Labfiles\Lab14\Starter\Lab Exercise 1.sql.
2. Review the query, execute the query, and then review the results for scripts 19.1 to 19.9.
Results: After this exercise, you should have seen how to work with the geometry data type.
Exercise 2: Add Spatial Data to an Existing Table

Scenario
In this lab, you need to modify an existing table, Marketing.ProspectLocation. You need to replace the
existing Latitude and Longitude columns with a new Location column that has the geography data
type. You need to migrate the data to the new Location column before you delete the existing Latitude
and Longitude columns.
1. Add a Location Column
2. Assign Values to the Location Column
3. Drop the Existing Latitude and Longitude Columns
 Task 1: Add a Location Column

1. Add a Location column to the Marketing.ProspectLocation table in the MarketDev database.
 Task 2: Assign Values to the Location Column

1. Write code to assign values to the Location column based on the existing Latitude and Longitude
columns.
 Task 3: Drop the Existing Latitude and Longitude Columns

1. When you are sure that the new column has correct data, drop the existing Latitude and Longitude
columns.
Results: After this exercise, you should have replaced the existing Longitude and Latitude columns with
a new Location column.
Question: Where would you imagine you might use spatial data in your own business
applications?

Best Practice: Set the SRID for geometry objects to 0 to ensure that it is always possible to
perform operations on multiple geometry objects.
Use a CHECK constraint to ensure that the SRID values for a column are consistent across all rows.
Before you create spatial indexes, make sure that the queries that need to be executed against
the data use predicate forms that are supported by the types of index that you are creating.
Review Question(s)
Question: What is the main difference between the geometry and geography data types?
Question: When you are defining a polygon, why does it matter how you specify the order
of the points?
15-1
Module 15
Incorporating Data Files into Databases
Contents:
Lesson 1: Considerations for Working with Data Files in SQL Server 2014 15-2
Lesson 2: Implementing FILESTREAM and FileTables 15-9
Lesson 3: Searching Data Files 15-16
Lab: Implementing a Solution for Storing Data Files 15-23
Module Overview
Organizations store and manage data files in a wide range of formats. Very often, this data is stored on
the file system of the server operating system, but organizations are increasingly choosing to integrate
data files into their relational databases because of the benefits this can bring.
This module provides an overview of the options for storing data in a database in Microsoft® SQL
Server® 2014 data management software, and the benefits of each storage type. It also explains the key
factors to consider when you are planning to incorporate data files into your databases, and describes
how you can use full-text indexes and semantic search functionality to search documents in various ways.
Objectives
 Describe the options for storing data files in SQL Server 2014 and plan an appropriate storage
solution for a given scenario.
 Explain how to implement FILESTREAM and FileTables.
 Describe the benefits of full-text indexing and semantic search functionality, and explain how to use
these features to search data in data files.
15-2 Incorporating Data Files into Databases
Lesson 1
Considerations for Working with Data Files in SQL Server
2014
SQL Server 2014 provides various ways to incorporate data files into a database. This lesson describes the
benefits and challenges of storing data files in databases, reviews the options for storing data files in a
SQL Server 2014 database, explains the considerations for planning to store data files, and explains how to
choose the appropriate storage solution for a given scenario.
Lesson Objectives
 Explain the benefits and challenges of storing data files in databases.
 Describe the options for storing data files in a SQL Server 2014 database.
 Choose an appropriate storage option for data files for a given scenario.
Considerations for Storing Data Files

The storage of data files presents several challenges
to organizations, including:
 Increased management and storage costs. Data

files can consume large amounts of storage
space and they require ongoing management
and maintenance.
 Ensuring that data files are secure.
 Identifying and accessing textual data. Textual

data often contains information that can be
very useful, but it can be difficult to identify
and access this information efficiently and
reliably.
 Maintaining referential and transactional integrity. Some data might relate directly to data that is
stored in a database, so the organization might want to make this relationship explicit. For example, a
company might hold a résumé of each employee as a data file and want to explicitly relate this file to
the rest of the information about the employee that is held in a relational database.
 Enabling developers to create applications that can easily and efficiently make use of data files.
Benefits of Incorporating Data Files into Databases

Storing data files alongside relational data in a database can have several potential benefits, including:
 Centralized storage for relational and nonrelational data, which can streamline management and
maintenance tasks such as performing backups, and helps to reduce the effort that is required to
manage the data.
 Access to both data files and relational data can be controlled through database security, which
simplifies security and reduces the possibility of errors in security configuration.
 The ability to perform better and more efficient searches by using the built-in indexing features in the
database management system.
 The relational database engine can maintain referential and transactional consistency between data
files and relational data.
 Reduced complexity for developers who write applications that use data files.
Options for Storing Data Files in SQL Server 2014

SQL Server 2014 provides various mechanisms for
incorporating data files into databases.
Storing Data Files on the File System

This approach involves storing data files on the file
system of an operating system or in a dedicated
binary large object (BLOB) store, and adding a
varchar() column that contains a path that points
to the data in the appropriate SQL Server table. The
advantages of this approach include:
 It requires minimal configuration in the
database.
 It can offer better read performance than simply storing data in the database by using a data type
such as image or varbinary(max); however, this depends on the size of each file and to some extent,
on the type of file that you are storing. For example, for files that are larger than 1 MB in size, or for
streaming video files, the file system will typically deliver better read performance.
 The file system will typically handle fragmentation better than SQL Server for files that are frequently
modified.
The disadvantages of this approach include:
 There is no single point of management for the data in the database and the data in the file system,
so you will need to plan separate maintenance and backup schedules.
 There is no mechanism for maintaining transactional and relational integrity.

 There is no single point of access to the data, so you will need to maintain separate security for the
data in the database and the data in the file system.
 Maintaining two separate stores adds an extra layer of complexity for developers who write
applications that use the data.
 You cannot take advantage of integrated services such as Full-Text Search and Statistical Semantic
Search to query textual data.
Storing Data Files As BLOBs in the Database

This approach involves integrating data files more fully into a SQL Server database by storing files as
binary large objects. BLOB data is typically much larger than the maximum row size of 8,060 bytes in SQL
Server, so it is not possible to store BLOB data in the data pages alongside the other row data. Instead,
SQL Server stores BLOB data in a binary tree (B-Tree) that consists of a set of dedicated data pages.
Earlier versions of SQL Server included the image data type for storing BLOBs. This data type adds a
pointer to each row that points to the location of the data in the BLOB data files. The image data type is
still available in SQL Server 2014, but because it is now deprecated, you should not use it in any new
databases.
To store BLOB data in a SQL Server 2014 database, you can use the varbinary(max) large-value data
type. The varbinary(max) data type enables you to store binary data up to 2,147,483,647 bytes
(approximately 2 GB) in size. When you designate a column as varbinary(max), SQL Server stores data
that is up to 8,000 bytes in size in the same data page as the rest of the row. For data that is larger than
8,000 bytes in size, SQL Server stores the data in separate data pages and includes a pointer to the pages
in the row. This flexible approach to storage can help to improve read performance when BLOBs are
smaller than 8,000 bytes, because SQL Server can read the data directly from the table’s data pages, so it
does not have to incur extra page reads by accessing the dedicated BLOB data pages.
Note: You can use the large value types out of row table option of the sp_tableoption
stored procedure to change the way that SQL Server stores varbinary(max) data. A value of 0
forces SQL Server to store BLOBs that are smaller than 8,000 bytes in the appropriate data row
and to store BLOBs that are larger than 8,000 bytes in the BLOB storage pages. A value of 1
forces SQL Server to always store the data in varbinary(max) columns in the BLOB storage
pages, regardless of the size of the BLOB. By default, the large value types out of row table
option is set to 0.
The advantages of storing data files by using the varbinary(max) data type include:
 Streamlined management of data because all data is stored in the database.
 SQL Server maintains transactional and relational integrity for BLOB data.
 A single point of configuration for security.
 Reduced complexity for developers who write applications that use the data.
 The ability to take advantage of integrated services such as Full-Text Search and Semantic Search to
query textual data.
The disadvantages of storing data files by using the varbinary(max) data type include:
 You can only access BLOB data that is stored as varbinary(max) programmatically; you cannot access
the BLOB data directly through the file system. For example, if you store Microsoft Word documents
by using varbinary(max), you cannot open these documents directly by using Word.
 Read performance is directly affected by the number of pages that are required to store each BLOB;
more page reads leads to slower response times, so performance can be degraded for large BLOBs
that require many data pages.
Storing Data Files by Using FILESTREAM

FILESTREAM, which was introduced in SQL Server 2008, combines the performance advantages of storing
data on the file system with the advantages of storing data as BLOBs in the database. FILESTREAM is not a
data type in its own right; it is an attribute of the varbinary(max) data type. When you enable the
FILESTREAM attribute on a varbinary(max) column, SQL Server stores BLOB data for that column on the
NTFS file system (NTFS). You can subsequently access the data by using high-performance Win32
application programming interfaces (APIs), or by using Transact-SQL. The use of Win32 APIs for data
access offers better performance and, because FILESTREAM uses the NTFS cache to cache BLOB file data
instead of the SQL Server buffer pool, the SQL Server buffer pool remains free for processing other
queries.
In addition to delivering better performance, FILESTREAM also offers greater flexibility by enabling you to
store BLOBs that are larger than 2 GB. When you create a FILESTREAM column in a table, BLOB size is
only limited by the size of the volume on which the data is stored.
The advantages of using FILESTREAM to store BLOB data include:
 Improved performance over storing BLOBs in SQL Server data pages when BLOBs are larger than
1 MB on average.
 Integrated management and security.
 SQL Server maintains transactional and relational integrity for BLOB data.
 Reduced complexity for developers who write applications that use the data.
 The ability to take advantage of integrated services such as Full-Text Search and Semantic Search to
query textual data.
 The ability to store BLOBs that are larger than 2 GB.
The disadvantages of using FILESTREAM to store BLOB data include:
 You can only access BLOB data that is stored in FILESTREAM columns programmatically; you cannot
access the BLOB data directly through the file system.
 You must store FILESTREAM BLOBs on the same server where the database data files are located or
on a storage area network (SAN). You cannot use a remote location such as a shared folder on the
network for BLOB storage.
Storing Data Files by Using a FileTable

FileTable was a new feature in SQL Server 2012 that extends the functionality of FILESTREAM, making
working with stored files easier and more flexible. A limitation of FILESTREAM is that files in the
FILESTREAM column are only accessible programmatically; for example, you cannot open a FILESTREAM
file that has a .docx extension by using Word, and you cannot save a file into a FILESTREAM column by
simply saving to the FILESTREAM folder. A FileTable overcomes this limitation by enabling users to have
nontransactional access to the files, which enables users to access FILESTREAM files in the same way that
they access files that are not stored in SQL Server. Furthermore, for FileTables, unlike FILESTREAM, you can
use a network share as the location for BLOB storage. By using FileTable, you eliminate the need to store
one copy of your files in the database so that you can index and search them, and store another copy of
the same files separately in a file share so that users can open and interact with them.
A FileTable is a SQL Server table that has a predefined schema. The columns in a FileTable include a
varbinary(max) column that has the FILESTREAM attribute enabled, and a series of metadata columns
that store information including the file size, file creation time, and the last write time. FileTable files are
part of a hierarchy that includes a database-level directory and a separate directory for each FileTable in
the database. Each row in a FileTable represents either a file or a directory in the FileTable shared
directory. Each FileTable has two columns, which are called path_locator and parent_path_locator. These
columns use the hierarchyid data type to keep track of the place of each file and folder in the FileTable
folder hierarchy.
To use FILESTREAM, you must create or alter a database to set the NON_TRANSACTED_ACCESS option
on a database that has a filegroup that is configured for FILESTREAM. Nontransactional access enables
access to BLOB files through the file system, but enabling it means that you cannot restore BLOB data to a
specific point in time. You can configure NON_TRANSACTED_ACCESS by using the following values:
 FULL. When you set NON_TRANSACTED_ACCESS to FULL, you can read and write files and folders
by using the FILESTREAM shared directory. For example, you can drag a new file to the directory, and
this file is stored in the FileTable.
 READONLY. When you set NON_TRANSACTED_ACCESS to READONLY, you can read files in the
FILESTREAM shared directory, but you cannot modify them or save new files to the directory.
 OFF. When you set NON_TRANSACTED_ACCESS to OFF, you cannot access BLOB files in the
FILESTREAM shared directory. However, you can still access the files programmatically.
Storing Data Files on a Dedicated Remote BLOB Store

The Remote BLOB Store (RBS) add-on in SQL Server enables you to use a third-party, dedicated BLOB
storage solution to store BLOB data. Offloading the storage of BLOB data in this way can boost
performance by ensuring that server resources remain available. However, RBS can itself incur significant
overheads and typically is of greatest benefit with massive databases that contain a high proportion of
unstructured binary data.
Reference Links: For more information about RBS, see the Remote Blob Store (RBS) (SQL
Server) topic in SQL Server Books Online.
Planning a Solution for Storing Data Files

When you are planning the storage of data files,
there are numerous factors that will influence your
choice of storage solution. The key factors to
consider include:
 The effect on performance of storing and
querying data files. In general, for
performance reasons, you should avoid using
varbinary(max) without FILESTREAM enabled
for files that are larger than 1 MB in size.
However, this is not a hard and fast rule and
you should test query performance for your
specific environment.
 Ease of manageability. For example, will BLOB data be included in database backups, or will you
have to schedule separate backups?
 Ease of configuring and maintaining security. The simpler the security model, the less prone it will
be to configuration errors.
 The need to maintain nontransactional file access through the file system. Do users need to
access the files that you want to store by using programs such as Word?
 Ease of development effort for applications that use the data. It will be less complex for
developers if all of the data is in a single store.
 The maximum size of the data files. When you use varbinary(max), the largest file size that you
can accommodate is 2 GB. Other storage mechanisms do not have this limitation.
 The need to maintain transactional integrity for data files and relational data. Transactional
integrity is required for point-in-time restores, but it is not possible to provide transactional integrity
when you use a FileTable that has full nontransactional access configured.
 The need to perform full-text and semantic searches on data files. To use full-text indexes and
semantic searches, the data needs to be stored in SQL Server.
 The location of the directory that will host the files. Storage can be either local or on the network,
but not all of the available solutions will support network storage.
The following matrix compares the various storage options against the factors that are described in the list
above.
Store data by Store data by

Store data on Store data by
using using
the file system using a FileTable
varbinary(max) FILESTREAM
Performance Good, uses the Good when Good, uses Good, uses file
file system files are file system system
smaller than streaming streaming
1 MB on
average
Manageability Requires Enables Enables Enables

separate integrated integrated integrated
management of management management management of
relational data of all data, of all data, all data,
and data files, including including including
including backups backups backups
separate
backups
Security Requires Enables Enables Enables

separate integrated integrated integrated
security for security for all security for security for all
relational data data all data data
and data files
Nontransactional All access is Not available, Not Enables both

access through nontransactional transactional available, transactional
file system access only transactional and
access only nontransactional
access
Development Greater because Lower because Lower Lower because

effort of separate of integrated because of of integrated
storage storage integrated storage
storage
Maximum size of Size limited by Maximum of 2 Size limited Size limited by

files size of volume GB by size of size of volume
volume
Maintenance of No transactional Transactional Transactional Depends on

transactional integrity integrity integrity configuration
integrity maintained maintained
Ability to Cannot perform Can perform Can perform Can perform

perform integrated SQL full-text and full-text and full-text and
integrated Server searches semantic semantic semantic
searches searches in searches in searches in SQL
SQL Server SQL Server Server
Storage of data Yes No No Yes

on a remote
network share
Note: The matrix does not include RBS because factors such as manageability and
performance depend on the RBS solution that you use.
Lesson 2
Implementing FILESTREAM and FileTables
FILESTREAM and FileTables enable you to integrate the storage of data files on the file system with data
that is stored in your relational database, while maintaining good levels of performance and fast response
times. This lesson describes the benefits that these technologies offer, and explains the considerations for
implementing and working with FILESTREAM and FileTables. The lesson also demonstrates how to enable
FILESTREAM and implement a FileTable.
Lesson Objectives
 Describe the considerations for implementing FILESTREAM.
 Describe the considerations for implementing FileTables.
 Describe the options for accessing FILESTREAM data and FileTable data.
Considerations for Implementing FILESTREAM

Before you can use FILESTREAM, you must enable it
for the SQL Server instance. You can enable
FILESTREAM during SQL Server installation, or you
can enable it after performing an installation.
Enabling FILESTREAM after installation is a two-step
process:
1. Use SQL Server Configuration Manager to

enable FILESTREAM for the SQL Server instance.
This step makes the necessary configuration
changes to enable the Windows® operating
system to support FILESTREAM.
2. Use the sp_configure stored procedure to

configure the level of access for FILESTREAM. This step makes the required changes to SQL Server
configuration to support FILESTREAM. When you use sp_configure, you should specify the
filestream_access_level option. Valid values for the filestream_access_level option are:
o 0. This disables FILESTREAM.
o 1. This enables FILESTREAM for Transact-SQL access only.
o 2. This enables FILESTREAM for Transact-SQL access and Win32 streaming access.
After you have enabled FILESTREAM, you should restart the SQL Server service.
The following code example shows how to use sp_configure to configure the FILESTREAM access level for
a SQL Server instance.
Enabling FILESTREAM by Using sp_configure

EXEC sp_configure filestream_access_level, 2
RECONFIGURE;
GO
Creating a FILESTREAM-Enabled Database

To use FILESTREAM in a database, you must create a dedicated filegroup for FILESTREAM data. You can
do this by specifying the CONTAINS FILESTREAM clause when you create the filegroup.
The following code example creates a database that contains a dedicated FILESTREAM filegroup.
Creating a Database That Has a FILESTREAM Filegroup

CREATE DATABASE Archive
ON
PRIMARY (NAME = Archive1,
FILENAME = 'D:\Data\archivedat1.mdf'),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM( NAME = Archive3,
FILENAME = 'D:\Data\filestream1')
LOG ON ( NAME = Archivelog1,
FILENAME = 'D:\Data\archivelog1.ldf');
GO
Creating a Table That Has a FILESTREAM Column

To use FILESTREAM in a table, you should use a varbinary(max) column with the FILESTREAM attribute.
You can do this as part of either a CREATE TABLE statement or an ALTER TABLE statement.
The following code example creates a table that has a FILESTREAM column.
CREATE TABLE Statement That Includes a FILESTREAM Column

CREATE TABLE Archive.dbo.Records
(
[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
[SerialNumber] INTEGER UNIQUE,
[Chart] VARBINARY(MAX) FILESTREAM NULL
);
GO
Note: You cannot enable the FILESTREAM attribute on an existing column. To convert an
existing varbinary(max) column to FILESTREAM, you should first create a new FILESTREAM-
enabled column, and then copy the data from the existing column into the new column.
Considerations and Limitations

When you are designing a database that will include FILESTREAM data, you should consider the following:
 FILESTREAM requires a uniqueidentifier ROWGUID column.
 You should place the filegroups that contain FILESTREAM data on separate volumes from the
operating system files, paging files, SQL Server database and log files, and SQL Server tempdb.
 FILESTREAM data is included in database backups and restores, so you do not need to maintain a
separate backup for data files.
 For BLOBs that are smaller than 1 MB in size, FILESTREAM may not perform as well as storing data as
varbinary(max) without the FILESTREAM attribute enabled.
 FILESTREAM is not compatible with database mirroring.
 When you enable transparent database encryption for a database, the data in the FILESTREAM
column is not encrypted.
Reference Links: For more information about the best practices for using FILESTREAM, see
the FILESTREAM Best Practices topic in SQL Server Books Online.
Considerations for Implementing FileTables

To enable a database to use FileTables, you need to
configure a directory that will contain all FileTable
data for that database. You need to place this
directory in the share that you created when you
enabled FILESTREAM. When you create FileTables in
the database, each FileTable in the database will
have its own subdirectory in the FileTable directory
for the database. You can specify the FileTable
directory for a database by using the
DIRECTORY_NAME options with either the
CREATE DATABASE or ALTER DATABASE
statement. You can set the DIRECTORY_NAME
option at the same time as the NON_TRANSACTED_ACCESS option, which you learned about in the
previous lesson.
The following code example configures the HumanResources database for FileTables.
Setting the NON_TRANSACTED_ACCESS and DIRECTORY_NAME Options for the HumanResources

Database
ALTER DATABASE Archive
SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'Archive_Files' );
GO
Configuring a FileTable
To create a FileTable, you use the AS FILETABLE option with a CREATE TABLE statement and specify a
name for the FileTable directory that will contain the data for the FileTable. This directory will be a
subdirectory of the folder that you created by setting the DIRECTORY_NAME option.
The following code example creates a FileTable named Images.
Creating a FileTable
USE Archive;
GO
CREATE TABLE Images As FileTable
WITH
(FileTable_Directory = 'Images');
GO
When the NON_TRANSACTED_ACCESS option is set to FULL, you can add files to a FileTable by saving
them or dragging them into the FileTable shared folder. For example, the Images FileTable in the
preceding code example would have a Universal Naming Convention (UNC) path of \\MIA-
SQL\MSSQLSERVER\Archive_Files\Images, where MIA-SQL is the instance name, MSSQLSERVER is the
instance-level FILESTREAM share name, and Archive_Files is the FileTable directory for the database. After
you create the FileTable, you can open and work with the files in the same way that you would if they
were not contained in a FileTable.
Note: You can access FileTable files by navigating to the shared folder by using the UNC or
by navigating to the folder on the local file system. If you use the latter method to open FileTable
files, you will not be able to open files that were created by using applications that use memory-
mapped files. Examples of applications that use memory-mapped files include Notepad and
Paint. However, you can open these files by using the UNC path.
Considerations and Limitations

When you are planning to use FileTables in your databases, you should consider the following points:
 FileTables do not support the following SQL Server features:
o Table partitioning.
o Database replication.
o Transactional rollbacks and point-in-time recovery.
 FileTables support the following SQL Server features with some limitations:
o Including a database with a FileTable in an AlwaysOn availability group changes the way that
failover works. After failover, you can still access FileTable data on the primary replica, but you
cannot access it on readable secondary replicas.
o FileTables support AFTER triggers for data manipulation language (DML) operations, but they do
not support INSTEAD OF triggers for DML operations. FileTables fully support data definition
language (DDL) triggers.
o You can create views on FileTables, but you cannot include FILESTREAM data in indexed views.
FileTable is an extension of FILESTREAM, so the same restriction applies to them.
Reference Links: For more information about compatibility with other SQL Server features,
see the FileTable Compatibility with Other SQL Server Features topic in SQL Server Books Online.
For more information about using FileTable with AlwaysOn Availability Groups, see the
FILESTREAM and FileTable with AlwaysOn Availability Groups topic in SQL Server Books Online.
Accessing FILESTREAM Data and FileTable Data

You can use the built-in system functions to enable
applications to access the paths to FILESTREAM and
FileTable data.
FileTableRootPath
The FileTableRootPath function returns the root-
level UNC path for FileTables for the current
database or for a specified FileTable. You can
format the path into the format that your
application requires by setting the @option
argument as follows:
 A value of 0 returns the path with the server

name converted to NetBIOS format. This is the default value for the @option argument.
 A value of 1 does not convert the path.
 A value of 2 returns the path with the server name displayed as a fully qualified domain name
(FQDN).
The following code example returns the root-level UNC path for the current database.
The FileTableRootPath Function

USE Archive;
GO
SELECT FileTableRootPath() AS [Full Path];
GO
GetFileNamespacePath
The GetFileNamespacePath function returns the UNC path to a directory or file in the FileTable directory
hierarchy. You can use the is_full_path argument with a value of 1 to return the full UNC path or with a
value of 0 to return the relative path. 0 is the default value.
You can also set the @option argument to determine the formatting of the path in the same way that you
can for the FileTableRootPath function.
The following code example returns the relative paths of the files in the Images FileTable in the Archive
database.
The GetFileNamespacePath Function

USE Archive;
GO
SELECT file_stream.GetFileNamespacePath() AS [Relative Path] FROM Images;
GO
You can use the FileTableRootPath and GetFileNamespacePath functions to avoid the use of hard-
coded file paths in applications. By using these functions to return the required paths, you can help to
ensure that applications can function regardless of changing environmental factors, such as databases
that are hosted on different servers.
Reference Links: For more information about using the FileTableRootPath and
GetFileNamespacePath functions, see the Work with Directories and Paths in FileTables article in
the MSDN library.
GetPathLocator
The GetPathLocator function returns the hierarchyid value for a FileTable file or directory. You must
supply the path name to the file or directory.
The following code example uses the GetPathLocator function to return the path locator hierarchyid
value for the Images FileTable directory.
The GetPathLocator Function

USE Archive;
GO
SELECT GetPathLocator(‘\\MIA-SQL\MSSQLSERVER\Archive_Files\Images);
GO
When you migrate files from the file system to a FileTable, you can use GetPathLocator to replace the
original UNC path for each file in the metadata with the FileTable UNC path, which helps to ensure that
the metadata for the files is correct.
Reference Links: For more information about using GetPathLocator when you are
migrating files from the file system to a FileTable, see the Load Files into FileTables article in the
MSDN library.
PathName
The PathName function returns the path of a BLOB in a FILESTREAM column. You can include the
@option argument to obtain the path in the correct format for your applications. The values for @option
are the same as the values that were described in the FileTableRootPath section at the beginning of this
topic.
Demonstration: Configuring FILESTREAM and FileTable

 View FILESTREAM configuration
Configure FILESTREAM access and create a FileTable
Demonstration Steps
View FILESTREAM configuration
3. On the taskbar, click Start to open the Start page.
4. Click the down arrow to view all apps.
5. Scroll across to the Microsoft SQL Server 2014 group.
6. Click SQL Server 2014 Configuration Manager.
7. In the User Account Control dialog box, click Yes.

8. In SQL Server Configuration Manager, click SQL Server Services, right-click SQL Server
(MSSQLSERVER), and then click Properties.
9. In the SQL Server (MSSQLSERVER) Properties dialog box, click the FILESTREAM tab.
10. Note that FILESTREAM is enabled for Transact-SQL access, File I/O access, and remote access, and that
the FILESTREAM share is named MSSQLSERVER, and then click Cancel.
11. Close SQL Server Configuration Manager.
Configure FILESTREAM access and create a FileTable
2. In the Connect to Server dialog box, in the Server Name field, type MIA-SQL and then click
Connect.
3. In SQL Server Management Studio, click File, point to Open, and then click File.
4. In the Open File dialog box, browse to D:\Demofiles\Mod15, and then double-click FilesDemo.sql.
5. In the query window, under the Enable filestream comment, highlight the Transact-SQL statement,
and then click Execute.
6. Under the Create filestream database comment, highlight the Transact-SQL statement, and then
click Execute.
7. Under the Create filetable comment, highlight the Transact-SQL statement, and then click Execute.
8. In File Explorer, in the D:\Demofiles\Mod15 folder, click Document1, press Shift, click Document3,
right-click Document3, and then click Copy.
9. On the taskbar, click Start, type Run and then press Enter.
10. In the Run dialog box, type \\localhost\MSSQLSERVER\FilestreamData\Documents and then click
OK.
11. Right-click in the empty folder, and then click Paste.
12. In SQL Server Management Studio, in the query window, under the Query FileTable comment,
highlight the Transact-SQL statement, and then click Execute.
13. Leave SQL Server Management Studio open for the next demonstration.
Lesson 3
Searching Data Files
In SQL Server, you can create full-text indexes that enable you to perform fast and efficient searches on
data files. This lesson explains the benefits of using full-text indexes and the considerations for
implementing a full-text index. It also explains the options for performing searches against full-text
indexes, and explains how you can use Semantic Search to perform more sophisticated searches.
Lesson Objectives
 Explain the benefits of using full-text indexes.
 Describe the considerations for implementing full-text indexes.
 Describe the options for querying full-text indexes.
 Explain the benefits of using Semantic Search.
Benefits of Full-Text Indexes

SQL Server Full-Text Search uses full-text indexes to
enable users to search for keywords and phrases in
table columns that contain character-based data. By
using a full-text index, you can perform several
different types of search, including:
 Simple term search. A simple term search

locates words or phrases that the user supplies
in the query.
 Prefix term search. A prefix term search

locates words that start with a specified series
of characters.
 Generation term search. A generation term search locates all inflected forms of specified words. For
example, a generation term search on the word “walk” could locate the words “walk,” “walks,”
“walked,” and “walking.”
 Proximity term search. A proximity term search locates specified words or phrases that are near to
other specified words or phrases.
 Weighted term search. A weighted term search uses supplied weighting values that are associated
with the search terms to ensure that the query returns the most relevant rows first.
 Thesaurus search. A thesaurus search identifies words that are synonyms of the search terms.
In addition to these search types, you can also use Semantic Search to perform searches to identify
documents that are stored in a varbinary(max) column and have similarities or are related in some way.
Performance
Using a full-text index delivers much better performance than simply using the LIKE predicate to identify
words or phrases in text. Furthermore, you cannot use the LIKE predicate to search formatted binary data.
Consequently, if you anticipate that users will need to perform frequent searches on data files, you should
consider creating a full-text index.
Property-Scoped Searches
SQL Server 2014 supports the searching of the properties of files in varbinary(max) columns. (This
functionality applies to varbinary(max) columns regardless of whether the FILESTREAM attribute is
enabled.) For example, you could use a property search to identify all Word documents that have a
particular author. Property-scoped searches are possible only for documents for which there is an
appropriate filter available that can recognize the properties for that document type.
Support for Multiple Languages

Full-text indexes enable you to perform searches against text in multiple languages, including languages
that have different character sets, such as Spanish and Japanese, or different varieties of the same
language, such as the different varieties of Portuguese that are spoken in Portugal and Brazil. You can use
the sys.fulltext_languages catalog view to view all of the languages that SQL Server 2014 supports for full-
text indexing.
Full-text search includes word breakers, stemmers, stoplists, and thesaurus files for each supported
language. These tools enable the identification of word boundaries, the rejection of words such as “the”
and “an” that have no inherent meaning beyond their syntactical function, and the recognition of the
conjugational forms of verbs.
Considerations for Implementing Full-Text Indexes

You can create full-text indexes on tables and
indexed views. Each table or index view can have
only one full-text index, but that index can include
up to 1,024 columns. When you plan to use full-text
indexes, you should consider the following factors:
 Supported data types. You can create a full-
text index on columns that use the following
data types: char, varchar, nchar, nvarchar,
text, ntext, image, xml, varbinary, and
varbinary(max), including varbinary(max)
with FILESTREAM enabled.
 Unique column. When you create a full-text

index on a table, you must specify a unique, non-null column as the key index column. For optimal
performance, this column should use the integer data type.
 Language support. Full-text indexes enable support for multiple languages, so you can use a single
index to service queries against data in different languages.
 Filegroup placement. Creating and populating a full-text index can incur high I/O, so for optimal
performance, you should consider placing the full-text index on a filegroup that can provide good
I/O performance and that is separate from other data. Conversely, when manageability is a greater
concern than performance, you can place a table and its associated full-text index on the same
filegroup.
 Managing updates. By default, SQL Server incrementally updates full-text indexes automatically as
changes occur in the underlying data. To minimize performance disruption at busy times, you can
configure SQL Server to update full-text indexes based on a schedule that you define, or you can
configure only manual updates. You can create a schedule for a full-text index by opening the
properties of the index, and on the Schedules page, clicking New, and specifying the schedule that
you require. Note that populating a full-text index uses a SQL Server Agent job, so you should ensure
that the SQL Server Agent service is running.
Creating a Full-Text Index

Creating a full-text index requires you to first create a full-text catalog for the database. A full-text catalog
provides a way of logically grouping full-text indexes in a database.
Note: In SQL Server 2005 and earlier, a full-text catalog was a physical structure that you
placed in a specified filegroup when you created the catalog. In SQL Server 2008 and SQL Server
2014, a full-text catalog is a logical structure that does not reside in a filegroup.
You can create a full-text catalog by using a CREATE FULLTEXT CATALOG Transact-SQL statement.
The following code example creates a full-text catalog named ArchiveCatalog.
CREATE FULLTEXT CATALOG Statement

USE Archive;
GO
CREATE FULLTEXT CATALOG ArchiveCatalog;
GO
You can use the CREATE FULLTEXT INDEX Transact-SQL statement to create a full-text index. You must
provide a name for the index, the name of the table and the names of the columns to include in the
index, the name of the column that will act as the key index column, and the name of the full-text catalog
for the index. You can also specify the language for each column.
If the index will include columns that use the varbinary(max) data type, you should also provide the
name of the column that identifies the type of document for each row in the table. This column is
specified as the TYPE COLUMN column in the CREATE FULLTEXT INDEX statement. SQL Server uses
filters to extract information from different types of documents, and can automatically select an
appropriate filter for each document type by using the information in the TYPE COLUMN column. When
you create a FileTable, the schema includes a column named file_type, which identifies the document
type for each row. You should specify this column as the TYPE COLUMN column when you are creating
full-text indexes on FileTables.
The following code example creates a full-text index in the ArchiveCatalog full-text catalog on a FileTable
named Archive.
CREATE FULLTEXT INDEX Statement

CREATE FULLTEXT INDEX ON Archive
([name] Language 1033,
[file_stream] TYPE COLUMN [file_type] Language 1033)
KEY INDEX PK_Archive
ON ArchiveCatalog;
GO
Populating a Full-Text Index

By default, when you create a full-text index, SQL Server will perform a full population to populate it. Full
population adds entries to the index for all of the rows in the indexed table. Full population is a resource-
intensive activity, so you can force SQL Server to populate the index at a later time from creating it by
using the CHANGE_TRACKING OFF, NO POPULATION clause in the CREATE FULLTEXT INDEX statement.
You can then use the ALTER FULLTEXT INDEX statement to initiate the full population at a time when
this action will have less of an impact on performance.
To keep the index up to date with the data, you can configure SQL Server to use change tracking to
perform incremental updates to the index. You can enable change tracking by using the
CHANGE_TRACKING AUTO or CHANGE_TRACKING MANUAL options with either the CREATE
FULLTEXT INDEX or the ALTER FULLTEXT INDEX statements. The AUTO option automatically updates
the index when data changes in the source table. However, because this occurs as a background process,
changes may not immediately appear in the index. The MANUAL option tracks changes to the source
data in the same way, but it does not update the index until you issue a START UPDATE POPULATION
command in an ALTER FULLTEXT INDEX statement.
Reference Links: For more information about populating full-text indexes, see the
Populate Full-Text Indexes topic in SQL Server Books Online.
Improving Performance for Full-Text Indexes

To improve the performance of your full-text indexes, you should ensure that the host server has
adequate memory and processor resources, and that the index filegroup uses a volume that provides fast
disk I/O. You can also consider the following points to help to deliver optimal performance:
 Use a column that has the integer data type as the key index column.
 Schedule regular index defragmentation for the clustered and nonclustered indexes on the source
table.
 Reorganize the full-text catalog by using the ALTER FULLTEXT CATALOG REORGANIZE statement.
Typically, SQL Server uses multiple internal tables, known as index fragments, to store data in a full-
text index. Indexes for which the source data changes frequently will often contain a large number of
fragments. Running the ALTER FULLTEXT CATALOG REORGANIZE statement merges these
fragments into a single larger fragment, which can dramatically improve performance.
Reference Links: For more information about improving performance for full-text indexes,
see the Improve the Performance of Full-Text Queries topic in SQL Server Books Online.
Querying a Full-Text Index

SQL Server includes several predicates that you can
use to query full-text indexes.
CONTAINS and FREETEXT

You can query a full-text index by using the
CONTAINS and FREETEXT predicates along with the
standard SQL Server predicates in the WHERE or
HAVING clauses of a SELECT statement. Typically,
you use CONTAINS in queries when you require an
exact match or a fuzzy match for the search term
that you provide.
The following code example uses the CONTAINS predicate to find matches for the word “Mountain” in
the Production.Product table.
The CONTAINS Predicate

SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain');
GO
Typically, you use the FREETEXT predicate when you need to identify words and phrases that match the
meaning of the search term that you provide, even if the results do not match the actual words that are
used in the search term. FREETEXT uses the thesaurus to achieve this.
The following code example uses the FREETEXT predicate to find words and phrases that match the
meaning of the term “safety components” in the Production.Document table.
The FREETEXT Predicate

SELECT Title
FROM Production.Document
WHERE FREETEXT (Document, 'safety components');
GO
CONTAINSTABLE and FREETEXTTABLE

You use the CONTAINSTABLE and FREETEXTTABLE full-text functions to return a table of rows that
match a search term. CONTAINSTABLE returns exact and fuzzy matches to search terms in the same way
that the CONTAINS predicate does, and FREETEXTTABLE returns rows that match the meaning of the
search term in the same way that the FREETEXT predicate does. The table that the functions return
includes a column named RANK, which displays a value for each row that shows how well that row
matches the search term, and a column named KEY, which contains the unique key value for each row.
When you use the CONTAINSTABLE or FREETEXTTABLE functions, you usually reference the function in
the FROM clause of a SELECT statement. You can use the values in the KEY column to join the function
output table to the source table for the full-text index, enabling you to combine data in the full-text index
with data in the source table.
When you define search terms in a full-text query, you can use the AND, OR, and NOT Boolean operators
to combine conditions.
The following code example returns all products from the Production.Product table that contain the
words "frame", "wheel", or "tire" in the product name. The words are weighted, enabling ranking of the
results in terms of the closeness of the match. The rows that match best are returned first.
The CONTAINSTABLE Full-Text Function

SELECT FT_TBL.Name, KEY_TBL.RANK
FROM Production.Product AS FT_TBL
INNER JOIN CONTAINSTABLE(Production.Product, Name,
'ISABOUT (frame WEIGHT (.8),
wheel WEIGHT (.4), tire WEIGHT (.2) )' ) AS KEY_TBL
ON FT_TBL.ProductID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC;
GO
Semantic Search
Semantic Search extends the capabilities of full-text
search to enable you to identify documents that are
similar or related in some way. For example, you
could use Semantic Search to identify the résumés
in a FileTable that relate to a specific job role.
Although a standard full-text query will reveal
résumés that contain similar keywords or phrases,
these searches may miss relevant résumés where
the author has not used the specified keywords that
are contained in the search term. By identifying
deeper semantic patterns, Semantic Search can
provide a results set that more accurately matches
the search query.
Semantic Search uses a database named the Semantic Language Statistics database, which contains the
statistical models that are used to perform semantic searches. You must install this database from the SQL
Server installation media before you can use Semantic Search.
Reference Links: For more information about installing and configuring the Semantic
Language Statistics database, see the Install and Configure Semantic Search topic in SQL Server
Books Online.
Note: Semantic Search does not support as many languages as a full-text index. To view
the list of supported languages for Semantic Search, query the sys.fulltext_semantic_languages
catalog view.
Enabling Semantic Search Functionality

You can use the CREATE FULLTEXT INDEX statement or the ALTER FULLTEXT INDEX statement to
create a full-text index by using Semantic Search.
The following code example adds Semantic Search to an existing full-text index on the Document table in
the AdventureWorks database.
Using ALTER FULLTEXT INDEX to Add Semantic Search

ALTER FULLTEXT INDEX ON Production.Document
ALTER COLUMN Document
ADD Statistical_Semantics;
GO
Semantic Search Rowset Functions

You can perform three types of search by using three different rowset functions. The following list
describes each type of search and its associated function:
 Finding key phrases in a document. You can use the SemanticKeyPhraseTable function to identify
key phrases. SemanticKeyPhraseTable returns a table that includes the following columns:
o Document_key. This column contains the key value of the document that contains the matched
term.
o Keyphrase. This column contains the matching term in the document.
o Score. This column contains a weighting value between 0 and 1 that evaluates the quality of the
match. The higher the value, the better the match.
 Finding similar or related documents. You can use the SemanticSimilarityTable function to find
related documents. SemanticSimilarityTable returns a table that includes the following columns:
o Matched_document_key. This column contains the key value of the document that is identified
as having similarities with the source document.
o Score. This column contains a weighting value between 0 and 1 that evaluates the degree of
similarity. The higher the value, the greater the similarity.
 Identifying the key phrases that make documents similar. You can use the
SemanticSimilarityDetailsTable function to identify the key phrases that make documents similar.
SemanticSimilarityDetailsTable returns a table that includes the following columns:
o Keyphrase. This column contains the phrases that are identified as making the documents
similar.
Score. This column contains a weighting value between 0 and 1 that evaluates the key phrases according
to the degree of similarity that they indicate between the two documents. The higher the value, the
stronger the link between the phrases.
Demonstration: Creating a Full-Text Index

Create and query a full-text index.
Demonstration Steps
Create and query a full-text index
2. Ensure that you have run the previous demonstrations in this module.
4. In SQL Server Management Studio, in the FilesDemo.sql query window, under the Create full-text
catalog comment, highlight the Transact-SQL statement, and then click Execute.
5. Under the Get index ID for FileTable PK comment, highlight the Transact-SQL statement, and then
click Execute.
6. In the Results pane, in the name column, right-click the value in row 1, which begins with
PK_FileStor_, and then click Copy.
7. In the FilesDemo.sql query window, under the Create full-text index comment, after KEY INDEX,
highlight PK_FileStor_REPLACE_WITH_INDEX_ID, right-click
PK_FileStor_REPLACE_WITH_INDEX_ID, and then click Paste.
8. Under the Create full-text index comment, highlight the Transact-SQL statement, and then click
Execute.
9. Under the Find documents containing "imperdiet" near "vivamus" (within 15 search terms)
comment, highlight the Transact-SQL statement, click Execute, and then review the results.
10. Close the FilesDemo.sql query window, and do not save any changes.

Lab: Implementing a Solution for Storing Data Files

Scenario
Employees in the Human Resources department have requested that the database development team
extend the HumanResources database to enable it to store the résumés that job candidates send in to the
company. The documents are in Microsoft Word format. Employees in the Human Resources department
would like to be able to search the résumés to identify candidates who match the requirements for new
jobs. Currently, résumés are stored in a network share, and Human Resources employees add résumés to
and remove them from the share by dragging them or by saving them directly to the share. They do not
want to lose the ability to work with the résumé documents directly, for example, by opening the
documents in Word.
You have decided to use a FileTable to store the résumés. In this lab, you will implement and test the data
files storage solution.
Objectives
 Created a FileTable.
 Created and used a full-text index.
Password: Pa$$w0rd
Exercise 1: Create a FileTable

Scenario
Having identified a strategy for storing résumés in the HumanResources database, you will now
implement that strategy.
2. Configure FILESTREAM Access Level
3. Configure the Database for FILESTREAM
4. Create a FileTable
5. Work with Data in a FileTable

2. In the D:\Labfiles\Lab15\Starter folder, run the Setup Windows Command Script file (Setup.cmd) as
Administrator.
 Task 2: Configure FILESTREAM Access Level

1. In SQL Server Management Studio, connect to the MIA-SQL database engine instance, and change
the FILESTREAM Access Level to 2.
 Task 3: Configure the Database for FILESTREAM

1. Open SQL Server Management Studio, and then open a new query window.
2. In the HumanResources database, add a new filegroup named FileStreamGroup to the

HumanResources database. The new filegroup will contain FILESTREAM data.
3. Add a file to the FileStreamGroup filegroup. The name of the file should be FileStreamData and the
file path should be “D:\Labfiles\Lab15\Starter\Filestream”.
4. Set the following FILESTREAM options in the HumanResources database:
o NON_TRANSACTED_ACCESS = FULL
o DIRECTORY_NAME = N’HRFiles’
 Task 4: Create a FileTable

1. In the HumanResources database, use the following values to create a FileTable:
o Table name: Resumes
o FileTable_Directory: ‘Resumes’
2. Copy Max Benson.doc, Shai Bassli.doc, and Stephen Jiang.doc from the D:\Labfiles\Lab15\Starter
folder to the \\MIA-SQL\MSSQLSERVER\HRFiles\Resumes FileTable share.
 Task 5: Work with Data in a FileTable

1. In the \\MIA-SQL\MSSQLSERVER\HRFiles\Resumes shared folder, test that you can open the files that
you copied by using Word.
2. In SQL Server Management Studio, type and execute a SELECT statement that returns the following
metadata columns from the Resumes table:
o Name
o Cached_file_size
o Last_write_time
o A column named [Full Path] that uses the GetFileNamespacePath() FILESTREAM function to
return the full UNC path for each file.
Results: At the end of this exercise, you will have:
Enabled FILESTREAM in the HumanResources database.
Created a FileTable.
Exercise 2: Create and Use a Full-Text Index

Scenario
Now that you have created a FileTable to store Human Resources résumés, you will create a full-text
catalog, and then create and test a full-text index to enable employees to search the documents.
1. Create a Full-Text Index
2. Query a Full-Text Index

3. Enable Semantic Search
4. Query by Using Semantic Search
 Task 1: Create a Full-Text Index

1. In SQL Server Management Studio, type and execute a Transact-SQL statement that creates a full-text
catalog named hr_catalog.
2. Query the sys.sysindexes view to obtain the full index name for the index name that begins
'PK_Resume'; in the Results pane, note the name of the primary key index.
3. Type and execute a Transact-SQL statement that creates a full-text index on the Resumes table, using
the columns and values in the following table.
name Language 1033
file_stream TYPE COLUMN
file_type Language 1033
4. Replace the value in the KEY INDEX clause with the value that you obtained in the previous step.
Place the index on hr_catalog.
 Task 2: Query a Full-Text Index

1. In SQL Server Management Studio, type and execute a Transact-SQL statement that returns the
résumés that contain the word “management.”
2. Type and execute a Transact-SQL statement that returns all résumés that contain the word
“machinist“ within 50 terms of the word “degree.“
 Task 3: Enable Semantic Search

1. In SQL Server Management Studio, type and execute a Transact-SQL statement to register the
Semantic Language Statistics database.
2. In SQL Server Management Studio, type and execute a Transact-SQL statement to add the
Statistical_Semantics option to the full-text index on the Resumes table.
 Task 4: Query by Using Semantic Search

1. In SQL Server Management Studio, type and execute a Transact-SQL statement that uses the
SemanticKeyPhraseTable function to return the top 10 key phrases and which documents they
appear in.
SemanticKeyPhraseTable function to return the top 10 phrases in the Shai Bassli.doc file.
SemanticKeyPhraseTable function to return the top two résumés that are about production.
Results: At the end of this exercise, you will have created a full-text catalog and a full-text index, and you
will have tested the index by running queries against it.
Question: If the lab scenario were modified as described in the list below, how might this
influence your choice of storage solution for the data files?
Administrators want to be able to perform point-in-time restores for all database data,
including data files.
Most of the files have an average size of 0.5 MB.


In this module, you learned how to plan for BLOB storage by using the various storage options in SQL
Server 2014. You also learned how to create and query full-text indexes, and how to perform different
types of searches by using Semantic Search.
Review Question(s)
Question: How have you enabled the storage of data files in your places of work? How
could you use the features of SQL Server 2014 to improve the storage of data files?
Course Evaluation
Your evaluation of this course will help Microsoft understand the quality of your learning experience.
Please work with your training provider to access the course evaluation form.
Microsoft will keep your answers to this survey private and confidential and will use your responses to
improve your future learning experience. Your open and honest feedback is valuable and appreciated.
L1-1
Module 1: An Introduction to Database Development

Lab: Introduction to Database Development
Exercise 1: Start SQL Server Management Studio
 Task 2: Open SQL Server Management Studio

1. On the Taskbar, click SQL Server 2014 Management Studio.
2. In the Connect to Server window, in Server type, ensure that Database Engine is selected.
3. In Server name, ensure that MIA-SQL has been entered.
4. In Authentication, ensure that Windows Authentication has been selected.
5. Click Connect.
6. Expand Databases.
7. Expand AdventureWorks.
8. Expand Tables to view the tables in the AdventureWorks database.

9. Close Microsoft SQL Server Management Studio.

Prepared the lab environment.
Connected to a database by using SQL Server Management Studio.

L1-2 Developing Microsoft® SQL Server® Databases
Exercise 2: Configure SQL Server

 Task 1: Check That the Database Engine and Reporting Services Have Been Installed
1. Press the Windows logo key, type SQL Server 2014 Configuration Manager and then click SQL
Server 2014 Configuration Manager and click Yes.
2. In the left pane, click SQL Server Services.
3. In the right pane, ensure that the following services are listed:
a. SQL Server (MSSQLSERVER)
b. SQL Full-text Filter Daemon Launcher (MSSQLSERVER)

c. SQL Server Reporting Services (SQL2)
d. SQL Server Agent (MSSQLSERVER)
 Task 2: Ensure That All Required Services Including SQL Server Agent Are Started and
Set To Autostart for Both Instances
1. Double-click SQL Server (MSSQLSERVER).
2. Click Service.
3. Ensure that Start Mode is Automatic.
4. Click OK.
5. Repeat steps 1-4 for each MSSQLSERVER service except the SQL Full-text Filter Daemon Launcher
service.
 Task 3: Configure the TCP Port for the SQL3 Database Engine Instance to 51550
1. In the SQL Server Configuration Manager window, in the left pane, expand SQL Server Network
Configuration, and then click Protocols for SQL3.
2. Right-click the TCP/IP protocol, and then select Properties.
3. In the TCP/IP Properties window, click the IP Addresses tab.

4. Scroll to the bottom of the screen, and then under the IP All section, clear the value for TCP
Dynamic Ports.
5. For TCP Port, type 51550 and then click OK.
6. In the Warning window, click OK.
7. In the left pane, click SQL Server Services.
8. Right-click SQL Server (SQL3), and then select Restart.
9. In the SQL Server Configuration Manager window, in the left pane, click SQL Server Services.
10. On the toolbar, click the Refresh icon, and then make sure that the SQL Server (SQL3) service has
started.
Checked that the necessary database services have been installed.
Check that the necessary services are set to auto-start.
Configured TCP port for the database engine.

L2-1
Module 2: Designing and Implementing Tables

Lab: Designing and Implementing Tables
Exercise 1: Choose Appropriate Data Types
 Task 2: Determine Column Names and Data Types

1. Open D:\Labfiles\Lab02\Starter\Supporting Documentation.docx.
2. Review the supporting documentation for details of the PhoneCampaign, Opportunity, and
SpecialOrder tables and determine column names, data types, and nullability for each data item in
the design.

Decided on appropriate data types for your tables.
Created a schema.
Created tables.
Exercise 2: Create a Schema

 Task 1: Connect to the MarketDev Database
1. On the taskbar, click SQL Server 2014 Management Studio, on the Connect to Server window,
ensure that Server type is: Database Engine, Server name is MIA-SQL and Authentication is:
Windows Authentication, and then click Connect.
2. In Object Explorer, expand the databases under MIA-SQL.
3. Right-click the MarketDev database, and then click New Query.
 Task 2: Create a Schema Named DirectMarketing

CREATE SCHEMA DirectMarketing AUTHORIZATION dbo;

GO
Created a schema.
Exercise 3: Create the Tables

 Task 1: Create the Tables
1. In Object Explorer, expand MIA-SQL, and then expand Databases.
CREATE TABLE DirectMarketing.PhoneCampaign

(
PhoneCampaignID INT NOT NULL
,ProspectID INT NOT NULL
,FirstAttemptedContact DATETIME NULL
,ContactComments NVARCHAR(MAX) NULL
,InitialContact DATETIME NULL
,ContactOutcomeCode CHAR(1) NULL
,SalesValue DECIMAL(10,2) NULL
);
GO
4. Click Execute.
CREATE TABLE DirectMarketing.Opportunity

(
OpportunityID INT NOT NULL
,ProspectID INT NOT NULL
,SalesStageCode CHAR(2) NOT NULL
,DateRaised DATETIME NOT NULL
,Likelihood TINYINT NOT NULL
,Rating CHAR(1) NOT NULL
,EstimatedClosingDate DATE NOT NULL
,EstimatedRevenue DECIMAL(10,2) NOT NULL
,DeliveryAddress NVARCHAR(MAX) NOT NULL
);
GO
6. Click Execute.
CREATE TABLE DirectMarketing.SpecialOrder

(
ProspectID INT NOT NULL
,SupplierID INT NOT NULL
,ItemDescription NVARCHAR(100) NOT NULL
,QuantityRequired DECIMAL(10,3) NOT NULL
,OrderDate DATETIME NOT NULL
,PromisedDeliveryDate DATE NOT NULL
,ActualDeliveryDate DATE NULL
,SpecialRequirements NVARCHAR(MAX) NULL
,QuotedPricePerUnit DECIMAL(10,2) NOT NULL
);
GO
8. Click Execute.
Created the tables that you designed in the first exercise of this lab.
L3-1
Module 3: Ensuring Data Integrity through Constraints

Lab: Ensuring Data Integrity Through
Constraints
Exercise 1: Add Constraints

1. Review the table design requirements that were supplied in the scenario.
 Task 3: Alter the DirectMarketing.Opportunity Table

ensure that server name is MIA-SQL, and then click Connect.
2. In Object Explorer, expand the MIA-SQL server, expand Databases, right-click the AdventureWorks
database, and then click New Query.
ALTER TABLE DirectMarketing.Opportunity

ALTER COLUMN OpportunityID int NOT NULL;
GO
ALTER COLUMN ProspectID int NOT NULL;
GO
ALTER COLUMN DateRaised datetime NOT NULL;
GO
ALTER COLUMN Likelihood tinyint NOT NULL;
GO
ALTER COLUMN Rating char(1) NOT NULL;
GO
ALTER COLUMN EstimatedClosingDate date NOT NULL;
GO
ALTER COLUMN EstimatedRevenue decimal(10,2) NOT NULL;
GO
4. Click Execute.
5. Click New Query.


ADD CONSTRAINT PK_Opportunity PRIMARY KEY CLUSTERED (OpportunityID, ProspectID);
GO
7. Click Execute.
8. Click New Query.

ADD CONSTRAINT FK_OpportunityProspect
FOREIGN KEY (ProspectID) REFERENCES DirectMarketing.Prospect(ProspectID);
GO
10. Click Execute.
11. Click New Query.


ADD CONSTRAINT dfDateRaised
DEFAULT (SYSDATETIME()) FOR DateRaised;
GO
13. Click Execute.
Results: Having completed this lab, you will have added constraints to the DirectMarketing.Opportunity
table.
Exercise 2: Test the Constraints (only if time permits)

 Task 1: Test the Default Values and Data Types

VALUES (1,1,8,’A’,’12/12/2013’,123000.00);
SELECT * FROM DirectMarketing.Opportunity;
GO
Note: This query should execute without errors.

L3-3
 Task 2: Test the Primary Key


VALUES (1,1,8,’A’,’12/12/2013’,123000.00);
GO
Note: This query should fail due to the PRIMARY KEY constraint.
 Task 3: Test the Foreign Key


VALUES (2,10,8,’A’,’12/12/2013’,123000.00);
GO
Note: This query should fail due to the FOREIGN KEY constraint.
Results: After completing this exercise, you should have successfully tested your constraints.
L4-1
Module 4: Introduction to Indexes

Lab: Creating Indexes
Exercise 1: Create Tables That Have Clustered Indexes

1. Navigate to D:\LabFiles\Lab04\Starter\ and open Supporting Documentation.docx.
2. Review the requirements in the supporting documentation for the tables.
 Task 3: Create the Tables in the AdventureWorks Database

ensure that server name is MIA-SQL, and then click Connect

CREATE TABLE Sales.MediaOutlet

( MediaOutletID int PRIMARY KEY CLUSTERED,
MediaOutletName nvarchar(40),
PrimaryContact nvarchar(50),
City nvarchar(50)
);
GO
CREATE TABLE Sales.PrintMediaPlacement

( PrintMediaPlacementID int PRIMARY KEY CLUSTERED,
MediaOutletID int,
PlacementDate datetime INDEX idxPlacementDate NONCLUSTERED,
PublicationDate datetime,
RelatedProductID int,
PlacementCost decimal(18,2)
);
GO
Results: After completing this exercise, you will have created tables with clustered indexes.
Exercise 2: Improve Performance Through Nonclustered Indexes

 Task 1: Implement a Nonclustered Index
CREATE INDEX IX_Person_LastName_FirstName_MiddleName

ON Person.Person
(Lastname,
FirstName,
MiddleName);
GO
Results: After completing this lab, you will have created a nonclustered index.
L5-1
Module 5: Advanced Indexing

Lab: Advanced Indexing
Exercise 1: Explore Existing Index Statistics
 Task 2: View Statistics

ensure that server name is MIA-SQL, and then click Connect
USE AdventureWorks
GO
GO
 Task 3: Review the Results

1. In the Query Analyzer Results pane, review the results.
2. Check to see whether any autostats have been generated. If they have, they will appear in the results
with a _WA prefix.
 Task 4: Create Statistics


CREATE STATISTICS Product_Color_Stats ON Production.Product (Color)

WITH FULLSCAN;
GO
 Task 5: Reexecute the SQL Command from Task 1



GO
 Task 6: Use the DBCC SHOW_STATISTICS Command

DBCC SHOW_STATISTICS('Production.Product',Product_Color_Stats);
GO
 Task 7: Answer Questions

1. Complete the answer column in the table below.
Question Answer
How many rows were sampled?
How many steps were created?
What was the average key length?
How many black products are there?
Note: The results returned can vary. Sample results are shown in the following table.
Question Answer
How many rows were sampled? 504
How many steps were created? 10
What was the average key length? 5.178571
How many black products are there? 93

L5-3
 Task 8: Execute an SQL Command and Check the Accuracy of Some Statistics
SELECT COUNT(1) FROM Production.Product WHERE Color = 'Black';

GO
 Task 9: Calculate the Selectivity of Each Query

-- Calculate the total number of rows in the table

SELECT COUNT(1) FROM Marketing.Prospect;
GO

Note: A sample result would be 19,955.
5. Type the query below in the same query pane.
-- Query 1
'A%';
GO
6. Highlight only Query 1, and then click Execute.

7. Calculate the selectivity of the query.
-- Query 2
'Alejandro%';
GO

-- Query 3
'Arif%';
GO

Results: After this exercise, you will have assessed selectivity on various queries.
Exercise 2: Create a Covering Index

 Task 1: Assess Design by Using Database Engine Tuning Advisor
1. In SQL Server Management Studio, on the Tools menu, click Database Engine Tuning Advisor.
2. Click Connect.
3. In Session name, type Person Query.
4. In Workload, ensure that File is selected, and then click Browse for a workload file.
6. Click PersonQuery.sql, and then click Open.
7. In Database for workload analysis, click AdventureWorks.
8. In Select databases and tables to tune, click AdventureWorks.

9. Click Tuning Options, and then review the settings.
10. Click Start Analysis.
11. Click the Actions menu and click Save Recommendations.
13. In File name, type PersonIndex and then click Save.
14. If a Database Engine Tuning Advisor window appears, click OK.

15. Close Database Engine Tuning Advisor.
 Task 2: Create a Covering Index

1. Click Open File.
3. Click PersonIndex.sql, and then click Open.
In the CREATE NONCLUSTERED INDEX statement, change the index name to

idx_Person_Covering.
4. Notice that Database Engine Tuning Advisor has created a script to create a nonclustered covering
index by using INCLUDE.
5. Click Execute.
Results: After completing this exercise, you will have created a covering index.
L6-1
Module 6: In-Memory Database Capabilities

Exercise 1: Enable the Buffer Pool Extension
 Task 2: Configure the Buffer Pool Extension

1. Start SQL Server Management Studio and connect to the MIA-SQL database engine instance by using
2. Click New Query, and then enter the following Transact-SQL code.
ALTER SERVER CONFIGURATION

SET BUFFER POOL EXTENSION ON
(FILENAME = 'S:\BufferCache.bpe',
SIZE = 10GB);
3. Click Execute.
 Task 3: Verify the Configuration of the Buffer Pool Extension

1. In SQL Server Management Studio, click New Query, and then enter the following Transact-SQL
code.
SELECT * FROM sys.dm_os_buffer_pool_extension_configuration;
2. Click Execute.
3. View the results and verify that the buffer pool cache is enabled and uses a 10-GB file named
S:\BufferCache.bpe.
4. Use File Explorer to view the contents of drive S, and verify that the BufferCache.bpe file exists.
5. Close File Explorer, but keep SQL Server Management Studio open for the next exercise.
Results: After completing this exercise, you should have enabled the buffer pool extension.
Exercise 2: Create Columnstore Indexes

 Task 1: Create a Columnstore Index on the FactInternetSales Table
FactInternetSales.sql script file.
2. In the Available Databases drop-down list, select AdventureWorksDW.
3. On the Query menu, click Include Actual Execution Plan.
4. Click Execute, and then view the query results.
5. On the Execution plan tab, view the execution plan that is used for the query. Examine the icons
from right to left, noting the indexes that were used. Note also that the query processor has identified
that adding a missing index could improve performance.
6. Click New Query, and then enter the following Transact-SQL code to create a nonclustered
columnstore index on the FactInternetSales table. Alternatively, in the D:\Labfiles\Lab06\Solution
folder, you can open the Create Columnstore Index on FactInternetSales.sql script file.
USE AdventureWorksDW
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_NCS_FactInternetSales]
ON dbo.FactInternetSales
(
[ProductKey],
[OrderDateKey],
[DueDateKey],
[ShipDateKey],
[CustomerKey],
[PromotionKey],
[CurrencyKey],
[SalesTerritoryKey],
[SalesOrderNumber],
[SalesOrderLineNumber],
[RevisionNumber],
[OrderQuantity],
[UnitPrice],
[ExtendedAmount],
[UnitPriceDiscountPct],
[DiscountAmount],
[ProductStandardCost],
[TotalProductCost],
[SalesAmount],
[TaxAmt],
[Freight],
[CarrierTrackingNumber],
[CustomerPONumber],
[OrderDate],
[DueDate],
[ShipDate]
);
7. Click Execute to create the index.
8. Switch back to the Query FactInternetSales.sql tab, and then click Execute to rerun the query.
from right to left, noting the indexes that were used. Note that the columnstore index is used, and
that the query processor does not identify any missing indexes.
L6-3
 Task 2: Create a Columnstore Index on the FactProductInventory Table

FactProductInventory.sql script file.
2. In the Available Databases drop-down list, select AdventureWorksDW.
3. On the Query menu, click Include Actual Execution Plan.
4. Click Execute, and then view the query results.

from right to left, noting the indexes that were used. Note also that the query processor has identified
that adding a missing index could improve performance.
6. Click New Query, and then enter the following Transact-SQL code to create a clustered columnstore
index on the FactProductInventory table. Alternatively, in the D:\Labfiles\Lab06\Solution folder, you
can open the Create Columnstore Index on FactProductInventory.sql script file.
USE [AdventureWorksDW]
GO
ALTER TABLE [dbo].[FactProductInventory]
DROP CONSTRAINT [PK_FactProductInventory];
GO
DROP CONSTRAINT [FK_FactProductInventory_DimDate];
GO
DROP CONSTRAINT [FK_FactProductInventory_DimProduct];
GO
CREATE CLUSTERED COLUMNSTORE INDEX [IX_CS_FactProductInventory]
ON dbo.FactProductInventory;
GO
7. Click Execute to create the index.
8. Switch back to the Query FactProductInventory.sql tab, and then click Execute to rerun the query.
from right to left, noting the indexes that were used. Note that the columnstore index is used, and
that the query processor does not identify any missing indexes.
Results: After completing this exercise, you should have created columnstore indexes.
L7-1
Module 7: Designing and Implementing Views

Lab: Designing and Implementing Views
Exercise 1: Design and Implement the WebStock Views

1. Review the supplied design in the supporting documentation in the Exercise Scenario for the
OnlineProducts and AvailableModels views.
 Task 3: Design and Implement the Views

ensure that Server name is MIA-SQL, and then click Connect.

CREATE VIEW Production.OnlineProducts

AS
SELECT p.ProductID,
p.Name,
p.ProductNumber,
COALESCE(p.Color,'N/A') AS Color,
CASE p.DaysToManufacture
WHEN 0 THEN 'Instock'
WHEN 1 THEN 'Overnight'
WHEN 2 THEN 'Fast'
ELSE 'Call'
END AS Availability,
p.Size,
p.SizeUnitMeasureCode AS UnitOfMeasure,
p.ListPrice AS Price,
p.Weight
FROM Production.Product AS p
WHERE p.SellEndDate IS NULL
AND p.SellStartDate IS NOT NULL;
GO
CREATE VIEW Production.AvailableModels
AS
SELECT p.ProductID,
p.Name,
pm.ProductModelID,
pm.Name as ProductModel
INNER JOIN Production.ProductModel AS pm
ON p.ProductModelID = pm.ProductModelID
WHERE p.SellEndDate IS NULL
AND p.SellStartDate IS NOT NULL;
GO

 Task 4: Test the Views

SELECT * FROM Production.OnlineProducts;

GO
SELECT * FROM Production.AvailableModels;
GO
Created the OnlineProducts view.
Created the AvailableModels view.
Exercise 2: Design and Implement the Contacts View

1. Review the supplied design in the supporting documentation in the Exercise Scenario for the Contacts
view.
 Task 2: Design and Implement the View

CREATE VIEW Sales.Contacts

AS
SELECT c.CustomerID AS ContactID,
p.FirstName,
p.MiddleName,
p.LastName,
'Customer' AS ContactRole
FROM Sales.Customer AS c
JOIN Person.Person as p
ON c.PersonID=p.BusinessEntityID
UNION ALL
SELECT sp.BusinessEntityID AS ContactID,
p.FirstName,
p.MiddleName,
p.LastName,
'Salesperson' AS ContactRole
FROM Sales.SalesPerson AS SP
JOIN Person.Person as p
ON sp.BusinessEntityID =p.BusinessEntityID;
GO

L7-3
 Task 3: Test the View

SELECT * FROM Sales.Contacts;

GO
Created the Contacts view.

L8-1
Module 8: Designing and Implementing Stored Procedures

Lab: Designing and Implementing Stored
Procedures
Exercise 1: Create Stored Procedures
 Task 2: Review the Marketing.GetProductColors Stored Procedure Specification

for Marketing.GetProductColors.
 Task 3: Design, Create, and Test the Marketing.GetProductColors Stored Procedure

ensure that Server name is MIA-SQL, and then click Connect.

CREATE PROCEDURE Marketing.GetProductColors

AS
SELECT DISTINCT p.Color
WHERE p.Color IS NOT NULL
ORDER BY p.Color;
GO
EXEC Marketing.GetProductColors;
GO
Note: Ensure that approximately nine colors are returned and that no NULL row is returned.
Created the GetProductColors stored procedure.

Exercise 2: Create a Parameterized Stored Procedure

 Task 1: Review the Marketing.GetProductsByColor Stored Procedure Specification
for Marketing.GetProductsByColor.
 Task 2: Design, Create, and Test the Marketing.GetProductsByColor Stored

Procedure
CREATE PROCEDURE Marketing.GetProductsByColor

@Color nvarchar(16)
AS
SELECT p.ProductID,
p.Name,
p.ListPrice AS Price,
p.Color,
p.Size,
p.SizeUnitMeasureCode AS UnitOfMeasure
WHERE (p.Color = @Color) OR (p.Color IS NULL AND @Color IS NULL)
ORDER BY Name;
GO
EXEC Marketing.GetProductsByColor 'Blue';
GO
EXEC Marketing.GetProductsByColor NULL;
GO
Note: Ensure that approximately 26 rows are returned for blue products. Ensure that approximately
248 rows are returned for products that have no color.
Created the GetProductByColor stored procedure.

L9-1
Module 9: Designing and Implementing User-Defined

Functions
Lab: Designing and Implementing User-
Defined Functions
Exercise 1: Format Phone Numbers

2. Review the Function Specifications: Phone Number section in the supporting documentation.

ensure that server name is MIA-SQL, and then click Connect.

CREATE FUNCTION dbo.FormatPhoneNumber

( @PhoneNumberToFormat nvarchar(16)
)
RETURNS nvarchar(16)
AS BEGIN
DECLARE @Digits nvarchar(16) = '';
DECLARE @Remaining nvarchar(16) = @PhoneNumberToFormat;
DECLARE @Character nchar(1);
IF LEFT(@Remaining,1) = N'+' RETURN @Remaining;
WHILE (LEN(@Remaining) > 0) BEGIN
SET @Character = LEFT(@Remaining,1);
SET @Remaining = SUBSTRING(@Remaining,2,LEN(@Remaining));
IF (@Character >= N'0') AND (@Character <= N'9')
SET @Digits += @Character;
END;
RETURN CASE LEN(@Digits)
WHEN 10 THEN N'(' + SUBSTRING(@Digits,1,3) + N') '
+ SUBSTRING(@Digits,4,3) + N'-'
+ SUBSTRING(@Digits,7,4)
WHEN 8 THEN SUBSTRING(@Digits,1,4) + N'-'
WHEN 6 THEN SUBSTRING(@Digits,1,3) + N'-'
ELSE @Digits
END;
END;
GO

SELECT dbo.FormatPhoneNumber('+61 3 9485-2342');

SELECT dbo.FormatPhoneNumber('415 485-2342');
SELECT dbo.FormatPhoneNumber('(41) 5485-2342');
SELECT dbo.FormatPhoneNumber('94852342');
SELECT dbo.FormatPhoneNumber('85-2342');
GO
Note: The output should resemble the following:

+61 3 9485-2342
(415) 485-2342
(415) 485-2342
9485-2342
852-342
Results: After this exercise, you should have created a new FormatPhoneNumber function within the
dbo schema.
L9-3
Exercise 2: Modify an Existing Function

 Task 1: Review the requirements
2. Review the requirement for the dbo.IntegerListToTable function in the supporting documentation.

CREATE FUNCTION dbo.IntegerListToTable

( @InputList nvarchar(MAX),
@Delimiter nchar(1) = N',')
RETURNS @OutputTable TABLE (PositionInList int IDENTITY(1, 1) NOT
NULL,
IntegerValue int)
AS BEGIN
DECLARE @RemainingString nvarchar(MAX) = @InputList;
DECLARE @DelimiterPosition int;
DECLARE @CurrentToken nvarchar(8);
WHILE LEN(@RemainingString) > 0
BEGIN
SET @DelimiterPosition
= CHARINDEX(@Delimiter, @RemainingString);
IF (@DelimiterPosition = 0)
SET @DelimiterPosition = LEN(@RemainingString) + 1;
IF (@DelimiterPosition > 8) SET @DelimiterPosition = 8;
SET @CurrentToken =
SUBSTRING(@RemainingString,1,@DelimiterPosition - 1);
INSERT INTO @OutputTable (IntegerValue)
VALUES(CAST(@CurrentToken AS int));
SET @RemainingString =
SUBSTRING(@RemainingString,
@DelimiterPosition + 1,
LEN(@RemainingString));
END;
RETURN;
END;
GO

1. In Object Explorer, expand MIA_SQL, and then expand Databases.

SELECT * FROM dbo.IntegerListToTable('234,354253,3242,2',',');

GO
Note: The output should resemble the following.
PositionInList IntegerValue
1 234
2 354253
3 3242
4 2
 Task 4: Test the Function by Using an Alternate Delimiter

1. In Object Explorer, expand MIA_SQL, and then expand Databases.
SELECT * FROM dbo.IntegerListToTable('234|354253|3242|2','|');

GO
Note: The output should resemble the following.
PositionInList IntegerValue
1 234
2 354253
3 3242
4 2
Results: After this exercise, you should have created a new IntegerListToTable function within a dbo
schema.
L10-1
Module 10: Responding to Data Manipulation via Triggers

Lab: Responding to Data Manipulation by
Using Triggers
Exercise 1: Create and Test the Audit Trigger

2. Open Supporting Documentation.docx.
3. Review the supplied table requirements in the supporting documentation for the
Production.ProductAudit table.
6. Expand AdventureWorks, expand Tables, expand Production.Product, and then expand Columns.
7. Review the table design.

 Task 3: Design a Trigger

1. In Object Explorer, expand Databases.
CREATE TABLE Production.ProductAudit(

ProductID int NOT NULL,
UpdateTime datetime2(7) NOT NULL,
ModifyingUser nvarchar(100) NOT NULL,
OriginalListPrice money NULL,
NewListPrice money NULL
)
GO
CREATE TRIGGER Production.TR_ProductListPrice_Update
ON Production.Product
AFTER UPDATE
AS BEGIN
SET NOCOUNT ON;
INSERT Production.ProductAudit(ProductID, UpdateTime, ModifyingUser,
OriginalListPrice,NewListPrice)
SELECT Inserted.ProductID,SYSDATETIME(),ORIGINAL_LOGIN(),deleted.ListPrice,
inserted.ListPrice
FROM deleted
INNER JOIN inserted
ON deleted.ProductID = inserted.ProductID
WHERE deleted.ListPrice > 1000 OR inserted.ListPrice > 1000;
END;
GO
 Task 4: Test the Behavior of the Trigger

UPDATE Production.Product
SET ListPrice=3978.00
WHERE ProductID BETWEEN 749 and 753;
GO
SELECT * FROM Production.ProductAudit;
GO
In the toolbar, click Execute.
4. Note: There should be five rows in the Production.ProductAudit table.
Results: After this exercise, you should have created a new trigger. Tests should have shown that it is
working as expected.
L10-3
Exercise 2: Improve the Audit Trigger

 Task 1: Modify the Trigger
2. Right-click the MarketDev database, and then click Refresh.
3. Expand the MarketDev database, expand Tables, expand Marketing.CampaignBalance, and then
expand Triggers.
4. Review the trigger design.

ALTER TRIGGER Marketing.TR_CampaignBalance_Update

ON Marketing.CampaignBalance
AFTER UPDATE
AS BEGIN
SET NOCOUNT ON;
INSERT Marketing.CampaignAudit
(AuditTime, ModifyingUser, RemainingBalance)
SELECT SYSDATETIME(),
ORIGINAL_LOGIN(),
inserted.RemainingBalance
FROM deleted
INNER JOIN inserted
ON deleted.CampaignID = inserted.CampaignID
WHERE ABS(deleted.RemainingBalance - inserted.RemainingBalance) > 10000;
END;
GO
 Task 2: Delete all Rows from the Marketing.CampaignAudit Table

DELETE FROM Marketing.CampaignAudit;

GO

 Task 3: Test the Modified Trigger

SELECT * FROM Marketing.CampaignBalance;

GO
EXEC Marketing.MoveCampaignBalance 3,2,10100;
GO
EXEC Marketing.MoveCampaignBalance 3,2,1010;
GO
SELECT * FROM Marketing.CampaignAudit;
GO
Results: After this exercise, you should have altered the trigger. Tests should show that it is now working
as expected.
L11-1
Module 11: Using In-Memory Tables

Exercise 1: Use Memory-Optimized Tables
 Task 2: Add a Filegroup for Memory-Optimized Data

2. In SQL Server Management Studio, in Object Explorer, expand Databases.

3. Right-click the InternetSales database, and then click Properties.
4. In the Database Properties - InternetSales dialog box, on the Filegroups page, in the MEMORY
OPTIMIZED DATA section, click Add Filegroup.
5. In the Name box, type MemFG and then press Enter..
6. In the Database Properties - InternetSales dialog box, on the Files page, click Add.
7. In the Logical Name column, type InternetSales_MemData and press Enter.

8. In the File Type column, select FILESTREAM Data, and then ensure that MemFG is automatically
selected in the Filegroup column.
9. In the Database Properties - InternetSales dialog box, click OK.
 Task 3: Create a Memory-Optimized Table

1. In SQL Server Management Studio, click New Query, and then type the following Transact-SQL code.
(Alternatively, in the D:\Labfiles\Lab11\Solution folder, open the Create ShoppingCart.sql script file.)
USE InternetSales
GO
CREATE TABLE dbo.ShoppingCart
(SessionID INT NOT NULL,
TimeAdded DATETIME NOT NULL,
CustomerKey INT NOT NULL,
ProductKey INT NOT NULL,
Quantity INT NOT NULL
PRIMARY KEY NONCLUSTERED HASH (SessionID, ProductKey) WITH (BUCKET_COUNT=100000))
2. In the toolbar, click Execute to create the table.

3. Click New Query, and then type the following Transact-SQL code. (Alternatively, in the
D:\Labfiles\Lab11\Solution folder, open the TestShoppingCart.sql script file.)
USE InternetSales
GO
INSERT INTO dbo.ShoppingCart (SessionID, TimeAdded, CustomerKey, ProductKey,
Quantity)
VALUES (1, GETDATE(), 2, 3, 1);
Quantity)
VALUES (1, GETDATE(), 2, 4, 1);
SELECT * FROM dbo.ShoppingCart;
4. In the toolbar, click Execute to test the table.
Results: After completing this exercise, you should have created a memory-optimized table and a natively
compiled stored procedure in a database with a filegroup for memory-optimized data.
Exercise 2: Use Natively Compiled Stored Procedures

 Task 1: Create Natively Compiled Stored Procedures
code. (Alternatively, in the D:\Labfiles\Lab11\Solution folder, open the Create AddItemToCart.sql
script file.)
USE InternetSales
GO
CREATE PROCEDURE dbo.AddItemToCart
@SessionID INT, @TimeAdded DATETIME, @CustomerKey INT, @ProductKey INT, @Quantity
INT
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
Quantity)
VALUES (@SessionID, @TimeAdded, @CustomerKey, @ProductKey, @Quantity)
END
GO
2. In the toolbar, click Execute to create the stored procedure.
code. (Alternatively, in the D:\Labfiles\Lab11\Solution folder, open the Create
DeleteItemFromCart.sql script file.)
USE InternetSales
GO
CREATE PROCEDURE dbo.DeleteItemFromCart
@SessionID INT, @ProductKey INT
AS
DELETE FROM dbo.ShoppingCart
WHERE SessionID = @SessionID
AND ProductKey = @ProductKey
END
GO
L11-3
code. (Alternatively, in the D:\Labfiles\Lab11\Solution folder, open the Create EmptyCart.sql script
file.)
USE InternetSales
GO
CREATE PROCEDURE dbo.EmptyCart
@SessionID INT
AS
DELETE FROM dbo.ShoppingCart
WHERE SessionID = @SessionID
END
GO
7. Click New Query, and then enter the following Transact-SQL code. (Alternatively, in the
D:\Labfiles\Lab11\Solution folder, open the Test Procs.sql script file.)
USE InternetSales
GO
DECLARE @now DATETIME = GETDATE();
EXEC dbo.AddItemToCart @SessionID = 3,
@TimeAdded = @now,
@CustomerKey = 2,
@ProductKey = 3,
@Quantity = 1;
EXEC dbo.AddItemToCart @SessionID = 3,
@TimeAdded = @now,
@CustomerKey = 2,
@ProductKey = 4,
@Quantity = 1;
EXEC dbo.DeleteItemFromCart @SessionID = 3, @ProductKey = 4;
EXEC dbo.EmptyCart @SessionID = 3;
8. In the toolbar, click Execute to test the stored procedures.
Results: After completing this exercise, you should have created a natively compiled stored procedure.
L12-1
Module 12: Implementing Managed Code in SQL Server

Lab: Implementing Managed Code in SQL
Server
Exercise 1: Assess Proposed CLR Code

1. Review the proposed SQL CLR functionality list in the supporting documentation in the scenario.
 Task 3: Determine Whether to Implement Objects in Managed Code

1. The components in the following table should be included in SQL CLR.
Proposed SQL CLR functionality Should it be implemented?
Table-valued function that returns a list Yes, good use of external access.
of files in a particular folder.
Function that formats phone numbers as Yes, good use of string handling.
strings.
Trigger that records balance movements No, only involves data access.
that have a value of more than 1,000.
Stored procedure that writes an XML file Yes, good use of external access.
for a given XML parameter.
Function that counts rows in a table. No, only involves data access.
A new Customer data type. No, not an object-oriented database.
Results: After this exercise, you should have created a list of which objects should be implemented in
managed code and the reasons for your decision.
Exercise 2: Implement a CLR Assembly

 Task 1: Ensure That the Database is Configured Appropriately
3. Expand System Databases, right-click the master database, and then click New Query.
EXEC sp_configure 'clr enabled',1;

GO
RECONFIGURE;
GO
ALTER DATABASE AdventureWorks SET TRUSTWORTHY ON;
GO
 Task 2: Catalog the Assembly and Its Functions

1. In Object Explorer, under MIA-SQL, expand Databases.
USE AdventureWorks
GO
CREATE ASSEMBLY SQLCLRDemo
FROM 'D:\Labfiles\Lab12\Starter\SQLCLRDemo.DLL'
WITH PERMISSION_SET = EXTERNAL_ACCESS;
GO
SELECT * FROM sys.assemblies;

SELECT * FROM sys.assembly_files;
GO
6. Highlight the query above and then, in the toolbar, click Execute.
CREATE FUNCTION dbo.IsValidEmailAddress(@email NVARCHAR(4000))

RETURNS BIT AS EXTERNAL NAME
SQLCLRDemo.[SQLCLRDemo.CLRDemoClass].IsValidEmailAddress;
GO
CREATE FUNCTION dbo.FormatAustralianPhoneNumber(@PhoneNumber NVARCHAR(4000))

RETURNS NVARCHAR(4000) AS EXTERNAL NAME
SQLCLRDemo.[SQLCLRDemo.CLRDemoClass].FormatAustralianPhoneNumber;
GO
L12-3
CREATE FUNCTION dbo.FolderList(@RequiredPath NVARCHAR(4000),@FileMask NVARCHAR(4000))

RETURNS TABLE (FileName NVARCHAR(4000))
AS EXTERNAL NAME SQLCLRDemo.[SQLCLRDemo.CLRDemoClass].FolderList;
GO
 Task 3: Test the Functions in the Assembly

SELECT dbo.IsValidEmailAddress('test@somewhere.com');
GO
SELECT dbo.IsValidEmailAddress('test.somewhere.com');
GO
SELECT dbo.FormatAustralianPhoneNumber('0419201410');
GO
GO
GO
SELECT * FROM dbo.FolderList(
'D:\Labfiles\Lab12\Starter','*.txt');
GO
Results: After this exercise, you should have three functions working as expected.
L13-1
Module 13: Storing and Querying XML Data in SQL Server

Lab: Storing and Querying XML Data in SQL
Server
Exercise 1: Assess Appropriate Use of XML Data in SQL Server
 Task 2: Review the List of Use Cases

1. Review the list of use cases in the scenario.
 Task 3: Determine Which Use Cases Are Appropriate for XML

1. The following table shows which use cases are suitable for XML storage.
Use case requirements Should be implemented Reason
Existing XML data that is Yes Data is already XML and

stored, but not processed. does not need to be
processed.
Storing attributes for a No Should be database

customer. columns.
Relational data that is being Perhaps Only if the data is being

passed through a system, sent and received as
but not processed within it. XML.
Storing attributes that are Yes Not standard relational

nested (that is, attributes data.
stored within attributes).
Results: After this exercise, you will have seen how to analyze requirements and determine appropriate
use cases for XML storage.
Exercise 2: Test XML Data Storage in Variables

 Task 1: Review, Execute, and Review the Results of the XML Queries
InvestigateStorage.sql and click Open.
3. Highlight and execute the first command block as shown.
USE tempdb;
GO
4. Highlight and execute scripts 13.1 to 13.9 separately, comparing the results of each script with the
script comment.
5. Review the results from each script.
Query number Query title Output
13.1 Create a table with an xml Command completed

column. successfully.
13.2 Use implicit casting to One row is inserted.

assign an xml variable.
13.3 Use implicit casting to One row is inserted.

assign a string constant.
13.4 Explicitly cast string to xml. One row is inserted.
13.5 Explicitly convert string to One row is inserted.

xml.
13.6 Well-formed document. One row is inserted.

This will succeed.
13.7 Well-formed fragment. This One row is inserted.

will succeed.
13.8 Not well-formed. This will Command will fail due to

fail. incorrect XML format.
13.9 Clean up the temporary Command completed

table. successfully.
Results: After this exercise, you will have seen how XML data is stored in variables.
L13-3
Exercise 3: Retrieve Information About XML Schema Collections

XMLSchema.sql and click Open.
2. Highlight and execute the first command block as shown.
USE tempdb;
GO
3. Highlight and execute scripts 13.10 to 13.11 separately, comparing the results of each script with the
script comment.
4. Review the results from each script.
13.10 Create an xml schema Command completed

collection for resumes. successfully.
13.11 Retrieve information about Ten rows showing how the

the components in the individual components of
schema collection. the XML schema collection
are stored. Note the values
in the kind_desc column.
Results: After this exercise, you will have seen how to create XML schema collections.
Exercise 4: Query SQL Server Data as XML

XMLQuery.sql and click Open.
2. In the query below, highlight and execute scripts 13.21 to 13.29 separately, comparing the results of
each script with the script comment.
13.21 FOR XML AUTO. One row per product. Note the
element name is based on the
table name.
13.22 ELEMENTS with RAW Note the element-centric

mode. output and the element name is
“row.”
13.23 NULL columns with Products without color show

ELEMENTS. xsi:nil=”true” when you view the
data by clicking the hyperlink.
Other products show the color.
13.24 Note the effect of the Note the element name is now
column alias compared to Product, based on the alias
13.23. name of the table.
13.25 Inline XSD schema. Note the inclusion of an XSD

schema.
13.26 Nested XML with TYPE. Note that rows with a value in
the Description column show
that value as XML.
13.27 PATH Mode. Note how the output can be

constructed with a PATH query.
In the output, locate each of the
elements in the SELECT clause.
13.28 ROOT directive. Note the “AvailableItems” root

node.
13.29 Named element in RAW Note the “AvailableItem”

modes. ElementName.
Results: After this exercise, you will have executed queries that return SQL Server relational data as XML.
L13-5
Exercise 5: Write a Stored Procedure That Returns XML

1. Review the supplied stored procedure specification in the supporting documentation for
WebStock.GetAvailableModelsAsXML.
 Task 2: Create the Stored Procedure

CREATE PROCEDURE
Production.GetAvailableModelsAsXML
AS BEGIN
SELECT p.ProductID,
p.name as ProductName,
p.ListPrice,
p.Color,
p.SellStartDate,
pm.ProductModelID,
pm.Name as ProductModel
INNER JOIN Production.ProductModel AS pm
ON p.ProductModelID = pm.ProductModelID
WHERE p.SellStartDate IS NOT NULL
AND p.SellEndDate IS NULL
ORDER BY p.SellStartDate, p.Name
FOR XML RAW('AvailableModel'), ROOT('AvailableModels');
END;
GO
 Task 3: Test the Stored Procedure

EXEC Production.GetAvailableModelsAsXML;
GO
Results: After this exercise, you will have created and tested the required stored procedure that returns
XML.
L14-1
Module 14: Working with Spatial Data in SQL Server

Lab: Working with Spatial Data in SQL
Server
Exercise 1: Become Familiar with the geometry Data Type
3. In the User Account Control dialog box, click Yes and then wait for the script to finish.
 Task 2: Review and Execute the Sample Script

1. Start SQL Server Management Studio and connect to the MIA-SQL database engine instance by using
2. Click File, click Open, click File, navigate to D:\Labfiles\Lab14\Starter and click Lab Exercise 1.
3. Click Open..
4. In the query below, highlight and execute scripts 19.1 to 19.9 separately, comparing the results of
each script with the script comment.
5. Review the results from each script. Remember to click the Spatial results tab to see the output.
19.1 Draw a square. A square is drawn.
19.2 Try an invalid value—note the A .NET error is returned. Scroll to

6522 error and the wrapped the right to read the full message.
error message. Note how specific the message is.
19.3 Draw a more complex shape. The shape is drawn. Note how a
polygon is represented in text in
the query.
19.4 Multiple shapes. Two shapes are drawn.
19.5 Intersecting shapes. The shapes are moved to intersect.
19.6 Union of two shapes. A combined shape is drawn.
19.7 Intersection of shapes. The two intersecting shapes and

the intersection are drawn.
19.8 Draw Australia. Australian map is drawn.

19.9 Draw Australia with a buffer Note the calculation of a buffer

around it. region around the map.
Results: After this exercise, you should have seen how to work with the geometry data type.
Exercise 2: Add Spatial Data to an Existing Table

 Task 1: Add a Location Column

ALTER TABLE Marketing.ProspectLocation

ADD Location GEOGRAPHY NULL;
GO
 Task 2: Assign Values to the Location Column

UPDATE Marketing.ProspectLocation
SET Location = GEOGRAPHY::STGeomFromText(
'POINT(' + CAST(Longitude AS varchar(20))
+ ' ' + CAST(Latitude AS varchar(20))
+ ')',4326);
GO
 Task 3: Drop the Existing Latitude and Longitude Columns


DROP COLUMN Latitude;
GO
DROP COLUMN Longitude;
GO
Results: After this exercise, you should have replaced the existing Longitude and Latitude columns with
a new Location column.
L15-1
Module 15: Incorporating Data Files into Databases

Lab: Implementing a Solution for Storing
Data Files
Exercise 1: Create a FileTable
 Task 2: Configure FILESTREAM Access Level

2. In the Connect to Server dialog box, in the Server type list, click Database Engine, in the Server
name list, click MIA-SQL, in the Authentication list, click Windows Authentication, and then click
Connect.
3. In SQL Server Management Studio, on the File menu, point to Open, and then click File.
4. In the Open File dialog box, navigate to the D:\Labfiles\Lab15\Starter folder, click FileTable.sql,
and then click Open.
5. Under the Configure filestream access level comment, highlight the Transact-SQL statement, and
then click Execute.
 Task 3: Configure the Database for FILESTREAM

1. In SQL Server Management Studio, in the FileTable.sql, query window, under the Add a filegroup
for Filestream data comment, highlight the Transact-SQL statement, and then click Execute.
2. Under the Add a file to the new filegroup comment, highlight the Transact-SQL statement, and
then click Execute.
3. Under the Set Filestream options comment, highlight the Transact-SQL statement, and then click
Execute.
 Task 4: Create a FileTable

1. In SQL Server Management Studio, in the FileTable.sql query window, under the Create the
Resumes FileTable comment, highlight the Transact-SQL statement, and then click Execute.
3. Navigate to the D:\Labfiles\Lab15\Starter folder, click Max Benson.doc, press and hold down Ctrl,
click Shai Bassli.doc, click Stephen Jiang.doc, right-click Stephen Jiang.doc, and then click Copy.
4. In File Explorer, in the navigation bar, type \\MIA-SQL\MSSQLSERVER\HRFiles\Resumes and then
press Enter.
5. Right-click in the Resumes shared folder, and then click Paste.

 Task 5: Work with Data in a FileTable

1. In the Resumes shared folder, double-click the Stephen Jiang file. Note that the file opens in Word,
just as it would if the file were not in a FileTable.
2. Close Word, and then close File Explorer.
3. In SQL Server Management Studio, in the FileTable.sql query window, under the Query the
FileTable metadata comment, highlight the Transact-SQL statement, and then click Execute.
4. Review the results, noting that the GetFileNamespacePath function returned the full UNC path to
each file in the FileTable shared folder.
5. Close the FileTable.sql query window, and do not save any changes.
Results: At the end of this exercise, you will have:
Enabled FILESTREAM in the HumanResources database.
Created a FileTable.
Exercise 2: Create and Use a Full-Text Index

 Task 1: Create a Full-Text Index
2. In the Open File dialog box, navigate to the D:\Labfiles\Lab15\Starter folder, click
FullTextIndex.sql, and then click Open.
3. Under the Create a full-text catalog comment, highlight the Transact-SQL statement, and then click
Execute.
4. Under the Get index name for the FileTable primary key comment, select the Transact-SQL
statement, and then click Execute.
5. In the Results pane, note the name of the primary key index.
6. Under the Create full-text index comment, in the Transact-SQL statement, locate the KEY INDEX
clause.
7. In the KEY INDEX clause, replace the index name (which will begin with PK_Resumes_ and is followed
by a string of numbers and letters) with the index name that you noted in step 5.
8. Under the Create full-text index comment, select the Transact-SQL statement, and then click
Execute.
 Task 2: Query a Full-Text Index

1. In SQL Server Management Studio, in the FullTextIndex.sql query window, under the Find resumes
containing ‘management’ comment, highlight the Transact-SQL statement, and then click Execute.
3. Under the Find resumes containing ‘machinist’ within 50 terms of ‘degree’ comment, highlight
the Transact-SQL statement, and then click Execute.
5. Close the FullTextIndex.sql query window, and do not save any changes.
L15-3
 Task 3: Enable Semantic Search

2. In the Open File dialog box, navigate to the D:\Labfiles\Lab15\Starter folder, click
SemanticSearch.sql, and then click Open.
3. Under the Register the semantic language database comment, highlight the Transact-SQL
statement, and then click Execute.
4. Under the Add semantic search index comment, highlight the Transact-SQL statement, and then
click Execute
 Task 4: Query by Using Semantic Search

1. In the SemanticSearch.sql query window, under the Get the top 10 keyphrases and which
documents they appear in comment, review the Transact-SQL SELECT statement, select the
Transact-SQL SELECT statement, click Execute, and then review the results.
2. Under the Get the top 10 phrases in Shai Bassli.doc comment, review the Transact-SQL SELECT
statement, select the Transact-SQL SELECT statement, click Execute, and then review the results.
3. Under the Find the top two resumes that are about 'Production' comment, review the Transact-
SQL SELECT statement, select the Transact-SQL SELECT statement, click Execute, and then review the
results.
4. Close the SemanticSearch.sql query window, and do not save the changes.
Results: At the end of this exercise, you will have created a full-text catalog and a full-text index, and you
will have tested the index by running queries against it.

20464C ENU TrainerHandbook PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20464C ENU TrainerHandbook PDF

Uploaded by

Copyright:

Available Formats

MCT USE ONLY.

STUDENT USE PROHIBITED

Microsoft and the trademarks listed at

Product Number: 20464C

Part Number (if applicable):

a. If you are a Microsoft IT Academy Program Member:

b. If you are a Microsoft Learning Competency Member:

d. If you are an End User:

e. If you are a Trainer.

3. LICENSED CONTENT BASED ON PRE-RELEASE TECHNOLOGY. If the Licensed Content’s subject

11. APPLICABLE LAW.

This limitation applies to

LIMITATION DES DOMMAGES-INTÉRÊTS ET EXCLUSION DE RESPONSABILITÉ POUR LES

Revised July 2013

Geoff Allix – Lead Content Developer

Lesson 1: Introduction to the SQL Server Platform 1-2

Lesson 2: Working with SQL Server Tools 1-9

Lesson 3: Configuring SQL Server Services 1-14

Lab: Introduction to Database Development 1-19

Module Review and Takeaways 1-21

Module 2: Designing and Implementing Tables

Lesson 2: Working with Character Data 2-10

Lesson 3: Designing Tables 2-15

Lesson 5: Creating and Altering Tables 2-24

Lab: Designing and Implementing Tables 2-29

Module Review and Takeaways 2-32

Module 3: Ensuring Data Integrity through Constraints

Lesson 1: Enforcing Data Integrity 3-2

Lesson 3: Implementing Entity and Referential Integrity 3-8

Lab: Ensuring Data Integrity Through Constraints 3-15

Module 4: Introduction to Indexes

Lesson 1: Core Indexing Concepts 4-2

Lesson 3: Table Structures in SQL Server 4-9

Lesson 4: Working with Clustered Indexes 4-14

Lesson 5: Working with Nonclustered Indexes 4-21

Lab: Creating Indexes 4-26

Module Review and Takeaways 4-28

Module 5: Advanced Indexing

Lesson 1: Core Concepts of Execution Plans 5-2

Lesson 2: Common Execution Plan Elements 5-9

Lesson 3: Working with Execution Plans 5-14

Lesson 4: Designing Effective Nonclustered Indexes 5-17

Lesson 5: Performance Monitoring 5-21

Lab: Advanced Indexing 5-27

Module Review and Takeaways 5-30

Module 6: In-Memory Database Capabilities

Lesson 1: The Buffer Pool Extension 6-2

Lesson 2: Columnstore Indexes 6-5

Module Review and Takeaways 6-13

Module 7: Designing and Implementing Views

Lesson 2: Creating and Managing Views 7-6

Lesson 3: Performance Considerations for Views 7-11

Module Review and Takeaways 7-18

Module 8: Designing and Implementing Stored Procedures

Lesson 1: Introduction to Stored Procedures 8-2

Lesson 2: Working with Stored Procedures 8-6

Lesson 3: Implementing Parameterized Stored Procedures 8-12

Lesson 4: Controlling Execution Context 8-17

Lab: Designing and Implementing Stored Procedures 8-20

Module Review and Takeaways 8-23

Module 9: Designing and Implementing User-Defined Functions

Lesson 1: Overview of Functions 9-2

Lesson 2: Designing and Implementing Scalar Functions 9-4