You are on page 1of 163

SQL

The Ultimate Beginners


Guide!

Andrew Johansen

Copyright 2015 by Andrew Johansen- All rights


reserved.
This document is geared towards providing exact and
reliable information in regards to the topic and issue
covered. The publication is sold with the idea that the
publisher is not required to render accounting, officially
permitted, or otherwise, qualified services. If advice is
necessary, legal or professional, a practiced individual in the
profession should be ordered.
- From a Declaration of Principles which was accepted and
approved equally by a Committee of the American Bar
Association and a Committee of Publishers and
Associations.
In no way is it legal to reproduce, duplicate, or transmit any
part of this document in either electronic means or in printed
format. Recording of this publication is strictly prohibited
and any storage of this document is not allowed unless with
written permission from the publisher. All rights reserved.
The information provided herein is stated to be truthful and
consistent, in that any liability, in terms of inattention or

otherwise, by any usage or abuse of any policies, processes,


or directions contained within is the solitary and utter
responsibility of the recipient reader. Under no
circumstances will any legal responsibility or blame be held
against the publisher for any reparation, damages, or
monetary loss due to the information herein, either directly or
indirectly.
Respective authors own all copyrights not held by the
publisher.
The information herein is offered for informational purposes
solely, and is universal as so. The presentation of the
information is without contract or any type of guarantee
assurance.
The trademarks that are used are without any consent, and the
publication of the trademark is without permission or
backing by the trademark owner. All trademarks and brands
within this book are for clarifying purposes only and are the
owned by the owners themselves, not affiliated with this
document.

Table of Contents
Introduction
Chapter 1: SQL The Basics
Chapter 2: The SQL Commands That You Can Use
Chapter 3: Data Types
Chapter 4: How to Manage Database Objects
Chapter 5: Database Normalization
Chapter 6: Data Manipulation
Chapter 7: How to Manage Database Transactions
Chapter 8: How to Get Excellent Results from Database
Queries
Chapter 9: Categorize Information Using Database Operators
Conclusion

Introduction
I want to thank you and congratulate you for getting my
book
SQL: The Ultimate Beginners Guide!

This book will teach you the basics of SQL and database
operations. Since SQL is a language used to manage
databases, you have to familiarize yourself with its basics
and nuances. Dont worry if you have never used SQL
before: this book will turn you from a beginner to an efficient
SQL-user.
This book will cover important topics about SQL. For
instance, a chapter focuses on the operators that you can use.
Another chapter, however, concentrates on giving you
accurate results from your database queries. Overall, youll
be an effective SQL user after reading this book.
Thanks again for getting a copy of this book, I hope you enjoy
it!

Chapter 1: SQL The Basics


Every company generates, holds, and uses data. Thus,
companies need an organized way of storing information.
This organized way is known as DBMS (i.e. Database
Management System). Business organizations have been
using these database systems for years.
At first, these systems were extremely simple. With the
modern technology, however, database management systems
have evolved. Even the most basic ones have powerful
features you can take advantage of. The DBMSs youll see
today can store large amounts of data and access the internet.
The new breed of data management is often implemented
using an RDBMS (i.e. Relational Database Management
System). This is derived from the classic DBMS. Relational
database systems often have Web and server/client
technologies. That means they can help businesses in
managing various types of information and staying
competitive in todays market.
This book will teach you how to use SQL and relational
databases. By reading this book, youll learn how to manage
large amounts of data using SQL.

What Is SQL?
SQL stands for Structured Query Language. This is the basic
language used to interact with databases. The original
version was created by IBM during the 70s. During 1979,
right after IBM released the prototype, Relational Software
Inc. published the first SQL tool in the world. They called
this product ORACLE. This product became so successful
the entire company is now called Oracle Corporation. Today,
ORACLE is one of the leaders in database technologies.
You can pronounce SQL in two ways:
1. Read the letters one by one, as in S-Q-L
2. Read the letters as Sequel
Modern users prefer the second way of pronouncing the
name. Basically, you will use SQL to tell a database about
your needs and wants. This process is similar to ordering
food in a restaurant: you can utilize SQL to get the
information you need.
Databases
Simply put, databases are groups of information. Some
people consider databases as organized mechanisms that can
store information, through which users can access data

efficiently and effectively.


You are actually using databases without knowing it. A
phonbook is considered as a database it contains
information about peoples names, phone numbers, and
physical addresses. The information is presented in
alphabetical order, which allows you to find certain pieces
of data easily.
Here is an image of a simple database:

Relational Databases An Introduction


This is a database segregated into tables or logical units.
These tables are interconnected inside the database.
Relational databases allow you to break data into smaller
units. Thus, you can easily maintain your database and
optimize it based on the needs of your organization. Heres a
visual representation of a relational database:

Client/Server Technology An Introduction


Previously, mainframe computers dominated the computer
industry. Mainframe computers are machines that possess
large, robust systems capable of storing and processing
information. People interacted with mainframe computers
using dumb terminals (i.e. terminals not capable of thinking
on their own). To perform their functions, dumb terminals
relied on the computers storage, memory, and processor.
This setup worked excellently. Actually, some companies are
still using them in their business operations. However, a
better solution was developed: the client/server technology.
In client/server systems, the main computer (i.e. the server)
can be accessed using a network usually a WAN (wide
area network) or a LAN (local area network). Users
typically access the server using desktop computers or other
servers, rather than dumb terminals. Every computer,
referred to as a client, has access to the system, enabling
interaction between the server and the client/s.
The major difference between mainframe and client/server
environments is that with the latter, the users computer can
think on its own and run its own applications. Because of
these features, client/server environments are preferred by
modern businesses and non-profit organizations.
Heres a visual representation of a client/server setup:

Internet-Based Database Systems An Introduction


In general, database systems are moving towards Internet
integration. Users can access database using the internet
customers can check a companys database using ordinary
web browsers. Customers (i.e. the people who use data) can
purchase items, pay online, check inventories, view the status
of their transactions, and perform changes to their accounts.
To see the companys database, you just need to access a
web browser, go to the companys website, log in (if
required), and search for the information you need. Many
businesses require customers to create an account with them
mainly for security reasons. These businesses provide
usernames and passwords to their customers for free.
Obviously, a lot of things happen behind the scenes while a
customer checks an internet-based database. For instance, the
web browser may execute SQL to find the data needed by the
user. The SQL is utilized to reach the database, give the data
to the websites server, and relay that data to the users web

browser.

Chapter 2: The SQL Commands That You


Can Use
This chapter will define the basic commands that you can use
in SQL. These commands can help you in performing
different actions on your database. To help you understand
these commands, lets divide them into six categories. These
categories are:
Data Definition Language (also called DDL) This
is the aspect of SQL that allows you to generate and
arrange database objects (e.g. adding or deleting a
database table). Here are the well-known DDL
commands that you can use:
DROP TABLE
CREATE TABLE
ALTER TABLE
CREATE VIEW
DROP VIEW
DROP INDEX
ALTER INDEX

CREATE INDEX
Important Note: These commands (and the ones youll see
below) will be explained further in the succeeding
chapters.
Data Manipulation Language (also known as DML)
This is the aspect of SQL used to modify information
inside database objects. DML has three main
commands, namely:
UPDATE
INSERT
DELETE
Data Query Language (also called DQL) This is the
most powerful aspect of SQL, particularly when used
with modern database systems. DQL is composed of a
single command, which is:
SELECT
You can use this command to run queries for relational
databases. If you want to get detailed results, you may
add clauses and options to this command.
Data Control Language (also known as DCL) You
should use these SQL commands if you want control the
access rights for a database. DCL commands are often
used to generate database objects related to access

rights. This includes the distribution of access


privileges among data users. Here are some of the wellknown DCL commands:
REVOKE
GRANT
ALTER PASSWORD
CREATE SYNONYM
Data Administration Commands With these
commands, you can audit and analyze the operations
done on a database. You can also use them to assess the
systems overall performance. Here are the
administration commands you should use:
STOP AUDIT
START AUDIT
Important Note: Data administration and database
administration are two different things. Basically,
database administration is the overall management of a
database, which involves the utilization of all SQL
commands. This kind of administration is more specific
to all SQL implementations than the main SQL
commands.
Transactional Control Commands You should use
these commands if you want to manage the transactions

within a database. Here are some examples:


COMMIT This saves the information about
database transactions.
SAVEPOINT This generates points inside
transaction groups. You should use this with the
ROLLBACK command.
ROLLBACK You should use this command if
you want to undo database transactions.
SET TRANSACTION This command assigns
names to database transactions. Use this if you
want to organize your database system.

Chapter 3: Data Types


In this chapter, youll learn more about data types and tables.
You will also know how to use each type of data.
The Basic Types of Data
Data types are attributes of the information itself, whose
characteristics are placed inside a table. For instance, you
may require that a field should hold numeric values only,
stopping any user from entering alphanumeric data. By
assigning data types to certain fields in a database, you can
minimize the possibility of errors in data entry.
Important Note: Each version of SQL has its own
array of data types. Nowadays, you should use versionspecific data types if you want to manage your database
properly. The basics, however, are the same for all SQL
versions.
Here are the basic types of data:
Numeric strings
Character strings
Time and date values

The Fixed-Length Characters


Constant characters, or strings that have constant length, are
saved via fixed length data types. Here is the typical data
type for a fixed-length character in SQL:
CHARACTER(n)
n is the assigned (or maximum) length of the field you are
working on.
Some SQL implementations utilize the CHAR data type to
save fixed-length information. You can use this data type to
store alphanumeric information. State abbreviations serve as
excellent example for this: each state abbreviation is
composed of two characters.
When using fixed data types, database users often add spaces
to fill excess fields. That means if the assigned length was 15
and the data you entered filled only 11 places, you should fill
the remaining four places with spaces. This method helps you
to make sure that each value is fixed-length.
Important Note:
If youre working on fields that may hold values of
different lengths (e.g. usernames), make sure that you
are not using fixed-length data types. If you used this
type incorrectly, you may encounter problems related
to data accuracy and storage space.
The Variable Characters

SQL also allows you to use varying-length strings (i.e.


strings whose length may change from one data unit to
another). Heres the standard notation for varying-length
characters: CHARACTER VARYING (n)
n is a number that identifies the maximum or assigned field
length.
VARCHAR2 and VARCHAR are two of the well-known
variable-length data types. ANSI (American National
Standards Institute) considers VARCHAR as the standard
data type for variable-length characters. Because of this,
popular services such as MySQL and Microsoft SQL Server
use it for their regular operations. Oracle, a top-notch
database system provider, uses VARCHAR2 and VARCHAR.
Character-defined columns may hold alphanumeric data: you
can enter letters and numbers into these columns.
Varying-length data types dont need spaces to fill excess
fields. For example, if the assigned length for a field is 15,
and you enter a string of 11 characters, the overall length of
that value is just 11. You dont have to use spaces to populate
empty places.
Important Note:
If you are working with variable character strings,
you should use a varying-length data type. This allows
you to maximize database space.

The Numeric Values


A numeric value is stored in a field defined as a number.
Numeric values are commonly referred to as REAL,
NUMBER, DECIMAL, and INTEGER.
Here are the SQL standards for numeric values:
INTEGER
FLOAT(p)
BIT(n)
REAL(s)
DECIMAL(p, s)
BIT VARYING(n)
DOUBLE PRECISION(P)
p is a number that shows the maximum or assigned field
length.
s is a number located at the right side of a decimal point
(e.g. 15.ss).
The Decimal Values
A decimal value is a numeric value that contains a decimal
point. Here is the SQL standard for decimal values.
Remember that p represents precision and s represents the
scale of the decimal.
DECIMAL(p, s)

Important Notes:
Precision is the overall length of a numeric value.
For this value: (5.3), 5 is the precision. It is the length
assigned to a numeric value.
Scale refers to the total number of digits found on
the decimal points right side. In the previous example
(i.e. 5.3), 3 is the scale of the numeric value.
The Integers
Integers are numeric values that dont involve a decimal
point. That means integers are whole numbers (regardless if
they are negative or positive). Here are some examples of
valid integers:
2
0
-3
99
-199
200
The Floating-Point Decimals
A floating-point decimal is a decimal value whose scale and
precision have varying lengths. Virtually, floating-point

decimals have unlimited character lengths. That means any


scale and precision is valid. The data type called REAL
assigns a column that holds single-precision, floating-point
decimals. The data type called DOUBLE PRECISION, on
the other hand, assigns a column that holds double-precision,
floating-point decimals.
A floating-point number is only considered as singleprecision if its precision is 1 to 21. A floating-point number
will be considered double-precision if its precision is 22 to
53.
Time and Dates
Obviously, these data types are used to record data regarding
time and dates. Typical SQL distributions support data types
known as DATETIME. Here are some of the popular
members of this category:
TIME
DATE
TIMESTAMP
INTERVAL
When working with these data types, youll encounter the
following elements:
Day

Year
Hour
Second
Minute
Month
The Literal Strings
Literal strings are series of characters (e.g. names, phone
numbers, etc.) that are specified by a program or database
user. In general, a literal string is composed of data with
similar characteristics. The value of the entire string is
identified. The columns value, on the other hand, is often
unknown since different values exist between data columns
and data rows.
When using literal strings, you dont really specify the data
types. You are just specifying the strings. The following list
shows some literal strings:
Morning
50000
50000
5.60
August 1, 1991
Alphanumeric strings require quotation marks (either single

or double). Number strings, on the other hand, can be stored


without any quotation mark.
The Null Values
Basically, null values are missing values or columns in a
data row that hasnt received a value yet. These values play
an important role in almost every aspect of SQL. You will
use null values in creating tables, assigning search
conditions, and entering literal strings.
If you need to reference null values, you may use the
following methods:
(i.e. single quotation marks with a space between
them)
Enter NULL (i.e. the word NULL itself)
When working with a null value, you should know that data
doesnt have to be entered in any field. If the fields you are
working on require data, use a data type followed by NOT
NULL. If theres a possibility that a field doesnt require
data, you should use the null data type.
The Boolean Values
Boolean values are values that can be NULL, TRUE, OR
FALSE. You should use BOOLEAN values when comparing
data units. For instance, when you specify parameters for a
search, every condition results to either a FALSE, TRUE, OR
NULL. If all of the parameters return the BOOLEAN value of

NULL or FALSE, data might not be retrieved. If the value is


TRUE, however, data is retrieved.
Heres a simple example:

A database user may have used this line to perform a search.


The system will evaluate this line for each data row. If
NAMEs value is equal to SMITH in one of the data rows,
the search gives TRUE as the result. Afterward, the user will
get the data linked with that search result.

Chapter 4: How to Manage Database


Objects
This chapter will discuss database objects: their nature,
behaviors, storage requirements, and interrelatedness.
Basically, databases objects are the backbone of relational
databases. You use these objects to store data (i.e. they are
logical units found inside a database). For this reason, these
objects are also called back-end databases.
What is a Database Object?
Database objects are the defined objects within a database
utilized to save or retrieve information. Here are several
examples of database objects: views, clusters, tables,
indexes, synonyms, and sequences.
The Schema
A schema is a set of database objects linked to a certain
database user. This user, known as the schema owner,
owns the set of objects linked to his/her username. Simply
put, any person who generates an object has just generated
his/her own schema. That means users have control over
database objects that are generated, deleted, and

manipulated.
Lets assume that you received login credentials (i.e.
username and password) from a database administrator. The
username is PERSON1. Lets say you accessed the database
and created a table named EMPLOYEES_TBL. In the
databases records, the files actual name is
PERSON1.EMPLOYEES_TBL. The tables schema name
is PERSON1, which is also the creator/owner of that table.
When accessing a schema that you own, you are not required
to use the schema name. That means you have two ways of
accessing the imaginary file given above. These are:
PERSON1.EMPLOYEES_TBL
EMPLOYEES_TBL
As you can see, the second option involves fewer characters.
This is the reason why schema owners prefer this method of
accessing their files. If other users want to view the file,
however, they must include the schema in their database
query.
The screenshot below shows two schemas within a database.

Tables The Main Tool for Storing Data


Modern database users consider tables as the main storage
tool. In general, a table is formed by row/s and column/s.
Tables take up space within a database and may be
temporary or permanent.
Fields/Columns
Fields, referred to as columns when working with a
relational database, are parts of a table where a particular
data type is assigned to. You should name a field so that it
matches the data type it will be used with. You may specify
fields as NULL (i.e. nothing should be entered) or NOT
NULL (i.e. something needs to be entered).
Each table should have at least one field. Fields are the
elements inside a table that store certain kinds of data (e.g.
names, addresses, phone numbers, etc.). For instance, youll
find a customer name column when checking a database
table for customer information.

Rows
Rows are records of data within a table. For instance, a row
in a customer database table might hold the name, fax
number, and identification number of a certain customer.
Rows are composed of fields that hold information from a
single record in the table.
SQL Statement CREATE TABLE
CREATE TABLE is an SQL statement used to generate a
table. Even though you can create tables quickly and easily,
you should spend time and effort in planning the structures of
your new table. That means you have to do some research
and planning before issuing this SQL statement.
Here are some of the questions you should answer when
creating tables:
What kind of data am I working on?
What name should I choose for this table?
What column will form the main key?
What names should be assigned to the fields/columns?
What type of data can be assigned to those columns?
Which columns can be empty?
What is the maximum length for every column?

Once you have answered these questions, using the CREATE


TABLE command becomes simple.
Heres the syntax to generate a new table:

The final character of that statement is a semicolon. Almost


all SQL implementations use certain characters to terminate
statements or submit statements to the server. MySQL and
Oracle use semicolons to perform these functions. TransactSQL, on the other hand, utilizes the GO command. To make
this book consistent, statements will be terminated or
submitted using a semicolon.
The STORAGE Clause
Some SQL implementations offer STORAGE clauses. These
clauses help you in assigning the table sizes. That means you
can use them while creating tables. MySQL uses the
following syntax for its STORAGE clause:

The Naming Conventions


When naming database objects, particularly columns and
tables, you should choose names that reflect the data they
will be used for. For instance, you may use the name
EMPLOYEES_TBL for a table used to hold employee
information. You need to name columns using the same
principle. A column used to store the phone number of
employees may be named PHONE_NUMBER.
SQL Command - ALTER TABLE
You can use ALTER TABLE, a powerful SQL command, to
modify existing database tables. You may add fields, remove
columns, change field definitions, include or exclude
constraints, and, in certain SQL implementations, change the
tables STORAGE values. Heres the syntax for this
command:

Altering the Elements of a Database Table


A columns attributes refer to the laws and behaviors of
data inside that column. You may change a columns
attributes using ALTER TABLE. Here, the term attributes
refers to:
The type of data assigned to a column.
The scale, length, or precision of a column.
Whether you can enter NULL values into a column.
In the screenshot below, ALTER TABLE is used on the
EMPLOYEE_TBL to change the attributes of a column
named EMP_ID:

If you are using MySQL, youll get the following statement:

Adding Columns to a Database Table


You must remember certain rules when adding columns to
existing database tables. One of the rules is this: You cant
add a NOT NULL column if the table has data in it.
Basically, you should use NOT NULL to indicate that the
column should hold some value for each data row within the
table. If youll add a NOT NULL column, you will go against
this new constraint if the current data rows dont have
specific values for the added column.
Modifying Fields/Columns
Here are the rules you should follow when altering existing
columns:
You can always increase a columns length.
You can decrease a columns length only if the highest
value for the column is lower than or equal to the
desired length.
You can always increase the quantity of digits for
numeric data types.
You can only decrease the quantity of digits for numeric

data types if the value of the largest quantity of digits in


the column is lower than or equal to the desired quantity
of digits.
You can increase or decrease the quantity of decimal
places for numeric data types.
You can easily change the data type of any column.
Important Note: Be extremely careful when changing or
dropping tables. You might lose valuable information if you
will commit typing or logical mistakes while executing these
SQL statements.
How to Create New Tables from Existing Ones
You may duplicate an existing table using these SQL
statements: (1) CREATE TABLE and (2) SELECT. After
executing these statements, youll get a new table whose
column definitions are identical to that of the old one. This
feature is customizable: you may copy all of the columns or
just the ones you need. The columns generated using this pair
of statements will assume the size needed to store the
information. Heres the main syntax for generating a table
from an existing one:

This syntax involves a new keyword (i.e. SELECT). This


keyword can help you perform database queries. In modern
database systems, SELECT can help you generate tables
using search results.
How to Drop Tables
You can drop tables easily. If you used the RESTRICT
statement and referenced the table using a view/constraint,
the DROP command will give you an error message. If you
used CASCASE, however, DROP will succeed and all
constraints and/or views will be dropped. The syntax for
dropping a table is:

Important Note: When dropping a database table, specify the


owner or schema name of the table you are working on. This
is important since dropping the wrong table can result to loss
of data. If you can access multiple database accounts, make

sure that you are logged in to the right account prior to


dropping any table.
The Integrity Constraints
You can use integrity constraints to ensure the consistency
and accuracy of data within a database. In general, database
users handle integrity concerns through a concept called
Referential Integrity. In this section, youll learn about the
integrity constraints that you can use in SQL.
Primary Key
A primary key is used to determine columns that make data
rows unique. You can form primary keys using one or more
columns. For instance, either the products name or an
assigned reference number can serve as a primary key for a
product table. The goal is to provide each record with a
unique detail or primary key. In general, you can assign a
primary key during table creation.
In the example below, the tables primary key is the column
named EMP_ID.

You can assign primary keys this way while creating a new
table. In this example, the tables primary key is an implicit
condition. As an alternative, you may specify primary keys as
explicit conditions while creating a table. Heres an
example:

In the example given above, the primary key is given after the
comma list.
If you need to form a primary key using multiple columns,
you use this method:

Unique Column Constraint


Unique column constraints are similar to primary keys: the
column should have a unique value for each row. While you
need to place a primary key in a single column, you may
place unique constraints on different columns. Heres an
example:

In the example above, EMP_ID serves as the primary key.


That means the column for employee identification numbers
is being used to guarantee the uniqueness of each record.
Users often reference primary key columns for database
queries, especially when merging tables. The EMP_PHONE
column has a unique value, which means each employee has
a unique phone number.
Foreign Key
You can use this key while working on parent and child
tables. Foreign keys are columns in a child table that points
to a primary key inside the parent table. This type of key

serves as the primary tool in enforcing referential integrity


within a database. You may use a foreign key column to
reference a primary key from a different table.
In the example below, youll learn how to create a foreign
key:

Here, EMP_ID serves as a foreign key for a table named


EMPLOYEE_PAY_TBL. This key points to the EMP_ID
section of another table (i.e. the EMPLOYEE_TBL table).
With this key, the database administrator can make sure that
each EMP_ID inside the EMPLOYEE_PAY_TBL has a
corresponding entry in EMPLOYEE_TBL. SQL practitioners
call this the parent/child relationship.
Study the following figure. This will help you to understand
the relationship between child tables and parent tables.

How to Drop Constraints


You can use the option DROP CONSTAINT to drop the
constraints (e.g. primary key, foreign key, unique column,
etc.) you applied for your tables. For instance, if you want to
remove the primary key in a table named EMPLOYEES,
you may use this command:

Some SQL implementations offer shortcuts for removing


constraints. Oracle, for instance, uses this command to drop a
primary key constraint:

On the other hand, certain SQL implementations allow users


to deactivate constraints. Rather than dropping constraints
permanently, you may disable them temporarily. This way,
you can reactivate the constraints you will need in the future.

Chapter 5: Database Normalization


In this chapter, youll learn about the process called
normalization. Normalization is the process of breaking a
database into smaller units. Developers use this procedure to
create databases that are easy to manage and organize. In
addition, normalization helps in ensuring the integrity and
accuracy of information within a database.
How to Normalize a Database
Basically, normalization means decreasing the redundancies
inside a database. You can use this technique to design or
redesign your databases. Many people consider it as a
collection of principles for optimizing information systems.
Raw Databases
Databases that havent been normalized may contain tables
that share identical pieces of information. Obviously,
redundant data can have negative effects on your database.
Here are some of the problems you may encounter: poor
security, inefficient database updates, and slow queries.
Before normalization, a database hasnt been divided into
smaller tables. The image below shows a database that needs

normalization:

The Importance of Logical Designs


Each database must be created and designed with the endusers in mind. Logical model, also called logical database
design, is a process where you will arrange data into
smaller groups. These data groups must be logical,
organized, and manageable. Logical designs can help you
reduce (or sometimes eliminate) data repetition.
The Needs of an End-User
The end-users needs play an important role in designing
databases. Keep in mind that these people will be the
ultimate consumer of the databases that are being developed.
In general, databases should have a user-friendly front-end
mechanism (i.e. the GUI of the database). Having an intuitive

interface is good, but it isnt enough. Good visuals must be


supported by excellent performance.
Here are the questions that must be answered when designing
a new database:
What kind of data will be stored?
How can the users access the data?
What kind of privilege do the users need?
How can the users group the data inside the database?
What is the connection between the pieces of data to be
stored?
How can accuracy and integrity of data be ensured?
Data Repetition
Data cannot be repetitive. That means you should minimize
data redundancy 100% of the time. For instance, it is
wasteful to keep a persons name in multiple tables.
Duplicated data can lead to inefficient use of storage space.
Aside from wasting available storage space, repetitive data
entries can lead to confusion. This happens if the data in one
table doesnt match that of a different one, even though both
tables are created for the same object/person.
Normal Forms
A normal form is a method of identifying the depth, or levels,
to which databases have been normalized. Basically, normal

forms are used to determine the normalization level of a


database.
The list below shows three normal forms commonly used in
normalizing databases.
The first form
The second form
The third form
Every subsequent form relies on the normalization techniques
done during the normal form before it. For instance, a
database must be in the second form before it can be
normalized using the third normal form.
Lets discuss each normal form in detail:
1. The First Form The goal of this form is to segregate
the data into tables. Once all of the tables have been
designed, the user assigns a primary key to some or all
of the tables. In the image below, the raw database
given earlier is improved using this normal form:

To attain the first form, you should divide the data into
smaller units. Each unit must have a primary key and
free from redundant data. The large table transformed
into smaller tables, namely: PRODUCTS_TBL,
EMPLOYEE_TBL, and CUSTOMER_TBL. Often, the
primary keys are found in the initial column of each
table. For this example, these are: CUST_ID,
PROD_ID, and EMP_ID.
2. The Second Form The goal of this form is to find data
that is partially reliant on the primary keys. Then, this
data will be transferred onto a different table. The
image below explains this normal form:

As the figure shows, you can reach the second normal


form by breaking the tables into smaller units.
The table named EMPLOYEE_TBL is divided into
two: EMPLOYEE_PAY_TBL and EMPLOYEE_TBL.
The employees personal information depends on
EMP_ID (which is the primary key), so that data (e.g.
LAST_NAME, MIDDLE_NAME, FIRST_NAME, etc.)
stayed inside EMPLOYEE_TBL.

Meanwhile, the data that is partially dependent on


EMP_ID (i.e. individual employees), populates the
table named EMPLOYEE_PAY_TBL. As you can see,
EMP_ID is present in both tables. EMP_ID acts as the
primary key for the two tables: it helps in matching
related information between the tables.
The CUSTOMER_TBL table was divided into
ORDERS_TBL and CUSTOMER_TBL. The process is
similar to the one explained above. The columns that
partly depend on the primary key were transferred to a
different table.
3. The Third Form The goal of this form is to eliminate
information that doesnt depend on a primary key. This
concept is illustrated by the following screenshot:

A new table was generated to show the importance of


the third form. The table named
EMPLOYEE_PAY_TBL is divided into two smaller
tables. One of the tables contains the pay information
for the employee. The other table, on the other hand,
contains position descriptions (this data doesnt belong
to EMPLOYEE_PAY_TBL. That means
POSITION_DESC is not related to the primary key (i.e.
EMP_ID) in any way.
The Naming Conventions
You should consider naming conventions during the
normalization process. You have to use names in order to
store and retrieve data. In general, you should assign names
that are relevant to the information you are working with.
This will help you to organize your database and avoid
confusion. Most businesses design and enforce naming
conventions for their database systems.
The Benefits of Normalizing a Database
Normalization gives you many benefits. Here are the main
ones:
Better organization for your databases
Reduction of repetitive information
Greater consistency for the whole database
Better flexibility for the database design

Better security for the entire database system


The normalization process organizes the information within a
database. It can streamline the task of everyone: from
common users to database administrators. Normalizing a
database also minimizes data repetition, which improves
data structure and optimizes storage space. Since duplicate
information is lessened, you will also enjoy better data
consistency. For instance, a persons name can read Mark
Smith in one table, and Mark A. Smith in a different table.
Since you have normalized the database and broken it into
smaller parts, you have more flexibility in terms of changing
data structures. Finally, you will get better security because
database admins can give restricted access to some users.
You can easily control security once the database has been
normalized.
The Downsides of Normalization
It is true that normalization can boost the effectiveness of
most database systems. However, this process has its own
drawback: normalization reduces the overall performance of
a database. Normalized databases need more memory,
processing power, and I/O (Input/Output) to complete
database queries and transactions. Once normalized, a
database needs to find the needed tables and merge data to
provide the information required by the user.
Referential Integrity

Basically, referential integrity means the data in one table


depends on the data stored in a different table. For example,
before you can add customers to ORDERS_TBL, you should
first record them in CUSTOMER_TBL. You may use
integrity constraints to control database values. In this case,
you should create the constraint with the table it will be
applied to. Often, database users utilize foreign and primary
keys to control referential integrity.
Denormalization
This is the process of modifying a normalized database to
enable data repetition. This technique is commonly used to
increase the performance of a database. Actually, people
only denormalize a database to improve its performance.
Heres an important point to remember: normalization can
slow the performance of database systems because of its
automated functions. Sometimes, it is better to have
redundant information than work with a slow system.

Chapter 6: Data Manipulation


DML (Data Manipulation Language) is the aspect of SQL that
helps you to perform changes within a database. Through
DML, you can fill tables with new information, update old
tables, and remove unnecessary data from any table.
How to Populate a Table with New Information
You can complete this process in two ways: (1) enter the
new data manually or (2) use computer programs to enter the
data automatically. Manual data population refers to entering
new data using a keyboard. Automated data population, on
the other hand, refers to loading data from an outside source
(e.g. a different database) and transferring it into the
preferred database.
When entering new data, different factors can influence the
type and quantity of data you can work with. Here are the
main factors you have to consider: current constraints, the
tables physical size, and the columns length.
Important Note: You can run SQL statements without
worrying about lowercase or uppercase characters.
However, data is extremely case-sensitive. For instance, if

you entered the data into a database using uppercase


characters, you should use uppercase characters when
referencing that data. The examples given below use
uppercase and lowercase characters just to prove that this
factor cannot influence the result.
How to Insert Data
You should use INSERT command to insert data into an
existing table. This command has several options; check the
syntax below:

With this syntax, you should specify each column in the list
named VALUES. As you can see, the values in this list are
separated by commas. You should use quotation marks to
enclose the values you want to insert, particularly if you are
working with date/time and character data types. You dont
have to use quotation marks for NULL values or numeric
data. Each column within the table should contain a value.
In the example below, you will insert a record into a table
named PRODUCTS_TBL.
The tables current structure:

Use this INSERT statement:

For this example, you inserted three values into a table that
has three columns. The values you inserted follow the
arrangement of columns within the table. Two of the values
are enclosed with quotation marks because their columns are
of the character type. The final value (i.e. cost) is a number
data type: quotation marks are optional.
How to Insert Data into Specific Columns
You may insert data into certain columns. For example, lets
assume that you need to insert the values for your employee
except his pager number. In this case, you should determine a
VALUES list and a column list while running the INSERT
statement. Heres a screenshot of the values you may use:

When inserting values into specific columns, here is the


syntax you should use:

In the example below, youll insert values into specific


columns inside a table named ORDERS_TBL. This is the
tables current structure:

Lets say you used this INSERT statement:

In this INSERT statement, you specified a list of columns by


enclosing the columns names in parentheses. Also, the
column list must be entered after the tables name. You have
specified the columns that you need. Basically, you excluded
the column named ORD_DATE.
If youll check the table definition, youll see that
ORD_DATE is an independent column: it doesnt need any
data from the table. This column doesnt require information
since you didnt specify NOT NULL in the definition for the
table. The NOT NULL statement says that the column accepts
NULL values. Moreover, the array of values should follow
the arrangement of the columns.
How to Insert Data from a Different Table
You can accomplish this by combining two SQL statements:
INSERT and SELECT. Heres the syntax you should follow:

This syntax has new keywords: FROM, WHERE, and


SELECT. Lets discuss them one by one. FROM is a part of
the database query that determines the location of the needed
data. This part should contain the name of table/s. WHERE,
another part of the database query, applies conditions to
improve the search results. Heres a sample condition:
WHERE PRODUCT = CAR. Lastly, SELECT is the
primary statement used to begin the SQL query.
Important Note: Applying a condition means adding
criteria on the information influenced by an SQL command.
How to Insert NULL Values
Inserting NULL values into an existing table is easy and
simple. Why would you add this kind of value into your
tables or databases? Well, you need to insert NULL values
into a column if you dont know the specific value that should
be placed there. For example, not every individual owns a
cellphone, so it is imprecise to insert a wrong cellphone
number. You can use the word NULL to insert null values into
your desired column. Heres the syntax:

How to Update Data


You can use the UPDATE statement to modify data. This SQL
statement doesnt add or delete records it simply updates
the data inside the table/s you are working on. In general,
UPDATE is used to modify tables one by one. You may
update a single row or multiple rows, depending on your
needs.
How to Update a Column
This is perhaps the simplest way of using the UPDATE
command. If youll update a column, you can update either a
single row or multiple rows. Here is the syntax for this
process:

How to Update Multiple Columns


As stated earlier, you may update many columns using the
UPDATE statement. Check this syntax:

This syntax has one SET and three columns: the columns are
separated by commas. In general, you should use a comma to
segregate different kinds of arguments.

For this example, a comma separates the columns that must


be updated.
How to Delete Data

You may use the DELETE statement to eliminate data rows


from a table. This statement will remove a whole record
(even columns). Thus, you shouldnt use it if you just want to
remove some values from several columns. You must be
extremely careful when using DELETE it is an effective
and efficient command. In this section, youll learn about the
different techniques in removing data.
To delete a record or multiple records, you should follow
this syntax:

This syntax uses WHERE as a supporting clause. This clause


is an important aspect of the DELETE command, particularly
if you are trying to eliminate specific data rows. Actually,
youll use WHERE with DELETE most of the time. Without
the WHERE clause, youll get a result similar to this one:

Important Note: All data rows inside a table will be deleted


if youll omit WHERE.
Keep in mind that this SQL statement can inflict permanent
damages on your database. In ideal situations, you
can
undo erroneous deletions using a backup file. In
some cases, however, it may be impossible to retrieve lost
data. If you can no longer recover the deleted data, you have
to re-enter it into your database. This is not a problem with a
single data row, but this can make you pull your hair if you
are dealing with hundreds (or thousands) of data rows.

Chapter 7: How to Manage


Database Transactions
Simply put, transactions are units or sets of work done on a
database. You can accomplish database transactions
manually (i.e. by typing) or automatically (i.e. using a
database program). For relational databases that use SQL,
you can use the DML statements to complete transactions.
DML statements were discussed in the previous chapter.
A database transaction can either be a DML command or a
sequence of commands. While conducting transactions, all of
the transactions need to be successful. If at least one
transaction fails, the remaining transactions will fail too.
Here are the characteristics of a database transaction:
Each transaction has a starting point and an endpoint.
Each transaction can be undone or saved.
If transactions fail to complete, none of them can be
saved.
Transactional Control

This is the capability to control different transactions that


may happen inside a database management system. Whenever
you talk about transactions, you will be referring to the DML
commands (i.e. UPDATE, DELETE and INSERT).
Once a transaction is successfully completed, you wont see
immediate changes in the affected data tables. Sometimes,
you have to use transactional control statements to finalize
your database transactions. These control statements can help
you save or undo the changes you have made.
Here are the control statements that you can use:
ROLLBACK
COMMIT
SAVEPOINT
When a database transaction is completed, the information
about it is kept either in an assigned area or a short-term
rollback area inside the database. These areas hold
transactional information until a control statement is
executed. As stated earlier, control statements may save or
discard transactions. The rollback area will be emptied once
the transaction is saved or discarded.
The image below shows how changes are performed on a
database:

The ROLLBACK Statement


You must use this statement to reverse unsaved changes.
ROLLBACK can only be applied to transactions made after
the last ROLLBACK or COMMIT statement. Heres the
syntax for the ROLLBACK statement:

Here, WORK is completely optional.


Important Note: Currently, MySQL doesnt support this
statement.
The COMMIT Statement
Youll use this statement to save the changes caused by your

transaction. This statement finalizes all transactions


completed after the last ROLLBACK or COMMIT statement.
When using this command, you must follow this syntax:

This syntax has a mandatory part: COMMIT. This part comes


with a character or statement used to finish the command.
The keyword WORK is optional: use it to improve the
commands user-friendliness.
The SAVEPOINT Statement
This is a part of a transaction where you can undo certain
changes without affecting the whole transaction. This is the
syntax for SAVEPOINT:

You can only use this statement when creating a SAVEPOINT


in transactional commands. If you want to undo changes, you

must use ROLLBACK. SAVEPOINT allows you to manage


database transactions by dividing them into small groups.
Going Back to a Save Point
If you want to roll back to a certain SAVEPOINT, use the
following syntax:

Removing a Save Point


You can use RELEASE SAVEPOINT to remove a save point
you have made. After removing a save point this way, you
wont be able to use that point in rolling back database
changes. You can use the RELEASE SAVEPOINT statement
to prevent unwanted reversals of database modifications.
Heres the syntax that you should follow:

Chapter 8: How to Get Excellent


Results from Database Queries
This chapter will focus on database queries. Here, youll
learn how to use the SELECT command on the results of your
queries. In general, you will use SELECT lots of times once
your database has been established. This command helps you
to search and view the information stored in your database.
The Query
Queries are inquiries into a database. These inquiries are
submitted through the SELECT command. You must use
queries to get data from a database. For example, if you have
a product table, you may execute an SQL command to
identify your best-selling product. This request for usable
product information is normal for modern relational
databases.
The Select Command
This command represents the DQL (i.e. Data Query
Language) aspect of SQL. You can use the SELECT

command to start and execute database queries. In general,


this statement cannot stand alone: you have to use additional
clauses to make queries possible. Aside from the mandatory
clauses, optional clauses exist to help users in improving the
effectiveness of database queries.
When using the SELECT command, there are four clauses
(also called keywords) that you must consider. These clauses
are:
1. SELECT This command is combined with FROM to
obtain data in a readable, organized format. You can use this
to determine the data you need to get. Heres the syntax of a
basic SELECT command:

The SELECT clause introduces the list of columns you like


to see in the search results. FROM, on the other hand,
introduces the tables you want to choose data from. You
should use the asterisk to indicate that each column will be
displayed in the query results. ALL allows you to view all of
the values for any column, even redundant data. DISTINCT
is an option that you can use to hide duplicate information.
As you can see, commas are used to separate the columns for
FROM and SELECT.

2. FROM You should use this clause in combination with


SELECT. Its a mandatory element of any database query.
The purpose of this clause is to specify the tables that must
be accessed during the search. When running a query, you
should indicate at least one table in the FROM clause.
The syntax of this clause is:

3. WHERE This clause can have multiple conditions (i.e.


the element of a query that display selective data as selected
by the database user). If you are using this feature, you should
connect the conditions using the OR and AND operators.
This is the syntax for WHERE:

4. ORDER BY You can use this clause to arrange the output


of a database query. This clause organizes the search results
using your selected format. By default, this clause organizes
query output in an ascending order the output will be

displayed from A-Z if you are working with names. This


statements syntax is:

Case Sensitivity
You should understand this concept completely if you want to
use SQL. Usually, SQL statements and clauses are not
sensitive to uppercase and lower case characters. That
means you can enter clauses and statements with the Caps
Lock on: it wont affect your SQL commands in any way.
However, case sensitivity becomes extremely important
when you are dealing with data objects. Most of the time,
data is stored using uppercase letters. This method helps
database users in maintaining the consistency of data.
For example, your database will be inconsistent if youll
enter data this way:
JOHN
John
John

If the data was stored as JOHN and you executed a query for
John, you wont get relevant output.

Chapter 9: Categorize
Information Using Database
Operators
Operators The Basics
Operators are reserved words or characters mainly used in
the WHERE clause of SQL statements. As their name
suggests, operators are used to perform operations (e.g.
comparisons and mathematical operations). Operators can
specify parameters for your SQL statements. Lastly, they can
connect multiple parameters within the same SQL statement.
This chapter will use the following operators:
Logical Operators
Comparison Operators
Arithmetic Operators
Operators for Negating Conditions
Lets discuss each operator type in detail:
Logical Operators

These operators use keywords to perform comparisons. In


this section, youll learn about the following logical
operators:
IN
LIKE
UNIQUE
EXISTS
BETWEEN
IS NULL
ANY and ALL
IN
With this operator, youll compare a value to a set of
specified literal values. You will only get TRUE if at least
one of the specified values is equal to the value being tested.
Heres an example:

LIKE
Here, youll use wildcard operators to compare a value
against similar ones. You can combine LIKE with the
following wildcard operators:

_ (i.e. The underscore)


% (i.e. The percent sign)
You should use the underscore to represent a character or
number. On the other hand, you must use % to represent
one, zero, or several characters. You may combine these
wildcard operators in your SQL statements. Here are some
examples:
WHERE PRICE LIKE 100% - This statement will find
values that begin with 100.
WHERE PRICE LIKE %100% This SQL statement will
search for values the include 200.
WHERE PRICE LIKE _11% This statement will search
for any value that has 11 as its second and third digits.
WHERE PRICE LIKE 1_%_%_% This statement will
find values that begin with 1 are at least four characters long.
WHERE PRICE LIKE %1 This will search for values
that have 1 as their last character.
UNIQUE
With this operator, you can check the uniqueness of one or
more data rows. Check this simple example:

EXISTS
You can use this operator to find data rows that meet your
chosen criteria. Heres an example:

BETWEEN
You can use BETWEEN to find values within a specific
range. Here, youll assign the maximum value and the
minimum value. You must include the maximum and minimum
values in your conditional set. Check this example:

IS NULL
You can use IS NULL to compare your chosen value with a
NULL one. For instance, you can identify the products that
dont have wheels by checking for NULL values in the
WHEEL column of your PRODUCTS_TBL table.
In the example below, you wont get a NULL value:

ANY and ALL


ANY is an operator that can compare a value against any
legitimate value in a list. The list of values should have
predetermined conditions. Heres an example:

ALL, however, compares your selected value against the


values contained in a different value set. Check this example:

Comparison Operators
These operators can test single values within SQL
statements. This category is composed of <, >, <>, and =.
You can use these operators to test:

Non-equality
Equality
Greater-than values
Less-than values
Non-equality
As an SQL user, you should use <> to test non-equality.
The operation gives TRUE if the data is not equal; FALSE if
the data is equal.
Important Note: You may also use the != operator.
Actually, many SQL implementations are using this operator
to test inequality. Check the implementation you are using to
find out more about this topic.
Equality
You can use this operator to test single values in your SQL
statements. Obviously, = (i.e. the equal sign) represents
equality. When checking for equality, you will only get data if
the chosen values are identical. If the values are equal, youll
get TRUE as the result. If the values arent equal, youll get
FALSE.
Greater-than, Less-than
In general, < and > can serve as stand-alone operators.
However, you can improve the effectiveness of your
operations if youll combine them with other operators.

Comparison Operators Simple Combos


You can combine = with < and >. Check the examples
below:

With <= 20,000 (i.e. less-than or equal-to 20,000), youll


get 20,000 and all of the values below it. If a database object
is within that range, youll get TRUE from the operation. If
the objects value is 20,001 or higher, on the other hand, you
will get FALSE.
The second example follows the same principle. The only
difference is that youll get TRUE for objects whose value is
20,000 and above. Youll get FALSE for objects with the
value of 19,999 and below.
Arithmetic Operators
These operators can help you perform mathematical
operations in the SQL language. In this section, youll learn
about the typical operators used in relational databases: +. -,
*, and /.
Lets discuss each operator in detail:

Addition
You can perform addition using + (i.e. the plus sign). Study
the following SQL statements:
SELECT MATERIALS + OVERHEAD FROM
PRODUCTION_COST_TBL In this SQL statement, youll
add up the values in the MATERIALS column and the
OVERHEAD column.
SELECT MATERIALS FROM PRODUCTION_COST_TBL
WHERE MATERIALS + OVERHEAD < 500 This
operation will return values where the sum of MATERIALS
and OVERHEAD is less than 500.
Subtraction
You can use - (i.e. the minus sign) to perform subtraction.
To help you understand this process, two examples are given
below:
SELECT SALES COST FROM
COMPANY_FINANCIALS_TBL For this SQL statement,
the COST column will be deducted from the SALES column.
SELECT SALES FROM COMPANY_FINANCIALS_TBL
WHERE SALES COST < 100000 This statement will
give you values where SALES minus COST is less than
100,000.
Multiplication

You should use * (i.e. the asterisk) to perform


multiplication. Check the examples below:
SELECT SALES * 10 FROM
COMPANY_FINANCIALS_TBL The values in the SALES
column will be multiplied by ten.
SELECT SALES FROM COMPANY_FINANCIALS_TBL
WHERE SALES * 10 < 100000 This statement will
return values where the product of (SALES * 10) is less than
100,000.
Division
You must use / (i.e. the slash symbol) when performing
division. Here are two examples:
SELECT SALES / 5 FROM
COMPANY_FINANCIALS_TBL The SALES column is
divided by 5.
SELECT SALES FROM COMPANY_FINANCIALS_TBL
WHERE SALES / 5 < 100000 This SQL statement will
return data rows where the result of (SALES / 5) is less than
100,000.
Some Combinations of Arithmetic Operators
You may combine arithmetic operators to streamline your
database processes. Keep in mind that SQL applies the
principles of precedence in mathematics. That means youll
perform multiplication and division first. Then, youll

complete the process by performing addition and subtraction.


You can only control the sequence of mathematical
operations if you will use parentheses.
Important Note: Precedence is the sequence in which
mathematical expressions are performed. Here are some
basic examples:
Expression
2+2*5

Result
12

(2 + 2) * 5
20 8 / 4 + 2
(20 8) / (4 + 2)

20
20
2

When working with multiple arithmetic operators, always


apply the principles of precedence. If youll forget about
precedence and the usage of parentheses, you will get
inaccurate results from your arithmetic operations in SQL.
Logical errors can still exist even if you have perfect
syntaxes for your SQL statements.
For the next examples, the parentheses dont influence the
result if only division and multiplication are performed.
Keep in mind that precedence is not important in these
situations. Study these examples:
Expression

Result

8 * 12 / 4
(8 * 12) / 4
8 * (12/4)

24
24
24

Operators for Negating Conditions


In this section, youll learn how to negate the logical
operators discussed above. Negating the effects of logical
operators is necessary if you want to alter the viewpoint of a
condition.
You should use NOT to cancel the operator it is used for.
NOT is a logical operator in SQL that can be utilized with
these techniques:
NOT EQUAL
Earlier, you learned how to check for inequality using <
and >. It is important to mention inequality here since if
you are checking for it, you are already cancelling the =
operator. Heres another technique that you can use to test
inequality:
WHERE PRICE <> 10000 Price is not equal to 10,000
WHERE PRICE != 10000 Price is not equal to 10,000
In the second statement, the ! negates the comparison for
equality. Some SQL implementations allow users to combine

! with the typical inequality operators (i.e. < and >).


NOT BETWEEN
You can negate BETWEEN using the NOT BETWEEN
operator. Heres an example:
WHERE PRICE NOT BETWEEN 5000 AND 10000
The value of PRICE cant fall within the 5,000 to 10,000
range.
NOT IN
You can use NOT IN to negate the IN operator. In the
example below, all prices that are not included in the list
will be returned.
WHERE PRICE NOT IN (200, 300, 400) Action will
only be taken if PRICE is not equal to any value in the list.
NOT LIKE
NOT LIKE negates the wildcard operator LIKE. If you are
using NOT LIKE, you will only get values different from the
one you specified. Here are some examples:
WHERE PRICE NOT LIKE 100% This SQL statement
will find values that begin with any number except 100.
WHERE PRICE NOT LIKE %100% - This statement will
get values that dont have 200 in them.
WHERE PRICE NOT LIKE _11% This SQL statement
will give you values that dont have 11 in their second and

third positions.
WHERE PRICE NOT LIKE 1_%_% This statement
WILL NOT find values that begin with 1 and are three
characters long.
IS NOT NULL
You can use IS NOT NULL operator to negate IS NULL. This
procedure is usually done to check for data that isnt NULL.
Heres an example:
WHERE PRICE IS NOT NULL This operation will return
price values that are not null.
NOT EXISTS
This operator can help you negate EXIST. Study the example
below:

In this example, the maximum cost is shown in the output


section. This is because the cost of all existing records is
less than 100.
NOT UNIQUE
Use this operator to negate UNIQUE.
WHERE NOT UNIQUE PRICE (SELECT FROM
PRODUCT_TBL) This statement checks whether there are
non-UNIQUE prices in the PRODUCT_TBL table.
Conjunctive Operators
Sometimes, you have to use multiple criteria. This is usually

the case if you are getting confusing results from your


database queries. You can combine different criteria in your
SQL statements using the conjunctive operators. These are:
OR
AND
OR
You can use this operator to combine conditions in the
WHERE clause of your SQL statement. Before an SQL
statement can take any action, the criteria should be TRUE or
separated by OR. Heres an example:
WHERE PRICE = 100 OR PRICE = 300 This statement
will find values in the PRICE column that match either 200
or 300.
AND
This operator allows you to include multiple criteria in your
SQL statements WHERE section. Your SQL statement will
only take action if the criteria segregated by AND are all
TRUE. Analyze the example below:
WHERE PRODUCT_ID = ABC AND PRICE = 200
This statement will look for data objects whose
PRODUCT_ID value is ABC and PRICE value is 200.
Important Note: Keep in mind that you can always combine
multiple operators and conditions in your SQL statements.

You can also improve the readability of your statements by


using parentheses.

Chapter 10: Creating Table and Inserting


Data
Let's say that we have a database. It has no data in it yet.
What sort of data should we store in our first table? Let's
start with a shopping list, which you've probably used in real
life. Let's say you have a list that has three items and how
much we want to buy of each of them.
Our first bit of SQL will be the command to make the table to

store this list. In the following examples, note that we will be


using SQLite to type the code for the SQL language. Note that
the numbers on the left hand side of our example code is
placed there to represent each specific line of the code. They
are not part of the actual SQL code per se.
So again, to create the table to store the list, let's start with
the command below:
1 /** Shopping list:
2 Blouse (4)
3 Pants (1)
4 Underwear (2)
6 **/
7
8 CREATE TABLE shopping_list ();
If you look at the sample code above, at line 8 we type in
CREATE TABLE in all caps, followed by the name of the
table, which in this case is "shopping_list," followed by an
opening and closing parenthesis, and then a semicolon.
Chances are, you'll see an error pop up because the SQL
interpreter expects to see the column names inside the
parenthesis. What column should we have in order to
describe each item on our list?
First, we need a name for the column, which we'll call
"name," and we need to follow that with a data type. We have

a few options for data type, but for this example we'll just go
with the TEXT data type. In the next chapter, we'll have a
more in-depth discussion about data types in SQL. Now if
you typed everything correctly, your code should now look
something like our example below:
1 /** Shopping list:
2 Blouse (4)
3 Pants (1)
4 Underwear (2)
6 **/
7
8 CREATE TABLE shopping_list (name TEXT);
If you're using SQLite to follow along, you will see that our
new table is now listed with one column at the ride hand
side. Keep in mind that, for our list, we also need to specify
how many of each thing to buy, like our four blouses. So let's
go ahead and add a quantity column as well. Keep in mind
that the values in the quantity column will always be a whole
number. So, let's use an integer for the data type. Your code
should now look something like the example below:
1 /** Shopping list:
2 Blouse (4)
3 Pants (1)
4 Underwear (2)

6 **/
7
8 CREATE TABLE shopping_list (name TEXT, quantity
INTEGER);
And now, looking at the right hand side if you're using
SQLite, you can see the new column listed in our table. At
this point, our database now looks pretty good. But we're
missing something that we need in databases, and that is a
unique identifier for each row.
In SQL, we almost always need unique IDs for each row in a
database. Why? Because we need a way to identify rows
later when we're updating or deleting them. Under no
circumstances should an SQL programmer be dependent on
other columns for row identification. This is because the
values within those rows could change anytime.
When creating columns in SQL, we typically specify the ID
column first. So go ahead and move your cursor before the
"name" column. We'll call our ID column "ID," which is
standard. And for the data type, we'll have to type the SQL
constraint "INTEGER PRIMARY KEY." This signals the
database that it should treat it as the row identifier, and that
each row must have a unique numeric value for this column.
Generally in SQL, constraints define rules in a database table
data. The "INTERGER PRIMARY KEY" constraint
specifically just says that a constraint should be a number,

not a zero value, and that it must be unique. Your code should
now look something like the example below:
1 /** Shopping list:
2 Blouse (4)
3 Pants (1)
4 Underwear (2)
6 **/
7
8 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER);
Now, we have our shopping_list table with three columns in
it. However, at this point, it's empty. Our next task would be
to put some data in it. On a new line, we'll write "INSERT
INTO" and then the table name "shopping_list," and then
"VALUES", and then followed by an opening parenthesis.
After the opening parenthesis, we now start listing the
column values in the order that we declared the columns.
The first column is "id," so we'll put "1" for it since we
haven't used it yet. The second column is name, so we'll
write "Blouse" since its one of the items on our list. Then
third column is quantity, so we'll write the number "4" since
the quantity that we declared for Blouse in our list is four.
After that, go ahead and type a closing parenthesis, and then
followed by a semicolon. Your code should now look like
the one below:

1 /** Shopping list:


2 Blouse (4)
3 Pants (1)
4 Underwear (2)
6 **/
7
8 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER);
9
10 INSERT INTO shopping_list VALUES (1, "Blouse",
4);
If you're using SQLite, you'll see that the schema now says
that there's one row in the groceries table. In other words,
our insertion worked. Go ahead and insert the other items on
the list using the same procedure. On the next line, go ahead
and type "INSERT INTO," followed by the name of the table,
"shopping_list," followed by "VALUES," and then followed
by an opening parenthesis. After the opening parenthesis, go
ahead and type "2" as our second unique identifier, followed
by the name of the item, which in this case is "Pants,"
followed by the quantity "1", followed by a closing
parenthesis, and then finally our semicolon, and so forth.
If you typed everything correctly, your code show now look
like the one below:
1 /** Shopping list:

2 Blouse (4)
3 Pants (1)
4 Underwear (2)
6 **/
7
8 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER);
9
10 INSERT INTO shopping_list VALUES (1, "Blouse",
4);
11 INSERT INTO shopping_list VALUES (2, "Pants",
1);
12 INSERT INTO shopping_list VALUES (3,
"Underwear", 2);
It will now say that it has three rows in SQLite. However, to
confirm that the database actually contains data, you have to
either click on the table name on the right, or type the line of
code below:
13 SELECT * FROM shopping_list;
Doing this will insert a select statement in your code that
would display all the data within the table. Your code should
now look like this:
1 /** Shopping list:
2 Blouse (4)

3 Pants (1)
4 Underwear (2)
6 **/
7
8 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER);
9
10 INSERT INTO shopping_list VALUES (1, "Blouse",
4);
11 INSERT INTO shopping_list VALUES (2, "Pants",
1);
12 INSERT INTO shopping_list VALUES (3,
"Underwear", 2);
13 SELECT * FROM shopping_list;
As you can see in our example code above, the code
"SELECT * FROM shopping_list;" is inserted at line 13.
And by inserting this line, you're actually making a query
request to display everything in the shopping list table.
Hence, it would yield the output below:
id

name
quantity

Blouse
4

Pants
1

3
2

Underwear

So that's basically it. That's all we need to do to create our


first table and add data into it. In the next section, we'll
discuss how to get the data back out of the table in more
interesting ways.

Chapter 11: Querying The Table


1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",
6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
9
We're back with our shopping list table, but we've expanded
on it a bit. It now has a column for which aisle number we
can find the item at the apparel store. We've also added a few
more items in our shopping list. You can really see the power
of SQL in the different ways that you can retrieve data from

your database. This is also where it can get a bit tricky.


So, how do we retrieve all of a column's rows from our
table? To perform any query, we write "SELECT," followed
by which column we're interested in. In this case, let's say
we're interested in the column "name," followed by the
statement "FROM," and then finally the table name that we're
selecting from, in this case, "shopping_list." Your code
should now look like the one below:
1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",
6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
9 SELECT name FROM shopping_list;
10

Again, you'll see the list of shopping items under the "name"
column on the right-hand side if you're using SQLite. Now,
what if we want all of the column names? To get all of the
column names, we can just replace "name" with an asterisk *
symbol in line 9 of our code above. Note that "SELECT *
FROM shopping_list;" is the same code that gets inserted in
a new line when you click on the table name in SQLite.
The list that it will show is out of order, though. If we went
from top to bottom at the apparel store with this list, we'd
have to keep changing aisles. We'd rather have it ordered by
aisle, so that we can be more efficient at the store. To do that,
we can just add an "ORDER BY" command to our query,
specifying which column we want to order by. Your code
should now look like the one below:
1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);

6 INSERT INTO shopping_list VALUES (5, "T-shirt",


6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
9 SELECT * FROM shopping_list ORDER BY aisle;
10
As you can see in the example above, we've just added the
command "ORDER BY" at the end of our code in line 9. This
is followed by the name of the column we want to order the
list by. If you're using SQLite, you should have a query output
similar to the one below:
id
quantity
2
1
3
2
5

name
aisle
Pants
2
Underwear
2
T-shirt
2

6
1
1
4
4

Hat
4
Blouse
7
Shorts

12

Now that's better. We can now get our items faster. However,
to be even more efficient, let's say you and your significant
other like to shop together and split the store. You both shop
one half of the store and you both meet at the checkout
counter. There are twelve aisles in this apparel store. So, for
your list, let's say you just want to know which items are in
aisles 6 through 12.
To filter results out in our SQL table, we need to use a
WHERE clause, followed by the column name, and then what
we want to compare it to. At this point, this is how your code
should look like:
1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",
6, 2);

7 INSERT INTO shopping_list VALUES (6, "Hat", 1,


4);
8
9 SELECT * FROM shopping_list WHERE aisle > 5
ORDER BY aisle;
10
As you can see from our example code above, we used a
greater than operator ">" to compare the aisle number,
whether it has an aisle number that's greater than 5. Even
though we used the Greater Than operator here, keep in
mind that there are many other comparison operators that you
can use as well, depending on what it is you were trying to
filter by.
Great work so far! Not only do we know what items to get,
but also you've learned a few ways to use SQL to query. In
the next chapter, we're going to discuss even more ways we
can select data in SQL.

Chapter 12: Aggregating Data


1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",
6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
We're back with our grocery list. As you can see, it has six
rows in it. However, we need to buy more than one of each
item, like our Blouse. We're not sure offhand how many items
we'll end up buying total. We'd like to know the total number
so that when we reach the counter to pay out, we can just
perform a quick check to see if we have the correct number

of items in our cart.


To do that in SQL, we can use what's called an aggregate
function. An aggregate function is useful for things like
getting the maximum, minimum, sum, and average of values
in our database. In our example code, to get the total number
of items, we'll start with "SELECT" clause, followed by the
name of the function, in this case "SUM," and them
immediately followed by an open parenthesis. After the open
parenthesis, type in the name of the column that we want, and
then a closing parenthesis. After the closing parenthesis, we
type "FROM," followed by the name of the table we're
selecting from, and then lastly, a semicolon.
Your code should now look something like the one below:
1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",

6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
9 SELECT SUM(quantity) FROM shopping_list;
10
If you're using SQLite, you can see on the right hand side that
the SUM is 15. This means that you should have 15 items in
your cart if you got everything correctly. Notice that if you go
back to line 2 of our code and change the number of Blouses,
you can see the SUM increase in real time.
Now, we could easily try out other aggregate functions here,
because SUM is not the only one. If we want to know what is
the most that we'll be buying of any one item for example, we
could use the MAX aggregate function. What if we wanted to
make sure we had the right number of items after each aisle?
Well, we can do that in SQL using the GROUP BY clause.
We add the GROUP BY clause at the end of a query,
specifying the column name to GROUP BY, which in this
case is "aisle." Your code should now look something like
the one below:
1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);

2 INSERT INTO shopping_list VALUES (1, "Blouse",


4, 7);
3 INSERT INTO shopping_list VALUES (2, "Pants", 1,
2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",
6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
9 SELECT SUM(quantity) FROM shopping_list
GROUP BY aisle;
10
So now, we can see that in one aisle we have nine items, and
in another we have one item. However, we don't actually
know which aisle we're getting each of those in. So, what we
can do here is just add "aisle" immediately after the SELECT
clause like so:
1 CREATE TABLE shopping_list (id INTEGER
PRIMARY KEY, name TEXT, quantity INTEGER, aisle
INTEGER);
2 INSERT INTO shopping_list VALUES (1, "Blouse",
4, 7);

3 INSERT INTO shopping_list VALUES (2, "Pants", 1,


2);
4 INSERT INTO shopping_list VALUES (3,
"Underwear", 2, 2);
5 INSERT INTO shopping_list VALUES (4, "Shorts", 1,
12);
6 INSERT INTO shopping_list VALUES (5, "T-shirt",
6, 2);
7 INSERT INTO shopping_list VALUES (6, "Hat", 1,
4);
8
9 SELECT aisle, SUM(quantity) FROM shopping_list
GROUP BY aisle;
10
If you're using SQLite, the result that you see on the right
hand side should look something like this:
aisle
2
4
7
12

SUM(quantity)
9
1
4
1

As you can see, we're going to get nine items in aisle two,
one item in aisle four, four items in aisle seven, and one in
aisle twelve. Now you might be asking, how did that actually
work behind the scenes? What happened is that the SQL

engine first did the grouping of the rows based on aisle. It


first executed the GROUP BY clause. Then, it summed up the
quantity in each of those groups. And then finally, it selected
the first aisle value that it saw in each group. We already
know that the aisle value was the same value for all of our
previous examples, so we got the aisles back out. In other
words, SQL executed the code at line 9 from right to left.
We could also type "name" instead of "aisle" after the
SELECT clause of our code at line 9. However, it's a bit
misleading because for some of these aisles, there are
actually multiple items in that group. Therefore, you shouldn't
be using something different from what you're grouping by,
because you might not get a sensible result.
So far so good. Now you know how to aggregate functions,
and how to group them by. You can now officially gather
useful statistics on your dataa great tool in your SQL
toolbox.

Chapter 13: Queries with AND, OR


We've spent the last few chapters making a shopping list.
Let's now go ahead and track a different type of information
in our database. Let's now see how much exercise you're
doing. So, let's start by typing in the usual code for creating a
table, and then insert a few rows of data:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,
120);

12
13 SELECT * FROM workout_logs;
Let's now see what we have in our workout_logs table. We
have an "id" of course, an "activity," which is a string like
"bench_press" or "tread_mill," "minutes," for how many
minutes you've spent, "calories," for how many you've
burned, and "pulse_rate," for how high it went.
If you look closely, you'll see that we're inserting the data
using a slightly different syntax from before. Notice how
we're specifying the column names between an open and
close parenthesis after the table name. When we do that, it
means that we don't have to specify the columns again and
again after every VALUES clause. We only have to specify
the data that we want to enter in the column names. Here's a
question, do you see which column we didn't specify in our
example code?
The missing column is "id," and that's very much on purpose.
As you can see in line 2 of our code, the "id" column is the
primary key constraint in the table and it's set to autoincrement. What this means is the database will
automatically put an id that's different from the other rows in
the table for us. It's usually a number that's one bigger than
the biggest number so far.
We'd rather have the database figure out the id designation

for us instead of us figuring it out by ourselves. In the SQL


world, it's usually recommended to do data insertions this
way.
So there we go. Our database table is set up. Let's now find
out through which workout routine you're burning the most
calories with the help of a simple query. We'll just modify
our current code a little bit and add the "WHERE calories >
50 ORDER BY calories" clause at the end of our code at line
13. You code should now look like the one below:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,

120);
12
13 SELECT * FROM workout_logs WHERE calories >
50 ORDER BY calories;
Once you put type that code in at line 13, you'll most likely
get the result below:
id
calories
1
3

activity
pulse_rate
bench_press
100
tread_mill
200

minutes
30
110
15
120

As you can see, according to the database, you've done two


activities where you burned more than 50 calories. It also
shows that the benching press routine is the activity where
you burned the most calories. Now, let's find out which
activities in the database made you not only burn more than
50 calories, but also took you less than 30 minutes to do.
It's pretty obvious just by looking at the current results above
that there are only two results. However, what if there are
thousands of rows in this database? Maybe this is data from
thousands of users. In that case, we would need a way to
filter down the SQL query itself. To do that, we can use the

"AND" operator to combine multiple conditions, For


example, let's say we only need to return the rows where
calories are greater than 50 and minutes are less than 30.
Take a look below at how this can be implemented in code:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,
120);
12
13 SELECT * FROM workout_logs WHERE calories >
50 AND minutes < 30;

As you will most likely see in the result, the tread mill
workout is the only activity that you've done that burned more
than 50 calories, but only took less than 30 minutes to do.
In a similar way, we could use the "OR" operator to return
rows that meet any of some conditions. For example, let's say
we only need to return rows where 'calories' are greater than
50, 'OR pulse_rate' is above 100.
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,
120);
12

13 SELECT * FROM workout_logs WHERE calories >


50 OR pulse_rate > 100;
After executing the above code in SQLite, you'll see that the
bench press and tread mill routines both burn more calories,
or if not, push your pulse rate to above a hundred instead.
You can have as many "OR" and "AND" operators in your
query as youd like. The AND operator has precedence
over the OR operator if youve got both of them in the
same query. However, you can always use parentheses to
change the order of evaluation, just like with math
expressions.

Chapter 14: Querying IN subqueries


Were back with our table of workout logs, and weve added
a few more rows.
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,
120);
12 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);

13 INSERT INTO workout_logs(activity, minutes,


calories, pulse_rate) VALUES (dumbbells, 25, 72,
80);
14 INSERT INTO workout_losgs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
Now, if we wanted to filter these logs to just show your
bench press logs, we could just add the WHERE type
clause. Just add the line of code below at the bottom of your
exercise logs SQL code:
SELECT * FROM workout_logs WHERE activity =
bench_press;
As soon as you add that code in, youll have the result
below:
Id
calories
1
2

activity
pulse_rate
bench_press
100
110
bench_press
30
105

minutes
30
10

Now, lets say you need to ramp up your workout efforts.


Therefore, what we want to find are all of the available

workout activities, not just bench_press. To do that, we can


use the OR operator that we just learned, checking each of
the different workout types. Heres what your code should
look like at this point:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,
120);
12 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);
13 INSERT INTO workout_logs(activity, minutes,

calories, pulse_rate) VALUES (dumbbells, 25, 72,


80);
14 INSERT INTO workout_losgs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
16
17 SELECT * FROM exercise_logs WHERE activity =
bench_press OR type = dumbbells OR type =
deadlift OR type = squats;
If you type everything correctly, you should now yield a
result like the one below:
Id
calories
1
2
4
5
6
70
7

activity
minutes
pulse_rate
bench_press
30
100
110
bench_press
10
30
105
dumbbells
30
70
90
dumbbells
25
72
80
deadlift
30
90
squats
60

80

85

All right, so that worked. However, there is actually a


simpler way to do this query, and thats by using the IN
operator. The IN operator will check to see if a particular
value is in a list of values.
Heres how you do it. So in line 17 of our SQL code, well
replace the equals sign after the WHERE type clause with
IN, then put an opening parenthesis, and then were just
going to separate each of the strings with a comma, instead of
the long OR type clause. Heres what line 17 of your code
should look like:
17 SELECT * FROM workout_logs WHERE activity
IN (bench_press, dumbless, deadlift, squats);
There you go. It yielded the same results that we expected.
However, this query is easier to read and its a bit shorter as
well. We could easily also do the inverse queries, if we just
want to see the other miscellaneous activities. We just
replace the IN statement with a NOT IN. Doing this will
give you the result below:
Id
calories
3
200

activity
pulse_rate
treadmill
120

minutes
15

As you can see, it tells us that the miscellaneous routine is


the treadmill routine. If youre following along with SQLite,
or any other SQL compiler for that matter, go ahead and
change the NOT IN clause back to IN. Were going to
show you something a little more interesting that we can do
with IN.
First, were going to need another table of fitness trainer
recommended activities.
1 CREATE TABLE trainer_favorites
2

(id INTEGER PRIMARY KEY,

activity TEXT,

reason TEXT);

5
6 INSERT INTO trainer_favorites(activity, reason) VALUES
(bench_press, Improves upper body strength.);
7 INSERT INTO trainer_favorites(activity, reason) VALUES
(squats, Improves lower body strength.);
So, here weve created a table of fitness trainer
recommended activities, which just has an activity, which
is the activity from our previous code, and a reason, which
states why the doctor recommended it. Now, what if we
wanted to see all of your workout routines that correspond to

doctor-recommended activities? First, you might want to see


what those trainer recommended activities are.
So, lets go ahead and type the code below at line 8 of our
trainer_favorites table:
SELECT activity FROM trainer_favorites;
Typing the code above would give you the result below:
activity
bench_press
squats
Do take note that, at this point, we are working with two
tables in one whole SQL code. These are the workout_logs
and trainer_favorites tables. What we want to know next is,
which of the activities in our workout_logs correspond to
trainer recommended activities. To do that, go ahead and type
the code below:
SELECT * FROM workout_logs WHERE activity IN (
SELECT activity FROM trainer_favorites);
If you look closely at the logic of the code above, were
basically telling the SQL compiler to look at our
workout_logs table and choose the type from the
trainer_favorites table. With that in mind, the result of your
code should look like the one below:
id

activity

minutes

calories
1
2
3

pulse_rate
bench_press
100
110
bench_press
30
105
squats
80
85

30
10
60

If you look closely at our code, youll notice were putting a


query statement within a query, in SQL. We call this inner
query a "subquery". And now, this query will always display
information based on whatever is in the trainer_favorites
table at the time. It will stay up to date.
While the query that we just did is a pretty simple one, it
could also get really complex. It could be as complex as any
of the queries that weve learned so far. Heres an example.
What if we only want to select the activities that the fitness
trainer recommended for improving lower body strength?
We could add, WHERE reason = Improves lower body
strength after our SELECT type FROM trainer_favorites
clause. Heres how your overall code should look like at this
point:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,

3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("tread_mill", 15, 200,
120);
12 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);
13 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 25, 72,
80);
14 INSERT INTO workout_losgs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
16

17 CREATE TABLE trainer_favorites


18
(id INTEGER PRIMARY KEY,
19
activity TEXT,
20
reason TEXT);
21
22 INSERT INTO trainer_favorites(activity, reason)
VALUES (bench_press, Improves upper body
strength.);
23 INSERT INTO trainer_favorites(activity, reason)
VALUES (squats, Improves lower body strength.);
24
25 SELECT * FROM workout_logs WHERE activity
IN ( SELECT activity FROM trainer_favorites WHERE
reason = Improves lower body strength.);
Typing the code in line 25 will yield the below result in the
SQL compiler:
Id
calories
7
80

activity
pulse_rate
squats
85

minutes
60

What if we removed the period or dot at the end of our


Improves lower body strength. string? It is most likely that
youll see nothing for that query. Why? Because its trying to
do an exact match, and it cant since there's a character

missing. Remember, when dealing with strings, everything


inside the opening and closing quotation marks should match.
This means that every character, may it be a period, a
comma, a number, a letter, etc. should be present.
There are, however, times when we want to do an inexact
match, and we can do that with the LIKE operator, which
is a pretty neat operator. Let's go ahead and see how we can
implement the "LIKE" operator in SQL. To do that, were
just going to modify our code in line 25 a little bit. Well
replace the equal sign after the WHERE reason clause
with LIKE and get rid of the words Improves and
strength. Because all we care about is that essential word,
which is "lower body." That is what were looking for.
In addition, we can also put a percentage sign on both sides
of the word "lower body." This percentage sign acts like a
wildcard. A wildcard tells the SQL compiler to match any
row that contains the word "upper body" anywhere. And with
those conditions stated in our query, itll yield the below
result:
Id
calories
7
80

activity
pulse_rate
squats
85

minutes
60

Chapter 15: Restricting Grouped Results


with HAVING
In this chapter, consider the code below:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 115,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 45,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("treadmill", 15, 200,
120);
12 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (treadmill, 15, 165,
120);

13 INSERT INTO workout_logs(activity, minutes,


calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);
14 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 25, 72,
80);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
16 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
Lets go ahead and continue with more queries of our
workout_logs. Lets say that we want to see how many
calories youve burned for each type of activity. We can do
that with an aggregate query. Consider typing the code below
in line 18 of our code above:
SELECT activity, SUM(calories) FROM workout_logs
GROUP BY activity;
Typing the above code in will give you a result showing each
activity and how many calories that youve burned total for
that workout routine.
activity
bench_press
treadmill
squats

SUM(calories)
160
365
80

deadlift
dumbbells

70
142

Now, do you notice how the column shows up as


SUM(calories) in the results? If you want, you can actually
tell SQL to give that column a new name. You can do this by
adding the "AS total calories" clause in between
SUM(calories) and FROM exercise_logs in our code.
Heres how your code should look like at this point:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 115,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 45,
105);
11 INSERT INTO workout_logs(activity, minutes,

calories, pulse_rate) VALUES ("treadmill", 15, 200,


120);
12 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (treadmill, 15, 165,
120);
13 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);
14 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 25, 72,
80);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
16 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
17
18 SELECT activity, SUM(calories) AS total_calories
FROM workout_logs GROUP BY activity;
This definitely makes our results easier to read. Next, we
want to filter the results to only show activities where you
burned more than 150 calories total, across all the times
youve done that workout routine.
Your first instinct might be to use the WHERE calories >
150 clause in between the FROM and GROUP BY clauses
at line 18 of our code. However, the only result that we

would see is treadmill. We won't see bench press, which is


one of the activities that we expect to see. The reason we
only see treadmill is because SQL filters each individual
rows as it comes in. And treadmill is the only workout
routine where you burned more than 150 calories in a single
log.
What we want to know is in which workout activity or
activities have you burned more than 150 calories across all
of the logs for that activity type. We actually have to use
something new for this query. It is called the HAVING
clause. Well use the "HAVING" clause by typing HAVING
total_calories > 150 after the GROUP BY clause.
activity
bench_press
treadmill

total_calories
160
365

Now, well see bench press included in the result. The


reason for this is because now, its actually looking at the
total calories, and youve indeed earned more than 150 over
time in your logs. As you can see, those two queries are
actually different, but they are easy to confuse.
When we use the HAVING clause, were applying the
conditions to the grouped values, not the individual values in
the individual rows. We could use any aggregate function on
a grouped column that makes sense; it could be SUM,

MIN, MAX, AVG, whatever we want to check.


Let's go ahead and do another example. Let's say we want to
see the average calories for all the activities where we
burned more than 70 average total. Take a look below on
how we modified our code at line 18 to accommodate this
query:
18 SELECT activity, AVG(calories) AS avg_calories
FROM workout_logs GROUP BY activity HAVING
avg_calories > 70;
Typing that code in would give the following result:
activity
bench_press
treadmill
squats
dumbbells

avg_calories
80
182.5
80
71

Now, let's do something a little bit different. Let's say we


want to see all the workout routines where we logged at least
two activities for that type of routine. For that, we're going to
use a "COUNT" function. Again, let's go ahead and take our
code at line 18 and modify it a bit to cater to this query:
18 SELECT activity FROM workout_logs GROUP BY
activity HAVING COUNT(*) >= 2;

In the results, we can now see biking, dancing, and tree


climbing:
activity
bench_press
treadmill
dumbbells
The results now show all the activities that you performed at
least twice. Now that you know how to use the "HAVING"
clause in SQL, keep in mind that it's easy to confuse
"HAVING" with "WHERE" when using these aggregate
functions. So think through your results first and make sure
they make sense.

Chapter 16: Calculating results with


CASE
Let's go ahead and try a few more advanced SQL features.
We've been logging pulse rate in our table. However, we
haven't done anything interesting with this piece of data yet.
Consider the code below for this chapter:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,
110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("treadmill", 15, 200,
120);

12 INSERT INTO workout_logs(activity, minutes,


calories, pulse_rate) VALUES (treadmill, 15, 165,
120);
13 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);
14 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 25, 72,
80);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
16 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
17
If you do a little research on the Internet, you'll find that the
maximum pulse rate is 220 minus your current age. One thing
we can do with our table is query our logs to see if your
pulse rate ever goes above max. To do that, type the code
below in line 18 and 20 of your code above:
18 SELECT * FROM workout_logs;
19
20 SELECT COUNT(*) FROM workout_logs WHERE
pulse_rate > 20 - 30;

As you can see at line 20 of our code, we're selecting


COUNT(*) from our exercise logs table, where your heart
rate is above 20 minus your current age. In this case, let's say
your age is 30. If you look at the result, it'll most likely return
a zero value. This means your pulse rate never went above
the maximum.
Also, n0tice that we're using the subtraction operator. In fact,
you can use most math operators in SQL, like addition,
subtraction, multiplication, and division. If you need to, you
can throw in a parenthesis and change the order of evaluation
just like in actual math.
Okay. So we've confirmed that your heart rate doesn't go
above the maximum, which is great. Now, let's see if your
pulse gets into the target pulse rate zone, which is 50 to 90
percent of the maximum. To do this, where going to modify
the "WHERE" clause of our code in line 20 a little bit. Take
a look at the code below:
20 SELECT COUNT(*) FROM workout_logs WHERE
pulse_rate >= ROUND(0.50 * (220-30)) AND
pulse_rate <= ROUND(0.90 * (220-30));
As you can see from our example, we are stating the
parameters by which the SQL will see whether your pulse
rate is within 50% to 90% of the maximum pulse rate. If you
look closely, you'll see that we're combining a few math
operators here.

We also threw in the "ROUND" function to make sure that


our results don't throw out a decimal number. Based on the
result below, there are four logs that fall in that target pulse
rate zone.
COUNT(*)
4
But what about your other logs? What zones are they in? At
this point, all we know is that four of them are in that target
zone, while the rest could be around 90 to 100 percent, or
maybe they're less than 50 percent. What we would really
like to see is a summary of all your logs, and how many
where in each of the heart rate zones.
This sounds like a situation where we would use a "GROUP
BY" clause. We can group the data by a particular column
that says which heart rate zone this row is in. However, we
don't actually have a column like that to group by. In this
case, we can effectively create a column using the "CASE"
statement. Take note that many people find the "CASE"
statement the trickiest statement to use in SQL, so don't be
amazed to see other people struggle a bit.
It is similar to an "if" or a "switch" statement from various
programming languages. So, let's start by outputting what we
know, which is 'activity' and 'pulse_rate' from
'workout_logs.' Here's how your code at line 20 should look

like at this point:


20 SELECT activity, pulse_rate FROM workout_logs;
Once you enter the above code in line 20, you'll get the result
below:
activity
bench_press
bench_press
treadmill
treadmill
dumbbells
dumbbells
deadlift
squats

pulse_rate
110
105
120
120
90
80
90
85

Now, for each of these rows, we want to add a new column


that says which zone the pulse rate is in. To do that, we're
going to start below the "SELECT" clause of your code at
line 20 and we're going to type "CASE," just to begin the
CASE statement in line 21. On the next line, we type in a
"WHEN" clause followed by the first condition, which is
checking if the pulse rate is above maximum. So that's going
to be "heart_rate > 220-30." We immediately follow that up

with a "THEN" clause and give it a string value of "above


max." This is our first condition under the "CASE" clause.
Now, we're going to keep going with our other conditi0ns. In
line 23 , type "WHEN heart_rate > ROUND(0.90 * (22030))" and then give the "THEN" clause a string value of
"above target." This condition basically tells SQL that if the
pulse rate goes above 90% of the maximum, then it should
say that the pulse rate is "above target."
In line 24, we'll just enter a condition that tells SQL to output
a string value of "within target" if the pulse rate is more than
50% of the maximum. Lastly, everything else would be
below target. We could actually do another condition for that.
However, if you want to do it the easy way, you can just say,
ELSE "below target."
Here's what your code should look like at this point:
20 SELECT activity, pulse_rate,
21
CASE
22
WHEN pulse_rate > 220-30 THEN
"above max"
23
WHEN pulse_rate > ROUND(0.90
* (220-30)) THEN "above target"
24
WHEN pulse_rate > ROUND(0.50
* (220-30)) THEN "within target"
25
ELSE "below target"
26

27 FROM workout_logs;
At this point, our conditions are all done. However, we
haven't told SQL what to name this new column. For that, just
insert the below code on line 26 of our code:
26

END as "pulse_zone"

So now, for each of the rows, we can see the new column
with a nice description of what zone they're in.
activity

pulse_rate
pulse_zone
bench_press
110
within target
bench_press
105
within target
treadmill
120
within target
treadmill
120
within target
dumbbells
90
below target
dumbbells
80
below target
deadlift
90
below target
squats
85

below target
Now that we've done that successfully, we can now make a
query that summarizes how many of your logs are in each of
the zones. This query is a whole lot easier compared to when
we set up our "CASE" clause. What we're going to do is just
take our code from lines 20 through 27, copy and paste it in a
new line, and then just add the GROUP BY statement below
at the last line:
GROUP BY pulse_zone;
After that, just modify the SELECT statement to include a
"COUNT(*)" statement and then remove "pulse_rate" and
"activity." Here's how your code should look like overall:
1 CREATE TABLE workout_logs
2
(id INTEGER PRIMARY KEY
AUTOINCREMENT,
3
activity TEXT,
4
minutes INTEGER,
5
calories INTEGER,
6
pulse_rate INTEGER);
7
8
9 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 30, 100,

110);
10 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("bench_press", 10, 30,
105);
11 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES ("treadmill", 15, 200,
120);
12 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (treadmill, 15, 165,
120);
13 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 30, 70,
90);
14 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (dumbbells, 25, 72,
80);
15 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (deadlift, 30, 70, 90);
16 INSERT INTO workout_logs(activity, minutes,
calories, pulse_rate) VALUES (squats, 60, 80, 85);
17
18 SELECT * FROM workout_logs;
19
20 SELECT activity, pulse_rate,
21
CASE
22
WHEN pulse_rate > 220-30 THEN

"above max"
23
WHEN pulse_rate > ROUND(0.90
* (220-30)) THEN "above target"
24
WHEN pulse_rate > ROUND(0.50
* (220-30)) THEN "within target"
25
ELSE "below target"
26
END as "pulse_zone"
27 FROM workout_logs;
28
29
30
31 SELECT COUNT(*),
32
CASE
33
WHEN pulse_rate > 220-30 THEN
"above max"
34
WHEN pulse_rate > ROUND(0.90
* (220-30)) THEN "above target"
35
WHEN pulse_rate > ROUND(0.50
* (220-30)) THEN "within target"
36
ELSE "below target"
37
END as "pulse_zone"
27 FROM workout_logs;
28 GROUP BY pulse_zone;
And now we see that we've got four below target and for
within target in the results.

COUNT(*)
4
4

pulse_zone
below target
within target

That "CASE" statement sure is a tricky one, but it sure is


handy once you get the hang of it. You may not use it all the
time, but it's good to knowjust in case.

Chapter 17: JOINing tables


For this chapter, we've set up two tables in the database: A
"students" table with detailed information about each student
like their name and email, and a "student_grades" table,
which has their student ID, test name, and grade.
1 CREATE TABLE students (id INTEGER PRIMARY
KEY,
2
first_name TEXT,
3
last_name TEXT,
4
email TEXT,
5
phone TEXT,
6
birthdate TEXT);
7
8 INSERT INTO students (first_name, last_name,
email, phone, birthdate)
9
VALUES ("Peter", "Rabbit",
"peter@rabbit.com", "555-6666", "2002-06-24");
10 INSERT INTO students (first_name, last_name,
email, phone, birthdate)
11
VALUES ("Alice", "Wonderland",
"alice@wonderland.com", "555-4444", "2002-07-04");
12
13 CREATE TABLE student_grades (id INTEGER
PRIMARY KEY,

14
student_id INTEGER,
15
test TEXT,
16
grade INTEGER);
17
18 INSERT INTO student_grades (student_id,
grade)
19
VALUES (1, "Nutrition", 95);
20 INSERT INTO student_grades (student_id,
grade)
21
VALUES (2, "Nutrition",
92);
22 INSERT INTO student_grades (student_id,
grade)
23
VALUES (1, "Chemistry", 85);
24 INSERT INTO student_grades (student_id,
grade)
25
VALUES (2, "Chemistry", 95);
26
27 SELECT * FROM student_grades

test,

test,

test,

test,

As you can see at line 27 of our SQL code, we made a query


to display or select everything in our "student_grades" table.
The result that you should get should be like the one below:
id
1
95

student_id
grade
1

test
Nutrition

2
2
92
3
Chemistry
4
Chemistry

Nutrition
1
85
2
95

As you can see from the result above, SQL returned a result
that shows the student IDs, test names, and grades. However,
what we actually want to be able to see are student names,
emails, etc. Now, what you might notice is that the student ID
in "student_grades" actually corresponds to the ID in
"students." So, student_id number one in student_grades is
actually Peter Rabbit in our "students" table, and so forth.
What we want to do next is form a query that will output the
student name and email next to each test and grade. Note that
we have to extract those information from two different
tables. So basically, what we will be doing is join two
tables. There are a few ways to join multiple tables in SQL.
One of the most basic ways to join multiple tables in SQL is
the Cross join method.
Cross Join
The simplest way to join two tables in a database is called a
Cross Join. We can make that happen by putting both table
names after the "FROM" clause at line 27 of our code. Look

below for how your line 27 should look like after applying
the Cross Join:
27 SELECT * FROM student_grades, students;
After applying the cross join in your SQL code, you'll most
likely see lots of rows. Basically, what the cross join did
was for each row in the students table, it created a row for
each of the rows in the student_grades table. That means we
end up with eight rows, because it created the four rows for
each of the rows in the students table.
While the Cross Join is the simplest join in SQL, it is also
the least useful. So, we don't want every row matched with
every other row. We only want them matched if the student ID
matches. To do that, we can apply what's called an Inner
Join.
Inner Join
The Inner Join isn't actually much more work at all, and it is
way more useful. There are actually a few ways of doing an
inner join. We'll start with just building off of the last query
and add a "WHERE" clause. The "WHERE" clause will
check and make sure that the student_ID in our
student_grades table matches the ID in our students table.
Here's how your code should look like at line 27 and 28 after
applying an Inner Join:
27 SELECT * FROM student_grades, students

28
students.id;

WHERE student_grades.student_id =

So now, we actually have what we wanted: test grades next


to names. We did it by doing a cross join first, and then we
limited the rows to where those columns were the same with
an inner join.
id
grade
1
1
2
2
3
1
4
2

student_id
test
id
first_name
last_name
1
Nutrition
95
Peter
Rabbit
2
Nutrition
92
Alice
Wonderland
1
Chemistry
85
Peter
Rabbit
2
Chemistry
95
Alice
Wonderland

The syntax we just used above is what's called an implicit


inner join. It does the job for us, but it's not actually
considered a best practice. The best way to do it is by using
what's called an explicit inner join. To use an explicit inner
join, we must make use of the "JOIN" clause.
First, we're going to start by joining the "students" table with
the "student_grades" table using the "JOIN" clause. And then,

instead of using a "WHERE" clause, we do the "ON" clause.


The "ON" clause specifies what columns are being matched.
To see the implementation of this, take a look at the code
below:
27 SELECT * FROM students
28

JOIN student_grades

29

ON students.id = student_grades.student_id;

Once you put that code in, you'll now have the same results
as before. Now that we have these tables joined, we're just
going to whittle down the columns we're outputting to the
names, the email, the test name, and the grade. We can do this
by just modifying our explicit inner join code a bit to
something like the one below:
27 SELECT first_name, last_name, email, test, grade
FROM students
28
JOIN students_grades
29
ON students.id = student_grades.student_id;
The above code should give you the result below:
first_name
test
Peter
Nutrition
Alice
Nutrition

last_name
grade
Rabbit

email
peter@rabbit.com

95
Wonderland
92

alice@wonderland.com

Peter
Chemistry
Alice
Chemistry

Rabbit

peter@rabbit.com

85
Wonderland

alice@wonderland.com

95

Now the cool thing is, once we've done joins, we can still
use "WHERE" and "GROUP BY," and all those nifty SQL
statements. For example, let's say that we want to filter down
the results to just grades that are greater than 90. To see how
this is done, refer to the code below:
27 SELECT first_name, last_name, email, test, grade
FROM students
28
JOIN students_grades
29
ON students.id = student_grades.student_id;
30
WHERE grade > 90;
Now, what would happen if our tables both contain columns
with the same column name but different meanings? For
example, right now, our student grades table has a grade
column. But what if our students table also has a grade
column for their overall class grade?
If that were the case, then it would mean that at line 27 of our
code where we selected grade, it wouldn't know which table
to pull it from. Why? Because there'd be a grade column in
both of our tables. So, just to be on the safe side, we should
prefix our columns with the table name that they're from.

Instead of just typing "first_name," "last_name," "email,"


"test," and "grade," we should type them as
"students.first_name," "students.last_name," and so forth.
Good job so far. Now we know that we're going to get the
columns from the table we expect them to be in. There you
have it. That is our basic explicit inner join. It is what you'll
use for most of your joins across related tables.

Chapter 18: Joining Related Tables with


Left Outer Joins
We're back with our related tables about students: the
students and the student grades. We also added a new table to
keep track of what projects students are working on. Once
again this table has a "student_id" to relate it to the "students"
table, just like what "students_grades" had. It also has a
"title" for each project.
1 CREATE TABLE students (id INTEGER PRIMARY
KEY,
2
first_name TEXT,
3
last_name TEXT,
4
email TEXT,
5
phone TEXT,
6
birthdate TEXT);
7
8 INSERT INTO students (first_name, last_name,
email, phone, birthdate)
9
VALUES ("Peter", "Rabbit",
"peter@rabbit.com", "555-6666", "2002-06-24");
10 INSERT INTO students (first_name, last_name,
email, phone, birthdate)
11
VALUES ("Alice", "Wonderland",
"alice@wonderland.com", "555-4444", "2002-07-04");

12
13 CREATE TABLE student_grades (id INTEGER
PRIMARY KEY,
14
student_id INTEGER,
15
test TEXT,
16
grade INTEGER);
17
18 INSERT INTO student_grades (student_id, test,
grade)
19
VALUES (1, "Nutrition", 95);
20 INSERT INTO student_grades (student_id, test,
grade)
21
VALUES (2, "Nutrition",
92);
22 INSERT INTO student_grades (student_id, test,
grade)
23
VALUES (1, "Chemistry", 85);
24 INSERT INTO student_grades (student_id, test,
grade)
25
VALUES (2, "Chemistry", 95);
26
27 CREATE TABLE student_projects (id INTEGER
PRIMARY KEY,
28
student_id INTEGER,
29
title TEXT);
30
31 INSERT INTO student_projects (student_id, title)

32

VALUES (1, "Carrotapault");

What we want now is a list of names of students and the


projects they're currently working on. That means we need to
join the "student_projects" table with the "students" table. As
you can probably guess, we can do that using the inner join
that we learned in the previous chapter. Look below for an
SQL code implementation of this:
33
34 SELECT students.first_name, students.last_name,
students_projects.title
35
FROM students
36
JOIN students_projects
37
ON students.id =
student_projects.student_id;
The result that you'll get is the one below:
first_name
Peter

last_name
title
Rabbit
Carrotapault

Based on the result, we can see one project: Peter and his
very promising carrotapault experiment. But where is the
student named Alice? Well, we're missing Alice because an
"INNER JOIN" only creates rows if there are matching
records in the two tables. There's no row for Alice, because

there's no row in "student_projects" that has Alice's student


ID in it.
This makes sense since, as is often the case with joins, we do
only want rows where the records matched. However, in this
case, we want a comprehensive list of every student and their
project, and we want every student to be on that list, even if
they don't have a project yet. This is where an "OUTER
JOIN" is super useful.
The great thing about an "OUTER JOIN" is that it is easy to
use. To apply the "OUTER JOIN", we're just going to modify
the JOIN statement in our code and change it to "LEFT
OUTER JOIN" instead of just "JOIN." Once you do that,
you'll have the result below:
first_name
title
Peter
Carrotapault
Alice
NULL

last_name
Rabbit
Wonderland

Now we see Alice, and there's a big old "NULL" for the
project title. How this works is the "LEFT" statement tells
SQL that it should make sure to retain every row from the left
table, which is the one after the "FROM students." The
"OUTER" statement tells SQL that it should retain the rows

even if it does not match to anything in the right table, which


is "student_projects."
Basically, that's all there is to an outer join. There are many
cases where you might find you want to use an outer join.
Just keep in mind the behavior of the inner join versus the
outer join. There are also other variants of the outer join.
There's a "RIGHT OUTER JOIN," and it basically does the
opposite of the "LEFT OUTER JOIN". It makes sure that it
keeps everything from the right and joins with the left.
Our example table for this chapter doesn't exactly support
right outer joins, but if you want that, you can just switch the
table order and it's the same thing. You don't actually need to
have a "RIGHT OUTER JOIN," but it is always considered
standard to use the "LEFT OUTER JOIN."
There's also something called a "FULL OUTER JOIN,"
which matches rows, if it can on both the left and the right
side. It also fills in "NULLS" when it can on either side.
That's pretty interesting. However, it is not supported in our
example table in this chapter. That is one of the interesting
things about learning SQL--we're showing you lots of things
that work here and also in other SQL environments, but every
environment is a little different. So, you'll constantly be
tweaking the tools in your SQL toolbox for each new
environment.

Chapter 19: Changing Rows with UPDATE


and DELETE
Let's say that we want to create tables to store the data for a
journal app. A basic setup would be a users table and a
journal logs table.
1 CREATE TABLE user (
2
id INTEGER PRIMARY KEY,
3
name TEXT);
4
5 CREATE TABLE journal_logs (
6
id INTEGER PRIMARY KEY,
7
user_id INTEGER,
8
date TEXT,
9
entry TEXT
10
);
Now, when a user wrote up their journal on the website, we
would have some code to execute an "INSERT" statement
just like the code below:
11
12 INSERT INTO journal_logs (user_id, date, entry)
VALUES (1, "2015-04-01", "I had a horrendous fight
with my boyfriend and I tried to get over it by eating 3

bars of milk chocolate.");


13 INSERT INTO journal_logs (user_id, date, content)
VALUES (1, 2015-04-02", "We already made up and
we celebrated by going to the beach.");
14
The code above actually inserts two rows for two journal
logs. The logs were made one day after the other. The first
log was about someone who had a fight with her boyfriend,
and the following day's log was of course about them being
back together and celebrated it by going to the beach. If you
type "SELECT * FROM journal_logs" in your SQL code,
you'll see the two posts there in the result.
Now, let's say the user came back and wants to modify their
logsmaybe because they don't want to admit, even to
themselves, that they had eaten that many bars of milk
chocolate at one time. At this point, since we've already
inserted the data, we would need to have code that would
execute an "UPDATE" statement, like this:
15 SELECT * FROM journal_logs;
16
17 UPDATE journal_logs SET content = "I had a stupid
fight with my boyfriend."
The code at line 17 is telling SQL to update that table and set

the content. However, it did not indicate which row to


update. We need to tell the database which row to update
because we don't want it updating every row with that
content. We just need to update that one row. So how do we
find the log to update? We can use the ID. If we didn't know
the ID, we could also filter by columns that we knew were
unique to the log, like "user_id," and the date, which was
2015-04-01.
15 SELECT * FROM journal_logs;
16
17 UPDATE journal_logs SET content = "I had a stupid
fight with my boyfriend." WHERE user_id=1 AND
date = "2015-04-01";
In cases like these, if you know the ID, then use it. It's safer
because you do not have to rely on having to filter columns
which could actually be the same across multiple rows,
especially if you have a journal app where users are allowed
to write multiple logs per day. It really depends on how the
app and the tables are designed.
15 SELECT * FROM journal_logs;
16
17 UPDATE journal_logs SET content = "I had a
horrible fight with my boyfriend." WHERE id = 1;
Go ahead and do another SELECT clause to see if our table

changed in the results. Now, let's say the user made a change.
However, she later decides that she actually wants to delete
the entire log entry. In that case, we would have to make use
of the "DELETE" clause. Take a look at the code below:
15 SELECT * FROM journal_logs:
16
17 UPDATE journal_logs SET content = "I had a stupid
fight with my boyfriend." WHERE id = 1;
18
19 DELETE FROM journal_logs WHERE id = 1;
If you do another SELECT statement to see our journal_logs,
you'll notice that we only have one row left, which means
that our "DELETE" worked. You should be very careful
whenever you're doing "UPDATE" or "DELETE." You want
to make sure that you're updating the actual rows that you
intend to update. You don't want to update the wrong rows, or
delete the wrong rows and lose data.
In fact, some apps never issue "DELETE"they never really
delete rows. Instead, they'll add a "deleted" column to their
database, and they'll do something like set "deleted" to "true"
whenever the user wants to delete. They would then filter the
data based on "deleted = FALSE" in the SELECT queries.
So now, with "SELECT," "INSERT," "UPDATE," and
"DELETE," you have all the commands you need to handle

what a user would want out of your journal app. And now,
hopefully, you can better imagine which commands are
happening behind the scenes when you use your favorite apps
every day.

Conclusion
I hope this book really helped you master the basics of SQL.
The next step is to practice using the SQL statements youve
learned. Once you are comfortable using them, you will be
able to manage databases easily.

Would you do me a favor?


Finally, if you enjoyed this book, please take the time to
share your thoughts and post a positive review on Amazon.
Itd be greatly appreciated!
Thank you and good luck!

You might also like