You are on page 1of 23

CPS 216: Advanced Database

Systems
Shivnath Babu
Fall 2006

Outline for Today


What this class is about: Data management
What we will cover in this class
Logistics

Application

Data Management
Query

Query

Query

Data

DataBase Management System (DBMS)

Example: At a Company
Query 1: Is there an employee named Nemo?
Query 2: What is Nemos salary?
Query 3: How many departments are there in the company?
Query 4: How many employees have Salary >= 80K?
Query 5: What is the name of Nemos department?
Query 6: How many employees are there in the
Accounts department?
Employee

Department

ID

Name

DeptID

Salary

ID

Name

10

Nemo

12

120K

12

IT

20

Dory

156

79K

34

Accounts

40

Gill

89

76K

89

HR

52

Ray

34

85K

156

Marketing

DataBase Management System (DBMS)


High-level
Query Q

Answer

DBMS

Data

Translates Q into
best execution plan
for current conditions,
runs plan

Example: Store that Sells Cars


Make
Model OwnerID
ID Name
Owners of
12
12 Nemo
Honda Accords Honda Accord
who are <=
Honda Accord
156
156 Dory
23 years old
Join (Cars.OwnerID = Owners.ID)
Filter (Make = Honda and
Model = Accord)

Cars

Age
22
21

Filter (Age <= 23)

Owners

Make

Model

OwnerID

ID

Name

Age

Honda

Accord

12

12

Nemo

22

Toyota

Camry

34

34

Ray

42

Mini

Cooper

89

89

Gill

36

Honda

Accord

156

156

Dory

21

DataBase Management System (DBMS)


High-level
Query Q

Answer

DBMS
Keeps data safe
and correct
despite failures,
concurrent
updates, online
processing, etc.

Data

Translates Q into
best execution plan
for current conditions,
runs plan

DBMS is multi-user
Example
Get account balance from database;
If balance > amount of withdrawal then
balance = balance - amount of withdrawal;
dispense cash;
store new balance into database;

Homer at ATM1 withdraws $100


Marge at ATM2 withdraws $50
Initial balance = $400, final balance = ?
Should be $250 no matter who goes first

Final balance = $300


Homer withdraws $100:
Marge withdraws $50:
read balance; $400

read balance; $400


if balance > amount then
balance = balance - amount;
write balance; $350
if balance > amount then
balance = balance - amount; $300
write balance; $300

Final balance = $350


Homer withdraws $100:
Marge withdraws $50:
read balance; $400
read balance; $400

if balance > amount then


balance = balance - amount; $300
write balance; $300
if balance > amount then
balance = balance - amount;
write balance; $350

Concurrency control in DBMS


Similar to concurrent programming problems
But data is not all in main-memory
Appears similar to file system concurrent
access?
Approach taken by MySQL initially; now
MySQL offers better alternatives
But want to control at much finer granularity
Or else one withdrawal would lock up all
accounts!

Recovery in DBMS
Example: balance transfer
decrement the balance of account X
by $100;
increment the balance of account Y
by $100;
Scenario 1: Power goes out after the first
instruction
Scenario 2: DBMS buffers and updates data in
memory (for efficiency); before they are written
back to disk, power goes out
Log updates; undo/redo during recovery

DataBase Management System (DBMS)


High-level
Query Q

Answer

DBMS
Keeps data safe
and correct
despite failures,
concurrent
updates, online
processing, etc.

Data

Translates Q into
best execution plan
for current conditions,
runs plan

Summary of modern DBMS features


Persistent storage of data
Logical data model; declarative queries and
updates ! physical data independence
Multi-user concurrent access
Safety from system failures
Performance, performance, performance
Massive amounts of data (terabytes ~ petabytes)
High throughput (thousands ~ millions
transactions per minute)
High availability ( 99.999% uptime)

Modern DBMS Architecture


Applications
SQL
DBMS

Parser
Logical query plan
Query Optimizer
Physical query plan
Query Executor
Access method API calls
Storage Manager

File system API calls


Storage system API calls
OS
Disk(s)

Course Outline
50% of the class is about core DBMS concepts
Query execution, query optimization, transactions,
recovery, etc.
Textbook material

50% of the class is on what is happening today


in data management

Data streams
Web search Google, Yahoo!
XML and data integration
Data mining
Sensor data management

Using a Traditional DBMS


User/Application
Query
Query

Loader

Result
Result

Table R
Table S

New Approach for Data Streams


User/Application
Register
Continuous Query
(Standing Query)

Input streams

Result

Stream Query
Processor

Example Continuous (Standing) Queries


Web
Amazons best sellers over last hour
Network Intrusion Detection
Track HTTP packets with destination address
matching a prefix in given table and content
matching *\.ida
Finance
Monitor NASDAQ stocks between $20 and
$200 that have moved down more than 2% in
the last 20 minutes

Course Outline
50% of the class is about core DBMS concepts
Query execution, query optimization, transactions,
recovery, etc.
Textbook material

50% of the class is on what is happening today


in data management

Data streams
Web search Google, Yahoo!
XML and data integration
Data mining
Sensor data management

New Challenges in DBMSs


High-level
Query Q

Answer

DBMS
TeraBytes PetaBytes

Data

<CD>
<TITLE>Empire B.</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia
</COMPANY>
<PRICE>10.90</PRICE>
</CD>

Course Logistics
Recommended reference: Database Systems: The
Complete Book, by H. Garcia-Molina, J. D. Ullman,
and J. Widom
Web site: http://www.cs.duke.edu/education/courses/fall06/cps216

Grading:
Homework Assignments 15%
Project 25%
Midterm 25%
Final 35%

Summary: Data Management is


Important
Core aspect of most sciences and
engineering today
Core need in industry
Cool mix of theory and systems
Chances are you will find something
interesting even if you primary interest is
elsewhere

You might also like