Professional Documents
Culture Documents
A relationship, I think, is like a shark, you know? It has to constantly move forward or it dies. And I think what we got on our hands is a dead shark.
Woody Allen (from Annie Hall, 1979)
Administriva
Homework 4 Due One Week From Today Midterm is graded
Exam Summary
20
Exam Details
Section: Low High / Total (Average) 1: External Sorting 10 24 / 24 (18) 2: Query Execution 17 43 / 44 (36) 3: Query Optim. 6.5 29 / 32 (20)
Section 1 (cont)
Database systems sometimes use blocked I/O because reading a block of continuous pages is more efficient that doing a separate I/O for each page. Assume that in our computer system, it is much more efficient to read and write blocks of 32 pages at a time, so all reads and writes from files must be in blocks of 32 pages (and if a file has less than 32 pages, it is padded with blank pages). Consider doing external sort with blocked I/O, which gets faster I/O at the expense of more sorting passes. If the database must read and write 32 pages at a time, how many sorting passes are required? Show either the formula you use, or the temp file sizes after each pass. (4pts) Pass 0: 105 runs of 96 pages Pass 1: 2-way merge 96 + 96 -> result is 53 runs of 192 pages Pass 2: 2-way merge of 192 + 192 -> result is 27 runs of 384 pages Pass 3: -> result is 14 runs of 768 pages, Pass 4: -> result is 7 runs 1536 pages, Pass 5: -> result is 4 runs 3072 pages, Pass 6: -> result is 2 runs of 6144, Pass 7: -> result is 10,000 8 passes total
Section 1 (cont)
Blocked I/O is used because reading a 32-page block is faster than 32 separate 1-page I/Os. For each sorting pass, you still must read and write every page in the file, but instead of doing 10,000 1-page I/Os, instead you do (10,000/32) block I/Os. Assume that in our system, we can read a 32-page block in the time it would normally take to do 8 single-page I/Os. Is the blocked I/O sort faster or slower than a regular sort from question 2? By approximately what ratio? (4pts) Assume that instead of a heap file, the records from EMP are stored a clustered B-Tree index, whose key is Ename, using alternative 1 (i.e., the full data records are stored in the leaves of the tree). The B-tree has depth 3. Assuming the B-Tree already exists, what is the approximate I/O cost to use the B-tree to get the records in sorted order? (4pts)
What is the I/O cost of this operation if there is no index? (4pts) Lecture 15 slide 9: no index -> sequential scan, N I/Os, meaning 10,000 I/Os
What is the reduction factor? (4pts) 5000/100,000 = 0.05 Assume that instead of a heap file, the records from EMP are stored in a clustered B-Tree index, whose key is Ename, using alternative 1 (i.e., the full data records are stored in the leaves of the tree). The B-tree has depth 3. Assuming the B-Tree already exists, what is the I/O cost of this selection operation? (4pts) Due to a copy paste error on my part, the B-Tree Index is *not* useful for the query. I gave credit either way, whether people tried to use the index as an index, or for sequential scan. In considering the cost of using the index, either as an index or for a scan, full credit required knowing that B-Trees have an overhead of approximately 50%, i.e., there are 50% more leaves than the number of pages in a heap file.
EMP.Ename = Joe
IN_DEPT.EID = 003
We dont know about distribution of names, so 1/10. Since there are 100,000 employees, this is 1/100,000
Given the following query, where X is the join operator: (EMP.Ename) (Dept.Budget > 500000) (Emp X In_Dept X Dept) Mark whether each of the following queries are equivalent (True/False). __T___(EMP.Ename) (Dept.Budget > 500000) (Emp X Dept X In_Dept) __T___(EMP.Ename) (Dept.Budget > 500000) (Dept X In_Dept X Emp) __T___(EMP.Ename) (Emp X In_Dept X (Dept.Budget > 500000)(Dept)) __F___(Dept.Budget > 500000) ((EMP.Ename) (Emp) X In_Dept X Dept)
Students
sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math age 18 18 19 gpa 3.4 3.2 3.8
Database Design
The process of modelling things in the real world into elements of a data model. I.E., describing things in the real world using a data model. E.G., describing students and enrollments using various tables with key/foreign key relationships The Relational model is not the only model in use
With complicated schemas, it may be hard for a person to understand the structure from the data definition.
Enrolled
cid Carnatic101 Reggae203 Topology112 History105 grade C B A B sid 53666 53666 53650 53666
Students
sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math age 18 18 19 gpa 3.4 3.2 3.8
ssn
Enrolled_in
Conceptual Design
Define enterprise entities and relationships What information about entities and relationships should be in database? What are the integrity constraints or business rules that hold? A database `schema in the ER Model is represented pictorially (ER diagrams). Can map an ER diagram into a relational schema.
ER Model Basics
ssn
name
lot
Employees
Entity: Real-world thing, distinguishable from other objects. Entity described by set of attributes.
Entity Set: A collection of similar entities. E.g., all employees. All entities in an entity set have the same set of attributes. (Until we consider hierarchies, anyway!) Each entity set has a key (underlined). Each attribute has a domain.
ssn
Works_In
Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department. relationships can have their own attributes. Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E1 ... En ; each relationship in R involves entities e1 E1, ..., en En
Employees
budget
Departments Works_In
supervisor
subordinate
Reports_To
Same entity set can participate in different relationship sets, or in different roles in the same set.
since did
dname budget
Key Constraints
Employees
An employee can work in many departments; a dept can have many employees. In contrast, each dept has at most one manager, according to the key constraint on Manages.
Manages
Departments
Works_In since
Many-toMany
1-to Many
1-to-1
Participation Constraints
Does every employee work in a department? If so, this is a participation constraint the participation of Employees in Works_In is said to be total (vs. partial) What if every department has an employee working in it? Basically means at least one
name ssn lot
since
dname
did
Manages
budget
Departments
Employees
Works_In
Weak Entities
A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. Owner entity set and weak entity set must participate in a one-to-many relationship set (one owner, many weak entities). Weak entity set must have total participation in this identifying relationship set.
name
ssn lot cost pname age
Employees
Policy
Dependents
If each policy is owned by just 1 employee: Key constraint on Policies would mean policy can only cover 1 dependent!
Covers
Dependents
Bad design
policyid name ssn Employees Purchaser lot
Dependents
Beneficiary
Better design
policyid
Policies cost
Opposite example: a ternary relation Contracts relates entity sets Parts, Departments and Suppliers, and has descriptive attribute qty. No combination of binary relationships is an adequate substitute.
VS.
Suppliers
Parts needs Departments
can-supply
Suppliers
deals-with
S can-supply P, D needs P, and D deals-with S does not imply that D has agreed to buy P from S. How do we record qty?
Summary so far
Entities and Entity Set (boxes) Relationships and Relationship sets (diamonds) binary n-ary Key constraints (1-1,1-M, M-M, arrows on 1 side) Participation constraints (bold for Total) Weak entities - require strong entity for key Next, a couple more advanced concepts