Professional Documents
Culture Documents
Management System
Unit 2 .Query processing and Optimization
Dhanashree Huddedar
Index
Overview
Selection operation
Sorting
Join Operation
Other operation
Evaluation of Expressions
Materialized Views
Overview
Query processing: Is the list of activities that
are perform to obtain the required tuples that
satisfy a given query.
Query optimization: The process of choosing a
suitable execution strategy for processing a
query.
Two internal representations of a query:
Query Tree
Query Graph
Syntax
Schema element
Optimizer
Will create query evaluation plan which tell what R.A and what algorithm is
used.
Query block:
The basic unit that can be translated into the algebraic operators and
optimized.
Selection operation
Retrieve every record in the file, and test whether its attribute values satisfy
the selection condition.
Since the records are grouped into disk blocks, each disk block is read into a
main memory buffer, and then a search through the records within the disk
block is conducted in main memory.
S2Binary search.
An example is OP1 if Ssn is the ordering attribute for the EMPLOYEE file.
Binary search is not used in Db because ordered files are not used unless
they also have a corresponding primary key index
If the comparison condition is >, >=, <, or <= on a key field with a primary
indexfor example, Dnumber > 5 in OP2use the index to find the record
satisfying the corresponding equality condition (Dnumber = 5),
then retrieve all subsequent records in the (ordered) file. For the condition
Dnumber < 5, retrieve all the preceding records.
Sorting
We may build an index on the relation, and then use the index to read
the relation in sorted order. May lead to one disk block access for each
tuple.
For relations that fit in memory, techniques like quicksort can be used.
For relations that dont fit in memory, external
sort-merge is a good choice.
External Sort-Merge
Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of i be N
2. Merge the runs (next slide)..
a 14
initial
relation
a 19
d 31
g 24
d 21
r 16
create
runs
d 31
e 16
g 24
m 3
m 3
p
merge
pass1
r 16
r 16
runs
d 21
d 21
a 14
runs
a 14
m 3
c 33
g 24
e 16
b 14
e 16
c 33
a 19
33
d 31
b 14
a 14
b 14
merge
pass2
sorted
output
JOIN Operation
The JOIN operation is one of the most time-consuming operations
in query processing.
Implementing the JOIN Operation:
Join (EQUIJOIN, NATURAL JOIN)
twoway join: a join on two files
e.g. R
A=B
A=B
C=D
J3 Sort-merge join:
If the records of R and S are physically sorted (ordered) by
value of the join attributes A and B, respectively, we can
implement the join in the most efficient way possible.
Both files are scanned in order of the join attributes,
matching the records that have the same values for A and
B.
In this method, the records of each file are scanned only
once each for matching with the other fileunless both A
and B are non-key attributes, in which case the method
needs to be modified slightly.
Other Operations
Other relational operations and extended relational operationssuch as
duplicate elimination, projection, set operations, outer join, and
aggregation
sorting.
On sorting duplicates will come adjacent to each other, and all
but one set of duplicates can be deleted.
Optimization: duplicates can be deleted during run generation
as well as at intermediate merge steps in external sort-merge.
Hashing is similar duplicates will come into the same
bucket.
Projection:
Other Operations
Continued
Aggregation can be implemented in a manner similar to duplicate
elimination.
Sorting or hashing can be used to bring tuples in the same group
together, and then the aggregate functions can be applied on each
group.
Optimization: combine tuples in the same group during run generation
and intermediate merges, by computing partial aggregate values
For count, min, max, sum: keep aggregate values on tuples found so
far in the group.
When combining partial aggregate for count, add up the
aggregates
For avg, keep sum and count, and divide sum by count at the end
Other Operations
Continued
Set operations (, and ): can either use variant of merge-join
s:
Evaluation of Expressions
So far: we have seen algorithms for individual operations
Alternatives for evaluating an entire expression tree
Materialization: generate results of an expression whose
inputs are relations or are already computed, materialize
(store) it on disk. Repeat.
Pipelining: pass on tuples to parent operations even as an
operation is being executed
We study above alternatives in more detail
Materialization
Materialized evaluation: evaluate one operation at a time,
starting at the lowest-level. Use intermediate results
materialized into temporary relations to evaluate next-level
operations.
E.g., in figure below, compute and store
then compute the store its join with instructor, and finally
compute the projection on name.
Materialization (Cont.)
Materialized evaluation is always applicable
Cost of writing results to disk and reading them back can be quite high
Our cost formulas for operations ignore cost of writing results to
disk, so
Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk
Double buffering: use two output buffers for each operation, when
one is full write it to disk while the other is getting filled
Allows overlap of disk writes with computation and reduces
execution time
Pipelining
Pipelining (Cont.)
System schedules operations that have space in output buffer and can
process more input tuples
Pipelining (Cont.)
Implementation of demand-driven pipelining
Each operation is implemented as an iterator implementing the
following operations
open()
E.g. file scan: initialize file scan
state: pointer to beginning of file
E.g.merge join: sort relations;
state: pointers to beginning of sorted relations
next()
E.g. for file scan: Output next tuple, and advance and store
file pointer
E.g. for merge join: continue with merge from earlier state
till
next output tuple is found. Save pointers as iterator state.
close()
Important Question
List and explain the steps followed to process a high level query