Professional Documents
Culture Documents
67
Introduction to SQL
V V V
Declarative V Say what you want without specifying how to do it V One of the main reasons for commercial success of DBMSs Many standards and implementations V ANSI SQL V SQL-92/SQL-2 (Null operations, Outerjoins etc.) V SQL3 (Recursion, Triggers, Objects) V Vendor specific implementations Bag Semantics instead of Set Semantics V Used in commercial RDBMSs
68
Example:
CREATE TABLE Students (sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL);
V CHAR(n) V VARCHAR(n) V BIT(n) V BIT VARYING(n) V INT/INTEGER V FLOAT V REAL, DOUBLE PRECISION V DECIMAL(p,d) V DATE, TIME etc.
69
More Examples
V Why?
CREATE TABLE Students (sid CHAR(9), .... age INTEGER DEFAULT 21, gpa REAL);
70
Examples Contd.
V V V V
DATE and TIME V Implementations vary widely V Typically treated as strings of a special form V Allows comparisons of an ordinal nature (<, > etc.) DATE Example V 1999-03-03 (No Y2K problems) TIME Examples V 15:30:29 V 15:30:29.3875 Deleting a Relation/Table in SQL
71
V V
V What happens to the new entry for the old records? V Default is NULL or say
ALTER TABLE Students ADD phone CHAR(7) DEFAULT unknown;
V V
Always begin with ALTER TABLE <TABLE_Name> Can use DEFAULT even with regular definition (as in Slide 69)
72
UPDATE command UPDATE Students S SET S.age=S.age+1, S.gpa=S.gpa-1 WHERE S.sid = 53688
73
Domains
CREATE DOMAIN Email AS CHAR(8) DEFAULT unknown; .... login Email // instead of login CHAR(8) DEFAULT unknown
74
Keys
V Use PRIMARY KEY or UNIQUE V Declare alongside attribute V For multiattribute keys, declare as a separate line
CREATE TABLE takes ( sid CHAR(9), courseid CHAR(6), PRIMARY KEY (sid,courseid) );
To Specify Keys
V Typically only one PRIMARY KEY but any number of UNIQUE keys V Implementor allowed to attach special significance
75
Creating Indices/Indexes
V V
Why?
How to decide attributes to place indices on? V One is (typically) created by default on PRIMARY KEY V Creation of indices on UNIQUE attributes is implementation-dependent V In general, physical database design/tuning is very difficult! V Use Tools: Microsoft SQLServer has an index selection Wizard Why not place indices on all attributes? V Too cumbersome for insertions/deletions/updates Like all things in computer science, there is a tradeoff! :-)
V V
76
Other Properties
NOT NULL instead of DEFAULT CREATE TABLE Students (sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL);
V V
"!#$%'&)(0
12436587@9BADCFE8GIH@P6QSRTPURV7@H@HW9X3YG8E `a3658bcPdPeE87gfhPdbcP62Y3piqG8P6bX`rH@st28uvGIsvuvPd9cwvbxP6y8bxP69cP62Y3Ys3Y7gCv289G89xP6E7@26Cv2t G8283Y7@Ct2RV73Y53658PbcP6Hgs3Y7gCv28stHvA4CFE8P6Hg58P8bx937@9WbxP6H@s3Y7gCv28stHFsvH@utP68bxsvQqsv2svHguvP68bxsv7gst28EeyIbcCF6P6EIG8bcstH RTs`rCtb6bcPds3Y7g28u28PxRdbcPdH@s367@Ct289bcCtAeuv7gfhP62Cv2IP69ch58P9cP6dCv28E7@9iBs3YstH@CtuvQSRV587@57@9H@Ctuv7g6svHjsv28E E8P6dH@svbxs367fhPh7@228s36G8bcPDkX93YGIE8P62Y3Y9svA47@Hg7@stblRV7365p3Y58PmnopWoVqry8bcCtuvbxsvA4AD7g28uDHgsv28utG8svutPhRV7gH@HI28E 3Y5I7@9stH@H)3YCFC28s3YG8bxsvH@sxt12stx3YQ)3Y58PdbcPustbcPu6HgCv9xPu6CvbxbcP69xyvCv28E8Pd286P69whP3RuP6P62EIs3Yst8sv9xP9`93YPdAD9wsv28E mnopWoVqwmnopWoqx6sv2yvPz3Y5ICvG8ut53aCtsv9esE8s36sv8st9cP9`936P6A{RV58P6bxPDstH@H|3Y5IPE8s36s}t369e7g236C A4sv7@2~A4P6A4Cvb`F12dCv2Y3Ybcst936QusEI7@9367@28utG87@9x587@2IuP6s36G8bcPCvunhiu897g93658s343Y58P`CvyvP6bxs3YPCt2 9cPd6Cv28EIsvb`936CvbxsvuvPdo3658P6b3Y58st23Y58s3ykst28E~9cCtADPEI7@fvP6bcPd286P697@2yiqG8P6b`y8bxCF6P69c9x7@28utscQ3658P6bcP4svbxP Px6P6HgH@P62Y3Usv2IsvH@Ctuv9w36CvC3Y56G8Hg3YG8bxP69cC3Y5mnoVpWoqst28EntiuI9svbcPE8Pd6H@stbcs367fhP658s3RuP 28CR3YC4vPebcPdH@s367@Ct289svbxPebcPdP6bcbxP6Ez3YC4sv9B@y8bxP6E87@ds3YPd9c7g2mnopWoq36G8y8H@P7g96svHgH@P6Es4uvbcCtG828E sv3p7g2}mnopWoVq3Yst8H@Pz7@9h6svHgH@P6Esv2gPxq3YP6289x7@Ct28svHlE8P682I73Y7gCv28W7g2}mnopWoqst28E9cCCt28kXiuC 28C3VuvPx3uhCtuvutP6E}E8CRV2Y`43Y58P69xPu9cyvP6d7@869xqRuPt7g286H@GIE8P3658P6A58P6bxP G893u9cCB3658s3w`hCvG6sv2ADs P3658P 6Ct2828P6x367@Ct28Q7@|`hCtGstbcPsvHgbcP6stEt`stAD7gH@7gsvbwRV7g3Y5mnopWoVqH@9xP6Q28C3Y587g28up3YCpRTCtbcb`F@s58PB36587@bxE iqG8P6b`abxP6y8bxP69cP62Y3Ys3Y7gCv247@9cQjCvW6CtG8bc9xP6QIVp3658s3wRTst97@2Y3YbcCFEIG86P6E4P6svbxH@7gP6bcw12T3Y5IPubcP6A4sv7g28E8P6bCtI3Y587g9 E8CF6GIADP62Y3YQRuPpRV7@H@H7g236bcCFE8G8dPaIsv9c7gCvyvP6bcs3Y7gCv2894sv28E~A4sv287gy8G8Hgs3Y7gCv289B3658s3RuP6sv2~yvP6bcCtbcACt2 bcPdH@s367@Ct289c4vCvbP6sv59cGI58st9c7gCvyvP6bcs3Y7gCv28QRTPBRV7@HgH9x58CR58CR737@9bcP6yIbcP69xP6236P6E}7g2P6sv5Cv|3658P 3Y5IbcP6PE87gfhPdbcP62Y3T28C3Ys3Y7@Ct289c ulvvq8dvl58PzG8287gCv2Cvj3RuCbcP6Hgs3Y7gCv289Usv28Ea7@9'3Y58P9cP3VCvlP6H@P6A4P62Y3Y93Y58s3 stbcPu7g2"Ctb7g2CvbvC3Y58Pusv9x9cG8A4Pw3658s3w3Y58Ph9c58P6A4sv9wCv"st28EsvbxPusvHg7 PVkCtW6CvGIbc9cPds st28Ez3Y5Is33Y58Pd7@b6CtH@G8A4289svbxPtstH@9xCeCvbxE8P6bxP6EDstH@7 PTkCt6CtG8bc9xP6Qvsvutsv7@2Isc'P28CRut7fhP3Y5IP3658bcP6P bxP6y8bcPd9cP62Y3Ys3Y7@Ct289CvS3Y5IPVGI287@Ct2}bcPdH@s367@Ct28 "k9c7gADyIH@P6Qjbc7@ut536s
'ccc'~c'''c| 'ccc'~c'''c| BC3Y7g6P3658s33658Pfsvbx7@svIH@P69 'cc stbcPA4P6bcPdH`@yIH@svdP658CvHgE8P6bx9cG89cPdECtby8s3x3YP6bx2 ' A4s365I7@28utIRTPu6CtG8H@E58sfhPRVbc7g3c36P62p3Y58PsvvCfhP3RTCsv9x 'ccY~c''c| 'ccc'~c'''c| cqqq lvxcWW ' ccv dxFj6
7g2ahu9hG89cGIsvH@QSRTPst9c9cGIADPU3Y58s3t3658Pe9x58PdADst9Cv'sv28E B
'ccc'~c'''c|h|Yc'''c|' cqqq q q cqqq q 6q6qF6vvq8dvl58P7@2Y3YP6bx9cP63Y7@Ct2p Cv)3RTCbcP6Hgs367@Cv2I9' st28ED7@9j3658P9cP3 CtP6HgP6A4P62369l3658s3VstbcPe7g2 sv28E}Buvsv7g28QIRTPBsv9c9xG8A4Ph3Y5Is33Y5IPV9x5IP6ADst9Cv'st28E stbcPstH@7 Pst28EB3Y5Is33658P67@bWdCvH@GIAD289lsvbxPsvHg9cChCvbcEIP6bcP6EstH@7 P6uC3Y7@dPW3658s3U k 'sx
CtbDbxP6936bc7@3Y7@2Iu}7@2ICvbcA4s367@Ct28u9c9xG8A4P3658s34RTPpRusv2Y3aCv28Hg`3Y58P28stADPst28EstE8E8bcPd9c94bcCvA bxP6H@s3Y7@Ct2aB
#%$'&)(1032 &54648790A@B@
5qG89 '
7g9VIsv9cPdECv29xCvA4PDdCv28E87g3Y7gCv2y9cyvP667g8P6Ey`3658PDGI9cP6bxhCtbPx|stADyIH@P6Q9cGIy8yvCv9cPVRTPTRTsv2Y3stH@H
U$'&)(10)VXW`Yba`ced&50Afhg g
qqq q
'ccc'~c'''c|hpirqtsqq%utv
wxqqFEGqFiyqcPsqYqPutv
vRlqF6vlr8P6HgP6x3Y7gCv289vP6dCvA4P}6CtADy8Hg7@6s3YP6ERV58Pd2y3Y5IP}6Ct28E87367@Ct289DutPx3}H@Cv2IuvP6bxQ y8stb367@6G8HgsvbxH`RV7365}ius3YsvHgCvuk3658PeC3Y58Pdbw3RTCCtbcA49tstbcPy8bxPx3c3`936bcsv7guv5Y3YCtbRTstbcE8sx Cv289x7@E8Pdb CtbP|stADy8HgP6QYRV58P62BRTPRusv2Y3UsvHgHt3Y5IP3YG8y8HgP69bxCvAdnRV5IP6bcP3658P28svA4P7@9@7@58stP6H@FonyRV5IP62 3658PuvP628EIP6b7@9g@qPRVbc7g3YPw36587@97g2aiBs3YstH@Ctusv9x ''cccqcc''hDpirqcPsqqqPutvh ''cccqcc''hDpirqcv
uC3Y7g6Ph3658s3RTPg9cy8Hg736j3Y58PdCv28E87g3Y7gCv2svdbcCv9x9h3RTCbcG8HgP69cQ GI93Dsv9RTP4E8C7@23658PG8287gCv2y6sv9xP kCvA4Pz3YC 3Y587g2 Cvh736Ql3Y5IPDon7@9e7g28E8P6PdE3658PDGI287@Ct2Ct3RTC}dCv28E87g3Y7gCv289xscy87@A47@HgsvbcHg`)Qj3658P svbxPe28P67g3Y58PdbtA4svHgPe28Ctb 6CtADA4s7g2}P6st5Cv|3Y58PstvC fhPEIs3YstH@CtuabxG8H@Pd9tA4CFE8P6Hg9w3Y58PBuid6Ct28E87g3Y7@Ct28pWPx36@9h6Cv289x7@E8Pdb sA4CvbxPe6CtADyIH@7@ds3YPdEadCv28E87g3Y7gCv288P6HgP6x3VstH@HS3Y58PU3YG8y8HgP69hbcCvAxn3Y58s3 58sfhPw3658P28svA4P@vC|g ''cccqcc''hD 587gH@P9cPdH@P6x367@28uB3658Ph36G8y8HgP69bcCtA st587gPxfhP6E4`kRV5Y`|sc ''cccqcc''hD ''cccqcc''hD qcv qcvqadpqtYv n3658s3 stbcPz28C3 vC3Y5A4svH@Pst28E58sfhP3Y58Pz28svA4P@vC|W7g9
qcPYrv| 3Y58P48bc9X3DP6H@P6A4P62Y3
bxCvAsv2IE43Y5IPu9cP6dCv28EP6HgP6ADPd23tbcCtA'126sv9xPCvl6Cv28GI9c7@Ct289hsvA4Cv28u4s3x3Ybc7g8Gt36Pu28svA4P69xQ 3Y58PbxP6H@s3Y7@Ct2}28stADPdV5IP6svbX3YP69x7@sv2y8bcCFEIG8x3
3YG8y8HgP69sv2IE
rRV7gH@H5Is fhP
e f3YGIy8H@P69x
Tg
Ct29xCvA4P6Cv28EI73Y7gCv285qG89xQh3Y5IP3Y58Px36s Cv7g2Cvh3RuCbcP6Hgs367@Cv2I958sv9V3Y58P9xsvA4P2G8AvP6bCv Ct3658P}bxCRV9RV7gH@HhhPbcP6A4CfhP6E~vP66svGI9cP3Y5IPx`E87g9c9cs3Y7g9c`9xCvA4P}6Ct28E87g3Y7@Ct28scCt289c7gE8P6bCtb PxsvA4y8HgP6Q 3Y58s3RTPRTst23w3YC828Egy8sv7gbc9cWCt9X3YG8EIP623699xG85p3658s33658P8bc9X3VyhPdbc9cCt27@2p3Y58PBy8sv7gb 7g9svHRus`|9g7@58svP6Hg@|PVutPx3Y onApTqsr $t&5(u05Vlv Yba`ced&50Afhg 'cPtt|thcpcpc~ct|cPcPPFDccp'cp'phDPirqtsYqtu%v| qqqhBEG'hqpHqtIpIhtEqHDhtsHpdqqdF hBEG'hqpHqtIpIhtEqHDhtsHpdqqdF q'
wxqqtBEqpGqwixqctsYqtuPv
3658P3Y58Px36s Cv7g28uH@9xCvQWbxP6svHg7|{6Pw3658s3 uC3Y7g6P|3Y58s3RuP7g236bcCFE8GI6Pl3658P@vCR3Y7@PdF9`AhCtHynBpz9cGlzTcP6Ez`u3658PdCv28E87g3Y7gCv2eCtb7g28E87g6s367@28u
U ~k
's
G89X3s6HgPxfhP6bxP6bwRTs`36C6CvAI7@28PT3RTC}bxP6H@s3Y7gCv2897g236C}Cv2IP6~58P 3Y587g9
8st9c7gD7gE8P6s7@93Y5Is37g|3Y58PT3RTCbcP6Hgs367@Cv2I9V5Is fhP49cCtADP46CtH@G8A42k9xs7@2y6CvA4ADCt28Q|3Y58Pd2RuPD6st2 7g236C43Y5IPD9xsvA4P6CvHgG8A427g23658PDI28svHwCvGt36y8Gt3YyCvbxP6CfhP6bcQ|RuPD6st2E8C Ct28H`}7@)3Y58Pw3RuCh3YGIy8H@P69bxCvA3Y58Pw3RuCVbxP6H@s3Y7gCv289wsvuvbxP6Pu7g2z3Y58Ct9cPudCvA4ADCt26CvHgG8A4289cw5G89xQ7g3 7g9u9c7gAD7gH@svb36Cz3Y5IP6svbX3YP69x7@sv2yIbcCFE8G83YQ8G 3uRTPz Cv7g28lCv28Hg`}3Y5ICv9cPyIsv7@bx9w3Y5Is3 87gb3658E8s36PeCt936G8E8P62Y3Y9h7@2s9x7@28utH@PbcP6Hgs367@Cv2IuuC367@6P3Y58s3 C3Y58P6bCtG8bs3x3Ybx7@8Gt36P69svbxPy8bcP69xP62Y3U7@2 3RTCbxP6H@s3Y7gCv289x onAp 'ccc'3'h~q~'ccc'|D''c)|' qqqhBEG'h4qpHqtIpIDtEqHt'htsHpdqqdF q' 6CtADA4Cv24s3x3Ybx7@8Gt36P69cCv289x7@E8Pdbl3Y5Is3RTP'RTst233YC82IEz3Y58Ph28stADP6QvstE8E8bcPd9c9cQvutP628E8P6bxQvuvy8szsv28E A4s3657@23Y5IP67@b
uvy8s47@9hsfsv7@Hgsv8HgPbcCvA8G 3t3658P
BICvQ RTP28PdP6EsRTs`e36CV7g236P6H@Hg7@utP6236H`e6CvAI7@28P3Y58P69xP
sv2IE
st28E}A4s P
587g9t7g9tA4Cv9X3pG89xP6G8HSRV7365abxP6H@s3Y7gCv28stHstH@uvPd8bcstQ
kBs
81
V V V V
Declaring constraints
82
V V V V V V
Triggers
83
V V
Already Seen
CREATE TABLE Students (sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL, CHECK (gpa >= 0.0) );
84
V V
Foreign Keys V An attribute a of R1 is a foreign key if it references the primary key (say b) of another relation R2 V In addition, there is a ref. integrity constraint from R1 to R2. Example V login is a FOREIGN KEY for Students
CREATE TABLE Students (sid CHAR(9) PRIMARY KEY, name VARCHAR(20), login CHAR(8) REFERENCES Accounts(acct), age INTEGER, gpa REAL ); CREATE TABLE Accounts ( acct CHAR(8) PRIMARY KEY );
85
Alternatively
CREATE TABLE Students (sid CHAR(9) PRIMARY KEY, name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL, FOREIGN KEY login REFERENCES Accounts(acct) ); CREATE TABLE Accounts ( acct CHAR(8) PRIMARY KEY );
V in both cases
86
SQL Subqueries
V V V
Given
Students(sid,name,login,age,gpa) HasCar(sid,carname)
Find
The Subway
SELECT carname FROM HasCar WHERE sid= (SELECT sid FROM Students WHERE login=mark);
87
Aggregation
V V V
Given
Students(sid,name,login,age,gpa)
Find
V SUM (summation of all the values in a column) V MIN (least value) V MAX (highest value) V COUNT (the number of values), e.g.
SELECT COUNT(*) FROM Students;
Other Operations
88
Ordering
V V V
Given
Students(sid,name,login,age,gpa)
List
Default is ASC
89
Grouping
V V V
Given
Students(sid,name,login,age,gpa)
V the names of students with gpa=4.0 and V group people with like ages together
Solution
Find
90
More on Grouping
V V V
Given
Students(sid,name,login,age,gpa)
V the names of students with gpa=4.0 and V group people with like ages together and V show only those groups that have more than 2 students in it
Solution
Find
SELECT name FROM Students WHERE gpa=4.0 GROUP BY name HAVING COUNT(*) > 2;
91
General Form
SELECT <attribute(s)> FROM <relation(s)> WHERE <condition(s)> GROUP BY <attribute(s)> HAVING <grouping condition(s)>
Order of Execution
92
Views
V V
Can be viewed as temporary relations V do not exist physically BUT V can be queried and modified (sometimes) just like normal relations Example:
CREATE VIEW GoodStudents(id,name) AS SELECT sid,name FROM Students WHERE gpa=4.0; SELECT * FROM GoodStudents WHERE name=Mark;
93
V V V V V V
SQL uses Bag Semantics V meaning: does not normally eliminate duplicates V e.g. the SELECT clause BUT (a big BUT) this doesnt apply to V UNION, INTERSECT and DIFFERENCE Either way, it provides facilities to do whatever we want If you want duplicates eliminated in SELECT clause
V use (SELECT ...) UNION ALL (SELECT ...) V Likewise for INTERSECT and DIFFERENCE
94
V V
when R(A) has {2,3}, S(A) has {3,4} and T(A) is {} Confusion Reigns!
95
Safety in Queries
V V
Students(id)
Easy to distinguish unsafe queries via common-sense V Final result is not closed V Is there an automatic way to determine safety?
96
Answer: Yes!
V V V
V Any variable that appears anywhere must also V In this case, id causes the query to be unsafe
Example of a Safe Query appear in a non-negated body part
Golden Rule
V This produces all those people who are NOT students V safe because the People relation provides a reference point V id which appears in a negated body part also appears non-negated
97
More Dangers
V V
Students(id,age)
V Find all those numbers that are greater than the age of some student
Answer(x) <- Student(id,age), x>age.
98
V V V V V V
Given V a relation Composite(x) V which lists all the composite numbers Write a query to find V the prime numbers Wrong Way
V Prime(x) <- NOT Composite(x). V Prime(x) <- Number(x), NOT Composite(x). V Relational Algebra: via the subtraction operator V SQL: via the EXCEPT construct
Safety in Other Notations Right Way
Notice how SQL and Relational Algebra do not allow unsafe queries V because there is no way to write such queries with the given constructs V how clever, eh? :-) V It is always amazing how languages force you to think in a certain manner V a problem long studied by philosophers
99
Recursion in Queries
V V
Example
Easy to find an ancestor at a predefined level V parent: Use Person V grandparent: Join Person with Person V great-grandparent: Join Person with Person with Person V and so on. To find an ancestor at no predefined level V Need to join Person with Person an indefinite number of times SQL3 provides support for recursive definitions
V V
100
Solution in Datalog
V V V
V why?
101
Recursion in SQL3
V V
WITH RECURSIVE Ancestor(name,ans) AS (SELECT * FROM Person) UNION (SELECT Person.name,Ancestor.ans FROM Person, Ancestor WHERE Person.parent=Ancestor.name) SELECT * FROM Ancestor;
Use with caution: Some kinds of recursive queries will not be allowed!
V because the rule involves 2 applications of the recursively defined predicate V Linear recursion allows only one (as in the SQL code above)
102
Final Example
V V V V V
Be careful when combining negation, aggregation and recursion V perfect recipe for disaster! Mutual Recursion
V Odd(x) <- Number(x), NOT Even(x). V Even(x) <- Number(x), NOT Odd(x). V Notice that the query appears safe (per Slide 96) V cycles indefinitely!; no proper base cases
What are the problems?
Illegal in SQL3 V not because of mutual recursion V but due to the fact that there is no unique interpretation to the query V Eg: 6 could be either in Odd or in Even; both are acceptable!
103
V V V V
V CORAL (Univ. Wisc.) V LDL++ (MCC) V XSB Systems (SUNY, Stony Brook)
Can be viewed as V extending PROLOG-type systems with secondary storage V extending RDBMSs with deductive functionality Mappings: Commonalities between PROLOG and DBMSs V Predicate: Relation V Argument: Attribute V Ground Fact: Tuple V Extensional Definition: Table (defined by data) V Intensional Definition: Table (defined by a view)
Example Systems
104
Characteristics of PROLOG
V V
V ancestor(X,X). parent(amy,bob). V ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y). V Find the ancestors of bob: ancestor(X,bob)?
Query
105
PROLOG Pitfalls
V V V V
Previous Example
V ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y). V PROLOG goes into an infinite loop (why?) V ancestor(X,Y) <- ancestor(X,Z), ancestor(Z,Y). V Not Linear Recursion
What if we make it
106
V V V V V
V sg(X,Y) <- flat(X,Y). V sg(X,Y) <- up(X,U),sg(U,V),down(V,Y). V Rewrite query such that advantages of bottom-up evaluation
goal-oriented behavior are combined Example: For the query Magic: A Rewriting Technique
V sg(john,Z)?
Magic produces
V sg(X,Y) <- magic_sg(X),flat(X,Y). V sg(X,Y) <- magic_sg(X),up(X,U),sg(U,V),down(V,Y). V magic_sg(john). V Iterative Fixpoint Evaluation (when the answer stops changing)
How do you know when to stop?
107
V V V V V
V There are some things we cannot do with SQL alone V e.g. preserving complex states, looping, branching etc. V Typically embed SQL in a host-language interface
Problems: Impedance Mismatch V SQL operates on sets of tuples V Languages such as C, C++ operate on an individual basis Solution V easy when SELECT returns only one row When more than one row is returned V design an iterator to run over the results V called a cursor
Why?
108
V V
Vendor-Specific Implementations V ORACLE: PL/SQL (procedural extensions to SQL) Open Database Connectivity Standard V Provides a standard API for transparent database access V used when database independence is important V used when required to connect to diverse data sources
109
Tradeoffs
V originated by Microsoft in 1991 V adds one more abstraction layer V not as fast as a native API (does not exploit special features) V least-common denominator approach V constantly evolving
ODBC
PL/SQL etc. V tailored to the details of the underlying DBMS V might not extend to heterogeneous domains V modeled after a specific programming language (e.g. Ada for Pl/SQL)
110
V V V V
Used for developing tightly-coupled applications V push computations selectively into the database system V avoid performance degradation V work in database address space instead of application address space Advantages V No sending SQL statements to and fro V eliminate pre-processing V speedup by an order of magnitude Example Applications V Database Adminstration V Integrity Maintenance and Checks V Database Mining Disadvantages V Non-standard implementation V Difficult to enforce transactional synchronization V Without traditional SQL optimization, can lead to performance degradation
111
V V
SELECT name FROM Students, Classroll WHERE Students.name = Classroll.studentname AND Students.gpa = 4.0 AND Classroll.coursename = CS5614
V V
V Do join and then filter out the ones with gpa <> 4.0 and course <> CS5614 V Filter first the ones with gpa <> 4.0 and course <> CS5614 and then Join
Which is Better? V Always good to push selections as far down into the query parse tree
Two Strategies
112
V V V V V
Three Requirements V A Search Space of Plans V A Cost Model (for Plan evaluation) V An Enumeration Algorithm Ideally V Search Space: contains both good and efficient plans V Cost Models: cheap to compute and accurate V Enumeration Algorithm: efficient (not a monkey-typewriter algorithm) Example of a Search Space V See Previous Slide Examples of Cost Models V #(tuples) evaluation V #(main memory locations) etc. Example of an enumeration algorithm V Sequential enumeration of a lattice of plans V Dynamic Programming vs. Greedy Approaches
113
V V V V
114
V V V V V
V Minimum: 0 V In-between: #(R) (if Y is a foreign key for R and a key for S) V Maximum: #(R)#(S) (if Ys in R and S are all the same)
Assumptions for Join Size Estimation V Containment of Value Sets V Preservation of Value Sets Containment of Value Sets V If V(R,Y) <= V(S,Y) then the Ys in R are a subset of the Ys in S V Satisfied when Y is a foreign key in R and a key in S Preservation of Value Sets V #(R Join S,X) = #(R,X) V #(R Join S,Z) = #(S,Z) V why is this reasonable?
115
V V V V V V
V Every tuple in R has a chance of 1/V(S,Y) of joining with a tuple of S V Every tuple in R has a chance of #(S)/V(S,Y) of joining with S V All tuples in R have a chance of #(R)#(S)/V(S,Y) of joining with S
What if V(S,Y) <= V(R,Y) V Answer: #(R)#(S)/V(R,Y) V In general: #(R)#(S)/(max (V(S,Y),V(R,Y))) What if there are multiple join attributes V Have a max factor in the denominator for each such attribute! How to Estimate #(R Join S Join T)? V Does it matter which we do first? Surprise! V Estimation formula preserves associativity of Joins! V In other words, it takes care of itself! Thus, for a Join attribute appearing > 2 times V 3 times: Use two highest values V 4 times: Use three highest values etc.
116
V V
Which is better: (R Join S) or (S Join R) V Good to put the smaller relation on the left V Why? Most Join algorithms are assymmetric Example: V Construct a good query tree for the following
V Arises from the shape of the trees: T(n) V Arises from permuting the leaves: n! V Total choices: n! T(n)
117
What is T(n)?
V 1: 1 V 2: 1 V 3: 2 V 4: 5 V 5: 14
Sample Values
V V V
A formula V T(1) = 1 (Basis) V T(n) = T(1)T(n-1) + T(2)T(n-2) + ..... + T(n-1)T(1) Classifications V Left-Deep Trees: All right children are leaves V Right-Deep Trees: All left children are leaves V Bushy Trees: Neither Left-Deep nor Right-Deep Choosing a Join Order: Restricted to Left-Deep Trees V By Dynamic Programming: O(n!) V Greedy Approach: Make local selections
118
Example
V V
Consider V R(a,b): #(R) = 1000, V(R,a) = 100, V(R,b) = 200 V S(b,c): #(S) = 1000, V(S,b) = 100, V(S,c) = 500 V T(c,d): #(T) = 1000, V(T,c) = 20, V(T,d) = 50 Possible Join Orders V (R Join S) Join T V (S Join R) Join T (same as above; why?) V (R Join T) Join S V (T Join R) Join S (same as above) V (S Join T) Join R V (T Join S) Join R (same as above) Cost Estimation = Sizes of Intermediate Relations V (R Join S) Join T: 5000 V (R Join T) Join S: 1000000 V (S Join T) Join R: 2000 Best Plan = (S Join T) Join R
V V
119
V V
V OLTP (Access small number of records) V OLAP (Summarize from a large number of records)
Sources of Poor Performance V Imprecise Data Searches V Random vs. Sequential Disk Accesses V Short Bursts of Database Interaction V Delays due to Multiple Transactions What can be done? V Tune Hardware Architecture V Tune OS V Tune Data Structures and Indices
120
V V V V V V
To normalize or not to V Sacrificing Redundancy Elimination V Sacrificing Dependency Elimination Several Choices of Normalized Schemas V Vertical Partitioning Applications Recomputing Indices V Histograms etc. might be outdated Restricting Uses of Subqueries V Unnesting query blocks by Joins Declining the Use of Indices V Table Scans for Small Tables V Rule-based optimization: Rewrite A=6 as A+0=6 Provide Redundant Tables V Decision-Support/ Data Mining Queries