You are on page 1of 29

Chapter III Object-Oriented Database Systems

Nguyen Kim Anh Dept. of Information Systems, SoICT, HUT

Outline
Object-Oriented Data Model Object-Oriented Database System(OODBS) Object-Oriented Data Definition Language Object-Oriented Query Language Index organizations for OODBS Query Optimization in OODBS

Object-Oriented Data Model


Definition of an object Object Identity Object Structure Object-Oriented Concepts Graphical representation of a complex object Comparisons of the states of two objects for equality Class Schema

Definition of an object
Objects User defined complex data types
An object has structure or state (variables) and methods (behavior/operations)

An object is described by four characteristics


Identifier: a system-wide unique id for an object Name: an object may also have a unique name in DB (optional) Lifetime: determines if the object is persistent or transient Structure: Construction of objects using type constructors

Object Identity
unique identity for each independent object stored in the database created by a unique, system-generated object identifier, or OID

Object Identity
properties of OID
immutable: the OID value of a particular object should not change
OID should not depend on any physical address attribute values of the object

Each OID is used only once.


Most OO database systems allow for the representation of both objects and values (having no OIDs)

Object Structure
The state (current value) of a complex object may be constructed from other objects (or other values) by using certain type constructors Can be represented by (i,c,v)

i is an unique id c is a type constructor v is the object state


Type constructors

Basic types: atom (integer,real,string,Boolean,) Structured type: tuple Collection type: array vs. list (order), set vs. bag (unorder)

Object Structure
Object states
c=atom ::: an atomic value from the domain c=set ::: a set of object identifiers {i1, i2, , in} c=tuple ::: a tuple of <a1:i1, a2:i2, , an:in> c=list ::: an ordered list [i1, i2, , in] c=array ::: a single-dimensional array of object identifiers c=bag

Object-Oriented Concepts

Abstract Data Types

Class definition provides extension to complex attribute types Implementation of operations and object structure hidden Sharing of data within hierarchy scope, supports code reusability Operator overloading

Encapsulation

Inheritance

Polymorphism

Example 1: Complex Object


o8=(i8, tuple, <NAME:i5, NUMBER: i4, MANAGER: i9, LOCATIONS: i7, MEMBERS: i10, CONTROL: i11>) ::: department 5 o9=(i9, tuple, <MANAGER:i12, MANAGER_START_DATE: i6>) o10=(i10,set,{i12, i13, i14}) o11=(i11,set,{i15, i16, i17})

LEGEND:

Graphical representation of a complex object object


tuple set
i8: O8 tuple

Object instance of department type


MEMBERS CONTROL

NAME

NUMBER MANAGER

LOCATIONS

i5: O5 atom
V5

i4: O4 i9: O9 tuple atom


V4

i 7: O 7 set V7

i10: O10 set


V10 O3 V3

i11: O11 set


V11

Research

V9 O1 V1

O2 V2

i15: ..... i16: ..... i17: ..... tuple tuple

Houston Bellaire Sugarland tuple MANAGER MANAGERSTRATDATE


i 6: O 6 atom V6 1988-05-22 i12: ..... tuple

i13: ..... tuple i14: ..... tuple

Comparisons of the states of (Current Values) two objects for equality


identical states (deep equality)
the graphs representing their states are identical in every respect, including the OIDs at every level

equal states (shallow equality)


the graph structures must be the same all the corresponding atomic values in the graphs should be the same allow some corresponding internal nodes in the two graphs to have objects with different OIDs

Example 2: Identical vs. Equal Object States


o1=(i1, tuple, <a1:i4, a2:i6>) o2=(i2, tuple, <a1:i5, a2:i6>) o3=(i3, tuple, <a1:i4, a2:i6>) o4=(i4, atom, 10) o5=(i5, atom, 10) o6=(i6, atom, 20) o1 and o2 have equal states o1 and o3 have identical states o4 and o5 have identical states o4 and o5 are equal but not identical

Class Schema

Outline
Object-Oriented Data Model Object-Oriented Database System(OODBS) Object-Oriented Data Definition Language Object-Oriented Query Language Index organizations for OODBS Query Optimization in OODBS

Object-Oriented Database System (OODBS)


A database system that incorporates all the important object-oriented concepts Some additional features
Unique Object identifiers Persistent object handling

Advantages of OODBS
Designer can specify the structure of objects and their behavior (methods) Better interaction with object-oriented languages such as Java and C++ Definition of complex and user-defined types Encapsulation of operations and userdefined methods

Outline
Object-Oriented Data Model Object-Oriented Database System(OODBS) Object-Oriented Data Definition Language Object-Oriented Query Language Index organizations for OODBS Query Optimization in OODBS

Object-Oriented Data Definition Language(OODDL)


Using OODDL to define Employee, Date, and Department types define type Employee: tuple ( name: string; birthday: date; address: string; sex: string salary: int; workfor: Department; supervisor: Employee; supervisee: set(Employee); manage: Department; workon set(Project); )
Attributes refer to Employee, Department , Project objects relationship among objects

Using OODDL to define Employee, Date, and Department types (Cont.)


Inverse reference: dept. of employee employee of dept.

define type tuple (

Date: year: month: day:

integer; integer; integer; ); string; integer; tuple (manager: Employee; startdate: Date; ); set (string); set (Employee); set (Project); );
set of references

define type Department tuple ( name: number: manager: locations: members: control:

Specifying Object Behavior via Class Operations


In relational model, selecting, inserting, deleting and modifying tuples are generic.

Define the behavior of a type of object based on the operations that can be externally applied to object of that type
create (insert) or destroy (delete) objects update the object state retrieve parts of the object state apply some calculations combination of retrieval, calculation, and update

Specifying Object Behavior via Class Operations (Continued)


interface define the name and arguments (parameters) of each operation signature (included in the class definition) implementation method (defined using programming languages) it is invoked by sending a message to the object to execute the corresponding method
Operations 1. object constructors 2. object destructor 3. object modifier 4. retrieval

Using OODDL to define Employee and Department classes


define class Employee: type tuple ( name:
birthday: address: sex: salary: workfor: supervisor: supervisee: manage: workon: string; date; string; string int; Department; Employee; set(Employee); Department; set(Project);

type definition

definition of operations

operations age integer; create_emp: Employee; destroy_emp : boolean; end Employee;

Using OODDL to define Employee and Department classes (Continued)


define class type tuple ( Department name: number: manager: locations: members: control: string; integer; tuple (manager: startdate: set (string); set (Employee); set (Project); );

type definition

Employee; Date; );

definition of operations

operations number_of_emps : integer; create_dept: Department, destroy_dept: boolean; assign_emp (e: Employee): boolean; (* adds a new employee to the department *) remove_emp (e: Employee): boolean; (* removes an employee from the department *) end Department;

Class Operations
object constructor create a new object destructor destroy an object object modifier modify various attribute of an object dot notation d.no_of_emps where d is a reference to a department object and no_of_emps is an operation
refer to attributes of an object: d.dnumber, d.mgr.startdate

Specifying Object Persistence via Naming and Reachability


transient object
exist in the executing program and disappear once the program terminates

persistent object
stored in the database and persist after program termination

naming mechanism
give an object a unique persistent name through which it can be retrieved by this and other program

Reachability
reachability mechanism
make the object reachable from some persistent object an object B is said to be reachable from an object A if a sequence of references in the object graph lead from object A to object B e.g., if o8 is persistent, then all other objects also become persistent (next slide) N defines a persistent collection of objects of class C create a named persistent object N, whose state is a set or list of objects of some class C add objects of C to the set or list and make them reachable from N

LEGEND:

Graphical representation of a complex object object


tuple set
i8: O8 tuple

Object instance of department type


MEMBERS CONTROL

NAME

NUMBER MANAGER

LOCATIONS

i5: O5 atom
V5

i4: O4 i9: O9 tuple atom


V4

i 7: O 7 set V7

i10: O10 set


V10 O3 V3

i11: O11 set


V11

Research

V9 O1 V1

O2 V2

i15: ..... i16: ..... i17: ..... tuple tuple

Houston Bellaire Sugarland tuple MANAGER MANAGERSTRATDATE


i 6: O 6 atom V6 1988-05-22 i12: ..... tuple

i13: ..... tuple i14: ..... tuple

Creating persistent objects by naming and reachability


define class DepartmentSet: type set (Department); operations add_dept(d: Department): remove_dept (d: Department): create_dept_set: destroy_dept_set: end DepartmentSet; boolean; boolean, DepartmentSet; boolean;

persistent name AllDepartments: DepartmentSet ; (* AllDepartments is a persistent named object of type set DepartmentSet*)

.....
d := create_dept ; ..... (* creates a new department object in the variable d *) b := AllDepartments.add_dept (d) ; (* make d persistent by adding it to the persistent named object AllDepartments *) AllDepartments object: extent of the class Department

Differences between traditional databases and OO databases


traditional database models
when an entity type or class is defined in EER, it represents both type declaration and persistent set

OO approaches
a class declaration specifies only the type and operations for a class of objects user must define a persistent object whose value is the collection of references to all persistent

10

Type Hierarchies and Inheritance


type (or class) hierarchy
define new types based on other predefined types (or classes) functions with zero arguments type
type name functions a number of attributes (instance variables) operations (methods)

TYPE_NAME: function, function, , function PERSON: Name, Address, Birthdate, Age, SSN EMPLOYEE subtype-of PERSON: Salary, HireDate, Seniority STUDENT subtype-of PERSON: Major, GPA

Inheritance
multiple inheritance
when T is a subtype of two (or more) types, T inherits the functions (attributes and methods) of both supertypes type lattice instead of type hierarchy if a function is inherited from some common supertype, it is inherited only once ambiguity resolution alarm users system default disallow multiple inheritance

Inheritance (Continued)
Selective Inheritance
a subtype inherits only some of the functions of a supertype an EXCEPT clause may be used to list the functions in a super type that are not to be inherited by the subtype

11

Outline
Object-Oriented Data Model Object-Oriented Database System(OODBS) Object-Oriented Data Definition Language Object-Oriented Query Language Index organizations for OODBS Query Optimization in OODBS

Object-Oriented Query Language


Declarative query language
Not computationally complete

Syntax based on SQL (select, from, where) Additional flexibility (queries with user defined operators and types)

SQL3 Object-oriented SQL


Foundation for several OO database management systems ORACLE8, DB2, etc New features relational & Object oriented Relational Features new data types, new predicates, enhanced semantics, additional security and an active database Object Oriented Features support for functions and procedures Set-oriented query language

12

Object Query Language (OQL)


Syntax based on SQL (select, from, where) :
select <structured query result> from <class [class variable]> [,<path>.] where <path expressions>

Path-oriented query language


Path : C1.A1.A2... . An-1.An C2 C3... .Cn Path expression : C1.A1.A2... . An-1.An = v

Example of OQL query


The following is a sample query what are the names of the black product? Select distinct p.name From products p Where p.color = black
Valid in both SQL and OQL, but results are different.

Result of the query (SQL)


Original table
Product no P1 P2 P3 Name Ford Mustang Toyota Celica Mercedes SLK Color Black Green Black

Result Name Ford Mustang Mercedes SLK

- The statement queries a relational database. => Returns a table with rows.

13

Result of the query (OQL)


Original table
Product no P1 P2 P3 Name Ford Mustang Toyota Celica Mercedes SLK Color Black Green Black

Result String String Ford MustangMercedes SLK

- The statement queries a objectoriented database => Returns a collection of objects.

Comparison
Queries look very similar in SQL and OQL, sometimes they are the same In fact, the results they give are very different Query returns: OQL
Object Collection of objects

SQL
Tuple Table

Outline
Object-Oriented Data Model Object-Oriented Database System(OODBS) Object-Oriented Data Definition Language Object-Oriented Query Language Index organizations for OODBS Query Optimization in OODBS

14

Index organizations for OODBS


Path index (PX):
a path P = C1.A1.A2... . An-1.An a path index (PX) on P with Ci, 1i n : {(v,S)/ vDOM(An )and S = {Oi.Oi+1..On / O1.O2..On.v is a instantiation of P}}

Index organizations for OODBS


Nested index (NX):
a path P = C1.A1.A2... . An-1.An a nested index (NX) on P: {(v,S)/ vDOM(An )and S = {O / O1.O2..On.v is a instantiation of P, Oi=O, 1i n }}

Index organizations for OODBS


Multi-index (MX): a path P = C1.A1.A2... . An-1.An a multi-index (MX) on P: 1i n{Ii,1, Ii,2,..., Ii,ni} where Ii,j, 1in, 1jni, is a single index on path Cij.Ai and ni is the number of subclasses rooted by Ci a single index for Cij.Ai is {(O,S)/ ODOM(Ai )and S = {O / O. Ai=O} Indexes Ii,j, 1i<n, have OIDs as key values and are called indentity indexes Indexes In1 are called equality indexes

15

Index organizations for OODBS


Inherited multi-index (IMX): a path P = C1.A1.A2... . An-1.An a inherited multi-index (IMX) on P: 1i n{Ii} where Ii is s class-hierarchy index on path Ci.Ai. a class-hierarchy index associates with each value of an attribute Ai the OIDs of instances of a class Ci and of all its subclasses. an inherited multi-index differs from the multi-index in that it maintains a single index for all classes belonging to same inheritance hierarchy. this technique always requires a number of indexes equal to the path length.

Outline
Object-Oriented Data Model Object-Oriented Database System(OODBS) Object-Oriented Data Definition Language Object-Oriented Query Language Index organizations for OODBS Query Optimization in OODBS

Query Optimization in OODBS


Algebraic Transformation-based query optimization Graph-based query optimization (using path indexes) Method Materialisation

16

Algebraic Transformation-based query optimization


The object algebra is a many-sorted algebra Algebraic operators are defined for the various kinds of value sets. Operators can be classified as constructors, projection operators performing access to components of a complex value, selection, and iteration.

Object algebra

Algebraic optimization rules


algebraic optimization rules:
validate the defined operators represent semantically equivalent query transformations. allow algebraic expressions to be transformed into semantically equivalent, but more efficiently executable ones.

17

Algebraic optimization rules

Algebraic optimization rules

Algebraic optimization rules

18

Example of algebraic query


The following is a sample OQL query what are the names of employees who work for CS department? Select distinct p.name From employee p Where p.workfor.name = CS Algebraic query:
iS[S.name.v(s)](P[Pname(V(D(workfor(V(p)))))=CS](employee))

Graph-based query optimization using path indexes


Access Path Selection Generalized Index Intersection Query Graph Reductions Generation of Least-Cost Evaluation Plan

Access Path Selection


Eligible indexes for Q, denoted by EI(Q), are the indexes that are useful in query processing; Eligible indexses, for the condition pathi value, are the indexes constructed on `any subpath' of the pathi. Predicates that can (cannot) be processed by indexes are called index processible predicates(IP) (residual predicates(RP))

19

Access Path Selection


Query Graph
ai/j the link (i.e., the attribute) that connects the classes Ci and Cj ---- the path index constructed on the corresponding path expression

Access Path Selection


The problem of determining eligible indexes in the query optimization has exponential time complexity.
use a simple index selection heuristic:
select all eligible indexes and pointers take full advantage of the path indexes not compromised by the proposed index selection heuristic.

Generalized Index Intersection (for simple indexes)

20

Generalized Index Intersection (for path indexes)

Query Graph Reductions


Objective of Reductions:
determine the classes that are replaced by the index scans and removes them from the query graph. use Higraph for modeling the process of the query graph reduction.
Higraph has one extra element called supernode that contains one or more subnodes (classes).

Query Graph Reductions


The query graph reduction algorithm consists of the following three steps: 1. For query graph QG, determine the set of eligible indexes EI(QG). 2. For each IDX(pathi) EI(Q) 1) remove all primitive classes and edges in pathi. 2) create a new supernode that contains all user-defined classes in pathi; the supernode denotes OID tuples of its subnodes that satisfy the predicates matched with IDX(pathi). (Note: not remove the user-defined classes on pathi since residual predicates may exist for them. 3. If two supernodes (relations) T1 and T2 have a common subnode, perform natural join for them. The join result is denoted by another supernode T12 and the nodes T1 and T2 are removed. We repeat this step until no more supernodes exist in the query graph that share a subnode.

21

Query Graph Reductions

Query Graph Reductions

Query Graph Reductions

22

Query Graph Reductions

Generation of Least-Cost Evaluation Plan


The search algorithm generates all possible join orders (or alternative plans) from the RQG, and then estimates evaluation cost for each join order, and finally chooses the least-cost join order based on the cost model. (1) Generation of Search Tree (2) Cost Estimation and The Least-cost Evaluation Plan Generation

Generation of Least-Cost Evaluation Plan


Generation of Search Tree

23

Generation of Least-Cost Evaluation Plan


The joins of the branch < C1, C2, ...,Cn > can be processed by the sequence of binary joins The cost formula for the binary join of Ci and Ci+1 (using pointer-based sort-merge join algorithm):
cost(Ci JNai Ci+1) = cost(Ci) + sort(Ci, ai) +cost(Ci+1)

Cost Estimation

Method Materialisation
A method materialisation consists:
compute the result of a method once, store the method's result persistently in a database, use the persistent result value when the method is invoked. maintain the materialised results: update the values of materialised methods when objects used for computing them change (base objects)

Method Materialisation
reduce applications response time for accessing a method's result, especially when its execution takes long time. add methods maintenance cost
in order to improve a system's performance, only the right set of methods should be materialised method materialisation (precomputation, caching) was proposed in the context of indexing techniques and query optimisation.

24

Method Materialisation
Two important issues arise for method materialisation :
(1) what technique to use for method materialisation, and (2) which methods to materialise?

use the dynamic hierarchical method materialisation technique:


if the method mi is materialised then other methods called by mi are materialised. the system decides whether to materialise a given method or not based on the gathered statistics (method reads and updates of base objects)

Method Materialisation
Storage Structures
Materialised Methods Dictionary (MMD) contains information about all methods:
a method name and class, the array of input arguments, a method return type, a method implementation, and a flag indicating if a method was materialised.

Method Materialisation
Storage Structures
Materialised Method Results Structure (MMRS) stores the following information about every materialised method:
(1) the identifier of a method, (2) an object identifier the method was invoked for, (3) the array of input argument values a method was invoked with, (4) the value returned by a method while executed for a given object and for a given array of input argument values.

When materialised method mi is required, then MMRS is searched in order to get the result of mi. If it is not found then, the value of mi is computed and stored in MMRS. When an object used to compute the materialised value of mi is updated or deleted, then the materialised value becomes invalid and is removed from MMRS.

25

Method Materialisation
Storage Structures
GMC stores pairs of values:
the identifier of a calling method and the identifier of a method being called.

Graph of Method Calls (GMC) represent dependencies between methods, where one calls another one. GMC is used by the procedure that maintains the materialised results of methods.

Method Materialisation
Storage Structures
In order to invalidate dependent methods the system must be able to find also inverse references in object composition hierarchy. The references are maintained in a data structure called Inverse References Index (IRI).

Method Materialisation
Storage Structures
Method Value Index (MVI) is an index defined on results of methods. Every method of a class has its own MVI. The index stores the following:
(1) the value of a method input argument, (2) a method result, and (3) an object identifier a method was invoked for.

By using this index, the system is able to quickly find answers to queries that use methods. The content of MVI is filled in with data when methods are materialised.

26

Dynamic Method Materialisation


The dynamic method materialisation technique consists in: (1) gathering method usage statistics and based on the statistics (2) finding methods whose materialisation increases system's performance and methods whose materialisation deteriorates system's performance. A software module, called the method analyser and optimiser does the final selection of methods for materialisation and monitors method access patterns and gathers execution statistics.

Dynamic Method Materialisation


Tuning of a system is performed in two following steps. Step 1:
select the set SM of methods for materialisation. materialise results of these methods for their first calls. monitor the usage of the methods and gather execution statistics for the set of transactions using mi and its materialised values called the batch transaction set.
The size of the batch transaction set is parameterised by a system administrator.

Step 2:
identify methods whose materialisation increases system's performance dematerialise automatically methods whose materialisation deteriorates the system's performance

Dynamic Method Materialisation


Gathering method usage statistics
For a given method mi the execution statistics include:
method execution times and the number of disk accesses for every object and every set of input argument values, the number of base object updates, the number of reads of mi materialised values, method invalidation times and the number of disk accesses for every object and every set of input argument values, method recomputation times and the number of disk accesses for every object and every set of input argument values, time and the number of disk accesses required for finding an already materialised value.

27

Dynamic Method Materialisation


Selecting methods for materialisation
Cost Model
r - number of transactions reading the materialised value v of method mi. u - number of transactions updating a base object of mi. r +u - number of transactions in the batch transaction set. tRMAT - time of reading a materialised value of mi using MMRS. tEXEC - execution time of non-materialised method mi. tREMAT - time of rematerialising value v of mi, after its base object was updated.

All the discussed times include I/O as well as CPU times.

Dynamic Method Materialisation


Selecting methods for materialisation
The materialisation of method mi will reduce query response time if the following holds:

represents a coefficient by which an overall system's response time is to be reduced. It takes its value from the range of (0, 1) and it is considered as a tuning parameter set up by an administrator.

Dynamic Method Materialisation


Selecting methods for materialisation
In the worst case, i.e. when all branches in the GMC have to be invalidated, the rematerialisation time (tREMAT) includes:
tINV - invalidation time of a materialised result tEXEC -time of computing of a method result tWMAT time of writing the materialised result on disk.

Thus can be expressed as follows:

28

Dynamic Method Materialisation


Selecting methods for materialisation
Formula 1 and Formula 2 Formula 3 express the number of updates to the number of reads.

for a given method mi and a given batch transaction set, if the inequality in formula 3 is true, then mis materialisation increase system's performance. Otherwise, mi has to be dematerialised.

Object Oriented Databases


Advantages
Good integration with Java, C++, etc Can store complex information Fast to recover whole objects Has the advantages of the (familiar) object paradigm

Disadvantages
There is no underlying theory to match the relational model Can be more complex and less efficient OODB queries tend to be procedural, unlike SQL

Object Relational Databases


Extend a RDBMS with object concepts
Data values can be objects of arbitrary complexity These objects have inheritance etc. You can query the objects as well as the tables

An object relational database


Retains most of the structure of the relational model Needs extensions to query languages (SQL or relational algebra)

29

You might also like