You are on page 1of 13

Chapter 1 Database Systems

Chapter 1 Database Systems


Discussion Focus
How often have your students heard that you have only one chance to make a good first impression? hat!s why it!s so important to sell the importance of databases and the desirability of good database design during the first class session" Start by showing your students that they interact with databases on a daily basis" #or e$ample% how many of them have bought anything using a credit card during the past day% week% month% or year? &one of those transactions would be possible without a database" How many have shipped a document or a package via an overnight service or via certified or registered mail? How many have checked course catalogs and class schedules online? 'nd surely all of your students registered for your class? Did anybody use a web search engine to look for ( and find ( information about almost anything? his point is easy to make) Databases are important because we depend on their e$istence to perform countless transactions and to provide information" *f you are teaching in a classroom e+uipped with computers% give some live performances" #or e$ample% you can use the web to look up a few insurance +uotes or compare car prices and models" *ncidentally% this is a good place to make the very important distinction between data and information" *n short% spend some time discussing the points made in Section 1"1% ,-hy Databases?, and Section 1". Data vs" *nformation" 'fter demonstrating that modern daily life is almost inconceivable without the ever/present databases% discuss how important it is that the 0database1 transactions are made successfully% accurately% and +uickly" hat part of the discussion points to the importance of database design% which is at the heart of this book" *f you want to have the keys to the information kingdom% you!ll want to know about database design and implementation" 'nd% of course% databases don!t manage themselves 2 and that point leads to the importance of the database administration 0D3'1 function" here is a world of e$citing database employment opportunities out there" 'fter discussing why databases% database design% and database administration are important% you can move through the remainder of the chapter to develop the necessary vocabulary and concepts" he review +uestions help you do that 2 and the problems provide the chance to test the newfound knowledge"

Chapter 1 Database Systems

Answers to Review Questions


1. Discuss each of the following terms: a. ata 4aw facts from which the re+uired information is derived" Data have little meaning unless they are grouped in a logical manner" b. fiel ' character or a group of characters 0numeric or alphanumeric1 that describes a specific characteristic" ' field may define a telephone number% a date% or other specific characteristics that the end user wants to keep track of" c. recor ' logically connected set of one or more fields that describes a person% place% event% or thing" #or e$ample% a C5S 6784 record may be composed of the fields C5S 9&57384% C5S 9:&'78% C5S 9#&'78% C5S 9*&* *':% C5S 9'DD48SS% C5S 9C* ;% C5S 9S ' 8% C5S 9<*=C6D8% C5S 9'48'C6D8% and C5S 9=H6&8" . file Historically% a collection of file folders% properly tagged and kept in a filing cabinet" 'lthough such manual files still e$ist% we more commonly think of a 0computer1 file as a collection of related records that contain information of interest to the end user" #or e$ample% a sales organi>ation is likely to keep a file containing customer data" ?eep in mind that the phrase related records reflects a relationship based on function" #or e$ample% customer data are kept in a file named C5S 6784" he records in this customer file are related by the fact that they all pertain to customers" Similarly% a file named =46D5C would contain records that describe products ( the records in this file are all related by the fact that they all pertain to products" ;ou would not e$pect to find customer data in a product file% or vice versa"

!"#$
!ote: Fiel % recor % an file are computer terms% create to help escribe how ata are store in secon ary memory. $mphasi&e that computer file ata storage oes not match the human perception of such ata storage.

Chapter 1 Database Systems '. (hat is ata re un ancy% an which characteristics of the file system can lea to it) Data redundancy e$ists when unnecessarily duplicated data are found in the database" #or e$ample% a customer@s telephone number may be found in the customer file% in the sales agent file% and in the invoice file" Data redundancy is symptomatic of a 0computer1 file system% given its inability to represent and manage data relationships" Data redundancy may also be the result of poorly/designed databases that allow the same data to be kept in different locations" 0Here@s another opportunity to emphasi>e the need for good database designA1 *. (hat is ata in epen ence% an why is it lac+ing in file systems) #ile systems e$hibit data dependence because file access is dependent on a file@s data characteristics" herefore% any time the file data characteristics are changed% the programs that access the data within those files must be modified" Data independence e$ists when changes in the data characteristics don@t re+uire changes in the programs that access those data" #ile systems lack data independence because all data access programs are subBect to change when any of the file system!s data storage characteristics ( such as changing a data type // change" ,. (hat is a D-.S% an what are its functions) ' D37S is best described as a collection of programs that manage the database structure and that control shared access to the data in the database" Current D37Ses also store the relationships between the database componentsC they also take care of defining the re+uired access paths to those components" he functions of a current/generation D37S may be summari>ed as follows) he D37S stores the definitions of data and their relationships 0metadata1 in a data dictionaryC any changes made are automatically recorded in the data dictionary" he D37S creates the comple$ structures re+uired for data storage" he D37S transforms entered data to conform to the data structures in item ." he D37S creates a security system and enforces security within that system" he D37S creates comple$ structures that allow multiple/user access to the data" he D37S performs backup and data recovery procedures to ensure data safety" he D37S promotes and enforces integrity rules to eliminate data integrity problems" he D37S provides access to the data via utility programs and from programming languages interfaces" he D37S provides end/user access to data within a computer network environment" /. (hat is structual in epen ence% an why is it important) Structural independence e$ists when data access programs are not subBect to change when the file@s structural characteristics% such as the number or order of the columns in a table% change" Structural independence is important because it substantially decreases programming effort and program maintenance costs"

Chapter 1 Database Systems 0. $1plain the ifference between ata an information. Data are raw facts" *nformation is processed data to reveal the meaning behind the facts" :et!s summari>e some key points) Data constitute the building bocks of information" *nformation is produced by processing data" *nformation is used to reveal the meaning of data" Eood% relevant% and timely information is the key to good decision making" Eood decision making is the key to organi>ational survival in a global environment" 2. (hat is the role of a D-.S% an what are its a vantages) (hat are its isa vantages) ' atabase management system 0D-.S1 is a collection of programs that manages the database structure and controls access to the data stored in the database" #igure 1". 0shown in the te$t1 illustrates that the D37S serves as the intermediary between the user and the database" he D37S receives all application re+uests and translates them into the comple$ operations re+uired to fulfill those re+uests" he D37S hides much of the database!s internal comple$ity from the application programs and users" he application program might be written by a programmer using a programming language such as C636:% Fisual 3asic% or CGG% or it might be created through a D37S utility program" Having a D37S between the end user!s applications and the database offers some important advantages" #irst% the D37S enables the data in the database to be shared among multiple applications or users" Second% the D37S integrates the many different users! views of the data into a single all/encompassing data repository" 3ecause data are the crucial raw material from which information is derived% you must have a good way of managing such data" 's you will discover in this book% the D37S helps make data management more efficient and effective" *n particular% a D37S provides advantages such as) Improved data sharing" he D37S helps create an environment in which end users have better access to more and better/managed data" Such access makes it possible for end users to respond +uickly to changes in their environment" Better data integration" -ider access to well/managed data promotes an integrated view of the organi>ation!s operations and a clearer view of the big picture" *t becomes much easier to see how actions in one segment of the company affect other segments" Minimized data inconsistency" Data inconsistency e$ists when different versions of the same data appear in different places" #or e$ample% data inconsistency e$ists when a company!s sales department stores a sales representative!s name as 3ill 3rown and the company!s personnel department stores that same person!s name as -illiam E" 3rown or when the company!s regional sales office shows the price of product H as IJK"LK and its national sales office shows the same product!s price as IJD"LK" he probability of data inconsistency is greatly reduced in a properly designed database" Improved data access" he D37S makes it possible to produce +uick answers to ad hoc +ueries" #rom a database perspective% a 3uery is a specific re+uest for data manipulation 0for e$ample% to read or update the data1 issued to the D37S" Simply put% a +uery is a +uestion

Chapter 1 Database Systems and an a hoc 3uery is a spur/of/the/moment +uestion" he D37S sends back an answer 0called the 3uery result set1 to the application" #or e$ample% end users% when dealing with large amounts of sales data% might want +uick answers to +uestions 0ad hoc +ueries1 such as) -hat was the dollar volume of sales by product during the past si$ months? -hat is the sales bonus figure for each of our salespeople during the past three months? How many of our customers have credit balances of ID%MMM or more? Improved decision making" 3etter/managed data and improved data access make it possible to generate better +uality information% on which better decisions are based" Increased end-user productivity" he availability of data% combined with the tools that transform data into usable information% empowers end users to make +uick% informed decisions that can make the difference between success and failure in the global economy"

he advantages of using a D37S are not limited to the few Bust listed" *n fact% you will discover many more advantages as you learn more about the technical details of databases and their proper design" 'lthough the database system yields considerable advantages over previous data management approaches% database systems do carry significant disadvantages" #or e$ample) Increased costs" Database systems re+uire sophisticated hardware and software and highly skilled personnel" he cost of maintaining the hardware% software% and personnel re+uired to operate and manage a database system can be substantial" raining% licensing% and regulation compliance costs are often overlooked when database systems are implemented" Management complexity" Database systems interface with many different technologies and have a significant impact on a company!s resources and culture" he changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company!s obBectives" Eiven the fact that databases systems hold crucial company data that are accessed from multiple sources% security issues must be assessed constantly" Maintaining currency" o ma$imi>e the efficiency of the database system% you must keep your system current" herefore% you must perform fre+uent updates and apply the latest patches and security measures to all components" 3ecause database technology advances rapidly% personnel training costs tend to be significant" Vendor dependence" Eiven the heavy investment in technology and personnel training% companies might be reluctant to change database vendors" 's a conse+uence% vendors are less likely to offer pricing point advantages to e$isting customers% and those customers might be limited in their choice of database system components" Frequent upgrade replacement cycles" D37S vendors fre+uently upgrade their products by adding new functionality" Such new features often come bundled in new upgrade versions of the software" Some of these versions re+uire hardware upgrades" &ot only do the upgrades themselves cost money% but it also costs money to train database users and administrators to properly use and manage the new features"

Chapter 1 Database Systems 4. 5ist an escribe the ifferent types of atabases.

he focus is on Section 1"D".% ;=8S 6# D' '3'S8S" 6rgani>e the discussion around the number of users% database site location% and data use) &umber of users o Single/user o 7ultiuser o -orkgroup o 8nterprise Database site location o Centrali>ed o Distributed ype of data o Eeneral/purpose o Discipline/specific Database use o ransactional 0production1 database 06: =1 o Data warehouse database 06:'=1 Degree of data structure o 5nstructured data o Structured data 6. (hat are the main components of a atabase system) he basis of this discussion is Section 1"N"1% H8 D' '3'S8 S;S 87 8&F*46&78& " #igure 1"L provides a good bird!s eye view of the components" &ote that the system!s components are hardware% software% people% procedures% and data" 17. (hat are meta ata) 7etadata is data about data" hat is% metadata define the data characteristics such as the data type 0such as character or numeric1 and the relationships that link the data" 4elationships are an important component of database design" -hat makes relationships especially interesting is that they are often defined by their environment" #or instance% the relationship between 87=:6;88 and O63 is likely to depend on the organi>ation!s definition of the work environment" #or e$ample% in some organi>ations an employee can have multiple Bob assignments% while in other organi>ations ( or even in other divisions within the same organi>ation ( an employee can have only one Bob assignment" he details of relationship types and the roles played by those relationships in data models are defined and described in Chapter .% Data 7odels"" 4elationships will play a key role in subse+uent chapters" ;ou cannot effectively deal with database design issues unless you address relationships" 11. $1plain why atabase esign is important.

Chapter 1 Database Systems he focus is on Section 1"J% -H; D' '3'S8 D8S*E& *S *7=64 '& " 8$plain that modern database and applications development software is so easy to use that many people can +uickly learn to implement a simple database and develop simple applications within a week or so% without giving design much thought" 's data and reporting re+uirements become more comple$% those same people will simply 0and +uicklyA1 produce the re+uired add/ons" hat@s how data redundancies and all their attendant anomalies develop% thus reducing the ,database, and its applications to a status worse than useless" Stress these points) Eood applications can@t overcome bad database designs" he e$istence of a D37S does not guarantee good data management% nor does it ensure that the database will be able to generate correct and timely information" 5ltimately% the end user and the designer decide what data will be stored in the database" ' database created without the benefit of a detailed blueprint is unlikely to be satisfactory" =ose this +uestion) would you think it smart to build a house without the benefit of a blueprint? So why would you want to create a database without a blueprint? 0=erhaps it would be 6? to build a chicken coop without a blueprint% but would you want your house to be built the same way?1 1'. (hat are the potential costs of implementing a atabase system) 'lthough the database system yields considerable advantages over previous data management approaches% database systems do impose significant costs" #or e$ample) Increased acquisition and operating costs" Database systems re+uire sophisticated hardware and software and highly skilled personnel" he cost of maintaining the hardware% software% and personnel re+uired to operate and manage a database system can be substantial" Management complexity" Database systems interface with many different technologies and have a significant impact on a company@s resources and culture" he changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company@s obBectives" Eiven the fact that databases systems hold crucial company data that are accessed from multiple sources% security issues must be assessed constantly" Maintaining currency" o ma$imi>e the efficiency of the database system% you must keep your system current" herefore% you must perform fre+uent updates and apply the latest patches and security measures to all components" 3ecause database technology advances rapidly% personnel training costs tend to be significant" Vendor dependence" Eiven the heavy investment in technology and personnel training% companies may be reluctant to change database vendors" 's a conse+uence% vendors are less likely to offer pricing point advantages to e$isting customers and those customers may be limited in their choice of database system components" 1D" 8se e1amples to compare an contrast unstructure an structure prevalent in a typical business environment) ata. (hich type is more

8nstructure ata are data that e$ist in their original 0raw1 state% that is% in the format in which they were collected" herefore% unstructured data e$ist in a format that does not lend itself to the processing that yields information" Structure ata are the result of taking unstructured data and formatting 0structuring1 such data to facilitate storage% use% and the generation of information" ;ou

Chapter 1 Database Systems apply structure 0format1 based on the type of processing that you intend to perform on the data" Some data might be not ready 0unstructured1 for some types of processing% but they might be ready 0structured1 for other types of processing" #or e$ample% the data value DNQLM might refer to a >ip code% a sales value% or a product code" *f this value represents a >ip code or a product code and is stored as te$t% you cannot perform mathematical computations with it" 6n the other hand% if this value represents a sales transaction% it is necessary to format it as numeric" Structured data are more prevalent than unstructured data in a business environment" #or e$ample% if invoices are stored as images for future retrieval and display% you can scan them and save them in a graphic format" 6n the other hand% if you want to derive information such as monthly totals and average sales% such graphic storage would not be useful" *nstead% you could store the invoice data in a 0structured1 spreadsheet format so that you can perform the re+uisite computations" 1J" (hat are some basic atabase functions that a sprea sheet cannot perform. Spreadsheets do not support self/documentation through metadata% enforcement of data types or domains to ensure consistency of data within a column% defined relationships among tables% or constraints to ensure consistency of data across related tables" 1K" -hat common problems do a collection of spreadsheets created by end users share with the typical file system? ' collection of spreadsheets shares several problems with the typical file system" #irst problem is that end users create their own% private% copies of the data% which creates issues of data ownership" his situation also creates islands of information where changes to one set of data are not reflected in all of the copies of the data" his leads to the second problem ( lack of data consistency" 3ecause the data in various spreadsheets may be intended to represent a view of the business environment% a lack of consistency in the data may lead to faulty decision making based on inaccurate data" 1P" 8$plain the significance of the loss of direct% hands/on access to business data that users e$perienced with the advent of computeri>ed data repositories" 5sers lost direct% hands/on access to the business data when computeri>ed data repositories were developed because the * skills necessary to directly access and manipulate the data were beyond the average user@s abilities% and because security precautions restricted access to the shared data" his was significant because it removed users from the direct manipulation of data and introduced significant time delays for data access" -hen users need answers to business +uestions from the data% necessity often does not give them the lu$ury of time to wait days% weeks% or even months for the re+uired reports" he desire to return hands/on access to the data to the users% among other drivers% helped to propel the development of database systems" -hile database systems have greatly improved the ability of users to directly access data% the need to +uickly manipulate data for themselves has lead to the problems of spreadsheets being used when databases are needed"

9roblem Solutions

Chapter 1 Database Systems

"!5:!$ C"!#$!#
#he file structures you see in this problem set are simulate in a .icrosoft Access atabase name ChM19=roblems% available in the 9remium (ebsite for this boo+. #he 9remium (ebsite also inclu es SQ5 script files ;"racle an SQ5Server< for all of the ata sets use throughout the boo+. =iven the file structure shown in Figure 91.1% answer 9roblems 1 > ,.

F:=8R$ 91.1 #he File Structure for 9roblems 1>,

1. ?ow many recor s oes the file contain) ?ow many fiel s are there per recor ) he file contains seven records 0.1/K< through D1/N=1 and each of the records is composed of five fields 0=46O8C 9C6D8 through =46O8C 93*D9=4*C8"1 '. (hat problem woul you encounter if you wante to pro uce a listing by city) ?ow woul you solve this problem by altering the file structure) he city names are contained within the 7'&'E849'DD48SS attribute and decomposing this character 0string1 field at the application level is cumbersome at best" 0Rueries become much more difficult to write and take longer to e$ecute when internal string searches must be conducted"1 *f the ability to produce city listings is important% it is best to store the city name as a separate attribute" *. :f you wante to pro uce a listing of the file contents by last name% area co e% city% state% or &ip co e% how woul you alter the file structure) he more we divide the address into its component parts% the greater its information capabilities" #or e$ample% by dividing 7'&'E849'DD48SS into its component parts 07E49S 488 % 7E49C* ;% 7E49S ' 8% and 7E49<*=1% we gain the ability to easily select records on the basis of >ip codes% city names% and states" Similarly% by subdividing the 7'&'E84 name into its components 7E49:'S &'78% 7E49#*4S &'78% and 7E49*&* *':% we gain the ability to produce more efficient searches and listings" #or e$ample% creating a phone directory is easy when you can sort by last name% first name% and initial" #inally% separating the area code and the phone number will yield the ability to efficiently group data by area codes" hus 7E49=H6&8 might be decomposed into 7E49'48'9C6D8 and 7E49=H6&8" he more you decompose the data into

Chapter 1 Database Systems their component parts% the greater the search fle$ibility" Data that are decomposed into their most basic components are said to be atomic" ,. (hat ata re un ancies o you etect) ?ow coul those re un ancies lea to anomalies) &ote that the manager named Holly 3" =arker occurs three times% indicating that she manages three proBects coded .1/K<% .K/L % and .L/.D% respectively" 0 he occurrences indicate that there is a 1)7 relationship between =46O8C and 7'&'E84) each proBect is managed by only one manager but% apparently% a manager may manage more than one proBect"1 7s" =arker@s phone number and address also occur three times" *f 7s" =arker moves andSor changes her phone number% these changes must be made more than once and they must all be made correctly!!! "ithout missing a single occurrence " *f any occurrence is missed during the change% the data are ,different, for the same person" 'fter some time% it may become difficult to determine what the correct data are" *n addition% multiple occurrences invite misspellings and digit transpositions% thus producing the same anomalies" he same problems e$ist for the multiple occurrences of Eeorge #" Dorts" /. : entify an iscuss the serious ata re un ancy problems e1hibite by the file structure shown in Figure 91./.

F:=8R$ 91./ #he File Structure for 9roblems />4

1M

Chapter 1 Database Systems

!"#$
:t is not too early to begin iscussing proper structure. For e1ample% you may focus stu ent attention on the fact that% i eally% each row shoul represent a single entity. #herefore% each row@s fiel s shoul efine the characteristics of one entity% rather than inclu e characteristics of several entities. #he file structure shown here inclu es characteristics of multiple entities. For e1ample% the A"-BC"D$ is li+ely to be a characteristic of a A"- entity. 9R"AB!8. an 9R"AB!A.$ are clearly characteristics of a 9R"A$C# entity. Also% since ;apparently< each proCect has more than one employee assigne to it% the file structure shown here shows multiple occurrences for each of the proCects. ;?urricane occurs three times% Coast occurs twice% an Satellite occurs four times.< Eiven the file@s poor structure% the stage is set for multiple anomalies" #or e$ample% if the charge for O639C6D8 T 88 changes from IQK"MM to ILM"MM% that change must be made twice" 'lso% if employee Oune H" Sattlemeier is deleted from the file% you also lose information about the e$istence of her O639C6D8 T 88% its hourly charge of IQK"MM% and the =46O9H654S T 1N"K" he loss of the =46O9H654S value will ultimately mean that the Coast proBect costs are not being charged properly% thus causing a loss of =46O9H654SUO639CHE9H654 T 1N"K $ IQK"MM T I1%JQN"KM to the company" *ncidentally% note that the file contains different O639CHE9H654 values for the same C Bob code% thus illustrating the effect of changes in the hourly charge rate over time" he file structure appears to represent transactions that charge proBect hours to each proBect" However% the structure of this file makes it difficult to avoid update anomalies and it is not possible to determine whether a charge change is accurately reflected in each record" *deally% a change in the hourly charge rate would be made in only one place and this change would then be passed on to the transaction based on the hourly charge" Such a structural change would ensure the historical accuracy of the transactions" ;ou might want to emphasi>e that the recommended changes re+uire a lot of work in a file system" 0. 5oo+ing at the $.9B!A.$ an woul you recommen ) $.9B9?"!$ contents in Figure 91./% what change;s<

' good recommendation would be to make the data more atomic" hat is% break up the data componnts whenever possible" #or e$ample% separate the 87=9&'78 into its componenst 87=9#&'78% 87=9*&* *':% and 87=9:&'78" his change will make it much easier to organi>e employee data through the employee name component" Similarly% the 87=9=H6&8 data should be decomposed into 87=9'48'C6D8 and 87=9=H6&8" #or e$ample% breaking up the phone number PKD/.DJ/D.JK into the area code PKD and the phone number .DJ/D.JK will make it much easier to organi>e the phone numbers by area code" 0*f you want to print an employee phone directory% the more atomic employee name data will make the Bob much easier"1

11

Chapter 1 Database Systems 2. : entify the various ata sources in the file you e1amine in 9roblem /. Eiven their answers to problem K and some additional scrutiny of #igure 1"K% your students should be able to identify these data sources) 8mployee data such as names and phone numbers" =roBect data such as proBect names" *f you start with an 87=:6;88 file% the proBect names clearly do not belong in that file" 0=roBect names are clearly not employee characteristics"1 Oob data such as the Bob charge per hour" *f you start with an 87=:6;88 file% the Bob charge per hour clearly does not belong in that file" 0Hourly charges are clearly not employee characteristics"1 he proBect hours% which are most likely the hours worked by the employee for that proBect" 0Such hours are associated with a work product% not the employee per se"1 4. =iven your answer to 9roblem 2% what new files shoul you create to help eliminate the ata re un ancies foun in the file shown in Figure 91./) he data sources are probably the =46O8C % 87=:6;88% O63% and CH'4E8" he =46O8C file should contain proBect characteristics such as the proBect name% the proBect managerScoordinator% the proBect budget% and so on" he 87=:6;88 file might contain the employee names% phone number% address% and so on" he O63 file would contain the billing charge per hour for each of the Bob types ( a database designer% an applications developer% and an accountant would generate different billing charges per hour" he CH'4E8 file would be used to keep track of the number of hours by Bob type that will be billed for each employee who worked on the proBect" 6. : entify an iscuss the serious ata re un ancy problems e1hibite by the file structure shown in Figure 91.6. ;#he file is meant to be use as a teacher class assignment sche ule. "ne of the many problems with ata re un ancy is the li+ely occurrence of ata inconsistencies D that two ifferent initials have been entere for the teacher name .aria Cor o&a.<

F:=8R$ 91.6 #he File Structure for 9roblems 6>17

&ote that the teacher characteristics occur multiple times in this file" #or e$ample% the teacher named 7aria Cordo>a!s first name% last name% and initial occur three times" *f changes must be made for any given teacher% those changes must be made multiple times" 'll it takes is one incorrect entry or

1.

Chapter 1 Database Systems one forgotten change to create data inconsistencies" 4edundant data are not a lu$ury you can afford in a data environment" 17. =iven the file structure shown in Figure 91.6% what problem;s< might you encounter if buil ing E". were elete ) ;ou would lose all the time assignment data about teachers -illiston% Cordo>a% and Hawkins% as well as the ?67 rooms .MJ8% 1.D% and DJ" Here is yet another good reason for keeping data about specific entities in their own tablesA his kind of an anomaly is known as a deletion anomaly"

1D

You might also like