You are on page 1of 8

Course ID: 50

Course Title: Oracle XML DB w/ Real-World Examples


Section: 4 Parsing XML Documents
Lecture: 10 Parsing XML Real-World Example Part 1
Resource Materials

Hello folks and welcome back to this our first of two lectures wherein we will examine our real-world
import XML example.

Now this example is based on the assumption that we have an external application creating an XML file
based on stale or only partially connected information. Hey that is really the whole reason for using
XML right. If your external application is continuously connected then why are you using XML? Just do
everything directly with the connected database.

But Rick what are you talking about when you say partially connected or stale data. How about a
warehousing application where the hand-held device is updated once each night and then runs fully
disconnected all day? How about a simple online catalog? The catalog is populated along with
associated quantities on hand and pricing at the time of page generation. An hour later the user finally
completes their checkout and both quantities and pricing have changed.

Net: if you are going to parse XML into a relational database enforcing standard referential integrity
then you need to be constantly checking for possible XML data errors before performing database
updates and/or inserts.

Remember your target will be moving! Let’s take a look at our import_XML package remembering that
we will be focusing on data anomalies as well as our XML parsing strategies.

OK we start here by checking out our package definition. Wow, doesn’t get much simpler than this.
One procedure aptly named import_POs. No parameters, no fuss, no hassles.

Go out there to a well-defined


external directory and grab all the
new XML files matching a pre-
defined naming convention.

Initially attempt to insert each of


those XML documents into a
corresponding XMLType column
and then parse that XMLType data
into the corresponding relational
data for subsequent processing and
reporting.

Should be no problem right? Well maybe a little problematic or we wouldn’t be looking at this example
right?
OK, here is the code associated with our only exposed procedure – import_POs. Really pretty straight
forward stuff here.

First we go ahead and clear out our temp_xml_files table. If you will recall this guy holds the names of
all files found in our external directory as read by our friend sys.get_file_names routine. Remember, we
created that guy earlier and it utilizes that very top secret routine hidden within the backup and restore
package. Net: you give it a physical address and it reads all the file contents into a user-specified local
table. In our case that table is temp_xml_files so we begin here by clearing that guy out.

We next open an error file to track any data problems encountered throughout our loading and parsing
process. Pretty easy stuff and we’ll use a strategy similar to what we did earlier when creating XML
export file names – prefix + date + version.

With our read_external_po_files routine we will next insert those external XML files into a
corresponding XMLType column within our xml_documents table. Remember those external files must
be well-formed and there are also those silly literal requirements around our directory_object – single
quotes and all CAPS.
In the next paragraph we basically bail if any of our external files were not well-formed. Hey if you can’t
get past the well-formed requirement then the parsing is almost certainly going to go in the tank
because something weird is going on here.

Finally with our extract_xml_to_relational routine we are going to parse the XMLType into the
corresponding relational tables. Not surprisingly, this routine is where we are really going to work our
magic with the extract function.

Close out our error file and our import_POs routine is complete. Very cool.

Let’s start breaking these subroutines down so we can better understand some of the subtleties.

OK, before we look at the open_error_file routine let’s take a quick look at our package globals here at
the top of the package body.

OK, easy enough. Looks like we have a my_error_filename variable to store the unique name of this
error file. A utl_file file handle for that error file aptly named my_error_file. And finally an integer
variable named error_count to store the total number of parsing and/or loading errors encountered.

Finally we also have two globals to store customer and line_item information passed between routines.
I know, globals aren’t supposed to be cool. Whatever, I have been using them for 25+ years despite
academic displeasure. Hey I actually did this for a living versus those same types who disrespect globals.

Alright I’ll get off my soapbox and move onto the open_error_file routine.

Looks like this guy starts off by jumping into the get_next_error_log routine. OK, we’ll follow along and
see what is going on there.
All right, looks like this guy takes in a base filename and appends today’s date and a trailing underscore.
We then call our old friend sys.get_new_files to populate our temp_xml_files table with all the files from
within some external directory.

In the next paragraph we trim off the leading directory path from all those filenames as that will mess
up our inserting them into a local XMLType.
Add a version number to our new filename and the .log extension and we are finished.

Would we generally move this and our parallel routine from within the export_xml routines into a
common package containing general utilities? Of course but we don’t have such a package here in this
world so that is left for your implementation.

OK, now back to our open_error_file routine wherein we finish things up by opening our new filename
with the utl_file.fopen function into the my_error_file global file handle. Please remember to put that
directory object in single quotes and all CAPS – yeah I know, crazy bug which Oracle refuses to correct.

If you will recall, our import_POs routine next called the read_external_PO_files to move all the new
external PO XML files into our corresponding xml_documents table and its XMLType column. Let’s
check out the code.

Not much new here. Set up an implicit loop stepping through each filename in the temp_xml_files table
matching our template. You of course will have a more elegant xml file naming convention but this
example contains a bunch of bad data we want to examine.

The insert statement may be interesting to some of you less experienced with SQL. We are using our
regular xmltype and bfile constructors to attempt to insert each external xml file into our
xml_documents table. What is cool? Sure we do that only when the filename is not already present
within that table thus the use of an existence subquery.

Why select 1 Rick? Because you are doing a simple existence test. You can return whatever you want
and if the set is not empty then the insert will not occur. Let me take that out of programmer double
negative for those of you who prefer positive logic. The insert will occur only when the subquery return
set is empty; i.e. the current filename already exists in the table.
Finally, check out the use of the embedded begin-end syntax. We do that to set up a distinct code block
and of course it’s own, distinct error handler. Unlike our regular situation where we would simply throw
a sqlcode and sqlerrm error we are now actually going to trap the error, write the error to our error log
and continue to process the next filename. Yes, that is correct and you heard me correctly. Note, I am
not raising an error here but instead simply writing the error to our log file. This way we are able to
attempt to insert all of the appropriate files to our xml_documents table before returning to the
import_POs routine.

Also please note that if other, unanticipated errors occur they do raise an error and we report sqlerrm
and sqlcode as we normally would. Yes, thank you that is elegant code.

All right if we jump back to our import_POs routine now we see that I chose to exit our return and
simply report insert errors if we found an ill-formed XML document.

But why Rick? Because this type of error should simply not occur once both this and your external
applications are stable. Once that external application is able to create its first well-formed XML
document it should be able to continue to do so going forward until something significantly changes.
This error trap is simply sounding the alarm that something significant has changed and your routine has
not been properly updated to handle that change. “Sound the alarms, something is broken.”
OK, that is a pretty good place to break for this lecture and in the next lecture we will look at the actual
extract components of our import_POs package.

So what did we learn in this lecture?

1) When parsing external XML documents into their associated relational tables the challenge
comes from stale data. Invariably external XML documents were chosen in the first place
because the external application did NOT have continuous real-time access to the database.
Under these conditions data can become stale and the possibility for relational integrity failures
exists. You must trap those errors and report them to the external users for manual correction.
They don’t happen often but that doesn’t mean anything for your code. You must anticipate
and properly react to these errors.
2) Errors while inserting XMLTypes is expected during testing but not once you are in production.
If insert errors occur something has changed – sound the alarm and stop batch processing.
3) Next, every production environment needs a package containing generalized utility procedures.
For simplicity we don’t have one here but your environment should.
4) Finally, global variables are not the end of the world. Academics hate them but they have an
infinite amount of time and virtually no budget constraints. I have used them for 25 years and
you can too without a feeling of guilt.

OK, that is a pretty good place to pause our discussion of the import_POs package. In our next lecture
we will examine the extract components of the package. See you there.

In the interim, “you go out there and have some fun today, but remember to learn something new – it’s
what’s important.”

You might also like