<myRoot>
<myElement>
<aNode>
<aData>123</aData>
<bData><![CDATA[some lengthy text with special <characters>in</it>]]></bData>
</aNode>
<otherNode>
<otherData>31/12/1900</otherData>
.
.
.
<otherData>01/01/2000</otherData>
</otherNode>
<lastNode>
<simpleTag>
<simpleValue1>some text</simpleValue1>
<simpleValue2>other text</simpleValue2>
<simpleFilter>0</simpleFilter>
</simpleTag>
<simpleTag>
<simpleValue1>blah blah</simpleValue1>
<simpleValue2>other blah</simpleValue2>
<simpleFilter>1</simpleFilter>
</simpleTag>
<simpleTag>
<simpleValue1>foo</simpleValue1>
<simpleValue2>bar</simpleValue2>
<simpleFilter>1</simpleFilter>
</simpleTag>
.
.
.
</lastNode>
</myElement>
.
.
.
</myRoot>
Also, we won't have a proper schema file but we know that it'll have one or more myElement nodes, for each of which it'll have:
- a single aNode node with an integer aData element and a CDATA bData text element
- a single otherNode node with multiple otherData elements representing dates
- a single lastNode node with multiple simpleTag child nodes, each composed of three elements: a string in both simpleValue1 and simpleValue2 and an integer in simpleFilter. Additionally, we also know that for every lastNode node, just one and only one simpleTag node will have its simpleFilter element set to 0.
Now you want to store it in a table, then read it, parse its contents and save them in another table.
The table where you'll save it is defined as:
CREATE TABLE myTable(
myId INTEGER;
myXML CLOB;
filename VARCHAR2(200);
ts_ins DATE;--insert timestamp
CONSTRAINT myTable_pk PRIMARY KEY (myId)
);
And of course, you'll have a SEQUENCE for its primary key:
CREATE OR REPLACE SEQUENCE myTable_seq
MINVALUE 1
MAXVALUE 999999--it can go much higher
START WITH 1
INCREMENT BY 1
CACHE 10;--unless you expect multiple concurrent accesses, it's pretty useless
The XML file will be read from a DIRECTORY:
CREATE OR REPLACE DIRECTORY WORK_DIR AS 'C:\temp';--or whatever
Note that the directory MUST be on the same machine where the Oracle instance is running, to access remote directories, you must first map it as a network directory on the machine running the instance and then treat it as a local folder.
We'll also have a simple log table:
CREATE TABLE myLog(
errmsg VARCHAR2(2000);
ts DATE;
);
Into which we'll write using a procedure. To ensure that our log procedure only commits its data, we'll define it as PRAGMA AUTONOMOUS TRANSACTION, which encloses said procedure in a separate transaction so that it cannot commit on the main transaction and is not rolled back from it either.
Finally, the parsed data will be stored in yet another table. Furthermore, since the XML node otherNode has one or more children, we'll store it as a NESTED TABLE; to do so, we'll have to define a new datatype:
CREATE OR REPLACE TYPE t_myDates AS TABLE OF DATE;--of course we could use more complex data structures here
then the table:
CREATE TABLE myData(
myId INTEGER;
aData INTEGER;
bData VARCHAR2(200);
otherData t_myDates;
simpleValue VARCHAR2(200);
CONSTRAINT myData_pk PRIMARY KEY (myId)
)
NESTED TABLE otherData STORE AS myData_otherData_NT;
and its sequence:
CREATE OR REPLACE SEQUENCE myData_seq
MINVALUE 1
MAXVALUE 999999
START WITH 1
INCREMENT BY 1
CACHE 10;
Now we'll define all our functions/procedures in a PACKAGE myPackage with the following package specification:
CREATE OR REPLACE PACKAGE myPackage
IS
FUNCTION readXML (FileNameIn IN VARCHAR2)
RETURN PLS_INTEGER;
PROCEDURE writeLog(MessageIN IN varchar2);
FUNCTION parseData (IdIn IN INTEGER)
RETURN PLS_INTEGER;
END;
All methods defined inside the package specification are publicly accessible from outside the package, moreover, by defining the function parseData inside the package specification we are able to call it from within the body of readXML without having to resort to the forward declaration or defining the parseData method BEFORE readXML. In any other case, the compiler would complain that it does not know of a "parseData" object.
and package body:
CREATE OR REPLACE PACKAGE BODY myPackage
IS
STD_DEF_DATE varchar2(10):='dd/mm/yyyy';
FUNCTION readXML (FileNameIn IN VARCHAR2)
RETURN PLS_INTEGER
IS
dest_clob CLOB;
src_clob BFILE := BFILENAME('WORK_DIR', FileNameIn); --from the LOCAL (Oracle instance) filesystem
dst_offset number := 1 ;
src_offset number := 1 ;
lang_ctx number := DBMS_LOB.DEFAULT_LANG_CTX;
warning number;
l_id myTable.myId%type;
l_parseDataRes PLS_INTEGER;
BEGIN
SAVEPOINT svp_ini;
--get a new id from our sequence
SELECT myTable_seq.NEXTVAL INTO l_id FROM DUAL;
--put some data in the table along an empty CLOB which we'll fill later
INSERT INTO myTable(myId,
myXML,
filename,
ts_ins)
VALUES(l_id,
empty_clob(),
FileNameIn,
SYSDATE)
RETURNING myXML INTO dest_clob;
--now open our file in read mode
DBMS_LOB.OPEN(src_clob, DBMS_LOB.LOB_READONLY);
--and copy its contents in our CLOB
DBMS_LOB.LOADCLOBFROMFILE(dest_lob => dest_clob,
src_bfile => src_clob,
amount => DBMS_LOB.GETLENGTH(src_clob),
DEST_OFFSET => dst_offset ,
SRC_OFFSET => src_offset,
BFILE_CSID => DBMS_LOB.DEFAULT_CSID,
LANG_CONTEXT => lang_ctx,
WARNING => warning);
--remember to close the file!
DBMS_LOB.CLOSE(src_clob);
COMMIT;
l_parseDataRes:=parseData(l_id);
RETURN 1;
EXCEPTION WHEN OTHERS THEN
IF DBMS_LOB.ISOPEN(src_clob) = 1 THEN--if the file is still open close it!
DBMS_LOB.CLOSE(src_clob);
END IF;
--log the event
writelog('Error: '||SQLERRM||' while reading '||FileNameIn);
ROLLBACK TO svp_ini;
RETURN 0;
END readXML;
PROCEDURE writeLog(MessageIN IN varchar2) IS PRAGMA AUTONOMOUS_TRANSACTION; --to prevent unwanted commits!!!
BEGIN
INSERT INTO myLog VALUES(MessageIn, SYSDATE);
COMMIT;
END writeLog;
FUNCTION parseData (IdIn IN INTEGER)
RETURN PLS_INTEGER
IS
l_XML XMLTYPE;
l_XML_content CLOB;
l_aData myData.aData%type;
l_bData myData.bData%type;
l_otherData myData.otherData%type;
l_simpleValue myData.simpleValue%type;
l_myDataId PLS_INTEGER;
l_countSimpleFilter PLS_INTEGER;
BEGIN
--get the XML
SELECT myXML
INTO l_XML_content
FROM myTable
WHERE myId = IdIn;
l_XML := XMLType.CreateXML (l_XML_content);
--loop on our XML
FOR myCur IN (
SELECT myElement."aData", myElement."bData", myElement."otherNode", myElement."lastNode" --note the "" around the column names!
FROM XMLTABLE('//myElement' --creates a temporary table from a given XML with the structure specified
PASSING l_XML
COLUMNS "aData" INTEGER PATH '//aNode/aData', /*path defines the full (starting with a single /) or relative
(starting with a double //) path where the data is located inside the XML. It would be equivalent to '/myRoot/myElement/aNode/aData' */
"bData" VARCHAR2(200) PATH '//aNode/bData/text()', --we use text() since it's a CDATA value
"otherNode" XMLTYPE PATH '//otherNode', --we'll cycle on that later
"lastNode" XMLTYPE PATH '//lastNode' --we'll cycle on that later
) myElement
)LOOP --we are now looping on each myElement element inside our XML
BEGIN
SAVEPOINT svp_parsedata;
l_aData:=myCur."aData";
l_bData:=myCur."bData";
--we now want to read and concatenate the two simpleValue elements from the simpleTag node where simpleFilter is set to 0
--suppose you don't trust that there's just one simpleFilter with value 0, you can easily check it
SELECT COUNT(*)
INTO l_countSimpleFilter
FROM TABLE (XMLSEQUENCE(EXTRACT(l_XML, '//lastNode/simpleTag'))) t /*using XMLSEQUENCE we extract one by one the simpleTag
values and create a collection from them, then we put them in a temporary table with alias t and work on it*/
WHERE EXTRACTVALUE(VALUE(t),'//simpleTag/simpleFilter')=0;
IF l_countSimpleFilter > 1 THEN --if there are more, raise a user defined error to stop parsing the current myElement node
RAISE_APPLICATION_ERROR(-20000, 'Found more than one simpleTag with simpleFilter = 0');
END IF;
--now we're 100% sure there's only one
SELECT EXTRACTVALUE(XMLQUERY( /*remember that we know there will be ONLY ONE
node with element simpleFilter=0, that's why we can use EXTRACTVALUE*/
'for $i in $myXML/lastNode/simpleTag
where $i/simpleFilter = 0
return <simpleValue>{$i/simpleValue1/text()} {$i/simpleValue2/text()}</simpleValue>'
PASSING myCur."lastNode" AS "myXML" RETURNING CONTENT), '//simpleValue/text()', null) /*the use of   is mandatory
if you want a space between the two values*/
INTO l_simpleValue /*using XMLQUERY we've queried lastNode for the values we were interested into,
knowing that only a single child node would have matched our criteria. Then we had it return a single XML node
tagged simpleValue from which we extracted its contents with EXTRACTVALUE*/
FROM DUAL;
--now we're going to loop on the otherNode node to get all its otherData elements
/*if you're using a more complex nested table object, don't forget to initialize it with the proper constructor!
l_otherData:=myComplexObject(); */
FOR myData IN (
SELECT otherNode."otherData"
FROM XMLTABLE('//otherNode'
PASSING myCur."otherNode"
COLUMNS "otherData" VARCHAR2(10) PATH '//otherData'
) otherNode
)LOOP
l_otherData.EXTEND;
l_otherData(l_otherData.LAST):=TO_DATE(myData."otherData", STD_DEF_DATE); /*convert to date from varchar using the date format specified
at the beginning of our package*/
/*if your myComplexObject is TABLE OF mySimpleObject, then you would
l_otherData(l_otherData.LAST):=mySimpleObject(value1,...,valueN)*/
END LOOP;
--finally store out data inside myData table
SELECT myData_seq.NEXTVAL INTO l_myDataId FROM DUAL;
INSERT INTO myData(myId,
aData,
bData,
otherData,
simpleValue
)
VALUES(
l_aData,
l_bData,
l_otherData,
l_simpleValue
);
COMMIT;
EXCEPTION WHEN OTHERS THEN
writeLog('Error: '||SQLERRM||' while parsing data');
ROLLBACK TO svp_parsedata;
END;
END LOOP;
RETURN 1;
EXCEPTION WHEN OTHERS THEN
writeLog('Error: '||SQLERRM||' while reading from XML');
RETURN 0;
END parseData;
END myPackage;
We used many complex objects or functions to create our package, here's what to note:
- the source XML is an external binary file, a BFILE
- there's a SYS package that allows us to easily work on LOB objects, DBMS_LOB
- when manipulating the XML inside our script, we treat it as an XMLTYPE which can be queried with the EXTRACT and EXTRACTVALUE functions. If the XML file has a complex structure, we'll need to use the XMLTABLE and XMLSEQUENCE functions to manipulate it better. Last but not least, don't forget that we can always resort to the XMLQUERY function to query XML data; the downside is that the syntax is slightly different and more complicated than standard SQL
- should we need to define new exceptions, we can use the RAISE_APPLICATION_ERROR function
nice piece of information, I had come to know about your internet site from my friend vinay, delhi,i have read atleast 12 posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new post, once again hats off to you! Thanx a ton once again, Regards, sql and plsql difference
ReplyDelete