A Labeling DOM-Based Tree Walking Algorithm for Mapping XML Documents into Relational Databases

No Thumbnail Available
El.Bashair, Seif El.Duola Fath EL Rhman El Haj
Journal Title
Journal ISSN
Volume Title
University of Khartoum
XML has emerged as the standard for representing and exchanging data on the World Wide Web. For practical purposes, it is found to be critical to have efficient mechanisms to store and query XML data, as well as to exploit the full power of this new technology. Several researchers have proposed to use relational databases to store and query XML data. The main challenge of this approach is that, one needs to resolve the conflict between the hierarchical nature of the XML data model and the flat nature of the relational data model. With the understanding of the limitations of current approaches, this thesis aims to provide a general method for extracting a Relational Model that accurately reproduces the relational database in a meaningful way. This thesis also details a method for extracting a corresponding XML Schema from a Relational Model. The two methods attempt to automate the forward and reverse conversions and can be used independently, which allows for a much more flexible and general approach to transformation. Note that the techniques described in this thesis do not reproduce an exact form of either model. Their purpose is to provide an automatic system that is general enough to process most XML documents, and to produce a lossless and semantically meaningful mapping between the two models. This thesis proposes and develops an efficient mapping algorithm, called XMR, for storing XML documents using relational databases. XMR requires the XML data to be shredded and composed into relational tuples. The Reconstruction algorithm, RRX, reconstructs an XML subtree rooted at a node from the relational database. These algorithms solve the problem of XML type, since it works with all type of XML document, Document Type Descriptor (DTD), schema data, schema less data without need to format it. The algorithm performance is linear with respect to the document size which is an important issue in the processing time. The algorithm works with complexity O(11n+2) which is acceptable taking into consideration it deals with all types of XML documents
Databases, XML Documents, Algorithm