Thursday, October 18, 2007

When XML Schemas Imports Become XQueries

For the last five years I have been using a XML Schema design process that I learned from the people at Georgia Tech Research Institute as part of my work with GJXDM and the NIEM. This work has strongly been influenced by Jim Heldler who co-wrote the Semantic Web article with Tim Berners-Lee. Although GTRI wanted to use RDF (graphs) to represent the exchange data they had to settle on using XML and XML Schemas since those technologies were widely used by most criminal justice agencies already. They compromised by still using a striped model within the XML Schema driven structures.

Creating an exchange documents starts with creating a metadata registry of all the data elements used in a family of XML Schemas. A shopping-cart-like tool is used to select a subset of the data elements. These data elements form a "wantlist" of data elements. The checkout process creates a zip file that when uncompressed creates a set of XML Schema file that you import into your constraint schema.

This process is an order-of-magnitude to use than before the subschema generation tools were created by GTRI. The subschema generated by the checkout process not only included the XML Schema type definitions but also all the data types that they depended on. The metadata registry had an internal dependency list that it maintained and so if you needed just ten data elements but they depended on then other data types you got all 20. you didn't have to manually figure out what additional data types to include.

I adopted the GTRI shopping-cart process to include adding a "subscribers list" to each of my data elements used in local metadata registries and local XML Schema creation. I used a large, complex and brittle ant script to update all the imported files for each XML Schema we were developing. But this could not easily be done during a requirements gathering meeting.

In the past we have viewed this subschema generation as a process of interacting with a web application that creates a set of files. One of our first insights is that this process is in fact just a transformation of the metadata registry based on the rules created by a wantlist and the dependency graph. It does not have to be a manual "batch" process.

In a recent blog (Metadata Web Services) I pointed out that tools like eXist and XForms allow us to store metadata in XML format and allow it to be easily updated by BAs and SMEs and queried using languages like XQuery. One of the great realizations I had on my last project was that XQuery has a simple construct that allows you to "chain" web services. The argument to the doc() function is just a URL. If that URL points to a web service that returns XML you can easily build composite web services. Web services that are transformations can be chained together. This allows you to start using enterprise integration patterns to build reusable services at a very fine grain.

Putting these facts together it has occurred to me that the ultimate goal of the metadata registry process is to allows new data element definitions and enumerated values to be changed during a XML Schema design process. We should be able to change a definition, hit the "save" on the XForms application that edits data elements and then just refresh the generated subschema. The refresh process must call a XQuery that calls web services that analyzes the dependency graph and updates new data elements imported into the XML Schema.

What this does is helps us get to the goal of updating the a model quickly and generating new artifacts directly from the model. This is part of the overall model-driven development process. The artifacts we create include XML Schemas, instance documents generated from the XML Schemas and XForms applications to view and edit the documents. This faster turn-around time allows our users to quickly see if their definitions and enumerated values are being precisely captured and used to create actual systems.

No comments: