Dan McCreary's Blog: 2007

Monday, December 17, 2007

And the Winner is: eXist-db.org My Award for the Most Innovative Product of 2007

As the year starts to come to a close we often take some time to look back at the year and look at the innovations that we have seen that have changed our worldview. I can think of many things that have had a large impact on me: The FireFox XForms extension, discovering the beauty of a well-designed REST interface, the mule enterprise service bus, Yahoo pipes and microformats all come to mind.

But after careful consideration I have to hand my award out to the most innovative application of the year to me to the eXist database/web server. This application constantly amazes me with what I can do with it. If you have a Java JVM on your desktop it only take a few minutes to get it running. Hit the startup script and you have a web server running on localhost:8080. Open a WebDAV browser and drag-and-drop you files and folders and away you go. With eXist I can easily setup a full enterprise metadata registry on a laptop in under five minutes.

The power of eXist comes from its use of the XQuery engine. Any well-formed XML file that is added to the system gets instantly indexed and can instantly be searched. That means that 10ms after you hit the "Save" on your XForms you can see the data appear in XQuery reports.

But the "Secret Sauce" of eXist rests in it's use of a lightweight Jetty web server with a remarkably integrated REST and WebDAV interface. This means that every XML file…is a static web service. This means that every XQuery that returns XML…is a dynamic web service that you can parameterize. This means that you can be writing your first web services in fifteen minutes that can be grabbing data from a dozen different XML files and quickly serializing the results out the wire. Enterprise mash-ups at your fingertips. Very little programming is done other then selecting data. Just like it should be.

eXist technologies are going to start to have a big impact once XForms/REST and XQuery (XRX) web development matures. I predict this is going to happen in the next three years. An although IBM, Microsoft, Oracle and many others are supporting XQuery in a big way, they don't yet have the really smooth integration with next-generation XForms-driven clients. You still need teams of AJAX, JavaScript, Java, .Net and SQL programmers to build simple web services for rich client interfaces.

I must admit it took me a little while to really understand how the simplicity of the interface really rocked my world. Just swap out the world "rest" in the URL and replace it with the word "webdav" and you go from a data browser to a file system that works with every copy/cut and paste operation of the file system.

It is also interesting to note that this product did not come out of a Bay Areas startup, it didn't come out of Google Labs, it didn't come from Microsoft or IBM. From what I can tell the initial version was pretty much written by a single guy in Germany, Wolfgang Meier. It think this shows that there is still room for true innovation by a single individual in this world. Although eXist now appears to have a great team of people behind it, it shows that a single person with a clear architectural can put the right pieces together really can make a difference in the world. Our teams could not have been as productive with XForms if we didn't have a clean and elegant tool like eXist behind our forms. I know that XRX innovators will no be able to stand on Wolfgang's shoulders and build incredible applications.

Thanks Wolfgang! Your labor of love is growing up!

Friday, December 14, 2007

Introducing the XRX Architecture: XForms/REST/XQuery

At the XML 2007 conference there were many people that seemed to have independently discovered that if you combine XForms/REST and XQuery you can create a software development environment that circumvents the need for middle tier objects and conversion to and from relational databases.

The conference also had a great deal of discussion of how XForms is a great architecture but as Elliot Rusty Harold pointed out, we need something to happen for XForms to replace things like AJAX. Although Rusty did propose a few interesting ideas (like getting XForms built into FireFox) I feel what we really need is a contagious meme. And I think it goes beyond XForms. What we need is an easy-to-remember and easy-to-communicate name for a collection of great ideas. For example there are few books that have the word XMLHttpRequest in it. But there are hundreds of books with AJAX in the title. Once the word AJAX was coined the meme started to spread. The world was waiting for the "sticky" label behind the concepts.

What XForms needs is a label around the collection of ideas that makes XForms really give you an order-of-magnitude improvement over other web development architectures. I would like to suggest that we get behind one "label" that brings many of use under one big tent. I would like to suggest we use "XRX".

XRX to me would stand for the ideas behind XForms/REST/XQuery web applications. But if people are using XHTML as a container for XForms you could use XHTML/REST/XQuery. Just spare use the "J" which is associated with a hard-to-use language for über geeks.

You could also use XRX for the format of the data in the client and on the server. In this case it would be XML on the client (in the XForms Model), and in a native XML database that supports a REST interface.

We can still build awesome tools and frameworks that make the XRX architecture really fly. IBM Workplace Forms Development tools, Orbeon Presentation Server, Kurt Cagle's new frameworks that use the eXist native XML database, and the hundreds of other XForms development tools all fit into the XRX strategy.

But if we make XRX into a "big tent" strategy we can also include XQuery on the server as part of the architecture. This would include people like Jason Hunter at MarkLogic. Jason is an experienced Java developer that has started to move away from Java on the server and toward XQuery because of its richness. And I could not agree more.

I have been reading the book The Tipping Point: How Little Things Can Make a Big Difference by Malcolm Gladwell and trying to apply the ideas in it to the adoption of XForms and XQuery. I think that the book has many example of how a careful packaging of the ideas behind a contagious meme can make ideas spread like widefire.

So when we hear the term "XRX" we want to have the following concepts come to mind:

A web development architecture with a 10x productivity improvement over traditional JavaScript/OO/RDBMS methods
A development architecture based on international standards that is designed to minimize the probability of vendor-lockin
An architecture that gives a rich user experience without creating mountains of spaghetti procedural code on either the client or the server
A system that leverages the REST architecture to take advantage of high-performance and simple interfaces using web standards
Portability on both the client and the server using a variety of forms players and XQuery databases
The option of avoiding costly shredding (and reconstitution) of complex XML documents into RDBMS tables
A community of standards/tools and a "complete solution" ecosystem that can give you a proven ROI on your IT investment

I don't expect everyone to buy into the XRX meme ideas. Everyone has their own favorite tools/standards and acronyms. But everyone that was at the XML 2007 conference could feel the excitement in the room at the XML presentations. Now we need to give that excitement a name and see if it sticks. Hopefully a few years we will be able to walk into a bookstore and see a bunch of books with "XRX" on the cover.

What I now need is for people to start to test the XRX meme and see if it slicks. Let me know what works and what does work. Send me your 30-second elevator pitch. How big should the idea tent be? What do we have to do to get more people in our tent? What evidence (case studies), example programs and resource do we need to convince the world we have seen the light …and its so elegant and beauty to will keep us from ever going back to the stone age of web development.

Wednesday, December 12, 2007

Dynamic Graphs for K-12 Educators

At the recent XML 2007 conference in Boston several of us met to discuss what we could do to promote FireFox and some of the important standards built into FireFox (or almost built into FireFox). Specifically several of us were using XForms and SVG and thought that the combination offered some great opportunities.

It was pointed out that the combination of the XForms range control (a slider control) and SVG allowed developers to create dynamic graphs for teaching educational concepts. There is an example here:

http://en.wikibooks.org/wiki/XForms/Pie_Chart

Note there is a link to the actual program here which you will need to use the FireFox XForms extension to view.:

http://xforms-examples.googlecode.com/svn/trunk/10-full-examples/86-piechart-range-controls/pie-chart.xhtml

So several of us are interested in putting together a grant to develop SmartBoard enabled educational content for the K-12 market using a combination of FireFox, XForms and SVG. The goal would be to create a "Killer App" for FireFox in the K-12 space. Ideally we will build 100 demos of some of the most useful dynamic graphs and then build a framework for teachers to extend these applications.

I also have a few Supply and Demand examples that run under IE but require the SVG add-on and just just plain JavaScript for the range-controls: http://www.danmccreary.com/svg/ You will note that visualizing the maximum profit by the area of the square under the triangle is much easier when you add the motion.

It was also pointed out that the Mozilla foundation has the funds to support this type of development and that they might fund a grant to build the top 100 useful dynamic graphs. We could then start building a toolkit to extend these dynamic graphs using some of the advanced drag-and-drop connection-based programming parts of SVG and FireFox. The kit should shield the average middle school physics teacher from haveing to write and debug JavaScript.

Let me know what you think - Dan

Saturday, December 08, 2007

Impressions of XML 2007 in Boston

I returned from the XML 2007 conference in Boston with my head spinning about all the new developments with XForms and XQuery. The conference was three days long and had between three and five tracks going concurrently. My impression was that there were about 400 people there.

Here are some of the highlights for me.

XForms appears to be getting lots of traction. The XForms sessions were all packed with standing-room only in most of them. The presenters we passionate that the combination of XForms, REST and XQuery has had a huge impact on their projects and may soon impact the rest of the development world.

Microformats are definitely starting to make huge strides where Metcalf's Law will start to kick in. The existence of 450 million tags and five FireFox microformat extensions indicate that within a year it will be just part of web publishing best practices. Even Microsoft is said to be supporting Microformats in IE 8, just a few years from now.

Taylor Cowan (who works for Sabre) gave an excellent presentation on how Microformats can be extended into the travel space with an Atom based format for publishing trip ideas. His use of Atom and semantic web technologies was right in line with my vision of how personalized agents are going to leverage microformats.

John Boyer's demo of IBM's Workplace Manager showed that you can now purchase top-shelf XForms GUI design tools that does not lock you into a proprietary vendor solution. I hope the people from Microsoft, Adobe, FairIsaac, Altova, Google GWT and other vendors take notice. Locking your users into a Java or JavaScript client is no longer an acceptable strategic option for developing web applications. Wake up everyone! Are you listening? We have a standard that works and lets now start building more great tools that use this standard. XForms is the only system that is creating as John puts it "Order of Magnitude" increases in developer productivity.

Mark Birbeck's Sidewinder demo showed that XForms is breaking out of the browser an on to the desktop. Showing web and iPhone web pages directly in desktop widgets was a great demonstration of the power of XForms.

Kurt Cagle's presentation about problems with programming with the fragile DOM model in JavaScript was interesting in its format. The presentation was actually an XForms application running on eXist. Kurt "eats his own dog food" on the client and the server! I am humbled by the man.

Jason Hunter from MarkLogic (an XML database vendor) gave an nice demo of the MarkMail.org application and showed that, yes, XQuery databases really do scale to the terabyte range. He has about 4 million e-mail messages indexed and you can do combinations of database and text searches in a few seconds. In other words a demonstration of how real-world analytics comes to XQuery. I also found out that MarkLogic is free if your database is under 100MB. Perfect for 10MB of metadata in a metadata registry.

The people from IBM research labs appear to be starting to take the XForms MVC architecture and really run with it. We saw some very interesting presentations on how future standards for data driven XML Application Components (XACs) and State Chart XML (SCXML) might be used in the future to make browser-based mash-ups easier. Componentization, composition, customization and reuse are all part of this architecture and it would not be possible without XForms MVC architecture. Hopefully Charles Weicha from IBM research will post his slides so you can see more.

Norm Walsh also talked about how the new XML Pipeline standard (XProc) is starting to take shape. Even though the standard is still in working draft there are already at least five implementations of the standard. This single standard could have more impact on the shape of server-side processing then any other standard coming out of the w3c. Integrating this into XQuery-based application servers seems to be the next step.

The eXist database also seemed to clean up all the open source native XML database talks. eXist was mentioned in over a half-dozen presentations and there were no other native XML databases even mentioned in the presentations I attended. It is not surprising that eXist is now the top hit on Google when you search for "XML database".

One of the most interesting presentations that I did not get a chance to see (but I did talk to the presenter afterwards) was Thomas White's presentation on using XSLT in the browser to get a 10x speedup in form rendering. It turns out that implementations of XForms players like FromFaces use heavyweight JavaScript objects in the browser. Thomas has found that a donkey cart can be turned into a sports car by implementing an in-browser XSLT centric event scheduler. If his architecture is implemented we could see vastly better XForms players in the future. Lets all hope that Thomas's architecture comes to the XForms world sooner than later.

I did not attend any of the XML publishing tracks. But from the titles it appears that the Darwin Information Typing Architecture (DITA) standard is a hot topic. Presentations on using XSL-FO and CSS for book and other print publishing were also on the agenda.

I also did not attend too many of the XML in the Enterprise tracks. There were good presentations on XML accelerators, security and mining XML Schemas.

I am disappointed that the conference organizers did not do anything to require that people post their presentation on the web site before their talk. There were many very interesting presentations that could have a larger impact the world and the sloppiness of people not publishing their slides seems to be a little unprofessional. There were some people that just brought their digital cameras and took pictures of each slide. Smart.

In summary I think that the conference had two types of people:

First there were people that had not done extensive work with XForms. The first group I call the "evolutionary proceduralists". Theses were people that were trying to beat the dead horse by patching the old-style procedural web application architecture with JavaScript, JSON and AJAX hacks. They seemed to spend a lot of time arguing about the merits of JSON vs. XML as a data transport syntax.

The other type were "the enlightened innovators". People that have seen the real world benefits of dumping object middle-tier stacks and relational databases and going with a pure declarative approach to solving business problems based around XForms/REST and XQuery. These were the people that are going to lead innovations in application development.

I was also impressed how a large percentage of XML innovations came from outside the US. I attribute the fact that some of the best presentations came from the UK to be the "Michael Kay" effect.

Beauty is in the Eye of the Beholder At one of the XForms sessions a young Ruby advocate stated that he thought his Ruby code was "beautiful" but he did not himself think that XForms code was "beautiful". Most of the people in the audience agreed that XForms was beautiful but like any new language, it takes getting used to. Beauty is the process of your visual cortex recognizing familiar patterns and signaling your pleasure centers. If you have never read Japanese then the most profound Haiku written in the most beautiful script will not have beauty.

XForms/REST/XQuery to me is indeed beautiful. Its beauty comes from its ability to quickly map real-world requirements into working systems with high fidelity. No other system in the world approaches its elegant architecture. And after XML 2007 I realized I was not the only one that had independently reached this conclusion.

Thursday, November 15, 2007

Full text Search Standards for XQuery

I have been using the eXist (http:www.exist-db.org) native XML database for almost a year now and I am constantly impressed by the power and depth of the eXist community. XQuery is a great way to search XML documents and most of my XSLTs that manage metadata are now being transferred to XQuery. Although there are still some places that XSLTs are superior, XQuery is much easier to use when I am building RESTful web services for metadata. My XForms are frequently generated directly from XQuery programs.

It is interesting to note that eXist is now dominating all the Google search results for "XML database". Leading the pack and pulling away!

From my days doing IT strategy I recall a great deal of analysis done by Adele Goldberg and Kenneth S. Rubin and documented in their excellent book Succeeding With Objects: Decision Frameworks for Project Management. This is an excellent book based on over 50 case studies on reuse and top-notch analysis of the role of trust in any high-reuse culture.

One of the key points in the Goldberg/Rubin analysis was the factors that cause people to be able to reuse assets. In their case the assists were programming objects. I my case I am concerned with the reuse of data objects. From simple data types that are part of the XML Schema standards to registered data elements in a metadata registry. Registered data elements have an approval process and have semantics that span a group, project, program or enterprise.

But the pattern that reoccurs is that people need to be able to quickly find assets before they can reuse them. One of my main points in my work is that you can not reuse what you can not find. The corollary is that the longer it takes to find an asset the higher the temptation is to recreate the asset. That is where the search functions of a metadata registry needs to come in. It must be fast and accurate.

The quality of a metadata registry also depends on strategies about informing users when two data elements have the same semantics and are potentially duplicates. To find this we need semantic nearness searches for data element definitions.

Today I am providing simple substring searches. But I know that text mining can provide semantic nearness searches. I was glad to see that the world-wide web is working on full-text search standards for XQuery. The w3c http://www.w3.org/TR/xquery-full-text/standard has now reached the working draft last-call stage and I hope that we soon start to see implementations that take advantage of these features. I know that Google's support of this project for eXist (see http://exist.sourceforge.net/soc-ft.html) is a very good sign.

I was also fortunate to work with Gary Berosik on the Advocate Agents project. Gary works for Thomson/West Publishing that has a large group of people doing text mining. Gary introduced my to the Text Mining Application Programming book by Manu Konchandy. This book has excellent sections on the decomposition of free-form text into parts of speech for information extraction, clustering, categorization and search.

I hope that eventually the w3c will approve the full text extensions to XQuery and that they will make it into the eXist system. Based on this link http://exist.sourceforge.net/soc-ft.html I am hoping that it will be in the next few years.

Thursday, October 18, 2007

When XML Schemas Imports Become XQueries

For the last five years I have been using a XML Schema design process that I learned from the people at Georgia Tech Research Institute as part of my work with GJXDM and the NIEM. This work has strongly been influenced by Jim Heldler who co-wrote the Semantic Web article with Tim Berners-Lee. Although GTRI wanted to use RDF (graphs) to represent the exchange data they had to settle on using XML and XML Schemas since those technologies were widely used by most criminal justice agencies already. They compromised by still using a striped model within the XML Schema driven structures.

Creating an exchange documents starts with creating a metadata registry of all the data elements used in a family of XML Schemas. A shopping-cart-like tool is used to select a subset of the data elements. These data elements form a "wantlist" of data elements. The checkout process creates a zip file that when uncompressed creates a set of XML Schema file that you import into your constraint schema.

This process is an order-of-magnitude to use than before the subschema generation tools were created by GTRI. The subschema generated by the checkout process not only included the XML Schema type definitions but also all the data types that they depended on. The metadata registry had an internal dependency list that it maintained and so if you needed just ten data elements but they depended on then other data types you got all 20. you didn't have to manually figure out what additional data types to include.

I adopted the GTRI shopping-cart process to include adding a "subscribers list" to each of my data elements used in local metadata registries and local XML Schema creation. I used a large, complex and brittle ant script to update all the imported files for each XML Schema we were developing. But this could not easily be done during a requirements gathering meeting.

In the past we have viewed this subschema generation as a process of interacting with a web application that creates a set of files. One of our first insights is that this process is in fact just a transformation of the metadata registry based on the rules created by a wantlist and the dependency graph. It does not have to be a manual "batch" process.

In a recent blog (Metadata Web Services) I pointed out that tools like eXist and XForms allow us to store metadata in XML format and allow it to be easily updated by BAs and SMEs and queried using languages like XQuery. One of the great realizations I had on my last project was that XQuery has a simple construct that allows you to "chain" web services. The argument to the doc() function is just a URL. If that URL points to a web service that returns XML you can easily build composite web services. Web services that are transformations can be chained together. This allows you to start using enterprise integration patterns to build reusable services at a very fine grain.

Putting these facts together it has occurred to me that the ultimate goal of the metadata registry process is to allows new data element definitions and enumerated values to be changed during a XML Schema design process. We should be able to change a definition, hit the "save" on the XForms application that edits data elements and then just refresh the generated subschema. The refresh process must call a XQuery that calls web services that analyzes the dependency graph and updates new data elements imported into the XML Schema.

What this does is helps us get to the goal of updating the a model quickly and generating new artifacts directly from the model. This is part of the overall model-driven development process. The artifacts we create include XML Schemas, instance documents generated from the XML Schemas and XForms applications to view and edit the documents. This faster turn-around time allows our users to quickly see if their definitions and enumerated values are being precisely captured and used to create actual systems.

Wednesday, October 10, 2007

XForms Tutorial and Cookbook Being Translated to French and Used in China

I got a nice notes from people both in France and China that our XForms Tutorial and Cookbook are now being used in both France and China.

The French version is being hosted on a version of the MediaWiki server that also has a new macro for code syntax highlighting.

The URL for that version is here: http://xforms.free-web-hosting.biz/mediawiki-1.9.3/index.php/Wikilivre I have not figured out if this will work in the Wikibooks yet but I have my doubts.

Hints of the Chinese version were posted by a zhouliyi from ZheJiang China. These were posted in the Cookbook Guest Registry: http://en.wikibooks.org/wiki/XForms/Guest_Registry

Please let me know if anyone else is using XForms so we can maintain a list of users.

Metadata Web Services

I have been using the eXist native XML database/web server for about nine months now and it is starting to change the way I think about metadata management.

My latest project for a financial institution requires us to quickly build XForms to manage various metadata as well as data. What I am finding is that my old method of storing metadata in XML files on a file system and then transforming the metadata using Apache Ant was a complex process. My new approach is to store the XML directly in eXist. This used to be a little bit hard since I thought that you had to use the eXist web interface to upload each XML file and ant scripts to backup your eXist database.

This all changed when I was shown how to use the Microsoft Windows WebDAV tool. Copying files to eXist and backing up the entire data store is just a drag and drop using Windows.

Now an entire new set of metadata web services are becoming much easier to build. Take the simple task of building a pick list of enumerate values for a form. XForms allows you to use the select1 control and specify an itemset using an XPath expression. I can now just load the data elements into an instance and grab the values and labels directly from the enumerations in the metadata registry files.

The only drawback to this is the fact that you load more metadata (like the full definitions) then you need in building the form. But once again eXist comes to the rescue. It is just a few lines of code to create a little web service (using XQuery) that you pass a code table to that returns just the label/value pairs. Using this method the selection list are always up to date and don't require any "batch" updates.

What I am learning from this is that in the past, metadata management was usually an after thought. Something that the coding standards people used to enforce database column naming conventions. But with metadata being stored in eXist metadata becomes part of application services. Building apps is just assembling forms that pull metadata from the registry in real time.

If you are concerned that the metadata registry server will be overloaded with requests for information each time forms load, we should remember that these services are RESTful. The results can also be cached so they don't have to be regenerated. I still have more to learn about how to make these services fast but since metadata is small it can usually always exist in RAM and disk I/O is very limited.

All of these developments are just small pieces of the puzzle at putting well-managed metadata at the core of your enterprise development methodologies. It is really the heart of the model-driven enterprise.

Let me know if you are creating metadata web services. I would like to know what things you feel are useful to your users.

- Dan

Tuesday, May 29, 2007

Impressions of Sem-Tech -07

I just returned from the 2007 Semantic Technology Conference in San Jose California. It was a great conference and opened my mind to several new ideas. Well worth the time!

The conference was held over four days and had over around 125 presentations including tutorials and research projects. There were almost 800 attendees. This is the third semantic technology conference that I attended and the second that I presented a paper at.

Here are some high-level observations and some patterns that I detected.

The Semantic Web gets the “Web 3.0” Label

Most people at the conference have tried to embrace the idea that the semantic web will be adopting the popular culture label “Web 3.0”. The final straw was the Nov 2006 NYT article by John Markoff which set the blogosphere is a buzz. This was a web that includes technologies to enable intelligent reuse of data. Wikipedia, after a long pro-active discussion about if “Web 3.0” deserved an entry, finally undeleted the page and let is stand. See http://en.wikipedia.org/wiki/Web_3.0 and check out the discussion page on the article for more.

From If to When to How

Eric Miller (now at Zepheira) calls some of the new technologies “recombinant data. Eric spent about five years with the w3c. He is perhaps the most well-connected person in the world with accurate knowledge of who is using semantic web technologies to solve real business problems today. His observation was that three years ago at the first semantic technology we were wondering if the semantic web would take off. Last year the speculation was when the semantic web technologies would start to become common place. This year the focus was debates on how the semantic web should be implemented.

Venture Capitalists are Becoming Educated on the Semantic Web

One example of this is the fact that last year, most semantic-web startup companies had to carefully explain what the semantic web was to venture capital companies when trying to get their initial rounds of funding. This year, many venture capital companies not-only had some level of understanding of the semantic web but were asking each of their potential startups how their technologies fit into the semantic web. Other companies that did not have a semantic-web focus were not coming to the conference to get educated.

Some of the companies that got VC last year have already been purchased and absorbed by larger firms. These companies were replaced by new venture-funded companies.

Consensus on the RDF/SPARQL Foundation

One of the first things that struck me was the consistent use of RDF and triple-stored to solve many hard problems. The use of RDF and SPARQL seemed to be the primary distinguishing factor about if people thought you were really using semantic web technologies or not. If you we not using RDF, you were not really in the club, just and outsider looking in.

OWL Fragmentation

Another thing that also surprised me was the discord about the use of OWL and its relative sub-functions. Central to this was the rise of use-cases of simple things examples of things that OWL could not do. Much of this centered around DLP (Data Log Programming).

A good example of a non-OWL solution was the use of SKOS to store things typically stored in a metadata registry. SKOS is a great example of a simple standard, built on top of RDF that attempts to solve common problems without getting overly complex. See SKOS in Wikipedia.

We are also starting to see that rules must also be exchanged between systems in semantically precise ways. The need for a Rule Interchange Format (RIF). Nice to see vendors like FairIssac supporting complex business rules running INSIDE the web browser using XForms. They rock!

REST is Deep

Perhaps my favorite presentation was given by David Wood and Brian Sletten from Zepheria. In this presentation, David and Brian gave a demo of the NetKernel system. They demonstrated how NetKernel embraces REST at a much deeper level then I had previously anticipated. Now they were not yet generating XForms from an XML Schema but it showed a great example of convergent evolution of my ideas and theirs.

Case Studies

This year also started to show examples of new startup companies actually using semantic web technologies to differentiate themselves in the marketplace. Although because they are all trying to differentiate themselves, much of the actual technologies they were using was not disclosed.

RDF Taggers, Harvesters, Linkers and Analyzers

The conference seemed to have three sets of problems that everyone agreed on. First was how do you harvest RDF from a web page or any other resource. Most of these presentations related to getting RDF out of un-structured and structured data. Lots of discussion of microformats (pros and cons).

iReader Rocks

The coolest demo I saw was the iReader demo from http://www.syntactica.com. This is an awesome FireFox (and IE) extension that does concept mapping from unstructured text. The people behind this have been doing research on linguistics for about 40 years and have only recently got a round of venture capital to start to publicize this tool. But they are not yet converting to RDF for storage other systems.

URL Design Patterns

One common themes that came up was the need for best practices about good URL design. Everyone said a few things: The design of URLs is very important, most people screw it up the first time and then have to redo the designs, there are not a lot of good documents out there and the ones that are available are at least five years old. The people from the w3c did take notes on this.

Reception of My Paper

My presentation was titled “The Semantics of Declarative Systems”. This talk covered how by using a set of small languages with precise semantics, you can build entire applications that allow non-programmers to draw pictures of their business requirements and generate working Apps. The purpose of this talk was really to test the metaphors on how I could explain these concepts to non-computer scientists. I got some positive feedback and was happy to see convergent evolution of our designs with other organizations.

The URL to the paper is here:

http://www.danmccreary.com/presentations/sem-web-07

The Semantics of Declarative Systems

I put my presentation that I did for the 2007 Semantic Technology Conference on my web site. The link is here:

http://www.danmccreary.com/presentations/sem-web-07

Please give me feedback on both the PowerPoint presentation (note the builds and notes pages) as well as the paper.

I am specifically interested if the metaphors worked for a non-technical audience.

- Dan

Saturday, April 28, 2007

Declarative Systems and the Cambrian Explosion

I have been reading Carl Zimmer's book Evolution, The Triumph of an Idea and I have been startled about the similarities between the the Cambrian Explosion and the growth of declarative languages.

Just to recap (if you have not read the Wikipedia article yet) here is a quick summary. For the first 3.5 billions years, life on earth was mostly single celled animals. Then about 550 million years ago, the number of new multicellular complex-body-plan animals exploded. There are two salient reasons. First, we needed oxygen to breath. Before there was oxygen in the atmosphere respiration was not easy. Oxygen was a foundation for complex body plans on land. Second, life needed a genetic "tool kit" to build complex animal shapes. We needed the HOX genes for building complex body plans. After that happened, complex life forms did explode.

I am seeing the same thing appear in declarative languages. Declarative languages need foundations also. We needed ways for people to quickly create new languages (e.g. GUI XML Schema editors like XMLSpy) and ways to build consensus around the semantics of these new languages (wiki's, social networks, folkonomies and voting).

And next we need the declarative systems tool kits. Right now we have started to develop tools to allow XML Schemas to be quickly transformed into GUI XForms editors and we have XML databases that eliminated the need for complex middle tiers. We can put many of these tools in the hands of non-programmers and allow them to create domain-specific vocabularies and rule-based systems with GUI editors.

And when those tools mature we will see an explosion of declarative languages similar to the explosion of life forms 550 million years ago.

So what is your prediction? 1 year, 3 years, 5 years or 10 years from now? I think a small group of people could build this toolkit without millions of dollars of venture capital. There may even be open-source tools that do this in a few years. But I think the key is to combine the best of metadata registries, XML Schemas, transforms, XForms, XQuery and XML databases into a holistic declarative system.

And from here the many NEW ideas will evolve.

Saturday, April 14, 2007

XForms and the Chasm: What XForms Need to Become a Mainstream Technology

Geoff Moore wrote a very insightful book on technology adoption called “Crossing the Chasm”. If you study the psychology of technology adoption, you find that people and organization fall in a bell curve. About 2% are innovators, 14% are early adopters, 33% early majority, 33% late majority and 15% are laggards. This is all standard stuff out of basic marketing-psychology textbooks. Each group has different reasons for using (or not using) a specific product.

Moore's insight was that many new products never move from early adopters to early majority. He called this region “The Chasm”. I have found that the same rules applyto new standards and new languages.

Moore did a lot of analysis on why some new technologies don’t make it across the chasm despite their merits or why some standards are adopted over others. Hisbiggest finding is that they were just too hard to use by non-techies. They required too much work to setup, install or learn. Ease-of-use was a principal factor in many technologies not being quickly adopted by organizations that did not want to use technology to differentiate their organization. Early Majority customers also tend to purchase new technology not based on how they work but who is using them in their industry. They purchase on references.

New technologies usually need a focused approach in a specific industry to be adopted by an early majority. If an organization can target a specific vertical industry and provide a more complete solution for that industry, once a few reference accounts are set up the rest of the industry can see there are proven case studies.

Another factor is the need of the early majority for complete solutions. Innovators and Early adopters are willing to glue together new technologies into their existing fabric of solutions. They dedicate time, staff and money to do this. If their XForms sends XML data over REST interfaces and their databases use JDBC they hire programmers to write middle-tier applications.

But early majority buyers can’t justify that additional cost. They just want a solution to work out-of-the-box. They are OK paying for a week for training for their BAs, but they don’t want to write and maintain complex applications.

Another thing that I have been searching for are metaphors that are understandable by non-programmers. Since the high-level decision makers are usually non-technical their ability to make a decision is usually based on their trust of the developers or their understanding based on metaphors.

The Language Translation Metaphor I find that if you can send your data from an XForms application directly to a native XML database in XML format that there is no need for middle-tier objects or shredding into relational databases. No translation is involved. One of the ways to demonstrate how machine-based language translation is not always semantically precise is to try an automatic translation program. Convert a statement from a native language into a foreign language and then translate it back. For example try this using Google Language Tools at http://www.google.com/language_tools

Original Phrase in English: The quick brown fox jumped over the lazy dog.

Google Translation to Spanish El zorro marrón rápido saltó sobre el perro perezoso.

Spanish Translation Back to English The fast brown fox jumped on the sluggish dog.

Although Google translate does a pretty good job, quick was changed to fast, over was changed to on, and lazy was changed to sluggish. If you study semantics, you can find lots of problems trying to communicate with close approximations. Note the original implicit business rule of don’t repeat a letter was also lost. We have many implicit business rules in complex systems and translation of these to other formats is always problematic for software developers.

Saving XML using an object-middle tier is one translation. Saving objects to a relational database is a second translation. Fetching data from a database using SQL is a third translation. Converting tabular data into objects is a fourth translation and converting objects back to XML is a fifth transformation. There are five translations, not just two.

Not that XForms solutions today are not without translation. In our process we are planning to convert XML Schemas into XForms as well as initial instance data. We need to also import leaf-level definitions from a metadata registry. But the nice thing about this process is that we can validate the submitted data directly against the original XML Schema that was used to generate the XForms. An elegant self-checking pattern.

All of these transforms are possible. But all have to be maintained each time you change your XML Schema. Right now we also need great tools that translate XML Schemas directly into XForms. And these need to retain the fidelity of the original intent of the subject matter experts.

So perhaps deploying applications with native XML databases is a critical part of the XForms value proposition.

Moore also noted that both early late majority buyers would be more willing to use a technology if it was buried deeply in a complete solution. That way the IT managers don’t have to sell the technology to management, just the solution. Similarly, once third party developers start building accounting or complete solutions for vertical industry using XForms the buyer will not need to know or care how the solution is being delivered.

Friday, April 06, 2007

Metaphors for Declarative Systems

I am working on my presentation for the 2007 semantic technology conference. My topic is the semantics of declarative languages.

I am attempting to present to business strategists that may not appreciate how powerful the new XForms and XQuery standards are and how entire rich user experience web 2.0 applications can be built using these tools as long as the tools have RESTful interfaces.

My first consideration is to find the right metaphors. I am working with the evolution metaphor and the puzzle metaphor right now and it seems to be working well.

I have also done some research on Domain Specific Languages and I see that they have also used the evolution metaphor but I find that they are focused on building small languages that are specific to a group or project and they don't really look at the external semantics of the problem.

Please write me if you are also trying to explain the business benefits of either declarative languages to a non-technical audience and if you have found metaphors that get the key points across.

Thanks! - Dan

Monday, February 26, 2007

Doing Zero Fill with XSLT 2.0

I was scratching my head, trying to find a simple way to do a zero fill padding with XSLT and I could not find it in my Michael Kay XPath 2.0 book. It turns out he put in in the XSLT 2.0 book. I am not sure how the index missed it. Anyway, searching for "Zero Fill XPath" turned up nothing on Google. Here is how it works. Just use the following:

format-number(number(.), '00000').

Here is an example of the input and output:

Input	Output
1	00001
12	00012
123	00123
1234	01234
12345	12345

Hope this helps. - Dan

Thursday, February 22, 2007

The Penguin, The Fox, The Weasel and the Elephant

Once upon a time there were four animals that lived in a magical “Land of Bew”: The Penguin, the Fox, the Weasel and the Elephant.

Each animal had their own way of looking at the world. Penguin and Fox loved to share their ideas.

Weasel also had lots of interesting ideas but he was afraid that if he shared too many ideas his owners would be worried.

Elephant always seemed to be around everywhere but didn’t pick up new ideas very fast.

In the Land of Bew most of the animals got along pretty well with each other. Fox played Penguin’s house and Fox played at Weasel’s house.

Unfortunately the Elephant was not allowed to play at the Penguin’s house. It was not that Elephant was mean, its just that Elephant’s owners were somewhat controlling.

The Land of Bew was blessed with special magic books. These books that allowed the reader to press on special words and pictures in the books and suddenly new pages would magically appear. The children of Bew loved the books and always wanted more.

But some of the books were very hard to make. Especially if you wanted nice moving and dancing pictures that children could change.

To make dancing pictures you had to use special type of black magic that sometimes got the animals in trouble. Goblins would sometimes used the black magic and cause havoc in the Land of Bew. ( I would tell you more about the horrible things Goblins did with the black magic but this IS a children’s story.)

One day Penguin and Fox found a new way of making dancing books that animals could use in magical ways without using the dangerous black magic. Fox tried it out and it worked very well.

But when they suggested their ideas to Elephant, Elephant said no. Elephant was worried that even kids would be able to make the dancing books. And she worried that all her friends would stop playing with Weasel and spend all their time over at Penguins house.

And although the story is not over…this is where our story ends. The ending is being written by you.

Want to try to make some dancing books?

Try XForms today and see if you can make your web applications dance without using any nasty JavaScript! And try to keep those Goblins out. OK?

Thursday, February 15, 2007

The eXist OpenSource Native XML Database

After a suggestion from Kurt Cagle, we have been using the eXist OpenSource XML Database for a week now and I must say that it has exceeded our expectations.
It was a snap to download a set up. It was easy to get all the XQuery demos running and it was almost trivial to administer the little but annoying things like adding collection and users.
Unfortunately I did have to actual read the documentation to figure out how to get our XForms and XMLSpy to work with it. But once you know the secret, it is easy.
The big secret is that by just adding the words "rest" or "webdav" before the "db" in the URL you automatically and to HTTP GETs and HTTP PUTs directly to your data.
You can also actually edit the data directly in the eXist database DIRECTLY from within XMLSpy by just using the "Open URL" option of the XMLSpy editor and then just slip the "webdav" string into your URL path. Hopefully XMLSpy will allow you to use the XQuery editor in future versions of the database.
eXist had given our entire group a large boost of confidence. We can now build a Ruby-on-rails like environment just by drawing XML Schemas, generating XForms from the XML Schemas and storing the form data directly into eXist. A full application development stack without a single like of procedural Java or JavaScript!!!
I posted a short article on how to get XForms working with eXist in the XForms WikiBook
I have a FireFox T-shirt on order for Kurt and I am also buying beers at the first XForms conference...which I may have to organize.
Thanks Kurt! Your advise was very sound.

Computers Don't Create Models (Today) – People Do

I had several good comments about my posting about the increasing role of semantics in the "Era of Tera". I did not mention some of the key concepts that the Intel paper discussed and the impact these concepts will have on IT strategy. Here is an excerpt:

The problem is ordinary computers don't model things. Aside from supercomputers, today's computers aren't capable of developing mathematical models of complex objects, systems or processes. Nor are they powerful or fast enough to perform such tasks at speeds people demand. We can't plug in a statistical model for a rare malignant tumor or the behavioral pattern of a shoplifting employee and search for similar instances of the model in a data set. To benefit from the wealth of data building up in the world, we need to be able to communicate with computers in more abstract terms (high-level concepts or semantics). We need to speak in terms of models.

I believe that the core problem is that there will be a HUGE demand for highly-skilled data-modelers/ontologists in five years. But there will not be a large supply since this skill is not something that can be learned from a single class in college. Rather it is more of a tacit skill that is not easily codifiable.

That being said, there are many best practices that ARE codifiable. You can already purchase books on OWL/RDF and metadata registries, although most of them are mired in relational database models or written by academics with little real-world experience. You can read the Wikipedia articles on “Data Stewardship”. What is still needed is a single set of best practices and tools for creating families of machine-readable data models that can be use as a basis for creating exchange models. And we need to do this without having to learn how to represent all 12 types of UML diagrams in XMI and transform them. The NIEM subset generator is a good example of the solid XML-Schema-driven front-end of this process of selecting elements from a metadata registry and putting them in your shopping cart.

The bottom line is that this is really about empowerment. And unless organizations introduce semantic web technologies into their organizations at a grass-root level and support them at the CEO level, many organizations will be left behind in the Era of the Tera.

Tuesday, February 13, 2007

Era of the Tera

I read in the New York Times yesterday that Intel has produced a chip with 80 core processors giving it a total computing power of 1.3 teraflops. So the natural question is what could you use 80 CPUs for? The article referred to a 2005 Intel Paper titled Recognition, Mining and Synthesis Moves Computers to the Era of Tera by Pradeep Dubey. This article opened with the following quote: “The great strength of computers is that they can reliably manipulate vast amounts of data very quickly. Their great weakness is that they don’t have a clue as to what any of that data actually means.” Stephen Cass, “A Fountain of Knowledge,” IEEE Spectrum, January 2004. When I saw this quote, I realized even the hardware engineers agree semantics is now the critical factor limiting our ability to effectively use computers I realized that we must continue our mission to promote semantic mapping concepts. So let me summarize my feelings after reading this article:

If want to be able to leverage the power coming in the Era of the Tera, you must start with semantics – recoding the meaning of your data.

And if you don’t yet have a metadata registry…get one!

Right next to the CEO, CIO, and CFO in the board room should be your CDO (Chief Data Officer).