Dan McCreary's Blog: owl

Showing posts with label owl. Show all posts

Friday, July 02, 2010

Impressions of SemTech 2010

Two weeks ago I attended my fifth Semantic Technology conference in San Francisco. It was a great conference! This is my fifth time attending the conference and I plan to attend in the future! My biggest dilemma was which session to attend. At times there were as many as eight concurrent sessions going on.

There were a few big trends that I spotted.

The use of RDFa to annotate web pages using semantically precise elements was clearly a big trend. Jay Myer of BestBuy described how the BestBuy sales went up 30% when they added RDFa tags to their product pages. Although many search engines are not transparent about their use of RDFa tags in page rankings, Jay’s results should make it clear that this strategy works.

Jay told a great story about how hard it was to find a fridge that met his specific criteria: black with a specific size etc. using a keyword-based search engine. The semantic web will change all of this!

The big factor here is Martin Hepps GoodRelations ontology for products and services. This can really bring the Semantic Web to the masses. Finally a way to code the hours of operation for your store so that you can use Siri to ask "What Sushi Restrauants are Open at 10pm Near Here"

There were also a lot of sessions on LinkedData and specifically on LinkedData in the government area. Jim Heldler and his students at RPI are scraping every data set on data.gov and converting them all to RDF and storing them in a huge triple store.

Jim also provided one of the best sound-bites from the conference:

“Get AWAY from the Table”

This goes far beyond the NOSQL movement that we are seeing. It gets to the core of the problems with innovation in many organizations.

Jim is a consultant to the US data.gov transparency process and a key advocate of open linked data standards. It is interesting to see a friendly rivalry between Jim and Tim Berners-Lee who is a consultant for the UK data transparency movement. Both are using RDF to convert data but each is using slightly different strategic approaches.

He was referring to the fact that many organizations over-use the relational model and they need to understand that there are many alternatives, especially when doing mashups of data sets from many sources.

There were also many presentations on natural language processing and finding the true “meaning” of words within unstructured and semi-structured data sets.

I always attend Brian Sletton’s session on REST. Many people do not understand the relationship between REST, URI’s and the semantic web stack. Joe Wicentowski’s work at the US Department of State on the URL rewriting frameworks within eXist has really reinforced how this can be done easily in a single XQuery module.

I presented a 3.5 hour tutorial session on Entity Extraction. Since it was scheduled for the last session of the last day of the conference I though the attendance would be very low. But the room was packed and most of the people stayed till the very end.

It was wonderful to finally meet Marie Wallace and DJ McCloskey from the LanguageWare team at IBM. Their support of the Apache UIMA standards will be a great step forward to the creation of interoperable language analysis piplelines. Marie and DJ introduced me to the Millennium restaurant just a few blocks from the hotel. My wife Ann and I went there over the weekend and we now own one of their cookbooks.

By the way Google is now also support recipes in their rich snippets in their search results so you will soon be able to find "only recipes that take under 30 minutes" in a Google search engine.

There were also many session on the need for good taxonomies and ontologies in the enterprise. The use of SKOS for controlled vocabularies was a very hot topic and there were dozens of presentation on how OWL is being used to capture and exchange business rules.

It was also really great to finally meet Jeni Tennison after reading her books and following her Tweets for a long time. She is deeply involved in the federal data transparency projects in the UK. Here use of RDF is breaking new best practices for the entire community.

It was also great to catch up with Mark Birbeck, one of the chief architects of the Ubiquity XForms libraries. I am looking forward to trying out some of the new tools based on his backplane JavaScript libraries.

The people from Facebook also gave a presentation on how they are adding the ability for people to add a few lines to each web site to allow users to add a “like” button on a web site using a variation of RDF. Many people were a little disappointed to hear that they will not be using namespaces in their interfaces. Facebook felt that host HTML coders could only handle a single namespace. Many people agreed with their findings and thought that until 90% of HTML coders knew what namespaces were that organizations like Facebook were stuck just adding new data to HTML meta elements.

One of the most interesting discussions I had was with the people from Cray Computer. Apparently a large unnamed government agency has given Cray Research a very large contract to build customized ASIC (FPGA) chips to do graph analysis. Their ThreadStorm XMT architecture has 128 register sets and allows the CPUs to get continuous feeds of graph queries without waiting for memory. Their claim is that the federal agency has told them they are getting a 100X improvement in complex graph queries. The challenge is that the API is currently a low-level C interface and they have not yet put a SPARQL complier in front of an XML. So on good SPARQL benchmarks are yet available to compare the results of this hardware with a typical triple store. But even if it is fast, the price would be in the six digits for one of the XMP systems. So only very large organizations and government agencies might use this unless it was provided as a service.

One of the other things I found was how many people are using OWL and the OWL reasonerers like Pellet as replacements for traditional rules engines from companies like FairIsaac. There are two reasons for this. First is that many rules engine companies only talk about their engine performance but not on the need for precise semantics in the data elements. With OWL and the methods behind the semantic web stack we attempt to put semantics higher in the requirements of a system a use pre-built and pretested ontologies.

The second big reason that people are use OWL is that the rules are created using open standards. That means you are not locked into any one vendors rule format. The new W3C standards for rule interchange (RIF) will also start to break down many of the barriers to the creation and exchange of large industry vertical rule sets to that each organization does not have to start from scratch with a rule base. This will be big for industries like insurance where claims processing ontologies are just being created. Thanks to Kendal Clark for good information on this topic.

I also had a great time talking to many people in the publishing area. Seth Maislin gave a great presentation on Taxonomies. Seth and Marlene Rockmore joined me for a lunch at nearby Indian restaurant. Marlene is an expert on Taxonomy development. She was working on a book with O’Relly on taxonomies that is on hold for now but we hope to see one in the future from her.

It was also nice to see how many federal agencies are now starting to use SPARQL in the intelligence community. I saw a good presentation by Dennis Wisnosky with the US DOD talk about how they plan to use NIEM and semantic technologies to cut their integration costs. Yeah NIEM! Most people don’t understand that the NEIM is based on the RDF model, it just uses XML syntax to store the relationships. See my web-cast on the SemanticUniverse.com web site for more details. Hopefully his slides will be available in the future. Many people were taking photos of the slides since the DOD is not great at getting their slides out. Wonder why?

I also had lunch with several people from within the intelligence community that had great discussions about the pros and cost of pulling assertions out of NLP-analyzed annotated XML documents and the challenges relating to this. Keeping the links back to the original XML documents to is critical to verify the results of an assertion in context. Traceability, linage, provenance and time-domain representations in RDF that don’t cause a 10X growth is the number of triples is a difficult problem.

If there was some way to relate RDF assertion to a source document XML Node ID (the fourth column question) is a very difficult problem and one that needs close research with native XML and RDF systems. One of the things I learned about the eXist-Lucene integration done by Wolfgang Meier is that keeping the node-id as the document ID in Lucene allows non-programmers the ability to configure customized search rank rules past on the context of the keyword in a document. This is where highly customizable structured search rocks. I think this innovation needs to be added to RDF triples projects. Keeping the XML node-id of an assertion with an RDF triple context could be a great way to prevent RDF triple bloom for context.

Finally, I felt that David Wood and James Leigh's presentation on Callimacus project had the most potential to have a "big impact" on the adoption of Semantic Web tools. This new open source project allows users to specify a simple HTML template with special XML tags that allows them to bring triples directly into a web application. This has the potential to allow far more non-SPARQL programmers to integrate RDF data directly into a web page much like XQuery does today. With some work it might be possible to auto-generate the XForms bind elements for form rules. This would be a huge win for the integration of rules into web forms without needing to write JavaScript.

You can also see my tweets on the conference here: http://twitter.com/dmccreary

I hope to see more of you in San Francisco next year. Let me know if you are interested in co-presenting any papers!

Tuesday, May 29, 2007

Impressions of Sem-Tech -07

I just returned from the 2007 Semantic Technology Conference in San Jose California. It was a great conference and opened my mind to several new ideas. Well worth the time!

The conference was held over four days and had over around 125 presentations including tutorials and research projects. There were almost 800 attendees. This is the third semantic technology conference that I attended and the second that I presented a paper at.

Here are some high-level observations and some patterns that I detected.

The Semantic Web gets the “Web 3.0” Label

Most people at the conference have tried to embrace the idea that the semantic web will be adopting the popular culture label “Web 3.0”. The final straw was the Nov 2006 NYT article by John Markoff which set the blogosphere is a buzz. This was a web that includes technologies to enable intelligent reuse of data. Wikipedia, after a long pro-active discussion about if “Web 3.0” deserved an entry, finally undeleted the page and let is stand. See http://en.wikipedia.org/wiki/Web_3.0 and check out the discussion page on the article for more.

From If to When to How

Eric Miller (now at Zepheira) calls some of the new technologies “recombinant data. Eric spent about five years with the w3c. He is perhaps the most well-connected person in the world with accurate knowledge of who is using semantic web technologies to solve real business problems today. His observation was that three years ago at the first semantic technology we were wondering if the semantic web would take off. Last year the speculation was when the semantic web technologies would start to become common place. This year the focus was debates on how the semantic web should be implemented.

Venture Capitalists are Becoming Educated on the Semantic Web

One example of this is the fact that last year, most semantic-web startup companies had to carefully explain what the semantic web was to venture capital companies when trying to get their initial rounds of funding. This year, many venture capital companies not-only had some level of understanding of the semantic web but were asking each of their potential startups how their technologies fit into the semantic web. Other companies that did not have a semantic-web focus were not coming to the conference to get educated.

Some of the companies that got VC last year have already been purchased and absorbed by larger firms. These companies were replaced by new venture-funded companies.

Consensus on the RDF/SPARQL Foundation

One of the first things that struck me was the consistent use of RDF and triple-stored to solve many hard problems. The use of RDF and SPARQL seemed to be the primary distinguishing factor about if people thought you were really using semantic web technologies or not. If you we not using RDF, you were not really in the club, just and outsider looking in.

OWL Fragmentation

Another thing that also surprised me was the discord about the use of OWL and its relative sub-functions. Central to this was the rise of use-cases of simple things examples of things that OWL could not do. Much of this centered around DLP (Data Log Programming).

A good example of a non-OWL solution was the use of SKOS to store things typically stored in a metadata registry. SKOS is a great example of a simple standard, built on top of RDF that attempts to solve common problems without getting overly complex. See SKOS in Wikipedia.

We are also starting to see that rules must also be exchanged between systems in semantically precise ways. The need for a Rule Interchange Format (RIF). Nice to see vendors like FairIssac supporting complex business rules running INSIDE the web browser using XForms. They rock!

REST is Deep

Perhaps my favorite presentation was given by David Wood and Brian Sletten from Zepheria. In this presentation, David and Brian gave a demo of the NetKernel system. They demonstrated how NetKernel embraces REST at a much deeper level then I had previously anticipated. Now they were not yet generating XForms from an XML Schema but it showed a great example of convergent evolution of my ideas and theirs.

Case Studies

This year also started to show examples of new startup companies actually using semantic web technologies to differentiate themselves in the marketplace. Although because they are all trying to differentiate themselves, much of the actual technologies they were using was not disclosed.

RDF Taggers, Harvesters, Linkers and Analyzers

The conference seemed to have three sets of problems that everyone agreed on. First was how do you harvest RDF from a web page or any other resource. Most of these presentations related to getting RDF out of un-structured and structured data. Lots of discussion of microformats (pros and cons).

iReader Rocks

The coolest demo I saw was the iReader demo from http://www.syntactica.com. This is an awesome FireFox (and IE) extension that does concept mapping from unstructured text. The people behind this have been doing research on linguistics for about 40 years and have only recently got a round of venture capital to start to publicize this tool. But they are not yet converting to RDF for storage other systems.

URL Design Patterns

One common themes that came up was the need for best practices about good URL design. Everyone said a few things: The design of URLs is very important, most people screw it up the first time and then have to redo the designs, there are not a lot of good documents out there and the ones that are available are at least five years old. The people from the w3c did take notes on this.

Reception of My Paper

My presentation was titled “The Semantics of Declarative Systems”. This talk covered how by using a set of small languages with precise semantics, you can build entire applications that allow non-programmers to draw pictures of their business requirements and generate working Apps. The purpose of this talk was really to test the metaphors on how I could explain these concepts to non-computer scientists. I got some positive feedback and was happy to see convergent evolution of our designs with other organizations.

The URL to the paper is here:

http://www.danmccreary.com/presentations/sem-web-07