Friday, March 21, 2008

Metadata Repositories vs. Metadata Registries

For several years people have been using the terms metadata Registry and Repository inconstantly, imprecisely and almost interchangeably and I would like to weigh in as to how these terms could be used more precisely to allow organizations to effectively to manage metadata processes.

First lets take the definition of a Repository. Webster defines a repository as …a place, room, or container where something is deposited or stored.. Note that here is nothing in this definition about the quality of the things being stored or the process to check to see if new incoming items are duplicates of things already in the repository. If I have 100 users they could each define "Customer" as the see fit and put their own definition into the metadata repository as their own definition. No problems.

On the other had lets take the word Registry. A Registry has the connotation of more than just a shared dumping ground. Registries have the additional capability to create workflow processes to check that new metadata is not a duplicate (for a given namespace). One of the definitions from Webster is an official record book. Note the word official.

A Repository is similar to a front-porch of a house. No locks prevent new things from landing there. But a Registry is a protected back room where human-centric workflow processes are used ensure that metadata items are non-duplicates, precise, consistent, concise, distinct, approved and unencumbered with business rules that prevent reuse across an enterprise. These registries have become the central foundation that agility can be baked-in to many enterprise process. The latest version of the Kimball's Data Warehouse Lifecycle Toolkit (which is actually a very good read) even goes as far as to call their process "metadata-driven". Not different the model-driven development world.

Registries have the implicit connotation of trust behind them. They now serve a a central process for the creation of shared meaning across the enterprise. Definitions in a registry have been vetted by an enterprise-level organization that has the responsibility of enterprise data stewardship. They have a high probability of being consistent with industry best-practices and vertical industry standards. Registries are the go-to source for creating canonical XML schemas, enterprise ontologies or conformed dimensions in a OLAP cube. Repositories are personal or small departmental definitions of an isolated view of the world.

None of these ideas are really new. They are at the core of the ISO/IEC 11179 metadata registry standard. Note that they don't call it a repository standard! People are just now starting to understand how important Registries are in most enterprise-wide systems. The growth of Business Intelligence and Enterprise Data Warehouse terminology and Service Oriented Architectures is a good place to see the rise of repositories and registries. We now see service registries, portlet registries, model registries...the list goes on-and-on.

Much of the background on the differences between the use of repositories and registries can be traced way back to the early days of object-oriented systems in the 1995 book Succeeding with Objects by Adele Goldberg and Kenneth Rubin. This was one of the first books on enterprise reuse strategies and they defined the concept enterprise asset reuse and the need for a trust-driven repository as a basis for reusing assets. They identified a multi-step process for reviewing new submissions to determine if the submission duplicated existing assets. They showed how critical it was to classify items in a registry and search an existing registry for duplicates before new items are added. If you can get a copy of the book I would suggest you read the section on "Set Up a Process for Maintaining Reusable Assets" on page 245.

The book then goes on to show how organizations can and should be structured to reuse these assets and gives the pros and cons of the differing organization structures and their impact on reuse. This is the basis for the data governance and data stewardship movement in many organizations today.

So the next time someone uses the word registry or repository in a conversation, ask them if they are using the definition of the word that is consistent with the corporate business term registry or is their own private definition from their own repository of imprecisely used buzzwords.

4 comments:

Anonymous said...

Hello Dr. Data Dictionary -
My name is Manish and I am the moderator of a metadata website. With your permission - I would like to have a link to this excellent article on my website which is also all about metadata. It will be a link with summary or if you allow a complete republish of your article with all your credentials intact. Please let me know if that would be ok. Looking forward to your response.

The website is metadataforums.com


Regards
Manish

Dan McCreary said...

Manish,

Feel free to repost the article on your metadataforums.com web site. I tried to use the site but the comment section is not working.

Unknown said...

Hello Dr. Data Dictionary,

I am looking for technical documents to design Metadata Registry based on ISO 11179.

Any help will be appreciated in this context.

Regards,
Pankaj Kulshrestha

Dan McCreary said...

Hello Pankaj,

I am glad you are interested in this important topic.

Here is a list of the key case studies I use to teach this topic:

http://www.danmccreary.com/case-studies/index.htm

The actual implementation may depend on your industry. My current focus is healthcare metadata. There are many changes from the standard NIEM models we need.

- Dan