Dan McCreary's Blog: October 2013

Over the past four years we have seen the NoSQL movement grow from a small "Meetup" in the Bay Area to a technology that is touching all corners of the database world. Each year NoSQL software becomes more capable, lower cost and easier to use. Document stores, in particular, make the process of doing object-relational mapping unnecessary allowing anyone with god metadata (JSON and XML) to simply drag-and-drop their files into a centralized corporate data store. There are still a few challenges left. For example using statistical analysis of inserts and record counts to optimized indexes and putting good query languages (like JSONiq) on top of these data stores. But in large, these features are just polish on systems that are optimized to scale and be highly-available. The hard work seems like it has been done and we are now in the stage of refinement, not revolution.

We documented the emergence of the NoSQL database patterns in our book, Making Sense of NoSQL, which now available through Manning Publications. If you read this book you know that NoSQL systems have a diverse set of architectural patterns and different patterns apply to different problems. Selecting the right database architecture is a complex process of carefully understanding the subtleties of requirements and weighing the alternatives. Yet they do work well and once they are setup and configured they make data persistence a straightforward process.

So whats next?

Now that saving data has shifted from a project in its own to a smaller part of the application developer project we see the skills needed to build applications starting to shift. The need to model your data with ER modeling tools is getting less. The need to write complex joins with SQL is no longer needed. The next major skill set we would like to address is the the movement to agile transformation. How do you get you data out of your database and how do you transform it into the many formats that your application needs?

We think that the answer to this question is clear. Organizations need to be better at transforming the data in their database to other forms. This is the shift in skill sets from persistence centric to transformation centric. And it is not just the software developers that need to be able to transform data. Everyone on your team including non-programmers can play a role. They all need skills to quickly transform data from one form to another.

We call this shift the movement to toward Agile Transformation. We hope to document how organizations are waking up to this new movement and understand the tools and processes they are adopting to empower everyone on their team to quickly transform data.

This process of data extraction and transformation used to be a two-step process. SQL developers might create a series of tabular reports. These reports were then converted into the medium needed, HTML, XML, JSON, or even CSV files and other structures needed by other tools. Now the extract and transformation process can be done with a single step. Query results are no longer restricted to tabular formats.

Strategies for Agile Transformation

Over the next few months (or perhaps years), we hope to document many of the ways that organizations are attacking the agility challenges. Here are just a few strategies to get us started.

Single Source Canonical Data Models

If you are in the content management business you know that the concept of single-source publishing is central to your productivity. Using a single format to store content gets around the many-to-many transform problems that can drag down a teams productivity. We see the same principals also applying to web applications in general. Getting many data sources into a single format and then transforming this single format into many forms is the key to organizational productivity. We call these models "Canonical" since they are the standards that organization can build publish/subscribe web services around.

Flexible Query Languages

If you have every worked with tools like XQuery and JSONiq you know that they are the most flexible query languages around. These languages have benefited from years of work combining the best features of SQL, XSLT, XPath and dozens of other advanced query languages into a grammar that is designed to transform a variety of use cases.

Reusable Transformation Libraries

One of the first strategies that companies find is that many transformations are similar and can benefit from reusable code. Languages like SQL do offer a wide variety of non-portable stored procedures. Yet most of these languages limit your ability to build reusable transformation functions and modules. Modern languages need to be close enough to your data to understand how queries use indexes but abstract enough to be reused in new applications.

Using Great Tools: IDEs and Report Writers

SQL GUI Report Writers were one of the first tools that tool the complexity out of transforming tabular data. And we need more tools like these to make NoSQL reporting accessible to non-programmers. Some NoSQL products like HBASE already have SQL-like query tools. From our other blog posts you may know that we are big fans of the the oXygen IDE for managing JSON and XML data. oXygen makes the process of learning how to write XPath expressions easy for even the non-programmer. Even if their data is complex. These tools are complex in themselves and require hands-on training if non-programmers are going to get the most out of them.

Simplicity for Non-Programmers

One of the core foundations of agility is not have all your transformation be controlled by a small group of overworked developers. We learned that simple tools like GUI-based report writers or simple XPath templates can empower a non-programmer, with a bit of training, to build and maintain their own data transformations. Not needing to understand database joins is a big step in empowerment. Getting a good foundation library is another great step. Setting up small but easy to use templates is another good strategy. Building a search system to find the right library and templates also helps empower new staff and lowers the training burden on existing staff. In general, we feel that if a user "knows their data" that they should be given the tools to transform their data.

So what is the best practices to build an organization that has agile transformation competency? We would love to know your ideas. Please send us email or tweet us at @dmccreary on Twitter.

Dan McCreary's Blog

Wednesday, October 02, 2013

Agile Transformation in the Post NoSQL Era