Polyglot Persistence

Posted on October 15, 2008 by Scott Leberknight

In late 2006 Neal Ford wrote about Polyglot Programming and predicted the wave of language choice we are now seeing in the industry to use the right language for the specific job at hand. Instead of assuming a "default" language like Java or C# and then warring over the many different available frameworks, polyglot programming is all about using the right language for the job rather than just the right framework(s). For a while now I've thought about the fact that, paralleling Neal's description of polyglot programming, a relational database seems to be the accepted and default choice for persistence. Sometimes this is due to the fact that organizations have standardized on RDBMS systems and there isn't even any other choice. Other times it is simply what we're used to doing, and possibly we don't even consider alternatives. But now, with things like Amazon SimpleDB, Google Bigtable, Microsoft SQL Server Data Services (SSDS), CouchDB, and lots more, it seems like we're now seeing the beginning of Polyglot Persistence in addition to polyglot programming.

Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. For example, some co-workers of mine on one project are effectively using Lucene as their primary datastore, since the application they've built is mainly to do complex full-text searches very fast against huge datasets. Most people probably don't think of Lucene as a data store and just consider it as their full-text search engine. But for this particular application, which aggregates multiple disparate datasets, glues them together, and performs full-text search against the consolidated view of the data, it makes a good deal of sense. It also helped that in a bake-off against a very popular traditional RDBMS system's full-text add-on product, the Lucene search solution blew the doors off the traditional RDBMS in terms of performance, and that was even after a team of consultants from the vendor came in and tried to optimize the search performance. So, in this case a non-relational data store made more sense in terms of the problem context, which was data aggregation and fast full-text search.

Within the past few years we've started to see and hear about how companies like Amazon and Google are using non-traditional data stores such as SimpleDB and Bigtable for their own applications. Google App Engine in fact provides access to Bigtable, described as a "sparse, distributed multi-dimensional sorted map," as the sole persistent store for Google App Engine applications. Other organizations like the Apache Software Foundation have gotten into the non-relational data store market as well with things like CouchDB which is described as "a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API." One of the common threads among all these non-relational stores is that they are distributed, designed for fault tolerance, embrace asynchronicity, and are based on BASE (Basically Available, Soft State, Eventually Consistent) and CAP (Consistency, Availability, Partition Tolerance) principles as opposed to traditional ACID (Atomicity, Consistency, Isolation, Durability) properties found in traditional RDBMS systems. In addition, they are almost all either "schemaless" or provide a flexible architecture that promotes ease of schema changes over time, again as opposed to the rigid and inflexible schemas of traditional relational databases.

I don't think it's a coincidence that the companies creating and now offering these alternative data stores - free, commercial, or hybrid models like Google App Engine which is free up to a certain point - are all giants in distributed computing and deal with data on a massive scale. My guess is that perhaps they initially deployed some things on traditional RBDMS systems and outgrew them or maybe they simply thought they could do it better for their own specific problems. But as a result, I think over time that organizations are going to start thinking more and more about the type of persistence they need for different problems, and that ultimately the RDBMS will be but one of the available persistence choices.

Apache Commons Collections For Dealing With Collections In Java

Posted on October 03, 2008 by Scott Leberknight

If you are (stuck) in Javaland, which for my main project I currently am, and you'd like a little of the closure-like goodness you get from, well, lots of other languages like Ruby, Groovy, C#, Scala, etc. then you can get a tad bit closer by using the Apache Commons Collections library. Ok, scratch that. You aren't going to get much closer but at least for some problems the extensive set of utilities available can make your life at least a little easier when dealing with collections, in that you don't need to code the same stuff over and over again, or create your own library of collection-related utilities for many common tasks. Note also I am not intending to start any kind of religious war here abut Java vs. Java.next, which is how Stu aptly refers to languages like Grooovy, JRuby, Scala, and Clojure.

As a really quick and simple example, say you have a collection of Foo objects and that you need to extract the value of the bar property of every one of those objects, and you want all the values in a new collection that you can use for whatever you need to. In that case you can use the collect method of the CollectionUtils class to do this pretty easily.

List<Foo> foos = getListOfFoosSomehow();
Collection<String> bars = CollectionUtils.collect(foos, TransformerUtils.invokerTransformer("getBar"));

This simple code is equivalent to the following:

List<Foo> foos = getListOfFoosSomehow();
Collection<String> bars = new ArrayList<String>();
for (Foo foo : foos) {
    bars.add(foo.getBar());
}

Depending on your viewpoint and how willing you are to ignore the ugliness of passing a method name into a method as in the first example, you can write less code for common scenarios such as this using the Commons Collections utilities. If Java gets method handles in Java 7, the first example could possibly be more elegantly rewritten like this:

List<Foo> foos = getListOfFoosSomehow();
// Making a HUGE assumption here about how method handles could possibly work...
Collection<String> bars = CollectionUtils.collect(foos, TransformerUtils.invokerTransformer(Foo.getBar));

Of course, if Java 7 also gets closures then everything I just wrote is moot and irrelevant (which it might be anyway even as I write this). Regardless, with the current state of Java (no closures and no method handles) the Commons Collections library just might have some things to make your life a bit easier when dealing with collections using good old pure Java code.