Date Handling in Java

Posted on February 16, 2005 by Scott Leberknight

There has been a thread going around an internal forum at work the past few days on calculating the difference between two dates in weeks, and how to implement that using Java. Of course since working with dates is Java is not at all fun, the code was very verbose and had lots of conversion from Strings to Dates to longs to perform subtraction on the time in milliseconds and then a bunch of basic math to convert milliseconds to weeks. Ugh. Since I've been dabbling a little in Python lately (that is, going through the tutorial very slowly whenever I get a few minutes here and there) I decided to see how this would be accomplished in Python. Here it is:

date_from = date(2004, 2, 17)
date_to = date(2005, 2, 17)
diff = date_to - date_from
diff_in_weeks = diff.days / 7
print diff_in_weeks

And of course the answer it prints is 52. So that was five lines of code including the print statement.

The equivalent code in Java might look something like the following:

GregorianCalendar fromCalendar = new GregorianCalendar(2004, 1, 17); // month is zero-based
long fromDate = fromCalendar.getTimeInMillis();

GregorianCalendar toCalendar = new GregorianCalendar(2005, 1, 17); // month is zero-based
long toDate = toCalendar.getTimeInMillis();

long diffInMillis = toDate - fromDate;
long diffInWeeks = diffInMillis / (1000 * 60 * 60 * 24 * 7); // lots of yuckiness here to convert millis to weeks
System.out.println(diffInWeeks);

So that's seven lines of code including the print statement. That's not that much of a difference, but the difference in readability is a rather large difference. In the Python version, we create two dates and subtract them to get a timedelta object which "represents a duration, the difference between two dates or times" according to the Python docs. We are now working with an object that inherently represents a difference between times and thus the code is very clean. On the other hand, in the Java code we have to convert from a Calendar to a Date to a long, compute the difference in a primitive type that does not inherently represent the difference between times, and do some yucky but still relatively simple math to convert from milliseconds to weeks.

And though I could have used the Date constructor that takes a year, month, and day I decided against that because that constructor and similar ones are deprecated. So I used a GregorianCalendar to obtain the date and then converted to a completely non-natural way to represent a date - a Java primitive long value.

The point is that dealing with dates, times, calendars, etc. in Java is not a very pleasing experience most of the time, and it is beyond me why the Java language does not directly support the notion of ranges, whether they are numeric or dates or times. At the least the Date or perhaps Calendar class could have included a diff() method that would allow you to compute the difference between two dates. Isn't this a pretty common thing to do in many applications? So of course everyone has to write their own version in some DateUtils class just like they have to create StringUtils classes to deal with omissions in String, such as checking if a string is empty or not empty, splitting and joining (prior to JDK 1.4), capitalizing, etc.

And to top it all off, the date and number formatting classes DateFormat and NumberFormat along with MessageFormat are not thread safe. That to me was a very poor design decision since now everyone needs to create their own wrappers or utility classes to be able to use a shared date formatter perhaps using a ThreadLocal to ensure thread safety. Until JDK 1.4 Sun didn't even bother to inform people of potential multithreading issues in the JavaDocs.

This is sort of a silly rant about relatively low-level details, but I think it points out that making your code read along the lines of your domain ensures readability and maintainability, since the code reflects the language of the problem domain and not the programming language. In this case the Python example more cleanly mirrors how you typically think about dates than the Java example, which makes for cleaner and more efficient code.

One Schema To Rule Them All

Posted on January 26, 2005 by Scott Leberknight

On my current project at work, we started out with our own separate database. About a month into development, we were directed to re-host our database inside a centrally managed corporate database. The application itself is relatively small with a small number of tables - around 20 in total. Prior to our move into the corporate database, we had our own schema in an Oracle9 database. This worked out well since our tables could be named exactly as we wanted and we didn't need to worry about naming conflicts.

When we began the migration into the new Oracle9 database, however, we found out that we would not have our own schema. Instead all our tables would be created in one schema, as there is only one schema for the entire database. Because there is only one schema, this means that all our tables had to be renamed in accordance with a set of naming conventions and also to ensure the table names fully described the domain and intent of the table. In other words, since all tables are in one schema, the table name is the fully qualified unique name. At first glance this "One Schema To Rule Them All" approach did not seem to be the best idea.

For one thing, tables in different Oracle schemas can certainly see each other so long as the appropriate permissions are granted. They can also reference each other via normal foreign key constraints. In addition, the fully qualified name of a table is schema_name.table_name such that you could have two tables with the same name in different schemas and not have a naming conflict. Also, requiring all tables to reside in one schema means that you cannot have two tables with the same name, and you sometimes would end up having to create contrived names to avoid name collisions.

But I think the real reason why this approach struck me as not such a good choice was related to the way in which you package classes in object-oriented languages such as Java and C#. In Java you create packages and you place classes into those packages according to some logical breakdown. For example, in a web application you might have high-level packages representing the view, the model, and the controller. Or you might create packages according to use cases. And more than likely different applications will reside in different packages, with perhaps some common packages in a "common" or "utils" package that can be shared between applications. So I thought: Wouldn't it be better to create a "core" schema in which tables common to all applications, e.g. Employee, Department, etc. could reside and then have separate schemas for each application?

So I asked several of the database designers we are working with about this. The answer was that several years ago they actually started down the multiple schema path, but quickly found that establishing the permissions between all the schemas coupled with some additional security restrictions they have in place was a real maintenance nightmare. I suppose that makes sense, since in order to have access to the proper tables each database account needs to be granted permissions on all the schemas and tables within those schemas. And with the additional security features they require it complicates this further. Thus they changed to the one schema approach and have strict naming guidelines and conventions to handle potential naming collisions.

I can understand this argument, but I wonder if there still isn't an easier way to deal with this issue in large corporate databases shared by multiple applications across multiple business units. Until then, we have "One schema to rule them all, One schema to find them. One schema to bring them all and in the darkness bind them."

Hibernate3

Posted on January 24, 2005 by Scott Leberknight

A week or two ago I went to the Novajug Enterprise Group meeting to hear about Hibernate3. Steve Ebersole from JBoss gave the presentation and did a good job explaining the future direction of Hibernate. From what he said, it appears the Hibernate crew have done a significant amount of work, including internal refactorings in addition to a bunch of new features I definitely want to get my hands on.

Steve called Hibernate3 an "evolutionary new version" of Hibernate which includes improvement of the existing code base and new features. One interesting thing is that you can actually run Hibernate3 in the same application as Hibernate2. Apparently this is possible because they changed the package root from net.sf.hibernate to org.hibernate. Of course that means to change from Hibernate2 to 3 you will need to do some search and replace in your own code. One thing that will be really cool if it happens is an improved Hibernate GUI console application that will allow you to write and test HQL queries as well as work with actual objects to see how Hibernate will behave in a real application. Another nice feature is the support for JDK 1.5 annotations, which might in some cases be a better alternative than writing .hbm.xml mapping files.

Several of the more notable internal refactorings include a new event-based architecture as opposed to the current Session-based architecture. Another new thing is the ANTLR-based HQL parser, which should provide much more granular and useful error messages for HQL syntax problems. Also, multi-valued cascade settings, e.g. cascade="save,update" instead of all the various values rammed together into dash-delimited string like cascade="save-update". One other notable change is that fetching will now default to lazy rather than eager.

Now for the new features. Hibernate3 adds Filters, or "parametrized application-level views". For example, you could implement a security filter that would only return rows in a table for which a user has the proper access level. This is pretty much like adding additional restrictions to the WHERE clause, e.g. WHERE :userLevel >= access_level, but in a dynamic and non-intrusive manner.

"Houston, we have multi-table mappings!" According to Steve, multi-table mappings are not really a good thing, as he believes you should not make your object model less granular than your database tables, e.g. by mapping two tables into one domain object. I agree most of the time. However, many of us live in a world where DBAs sometimes make the database extremely normalized to the point where one logical entity is scattered across two or more tables. I don't necessarily like it, but it's reality and I am glad Hibernate now has the ability to do multi-table mappings. Oh, did I mention you use the <join /> element to accomplish this feat? There are some more advanced things you can do like mixing <join /> and inheritance.

Another interesting, though perhaps dangerous in the hands of the wrong developer, feature is called "representation independence". Steve's example was persisting "maps of maps" or "lists of maps", etc. In other words, you can persist a dynamic model without even needing a class. I can see scenarios where this would come in handy, but would certainly be wary of using it unless there is no other good alternative.

Several other new features include JMX integration; support for handwritten SQL and stored procedure integration into normal POJO domain objects; and EJB3-style persistence operations. Though, as long as Hibernate is around I don't think I'll be knocking down EJB3's door anytime soon.

Writing Software Is Like Writing In General

Posted on December 30, 2004 by Scott Leberknight

I came across this article on IT World in one of the emails I get from them. It reminded me of something I think about a lot, which is that writing software is not much different than writing in general - that is, writing books, articles, presentations, etc. When you write a book, an article, a presentation, a blog entry, or compose even a short email, you are constantly rewording, reworking, restructuring your words...in software parlance you are refactoring your prose. I am doing it right now as I write this entry.

Why then, do so many people in the software industry assume that software is exempt from this reworking and restructuring, and that by "proper design" you can somehow avoid rework and restructuring? This problem is mostly found in managers and other like-minded people, who want to make sure you "get it right" the first time so there is no going back to any code you've already written. They want to check off the task in their project management tool that says your domain objects are done, and then make sure you don't touch them ever again because to do so would throw off the schedule and their ability to manage. Interestingly, most of these people would never release project documentation without it going through multiple reviews and revisions. Sadly, they fail to see how this is the same as reviewing and revising code as you write it and add more and more functionality throughout the development of a system.

How Much Do You Rely On Your IDE?

Posted on December 21, 2004 by Scott Leberknight

I love IDEs. They make developing software much more enjoyable and productive. But how much work do you let your IDE do? At what point are you fighting the IDE instead of being productive? Every IDE and tool you'll ever use will have some things you wish it would do or do differently. But there are certain things that I am not willing to let my IDE do for me. I think probably the most important thing is builds. Certainly in Java the de facto build tool for all sorts of Java applications is Apache Ant, but every single IDE out there has its own internal build tool and process. Most times you end up clicking through more than a few screens to configure it just right, and then find out it is really difficult or impossible to share all those settings or to easily view what the build process and settings actually are! Even worse, once you've got your IDE's build process mastered, you find you are locked into that IDE. So there is no way to share your build process with other team members who are not using the same IDE as you are.

Many organizations "solve" this problem by mandating the same IDE for all developers. I won't get into that argument, but I'll just say that I think there are positives and negatives to that approach. So, back to the build problem. Assuming all developers have the same IDE, is it a good idea to use their build configuration? I don't think so because the IDE build process is typically neither flexible nor transparent. Once you get beyond simple applications and into the arena where your build must be able to target multiple environments (e.g. development, test, production, etc.), IDE build processes quickly break down.

A simple but very relevant example is Ant's <filterset>, which I use all the time to replace tokens for different build environments among other things. To illustrate, most teams provide local instances of a database for each developer, one or more test databases where integration testing and user testing occurs, and a production database. Each of these databases is probably going to have a different URL, username, password, etc. Using the Ant <filterset> you can very easily replace tokens during the build with specific values for the target environment. Ant makes this very, very easy. You could simply place the varying properties in different properties files, e.g. dev.properties, test.properties, and prod.properties, and then use a -D property to determine which environment to build, e.g. ant -Dbuild.env=test would build the application for the test environment. Whereas this is very simple in Ant, IDEs I've used don't provide this type of customization. Why not? It seems very narrow for an IDE to assume there is only one target deployment environment.

There are other things I don't want my IDE doing for me as well, but there are of course a ton of things I really want my IDE to do for me, like code generation of getters and setters, refactoring, and being able to run unit tests within the IDE while I'm developing to get a really fast build, test, build iterative cycle going. Mainly those are things that are similar among IDEs but might have a different menu command or keystroke. IntelliJ, Eclipse, and JBuilder all can generate my getters and setters for me just fine, so what is the differentiator between IDEs? I think for me it is to what extent the IDE helps me versus hinders me, and what things it does that automate things I do all the time. I can be productive in any of those three aforementioned IDEs, but I prefer IntelliJ because it seems like it is more of an extension of me than a tool I am using. It just feels "smooth". But still, I won't let even IntelliJ do my builds for me!

There are some other relatively minor things I might avoid relying on an IDE for, but overall builds seem to be the most important aspect of development to keep out of your IDE's reach, regardless of whether you are developing in Java or something else.

Manual testing

Posted on December 20, 2004 by Scott Leberknight

Does you organization have a test department that still use a gigantic Word document or Excel spreadsheet to document all paths through a system and makes its human testers manually click through everything on a web site or rich client application? Several people I work with mentioned that on a recent project they used a test department that demanded a complete script of what buttons they need to click, what data to enter, in what order, and what results to expect. They then set the testers loose. Apparently all these testers did was to follow the "happy trail" and magically the application passed with flying colors! Of course the reality was much uglier, since they really didn't exercise the application either in terms of executing all potential paths thought the user interface, entering erroneous data, or load testing the application.

I submit that any testing team that insists on having humans manually click through a scripted session and manually type in the results is absurd, a waste of valuable time and money, and ultimately adds zero value to the software development effort and probably adds negative value.

There are tools out there to automate many of these repetitive tasks. The only one I ever worked with was Rational's Robot product, and that was back in 2001. It allowed you to record the interactions with an application and perform assertions on the results of those interactions. That allows you to automate regression tests in the same way developers can re-run all their JUnit tests at the touch of a button. From a brief visit to the Robot web site it appears to be able to do a lot more now, including web applications. I also found this article about using the Rational Robot automation framework support (RRAFS) and another link off that page to the Software Automation Framework Support project on Sourceforge.

In any case, please do your company a favor and let someone know there are lots of tools to automate many aspects of functional testing!

Five Sessions in Five Minutes (NFJS)

Posted on December 20, 2004 by Scott Leberknight

Ok, since I've been slacking so much since the November 5-7 No Fluff Just Stuff conference, I am going to write about the last five sessions I attended in this single entry. Each session gets one (hopefully short) paragraph.

Howard Lewis Ship, creator of Tapestry and HiveMind, gave an introductory session on HiveMind. This is another Dependency Injection/Inversion of Control container similar to Spring. Looks pretty interesting, especially the ability to configure separate modules and give them versions. It also contains the capability to define configuration points for plugging in your own extensions. However, with this capability also seems to come some pretty hefty complexity, from the example that Howard showed during the session. Like Tapestry, HiveMind has line-precise error reporting, which is always nice. But by far the coolest thing is HiveDoc, which thoroughly documents the HiveMind configuration in a JavaDoc-like web page. Someone mentioned a that Spring was going to introduce a similar feature but I haven't seen or heard anything about it yet. Overall, HiveMind looks pretty cool but for now I'm staying with Spring!

The first session on Sunday morning was "Hard-code Multi-threading in Java" given by Neal Ford. Overall this was a good session with lots of live examples showing the thread debugger in JBuilder and OptimizeIt, which are both pretty cool. He also showed using JDB to debug at a very low level. I suppose sometimes writing web apps is nice since you don't normally need to worry about threading - well, you actually do since servlets are by nature multi threaded, but you get to deal with threading at a much more basic level than worrying about deadlocks, lock starvation, etc.

The next session was "Ant Hacks" by Erik Hatcher. Erik did his usual bang-up job and showed some really cool new things in the latest version of Ant. A cool feature is the <image> task to do things like write the version number onto the splash screen when building, or creating image thumbnails. Next was the <import> task which provides the ability to import another Ant script to mix-in the imported build files, override targets in an OO-like fashion, and define abstract targets which must be overridden. Another really cool new task is the <subant> task, which recurses a directory tree and can operate in one of two modes. The first mode executes the same build file against each directory, which would be really useful if you have subprojects within a large project that all follow the same directory structure. The second mode is to use <subant> to execute a collection of build files, e.g. run all build.xml files in the directory tree. The <presetdef> and <macrodef> tasks look really useful for eliminating duplication in build files. And the <scriptdef> task could be really useful sometimes by allowing you to write script in your builds using one of several languages, such as JavaScript, Python, BeanShell, and Groovy. New stuff for Ant. All useful.

After the Ant session, went to another session by Erik on Subversion, a potential CVS-killer. Actually after earing this talk I believe it is a CVS-killer and I plan to start using it soon. Some of the cooler features are atomic commits, true version history across copy and rename operations, versioned metadata, directory versioning, and offline operations like status, diff, add and remove! Go see for yourself. Oh, and apparently all the Apache projects are migrating to Subversion...that ought to say something.

Ah finally. The last session. "Top 10 Security Vulnerabilities Developing Web Applications" by Neal Ford. They are in a nutshell: unvalidated input; broken access control; broken authentication and session management; cross-site scripting flaws; buffer overflows (though not in Java; injection flaws (e.g. SQL injection); improper error handling; insecure storage; denial of service; and insecure configuration management. One interesting thing Neal talked about was Stinger, an open-source tool that validates HTTP requests against an XML rule set. Another cool toy he mentioned is WebScarab by the Open Web Application Security Project (OWASP). This tool allows you to "record the conversations (requests and responses) that it observes, and allows the operator to review them in various ways", like trying out illegal values and seeing how your application behaves. This session could have been improved a lot if Near had shown using these and other tools to demonstrate the security vulnerabilities he talked about, but overall was informative and interesting.

Whew! Done, and it only took me another month and a half after the conference. :-(

Programming with Ruby (NFJS)

Posted on December 05, 2004 by Scott Leberknight

It's been a month since I went to the Reston No Fluff Just Stuff and I still haven't written all the sessions. Maybe next time I'll blog the sessions while I'm there. Anyway, went to a good session given by Dave Thomas on the Ruby programming language. It seems there has been a lot of discussion lately by Stu Halloway, Dave Thomas, and others about metaprogramming and how certain languages don't need the notion of patterns simply because the language supports the "patterns" as part of the language.

Some of the cooler things Ruby supports include closures, blocks, iterators, mixins, and the ability to add and remove methods dynamically from core classes like String. In Ruby, everything is an object, unlike Java and C# which have primitive and reference types. And Ruby does away with the special new keyword in favor of a special initialize method which is invoked when you call new to create an object:

myBook = Book.new("Programming Ruby")

Another cool thing is that Ruby provides native support for defining attributes and specifying whether they are readable and/or writeable using a shortcut notation:

attr_reader :title
attr_accessor :artist

The above defines a read-only attribute named title and a read/write attribute named artist. Even better you can directly access these attributes using dot notation, e.g. myBook.title and myBook.artist = "Dave Thomas" without needing to declare mundane Java-style getters and setters. The above are shorthand for a def block which is the formal notation for defining attributes. For example, the attr_accessor is equivalent to the following:

def artist
  @artist
end

def artist=(val)
  @artist = val
end

In Ruby, instance variables are always private and are prefixed with an @ symbol, e.g. @artist. That's kinda cool. No more project code convention arguments whether instance variables should be prefixed with an underscore, or always prefixed by the this keyword, or any other contrived mechanism for unambiguously identifying instance variables in code.

Ruby support for blocks and iterators is really cool. For example, since everything is an object, you can write code like the following:

3.times { puts "Hello, Ruby" }

myHash.each { |key, value|
  puts "#{key} -> # {value}"
}

IO.foreach("/my/file") do |line|
  process(line)
end

The first line in the examples above prints "Hello, Ruby" three times, as you would expect. Pretty nice. The next example shows iterating over a hash and printing the keys and values. The last example shows iterating over the lines in a file and calling some method process to handle each line. In Ruby a block is simply a "chunk" of code attached to a method. This has the nice side effect that you can pass around blocks of code to be executed in predefined methods. So you could define a method called do_twice and define it as such:

def do_twice
  yield
  yield
end

Then if you called the do_twice method using the line do_twice { puts "Ruby is cool!" }, the output would simply be "Ruby is cool!" printed twice.

There was a lot more to Dave's presentation including examples of scraping a web page, connection to a database, and information on Ruby web and persistence frameworks. All in all, Ruby seems like a pretty cool language, and I'd like to try it out on a few projects. Oh wait, I just remembered that requires having some free time...

Check out Dave's Programming Ruby book.

Unit Testing is Part of a Developer's Job

Posted on November 24, 2004 by Scott Leberknight

Today I was discussing software development with several colleagues. We started talking about unit testing and one developer mentioned he simply didn't have time to write unit tests and needed to get the code written and shipped. I won't bother trying to argue whether unit testing saves time and money and produces higher-quality software in the long run or not (of course I think it does but that's not the point here). My point is simply this: I think unit testing is a part of a developer's job in the same way that using version control is part of a developer's job. Most projects nowadays would never consider not using version control. Using version control is simply an accepted part of software development that adds significant value during software construction. I think unit testing should have the same status, such that projects simply consider unit testing part of every developer's normal job. There should not be separate schedule items for unit testing like you so often see on project teams. When I write code now I literally do not feel confident about it if I don't have corresponding test cases.

Herding Racehorses, Racing Sheep (NFJS)

Posted on November 13, 2004 by Scott Leberknight

Probably the best session at No Fluff is Dave Thomas' "Herding Racehorses, Raching Sheep" talk. It isn't really a technical session and could be applied not just to software development but to many other industries. The basic premise is simply that the software industry does not have enough people who are at the Competent level or above on the Dreyfus Model of Skills Acquisition, and that this has had a negative effect on the software industry in terms of the quality of products produced. The Dreyfus Model basically states that there are fundamental differences in how people at different levels perceive the worlds, how they solve problems, how they create mental models, how they acquire new skills, and what affects their performance. The Dreyfus Model has five levels: Novice, Advanced Beginner, Competent, Proficient, and Expert. A person can be at different levels for different things. For example, Dave gave the example of how he began learning to fly as a novice, the differences in the way you think and perceive, and how you progress to higher levels.

The title "Herding Racehorses, Racing Sheep" derives from the way the software industry treats individual software programmers, which is to say many naive companies think all developers are interchangeable and should be treated the same. This is extremely common in the industry, and the companies that do best understand this sentiment of uniformity and interchangeability is simply wrong and is actually counterproductive. Many studies have shown that programmer productivity differs by orders of magnitude between beginners and experts.

What can be done? Dave suggests we must "encourage competence" through better methods of training, keeping experts in development jobs rather than "promote" them to management, and make payscales match the actual skills and more importantly productivity of programmers. Trying to change the mentality of companies will be a difficult challenge at best, and an onging one. But one sure thing individual developers can do is to, as Dave puts it, "invest in their own Knowledge Portfolio" and continue to learn new things throughout their careers. Yes, many low-level development jos are moving offshore and will continue to do so, but developers who continually maintain and enhance their skills and knowledge portfolio will thrive.

My own philosophy has been to always be learning, always upgrading my skills and evaluating myself against others. This is the same mentaility as in the medical, law, and accounting indistries where those professionals are constantly learning, going to conferences to gain new knowledge, etc. For some reason, though, a large percentage of software programmers do not this. They feel no need or desire to learn anything new and continue to stagnate. I have worked with many developers who simply do not care about improving their skills or learning anything new. In the past these people had no problems retaining a job, because programmers were in such demand. The big difference now is that the demand is still there but their jobs are leaving to go overseas. After all, why pay someone here in the United States three or four times as much for the same productivity and skill level than someone half-way across the world who has the same skills and knowledge but costs much less? In fact, Dave mentioned at one point that he though about 30% of developers should be fired, to weed out those who haven't learned anything new since they graduated college. My only question is whether the percentage shouldn't actually be more like 50%.