Wackiness With String, GString, and Equality in Groovy

Posted on March 20, 2008 by Scott Leberknight

We ran into this one today while writing a Groovy unit test. Check out the following GroovyShell session:

groovy:000> test = []
===> []
groovy:000> (1..10).each { test << "Item $it" }
===> 1..10
groovy:000> test                               
===> [Item 1, Item 2, Item 3, Item 4, Item 5, Item 6, Item 7, Item 8, Item 9, Item 10]
groovy:000> test.contains("Item 1")
===> false

Um...what? We started by adding items to a list, e.g. "Item 1", "Item 2", and so on. Then we simply ask whether the list contains an item we know (or think) is there. But no, it prints false! How can test possibly not contain "Item 1?" First, what's going on here? Look at this second example:

groovy:000> test = []
===> []
groovy:000> (1..10).each { test << "Item $it" }
===> 1..10
groovy:000> test
===> [Item 1, Item 2, Item 3, Item 4, Item 5, Item 6, Item 7, Item 8, Item 9, Item 10]
groovy:000> println test[0].class 
class org.codehaus.groovy.runtime.GStringImpl
===> null

Ah ha! The items in the list aren't strings. So what? Look at this code:

groovy:000> a = "Item 2"
===> Item 2
groovy:000> a.class
===> class java.lang.String
groovy:000> n = 2
===> 2
groovy:000> b = "Item $n"
===> Item 2
groovy:000> b.class
===> class org.codehaus.groovy.runtime.GStringImpl
groovy:000> a == b
===> true
groovy:000> b == a
===> true
groovy:000> a.equals(b)
===> false
groovy:000> b.equals(a)
===> false

a is a normal Java String, whereas b is a Groovy GString. Even more interesting (or perhaps confusing), a == comparison yields true, but a comparison using equals() is false! So back in the original example where we checked if the 'test' list contained 'Item 1', the contains() method (which comes from Java not Groovy) uses equals() to compare the values and responds that in fact, no, the list does not contain the String "Item 1." Without getting too much more detailed, how to fix this? Look at this example:

groovy:000> test = []
===> []
groovy:000> (1..10).each { test << "Item $it".toString() }
===> 1..10
groovy:000> test
===> [Item 1, Item 2, Item 3, Item 4, Item 5, Item 6, Item 7, Item 8, Item 9, Item 10]
groovy:000> test.contains('Item 1')
===> true

Now it returns true, since in the above example we explicitly forced evaluation of the GStrings in the list by calling toString(). The Groovy web site explains about GString lazy evaluation stating that "GString uses lazy evaluation so its not until the toString() method is invoked that the GString is evaluated."

Ok, so we now understand why the observed behavior happens. The second question is whether or not this is something likely to confuse the You-Know-What out of developers and whether it should behave differently. To me, this could easily lead to some subtle and difficult to find bugs and I think it should work like a developer expects without requiring an understanding of whether GStrings are lazily-evaluated or not. Lazy evaluation always seems to come at a price, for example in ORM frameworks it can easily result in the "n + 1 selects" problem.

For comparison, let's take a look at similar Ruby example in an irb session:

>> test = []
=> []
>> (1..10).each { |n| test << "Item #{n}" }
=> 1..10
>> test
=> ["Item 1", "Item 2", "Item 3", "Item 4", "Item 5", "Item 6", "Item 7", "Item 8", "Item 9", "Item 10"]
>> test.include? "Item 1"
=> true

Ruby "does the right thing" from just looking at the code. Ruby 1, Groovy 0 on this one. I don't actually know whether Ruby strings evaluate the expressions immediately or not, and should not need to care. Code that looks simple should not require understanding the implementation strategy of a particular feature. Otherwise it is broken in my opinion.