Effective Java #9
The (not so) curious case of hashCode() and equals()

  Jul 25, 2016 -   read
  java, hashmap, hashcode, equals, object

As Java (or Android) developers, we use Object all the time. Java is an object-oriented programming language in the first place. We probably read/hear (more than once1 2 3) that equals() and hashCode() should only be overriden together, as they affect hashtable operations. Let’s do an interactive experiment to see what happens when one doesn’t follow advice.

First let’s grab some official definitions. The Java language specification (Java SE 8) is quite brief on this:

The method equals defines a notion of object equality, which is based on value, not reference, comparison.

The method hashCode is very useful, together with the method equals, in hashtables such as java.util.HashMap.

Effective Java item #9 has a more lengthy explanation of this contract, summing up that:

Always override hashCode() when you override equals()

The contract basically says that if 2 objects are equals(), they should have the same hashCode() value, but 2 objects having the same hashCode() value are not necessarily equals().

The experiment

Consider a common use case where we have a collection of items, each has an ID. An item is uniquely identified by its ID: if 2 items have the same ID, they are considered ‘logically equivalent’, and should be treated as being the same when we program using them. Let’s represent such item using Item class, and use its instances as keys for map operations as follows:

class Item {
  private final int id;

  Item(int id) {
    this.id = id;
  }
}

// for some funny reason, we create 3 instances
// that represent the same thing
Item item1 = new Item(1),
  item2 = new Item(1),
  item3 = new Item(1);
HashMap<Item, Integer> map = new HashMap<>();
map.put(item1, 1); // set value to 1
map.put(item2, 2); // override value to 2
map.put(new Item(100), 100); // some random entry
System.out.println(map.get(item1));
System.out.println(map.get(item2));
System.out.println(map.get(item3));

As our keys are supposed to represent the same item logically, the expected output to the 3 printouts are:

2, 2, 2

(Trivial) test runs

TL;DR

  • Test 1: Don’t override hashCode() and equals()
  • Test 2: Override hashCode() only
  • Test 3: Override equals() only
  • Test 4: Override both hashCode() and equals()
  • Test 5: What if hashCode() is random?
  • Test 6: What if hashCode() is always the same?
  • Test 7: And what if equals() is always false?
  • Test 8: Okay what if equals() is always true now?

You can skip to explanation, or try to work out explanation for each test output as you read on.

Test 1: Don’t override hashCode() and equals()

class Item {
  private final int id;

  Item(int id) {
    this.id = id;
  }
}
1, 2, null [FAILED]

Test 2: Override hashCode() only

class Item {
    ...
    @Override
    public int hashCode() {
      return id;
    }
}
1, 2, null [FAILED]

Test 3: Override equals() only

class Item {
  ...
  @Override
  public boolean equals(Object obj) {
    return obj instanceof Item && id == ((Item) obj).id;
  }
}
1, 2, null [FAILED]

Test 4: Override both hashCode() and equals()

class Item {
  ...
  @Override
  public int hashCode() {
    return id;
  }
  
  @Override
  public boolean equals(Object obj) {
    return obj instanceof Item && id == ((Item) obj).id;
  }
}
2, 2, 2 [PASSED]

Test 5: What if hashCode() is random?

class Item {
  private static int staticInt;
  ...
  @Override
  public int hashCode() {
    return ++staticInt; // make sure each call gets a different value
  }
  
  @Override
  public boolean equals(Object obj) {
    return obj instanceof Item && id == ((Item) obj).id;
  }
}
null, null, null [FAILED]

Test 6: What if hashCode() is always the same?

class Item {
  ...
  @Override
  public int hashCode() {
    return 0;
  }
  
  @Override
  public boolean equals(Object obj) {
    return obj instanceof Item && id == ((Item) obj).id;
  }
}
2, 2, 2 [PASSED]

Test 7: And what if equals() is always false?

class Item {
  ...
  @Override
  public int hashCode() {
    return 0;
  }
  
  @Override
  public boolean equals(Object obj) {
    return false;
  }
}
1, 2, null [FAILED]

Test 8: Okay what if equals() is always true now?

class Item {
  ...
  @Override
  public int hashCode() {
    return 0;
  }
  
  @Override
  public boolean equals(Object obj) {
    return true;
  }
}
100, 100, 100 [FAILED]

Explanation

Well, if you need no more explanation for above results then congratulations, you have mastered hashCode(), equals() and how HashMap works. But in case you get confused at some point, then below is the explanation.

A little dig into internal implementation reveals that HashMap uses hashCode(), == and equals() for entry lookup. HashMap stores entries in buckets. Ideally each bucket should store 1 entry, and by looking up a bucket in O(1) time, one can get corresponding entry in O(1) time. In practice, many entries may end up in the same bucket.

The lookup sequence for a given key k is simplified as follows:

  • Use k.hashCode() to determine which bucket the entry is stored, if any
  • If found, for each entry’s key k1 in that bucket, if k == k1 || k.equals(k1), then return k1’s entry
  • Any other outcomes, no corresponding entry

Similar sequence applies for when an entry is put into HashMap.

Now, hashCode() is a native implementation, but its Javadoc says:

/**
 * ...
 * As much as is reasonably practical, the hashCode method defined by
 * class Object does return distinct integers for distinct objects.
 * ...
 */
 public native int hashCode();

equals() implementation is much more straightforward:

public boolean equals(Object obj) {
  return (this == obj);
}

So without overriding hashCode(), its output will be different for each Item object. Without overriding equals(), 2 Item objects that are logically equivalent would only be possible if they are represented by the same instance.

With that in mind, let’s see how each test failed/passed:

  • [FAILED] Test 1: Don’t override hashCode() and equals()
    • As each key has a different hashCode() value, they end up in different buckets, and represent different entries, with different values.
  • [FAILED] Test 2: Override hashCode() only
    • Our keys are in the same bucket now, but they end up representing different entries, due to failed equals() checks.
  • [FAILED] Test 3: Override equals() only
    • Without overriding hashCode(), HashMap never gets to the point where it needs to do equals() checks. We get the same results as Test 1.
  • [PASSED] Test 4: Override both hashCode() and equals()
    • Our keys are in the same bucket, and they represent the same entry, as either == or equals() checks passed.
  • [FAILED] Test 5: What if hashCode() is random?
    • It’s worse than not overriding hashCode(), as the same key object would end up in a different bucket every time we use it.
  • [PASSED] Test 6: What if hashCode() is always the same?
    • Our code is functionally correct, but the use of HashMap is now redundant, as all entries will end up in the same bucket, regardless of their keys, which takes O(N) or O(logN)4 to iterate.
  • [FAILED] Test 7: And what if equals() is always false?
    • It would be the same as not overriding equals(). We get the same results as Test 2.
  • [FAILED] Test 8: Okay what if equals() is always true now?
    • We would be treating each key as logically equivalent to any other key, and thus any map operation will refer to the same entry.

I hope this trivial article now convinces you to follow Java language specification advice when overriding hashCode() and/or equals(). Effective Java2 item #9 provides much more informative insights, and you should absolutely grab the book and check it out as well.


Ha Duy Trung
Currently cooking, brewing and building Android stuffs.