Replacing a HashSet with a BitSet

kinow @ Oct 20, 2012 19:51:39 ()

I always read the messages in the Apache dev mailing lists, including Apache Commons dev mailing list. And you should too. There are always interesting discussions. Sometimes you participate, other times you only watch what’s happening, but in the end you always learn something new.

A few days ago, I found an issue where it was being proposed to replace an unnecessary HashSet in ArrayUtils#removeElements() by a BitSet. Here’s how the code looked like:

HashSet<Integer> toRemove = new HashSet<Integer>();
for (Map.Entry<Character, MutableInt> e : occurrences.entrySet()) {
    Character v = e.getKey();
    int found = 0;
    for (int i = 0, ct = e.getValue().intValue(); i < ct; i++) {
        found = indexOf(array, v.charValue(), found);
        if (found < 0) {
            break;
        }
        toRemove.add(found++);
    }
}
return (char[]) removeAll((Object)array, extractIndices(toRemove));

The HashSet created at line 1, in the code above, was used to store the array index of the elements that should be removed. And at line 13 there is a call to removeAll method, passing the indexes to be removed. And here’s how the new code looks like:

BitSet toRemove = new BitSet();
for (Map.Entry<Character, MutableInt> e : occurrences.entrySet()) {
    Character v = e.getKey();
    int found = 0;
    for (int i = 0, ct = e.getValue().intValue(); i < ct; i++) {
        found = indexOf(array, v.charValue(), found);
        if (found < 0) {
            break;
        }
        toRemove.set(found++);
    }
}
return (char[]) removeAll(array, toRemove);

The first difference is at line 1. Instead of a HashSet, it is now using a BitSet. And at line 10, instead of adding a new element to the HashSet, now it “sets” a bit in the set (the bit at the specified position is now true). But there are important changes at line 13. The method removeAll was changed, and now the array doesn’t require a cast anymore. And the it is not necessary to cast the elements from HashSet anymore, as now the bit in the index position of the set is set to true. So the extractIndices method could be removed.

The code got simpler. But that’s not all. At Apache Software Foundation you can find a lot of talented developers - that’s why I got so excited after joining them. Besides simplifying the code, the developer responsible for these changes (sebb) also pointed out that the new code consumes less memory and is faster. Ah! And he also wrote unit tests