Question

I am currently implementing a Hashed Array-Mapped Table (HAMT) in Java and I've run into the following issue. When inserting new key-value pairs, there will obviously be collisions. In the original paper, the author suggests:

The existing key is then inserted in the new sub-hash table and the new key added. Each time 5 more bits of the hash are used the probability of a collision reduces by a factor of 1/32. Occasionally an entire 32 bit hash may be consumed and a new one must be computed to diﬀerentiate the two keys.

... and also:

The hash function was tailored to give a 32 bit hash. The algorithm requires that the hash can be extended to an arbitrary number of bits. This was accomplished by rehashing the key combined with an integer representing the trie level, zero being the root. Hence if two keys do give the same initial hash then the rehash has a probability of 1 in 2^32 of a further collision.

So i tried this in Java, with strings. It is known that:

"Ea".hashCode() == "FB".hashCode(); // true

... because of the String#hashCode() algorithm. Following the suggestion in the paper and extending the string with the tree depth to yield another, non-colliding hash code sadly doesn't work:

"Ea1".hashCode() == "FB1".hashCode();  // :(  still the same!!

The above holds true for any integer you might concatenate the strings with, their hash codes will always collide.

My question is: how do you solve this situation? There has been this answer to a very similar question, but there has not been a real solution in the discussion. So how do we do this...?

Answer 1

You have to implements equals() method to compare if the values are equals.

Hashcode is just to sort data into data collections and it is usefull to binarySearch works. But hashcode() is nothing without equals().

Java: How to perform hash collision mitigation during insertion in Hashed Array Mapped Tree (HAMT)?

1 个答案: