Translate

Search This Blog

Wednesday 14 September 2011

How does HashMap work in Java

We all know and must have used HashMap, its a Map interface implementation. It stores keys and its corresponding value. Keys cannot contain duplicates and can contain at the most one null key (HashTable does not allow null as key, FYI). It has non synchronized methods unlike HashTable. It does not guarantee the order of retrieval (it changes every time the HashMap is modified) etc etc. Enough of this now..


So how does HashMap work?
Basically when this question is asked it generally means how is the object stored and retrieved from the     HashMap. But of course with the get and put methods in the Map API. Easy isn't it, but that is not what
meets the eye. We need to understand what goes inside to understand the answer to the title of this topic.

So what happens when you put a key/value pair in the HashMap, Firstly, hashCode of the key is retrieved and supplied to hash function to defend against poorly constructed hashCode implementation; to get a new hash value. This value is then used to figure out which bucket the entry (Entry is an static inner class within the HashMap structure, which stores the key, its associated value and the reference to the next entry, it is created whenever you try to put an new mapping in the HashMap; Buckets are nothing but an array of Entry objects) should belong to, using another method named indexFor which calculates the index for the bucket. (Initially, the number of buckets within the HashMap is equal to in the initial capacity of the Map,FYI).
Once the index is found, if there is an entry at that location in the bucket; then that entry's hash value and key is checked for equality with the new key if they are same the old value is replaced, else the old entry is marked as successor to the new entry forming a singly linked list. If there is no entry at the calculated index location; the new value is stored at that location.

What happens when you retrieve the key from the Map using get? If you understand how the above put logic works, retrieval logic is easy. We first, get the hash value by applying the hash function to the hashCode of the key. This value is then used to calculate the index value of the bucket. Using this index we get the entry, if the hash value  and the key of the retrieved entry and key passed is the same then the value is returned, else we traverse the linked list till we get the value. If not we return null.

This above understanding helps us to understand various questions
  • What is HashMap? Why do we use it?
  • How does HashMap work? How does the get method work?
  • What will happen when two different objects has same hashCode, (during both insertion and retrieval)
Another interesting thing to note is what happens when the capacity of the HashMap is reached.  As said in the java docs, when the number of entries exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method. Note, when rehashing happens, issues like race conditions can occur when the HashMap is used concurrently.
Also, the iterator returned by some of the methods in the class are fail-fast; if the Map is structurally modified after the creation of the iterator it will throw ConcurrentModificationException. This fact is recorded via the modCount (modification count) whenever an mapping is added or removed.

Hope this helps you understand why a correct implementation of hashCode and equals methods are so important to a class when its objects are used in a collections (based on principle of hashing)  and explains the strange behavior of these collections sometimes.

References

No comments:

Post a Comment

Please use proper blogging etiquette while posting comments.

Note: only a member of this blog may post a comment.