Wednesday, 14 September 2011

How does HashMap work in Java

We all know and must have used HashMap, its a Map interface implementation. It stores keys and its corresponding value. Keys cannot contain duplicates and can contain at the most one null key (HashTable does not allow null as key, FYI). It has non synchronized methods unlike HashTable. It does not guarantee the order of retrieval (it changes every time the HashMap is modified) etc etc. Enough of this now..


So how does HashMap work?
Basically when this question is asked it generally means how is the object stored and retrieved from the     HashMap. But of course with the get and put methods in the Map API. Easy isn't it, but that is not what
meets the eye. We need to understand what goes inside to understand the answer to the title of this topic.

So what happens when you put a key/value pair in the HashMap, Firstly, hashCode of the key is retrieved and supplied to hash function to defend against poorly constructed hashCode implementation; to get a new hash value. This value is then used to figure out which bucket the entry (Entry is an static inner class within the HashMap structure, which stores the key, its associated value and the reference to the next entry, it is created whenever you try to put an new mapping in the HashMap; Buckets are nothing but an array of Entry objects) should belong to, using another method named indexFor which calculates the index for the bucket. (Initially, the number of buckets within the HashMap is equal to in the initial capacity of the Map,FYI).
Once the index is found, if there is an entry at that location in the bucket; then that entry's hash value and key is checked for equality with the new key if they are same the old value is replaced, else the old entry is marked as successor to the new entry forming a singly linked list. If there is no entry at the calculated index location; the new value is stored at that location.

What happens when you retrieve the key from the Map using get? If you understand how the above put logic works, retrieval logic is easy. We first, get the hash value by applying the hash function to the hashCode of the key. This value is then used to calculate the index value of the bucket. Using this index we get the entry, if the hash value  and the key of the retrieved entry and key passed is the same then the value is returned, else we traverse the linked list till we get the value. If not we return null.

This above understanding helps us to understand various questions
  • What is HashMap? Why do we use it?
  • How does HashMap work? How does the get method work?
  • What will happen when two different objects has same hashCode, (during both insertion and retrieval)
Another interesting thing to note is what happens when the capacity of the HashMap is reached.  As said in the java docs, when the number of entries exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method. Note, when rehashing happens, issues like race conditions can occur when the HashMap is used concurrently.
Also, the iterator returned by some of the methods in the class are fail-fast; if the Map is structurally modified after the creation of the iterator it will throw ConcurrentModificationException. This fact is recorded via the modCount (modification count) whenever an mapping is added or removed.

Hope this helps you understand why a correct implementation of hashCode and equals methods are so important to a class when its objects are used in a collections (based on principle of hashing)  and explains the strange behavior of these collections sometimes.

References

Tuesday, 13 September 2011

Immutable classes

An object is said to be mutable when the state of the object can be changed E.x. through the setter methods mostly (of course there are other alternatives to change the state, which we will understand once we step through the guidelines of creating an immutable class). Immutable objects are objects whose state cannot be changed after they are created. Once created their state remains the same till the lifetime of the application (rather every life of the application).


Why would one create an immutable objects?  


  • Immutability objects find their use in concurrent applications/ multi-threaded applications, such an object is always thread safe which means threads wont see an inconsistent state of such an object,so you don't have to synchronize access to them across threads.
  • They also are good candidate in Hash based collections like HashSet  (they need to override equals and hashCode methods). 
  • You can freely share and cache references to immutable objects without having to copy or clone them; you can cache their fields or the results of their methods without worrying about the values becoming stale or inconsistent with the rest of the object's state. 
  • Wrapper classes in java language like Integer, Short, String etc are immutable. 


    So how would you create an Immutable class or what are steps to create one?
    1. Don't provide "setter" methods — methods that modify fields or objects referred to by fields.
    2. Make all fields final and private.
    3. Don't allow subclasses to override methods. The simplest way to do this is to declare the class as final. A more sophisticated approach is to make the constructor private and construct instances in factory methods.
    4. If the instance fields include references to mutable objects, don't allow those objects to be changed:
      • Don't provide methods that modify the mutable objects.
      • Don't share references to the mutable objects. Never store references to external, mutable objects passed to the constructor; if necessary, create copies, and store references to the copies. Similarly, create copies of your internal mutable objects when necessary to avoid returning the originals in your methods.
    References

    Monday, 29 August 2011

    Equals and hashcode method

    are two of the most important non final methods in the Object class which if not coded properly can lead to problems when used in collection and are difficult to detect and debug. Hence it is important to understand the contract of the two methods to code for the same.
    Equals method, defined in the object class checks if the references of the objects are the same, i.e they point to the same memory location. You can override this method to check if the two objects are meaningfully equivalent (though residing in different memory location). The following contract needs to be observed while coding the equals method

    • Reflexive - o1.equals(o1), an object should be equal to itself
    • Symmetric -  o1.equals(o2), if and only o2.equals(o1)
    • Transitive -  o1.equals(o2), and o2.equals(o3) implies o1.equals(o3)
    • Consistent - o1.equals(02), returns the same result as long as the objects are not modified.
    • Null Comparison - !o1.equals(null), which means any object is not equal to null, it should return false.
    • Equals and hashCode - If two objects are equal then their hashCode should be equal as well, however the reverse is not true. (It is mandatory to define hashCode if you define equals method)
    HashCode method returns an integer and is supported for the benefit of hashing based collections like HashMap,Hashtable etc. The contract for this method are
    • Consistent - Whenever the method is invoked on the same object more than once during the lifetime of an application, it should always return the same result
    • If two objects are equals as per the equals method then calling the hashCode method in each of the two objects must consistently return the same integer result. If a field is not used in equals method it should not be used in hashCode method as well.
    • If two objects are unequal as per the equals methods then calling the hashCode method in each of the two objects can either return the same or different integer result.

    How to make classes thread safe ?

    What is Thread safety?
    Thread safety is not making a class which implements Runnable or extends Thread, safe.  Thread safety means that the fields of an object or class always maintain a valid state as observed by other objects or classes when used concurrently by multiple threads.


    What are the problems if you don't make a class thread safe?
    Consider a scenario, where two threads are working on the same instance of the class. The object may be undergoing some modification in one thread and at the same time another thread may try to view the state of this object. This object when in thread1 could be in some intermediate state (when it is pre-empted) when thread 2 tries to access it. In this situation, thread 2 is viewing dirty state of the object. To make two threads, view a valid state of the object is when we term that an object is Thread safe.

    When should you think about thread safety?
    Firstly, you should think of thread safety whenever you are working in a multi-threaded application. Java allows multiple threads to be created, hence you think about it whenever you are coding in Java. When you have an instance of a class which can potentially be accessed in multiple threads when undergoing change in state, you should think about thread safety.
    Secondly, as we know all threads share the same heap and memory area, it make it necessary to think about thread safety.

    How can you achieve thread safety?
    Remember? You would not think of thread safety if your object is immutable or is being used in read-only mode,reason: the state of the object cannot be changed or is not changed in these situations.
    So, for sure,we know that making a class immutable (we will discuss in another article how to make a class immutable) or using it as read-only gives us thread safety.
    But, there is a situation where you have to modify the state of the objects then you need to think of thread safety, synchronization the piece of code can help.
    It may also happen that the code that is provided cannot be modified (could be a third party library or your boss does not allow you to play with a tried and tested code), you can still do it by extending the class and making a thread safe wrapper which can be used by the clients of your code. (Example , Collections are not thread safe, you can write thread safe wrapper for collections)


    Points to remember when making the class thread safe
    • Synchronize only the critical code.
    • Instance and static variable are not thread safe, they exist on heap and memory area which multiple threads can access.
    • Local variables,parameters to methods are thread safe.
    • Synchronization comes with performance cost, so synchronize only whats is critical and not the entire method.
    References:



    All about Exceptions

    What are Exception? Well, exception are exceptions, they are not normal and suggest that they should be handled in order for you to proceed forward or else some can be severe that they can cripple your application.

    What are the different types of exceptions?
    Exceptions - are the not so serious types of problem in your code. Some are ones which are known and hence compiler wants you to handle them and take corrective action, these are called checked exceptions and some are unknown at compile time and are hence called unchecked exceptions. Exceptions like ArrayIndexOutOfBoundsException are unchecked exception as they happen at runtime and hence are also termed as RuntimeException.
    Errors - are more fatal, they can bring down your application, you cannot catch them. OutOfMemoryError is once such example, it means that JVM cannot allocate an object because it is out of memory.

    Some points to remember about exceptions

    • All Exceptions are derived from java.lang.Exception or its subclass
    • You can create your own exception by extending the Exception or its subclass (for checked exceptions) or RuntimeException (for unchecked exceptions).
    • Compiler forces checked exceptions to be handled, you either handle them via try-catch block or mention that the method wont handle them and declare them in the throws clause.
    • Runtime exceptions may or may not be handled.
    • Catch blocks are ordered to catch exceptions from more specific to general, else the compiler will complain. Reason, the broader exception class can handle its sub-classes as well and hence it wont make sense to have a more specific exception (which is its subclass) later in the catch block. Ex a FileNotFoundException cannot be caught later in a catch block than IOException, as the earlier is a subclass of IOException.
    • Uncaught exceptions propagate back through the call stack, from the place it was generated to the first method that has the try-catch block, if its not handle anywhere not even in the main method then it will cause a JVM shutdown.
    Some best practices with exceptions
    • Do not use Exceptions as flow controls, exceptions are not alternate routes of flow but are unwarranted scenarios which should be handled.
    • Exception should be thrown early for it to be more accurate and specific, reason the stack trace generated at that point can help you to get to the root cause of the exception as it will contain all the method calls that lead to the exception.
    • Handle exceptions at the appropriate layer. Allow the exceptions to pass through if it makes no sense to handle at a particular layer, don't swallow it with empty catch block or a system.out just because the compiler is cribbing at that point. Instead handle (try-catch) it at the appropriate layer where you can meaningfully recover from the exception and can continue or if not , log it using logging frameworks so that the root cause can be identified and fixed.
    • To create a Checked or Unchecked Exception is a very debatable topic. However if you can recover meaningfully from an exception then throw a checked exception. 
    • Do not create or throw exceptions unnecessarily, reason creating a stack trace is a costly process.


    Saturday, 13 August 2011

    Serialization in Java

    In my previous post about cloning there was a mention of serialization. So,

    What is serialization?
    It is the process of reading or writing an object to a stream. Here the object's state is written to a sequence of bytes, this transformation is called serialization. The object state can then be retrieved from the stream of bytes into a live object for use later in the program. This process is called de-serialization.

    What can or cannot be serialized?
    The variables which are marked as static or transient are not serialized all others can be. Static variables are not serialized as the don't belong to any individual object of the class, where as transient construct lets the programmer control which variables need not be serialized. So what happens to these variables when the state is de-serialized; these variables will get the default values.

    When and why would one need serialization?
    Serialization has found its uses in many places
    1. As mentioned in the post for 'Deep cloning and Shallow cloning' we can achieve deep cloning via serialization.
    2. Whenever an object needs to be saved for future use its state can be serialized to a file or database or in-memory and then later de-serialized.(ex object stored in HTTP session should be serializable to support in-memory replication for scalability) NOTE: Though not mandatory but as a convention, when an objects state is written to a file, the file should have an extension of .ser 
    3. To send an object over the network i.e from one JVM to another (ex. objects passed in RMI needs to be serializable to support marshaling and un-marshaling of objects)

    There are three primary reasons why objects are not serializable by default and must implement the Serializable interface to access Java's serialization mechanism.
    1. Not all objects capture useful semantics in a serialized state. For example, a thread object is tied to the state of the current JVM. There is no context in which a de-serialized Thread object would maintain useful semantics.
    2. The serialized state of an object forms part of its class's compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be a deliberate design decision and not a default condition.
    3. Serialization allows access to non-transient private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable nor externalizable. (If there is a need for more control over the process of reading /writing the object to a stream then the class can implement Externalizable)
    Some of the classes which are Serializable in java are the Wrapper classes,String class, Date,DateTime, File etc.

    How can we achieve/JVM helps in serialization?
    An object can be marked as serializable by making the object implement the Serializable interface. It is just a marker interface and has no method which the class needs to implement. (markers help the JVM to understand that the object is of a specific type and needs to be treated accordingly i.e it helps to identify the semantics of being serializable). All subtypes of the Serializable class are themselves Serializable.

    To allow subtypes of non-serializable classes to be serialized, the subtype may assume responsibility for saving and restoring the state of the supertype's public, protected, and (if accessible) package fields provided  the class it extends has an accessible no-arg constructor to initialize the class's state. It is an error to declare a class Serializable if this is not the case and will be detected at runtime.

    During deserialization, the fields of non-serializable classes will be initialized using the public or protected no-arg constructor of the class. A no-arg constructor must be accessible to the subclass that is serializable. The fields of serializable subclasses will be restored from the stream.

    When traversing a graph, an object may be encountered that does not support the Serializable interface. In this case the NotSerializableException will be thrown and will identify the class of the non-serializable object.

    What do you understand by serial Version ID? 
    The serialVersionUID is a universal version identifier for a Serializable class. Deserialization uses this number to ensure that a loaded class corresponds exactly to a serialized object. 

    During the process of serialization all classes are given an unique serial Version ID if it is not explicitly provided in the class. [You can explicitly add a unique ID yourself pro-grammatically or by the use of SerialVer tool.].
    If the identifier of the class and that of the flattened object is not the same then the de-serialization process will throw an InvalidClassException. This can happen if a new attribute is added to the modified class or the class no longer extends the same hierarchy tree , that is, the structure of the class undergoes modification. This  exception could also be thrown when the serialVersionID calculated by different JVM's vary.

    References

    Type casting - Implicit and explicit

    The basics
    Whenever you create a class in java, you create a new data type. You can create instances of these classes  which are referred to as objects. To work with the objects you would need a variable through which you refer the objects, these are called as the reference variable and the object that the variable refers to is called the referred object.

    Creating a reference variable with the java statement
             Employee employee;  (In general ReferenceType reference)
    will not create an object of Employee instead it will just mean that we have created a reference variable for an object of type Employee.
    This reference can hold the value of null or an object of ReferenceType  or any object whose type is subclass of ReferenceType.The reference type could be a interface/abstract class/class. The type of the reference determines how the referenced object i.e the object that is the value of the reference can be used,(you can use the behavior of the referenced object or its superclasses if any).  However, the object type of the referred object determines the behavior of the variable at runtime (FYI, polymorphism).


    So what is casting?
    Type casting happens when one data type is converted to another data type or it simply means treating a variable of one type as though its another type. There are 2 ways a casting can happen 
    1. Upcasting - also called widening conversions. Its easy to convert a subclass to the superclass, reason a subclass object is also a superclass object. Ex casting a SavingAccount object to an Account class variable is automatic.
    2. Downcasting - also called narrowing conversions, this requires an explicit cast and also that the object that is casted is a legitimate instance of the class you are casting too. When you code for explicit casting you tell the compiler, that you understand/take responsibility that the reference variable will hold an object of the reference type or its subclass. The JVM will then check for the correctness of this at runtime.
    You can also cast an object to an interface provided that the object's class or any one of it superclass   implement the interface .

    Casting can be applied to primitives as well. The primitives are placed in the following hierarchy/order,

                 byte--- short---int---long---float---double.  \
    Upcasting happens automatically when you go from left to right, but if you go from right to left explicit casting is required. Ex: Casting a short to a byte is possible, however if the value of the variable of type short exceeds the permissible value that can be handled by byte, then this conversion can lead to an overflow condition at runtime. Remember java does not inform us with exceptions, about overflow or underflow conditions, so care should be taken to understand the overflow and underflow behavior for such numerical conversions and computations.

    When is ClassCastException thrown?
    ClassCastException is thrown when you attempt to cast an object to a class of which it is not an instance.
    Ex You cannot assign an Account object to a SavingAccount variable.   Although you could get away with the compile time error or Type mismatch error by explicit casting; it would fail at runtime with ClassCastException. A Type mismatch error also occurs when you try to reference a object of a different hierarchy. (Ex. trying to assign an employee object to an Account class variable)

    You can deal with incorrect casting in 2 ways
    1. Catching the ClassCastException with the try-catch block
    2. Before casting use to instanceof operator to check if the object being casted is an legitimate instance of the reference type else be sure to deal with it either through the else block or the exception being generated.
    NOTE:
    You could also get a ClassCastException when two different class loaders load the same class because they are treated as two different classes.

    What happens when you assign a reference variable with value null to a specific type. Say,   Account acc = (Account) acc1;  where acc1 is null. Casting has nothing to do with the value of the referenced object it just lets the compiler know that you know what you are doing (refer, Downcasting)

    However, thing to note is when you have overloaded methods Example, myMethod(String s) and myMethod(Integer i) and you want to pass null as a parameter, then you need to explicitly cast so that the compiler knows which method to call, else it cribs saying the invocation is ambiguous.

    Thursday, 11 August 2011

    What do you mean by deep cloning and shallow cloning

    Java supports two different types of cloning
    1. Shallow cloning - java's object.clone() gives a shallow copy of the object being cloned. Here the object cloned is copied without its contained object. Shallow cloning is a bitwise copy of an object. New object is created which is an exact copy of the original one. In case  of contained objects just the references are copied.
    2. Deep cloning - as object.clone() method yields shallow copy, to achieve deep copy classes needs to be adjusted (check note below). Here the original object is copied along with its contained objects (note: here the entire graph of objects are traversed and copied.) Each object in the graph is responsible of cloning itself through the clone method.So, In deep cloning, complete duplicate copy of the original copy is created. 
    NOTE
    • If the object wants to be able to clone itself it first needs to implement Cloneable interface else object is going to throw CloneNotSupportedException when the clone method is called. 
    • Second the object needs to make the clone() method public. 
    • Next in the clone method call the super.clone() 
    • You need not do anything special for primitive data types, its wrapper objects and immutable objects like String, the super.clone() method automatically creates a copy of them.
    • When you perform a deep copy involving collections the objects in the collection also needs to implement Cloneable.
    Problems with cloning
    • The entire object graph needs to be cloned, which is tiresome and error prone and difficult to maintain. Care should be taken when there is a modification to the classes.
    • Care should also be taken in circular reference of object, the reference object should be created only once. Ex Say Department contains Employee class and Employee inturn refers Department, care should be taken that the reference department is created only once.
    • If the class is not available for modification (third party classes), and there is a need for cloning, you need to create a new class by sub-classing and then overriding the clone method.
    • Also its problematic to clone a polymorphic variable, as the decision has to wait till the runtime.
    Alternatives to cloning
    Java Serialization offers an alternative for cloning (key here is the object and all the referenced objects needs to be serializable), here the entire object tree is traversed and serialized, and de-serializing it later will yield the exact state but a different copy of the original serialized object. However the cost of serialization is its performance, as this involves writing and reading from the stream.


    References

    Saturday, 6 August 2011

    Garbage collection in Java

    What is garbage collection and who handles it?
    is the process of collecting unused/unreachable objects (i.e those objects who have lived their life in the lifetime of a program and whose non existence does not affect the continuity of the program). Java handles memory de-allocation through garbage collection. There is a low priority daemon thread called the Garbage Collector thread which runs in the background performing this task of collecting unreachable objects. It runs in low memory situations to reclaim unused memory. Garbage collection cannot be forced, you can request the JVM to perform GC through System.gc(), but there is no guarantee that it will be a synchronous call or it will happen immediately on call.


    Why does GC happen?
    Whenever the JVM runs low in memory /or during certain time intervals, the garbage collector thread will wake up and try to reclaim unused memory by de-allocating memory referenced by unreachable object. Every programmer must have experienced a behavior of sudden unresponsiveness in the lifetime of a program. These are times when the garbage collector is doing its job of reclaiming memory. If the JVM cannot create an object on the heap it will fail with OutOfMemoryError.


    When does an object become eligible for GC?
    We know that in java an object that gets created, lives on the heap, whether the object is a local variable or instance variable, where as class or static variables live inside the memory area (fyi).
    When an object becomes unreachable it becomes eligible for GC. (i.e no live threads or static references point to these object)
    So when does object become unreachable.

    1. When you explicitly assign all references to an object to null, stating that the object has served its purpose and is no longer needed.
    2. Object goes out of scope once the block of code in which it is created gets executed.
    3. Cyclic references are detected by garbage collecting algorithms and hence become automatically eligible for garbage collection as long as there are no other live object referencing these objects.
    4. When the container object is set to null , the contained objects become eligible for GC.
    5. All weak reference gets eligible for GC in the next GC cycle.

    References

    How does JVM handle synchronization?

    Introduction
    We have seen how the JVM organizes the program into runtime data areas in my previous blog. We know that each thread has its own stack and that all the threads in the program share the heap. The heap contains all the objects created within the program including the thread. The method area contains all the class /static variables of a classes used by the program. These variables are available to all the threads in the program.

    Now we know that we have two data areas which contains data shared by all threads.
    1. the heap and 2. the method area

    Monitors
    So if two threads try to access the objects or class variables in these areas concurrently, then the data need to be properly managed else we will end up with inconsistent data. This situation can be handled through synchronization and java manages this through the use of 'monitor'. Hence monitor acts as a guardian over a piece of code, so that no two threads execute that code simultaneously.
    Java's monitor supports two kinds of synchronization: mutual exclusion and cooperation

    • Mutual exclusion is achieved in JVM through the use of object or class blocks , they enable multiple threads to work independently on shared data without interfering with  each other. 
    • While Cooperation in JVM is achieved through the use of object wait, notify and notifyAll methods.

    Each monitor is associated with an object reference. Whenever a thread reaches the code which is synchronized, it must obtain a lock on the referenced object failing which it will have to wait (block on synchronization state). Once the thread obtains the lock the JVM increases the count of the number of times an object has been locked. The same thread can lock the same object multiple times. Whenever the thread releases/relinquishes the lock the count is decremented. When there are no more locks on the object i.e count returns to zero, then any other thread can obtain the lock on the object if needed.

    Synchronization is supported by the java language  in two ways

    1. synchronized method : Whenever the JVM encounters a symbolic reference to the method and realizes its a synchronized method, then it tries to obtains the lock from the monitor. If the lock is obtained then the synchronized method is executed and once all the statements are processed or the code throws an exception the lock is released. For an instance method, a lock is obtained on the object, of which synchronized method is invoked. In case of static synchronized method the lock is obtained on the class object. The JVM does not use special op-codes to invoke or return from method level synchronization.
    2. synchronized statement (Block of code): The JVM uses two special op codes monitorenter and monitorexit whenever a thread enters or exits the synchronized block of code. When the JVM's encounters monitorenter  it tries to obtain a lock on the object referred to by objectref on the stack. If the lock is already obtained the the count is incremented by one and whenever the monitorexit  is encountered the count is decremented by one. When the count reaches zero the lock is released.

    References