Monday, 29 August 2011

Equals and hashcode method

are two of the most important non final methods in the Object class which if not coded properly can lead to problems when used in collection and are difficult to detect and debug. Hence it is important to understand the contract of the two methods to code for the same.
Equals method, defined in the object class checks if the references of the objects are the same, i.e they point to the same memory location. You can override this method to check if the two objects are meaningfully equivalent (though residing in different memory location). The following contract needs to be observed while coding the equals method

  • Reflexive - o1.equals(o1), an object should be equal to itself
  • Symmetric -  o1.equals(o2), if and only o2.equals(o1)
  • Transitive -  o1.equals(o2), and o2.equals(o3) implies o1.equals(o3)
  • Consistent - o1.equals(02), returns the same result as long as the objects are not modified.
  • Null Comparison - !o1.equals(null), which means any object is not equal to null, it should return false.
  • Equals and hashCode - If two objects are equal then their hashCode should be equal as well, however the reverse is not true. (It is mandatory to define hashCode if you define equals method)
HashCode method returns an integer and is supported for the benefit of hashing based collections like HashMap,Hashtable etc. The contract for this method are
  • Consistent - Whenever the method is invoked on the same object more than once during the lifetime of an application, it should always return the same result
  • If two objects are equals as per the equals method then calling the hashCode method in each of the two objects must consistently return the same integer result. If a field is not used in equals method it should not be used in hashCode method as well.
  • If two objects are unequal as per the equals methods then calling the hashCode method in each of the two objects can either return the same or different integer result.

How to make classes thread safe ?

What is Thread safety?
Thread safety is not making a class which implements Runnable or extends Thread, safe.  Thread safety means that the fields of an object or class always maintain a valid state as observed by other objects or classes when used concurrently by multiple threads.

What are the problems if you don't make a class thread safe?
Consider a scenario, where two threads are working on the same instance of the class. The object may be undergoing some modification in one thread and at the same time another thread may try to view the state of this object. This object when in thread1 could be in some intermediate state (when it is pre-empted) when thread 2 tries to access it. In this situation, thread 2 is viewing dirty state of the object. To make two threads, view a valid state of the object is when we term that an object is Thread safe.

When should you think about thread safety?
Firstly, you should think of thread safety whenever you are working in a multi-threaded application. Java allows multiple threads to be created, hence you think about it whenever you are coding in Java. When you have an instance of a class which can potentially be accessed in multiple threads when undergoing change in state, you should think about thread safety.
Secondly, as we know all threads share the same heap and memory area, it make it necessary to think about thread safety.

How can you achieve thread safety?
Remember? You would not think of thread safety if your object is immutable or is being used in read-only mode,reason: the state of the object cannot be changed or is not changed in these situations.
So, for sure,we know that making a class immutable (we will discuss in another article how to make a class immutable) or using it as read-only gives us thread safety.
But, there is a situation where you have to modify the state of the objects then you need to think of thread safety, synchronization the piece of code can help.
It may also happen that the code that is provided cannot be modified (could be a third party library or your boss does not allow you to play with a tried and tested code), you can still do it by extending the class and making a thread safe wrapper which can be used by the clients of your code. (Example , Collections are not thread safe, you can write thread safe wrapper for collections)

Points to remember when making the class thread safe
  • Synchronize only the critical code.
  • Instance and static variable are not thread safe, they exist on heap and memory area which multiple threads can access.
  • Local variables,parameters to methods are thread safe.
  • Synchronization comes with performance cost, so synchronize only whats is critical and not the entire method.

All about Exceptions

What are Exception? Well, exception are exceptions, they are not normal and suggest that they should be handled in order for you to proceed forward or else some can be severe that they can cripple your application.

What are the different types of exceptions?
Exceptions - are the not so serious types of problem in your code. Some are ones which are known and hence compiler wants you to handle them and take corrective action, these are called checked exceptions and some are unknown at compile time and are hence called unchecked exceptions. Exceptions like ArrayIndexOutOfBoundsException are unchecked exception as they happen at runtime and hence are also termed as RuntimeException.
Errors - are more fatal, they can bring down your application, you cannot catch them. OutOfMemoryError is once such example, it means that JVM cannot allocate an object because it is out of memory.

Some points to remember about exceptions

  • All Exceptions are derived from java.lang.Exception or its subclass
  • You can create your own exception by extending the Exception or its subclass (for checked exceptions) or RuntimeException (for unchecked exceptions).
  • Compiler forces checked exceptions to be handled, you either handle them via try-catch block or mention that the method wont handle them and declare them in the throws clause.
  • Runtime exceptions may or may not be handled.
  • Catch blocks are ordered to catch exceptions from more specific to general, else the compiler will complain. Reason, the broader exception class can handle its sub-classes as well and hence it wont make sense to have a more specific exception (which is its subclass) later in the catch block. Ex a FileNotFoundException cannot be caught later in a catch block than IOException, as the earlier is a subclass of IOException.
  • Uncaught exceptions propagate back through the call stack, from the place it was generated to the first method that has the try-catch block, if its not handle anywhere not even in the main method then it will cause a JVM shutdown.
Some best practices with exceptions
  • Do not use Exceptions as flow controls, exceptions are not alternate routes of flow but are unwarranted scenarios which should be handled.
  • Exception should be thrown early for it to be more accurate and specific, reason the stack trace generated at that point can help you to get to the root cause of the exception as it will contain all the method calls that lead to the exception.
  • Handle exceptions at the appropriate layer. Allow the exceptions to pass through if it makes no sense to handle at a particular layer, don't swallow it with empty catch block or a system.out just because the compiler is cribbing at that point. Instead handle (try-catch) it at the appropriate layer where you can meaningfully recover from the exception and can continue or if not , log it using logging frameworks so that the root cause can be identified and fixed.
  • To create a Checked or Unchecked Exception is a very debatable topic. However if you can recover meaningfully from an exception then throw a checked exception. 
  • Do not create or throw exceptions unnecessarily, reason creating a stack trace is a costly process.

Saturday, 13 August 2011

Serialization in Java

In my previous post about cloning there was a mention of serialization. So,

What is serialization?
It is the process of reading or writing an object to a stream. Here the object's state is written to a sequence of bytes, this transformation is called serialization. The object state can then be retrieved from the stream of bytes into a live object for use later in the program. This process is called de-serialization.

What can or cannot be serialized?
The variables which are marked as static or transient are not serialized all others can be. Static variables are not serialized as the don't belong to any individual object of the class, where as transient construct lets the programmer control which variables need not be serialized. So what happens to these variables when the state is de-serialized; these variables will get the default values.

When and why would one need serialization?
Serialization has found its uses in many places
  1. As mentioned in the post for 'Deep cloning and Shallow cloning' we can achieve deep cloning via serialization.
  2. Whenever an object needs to be saved for future use its state can be serialized to a file or database or in-memory and then later de-serialized.(ex object stored in HTTP session should be serializable to support in-memory replication for scalability) NOTE: Though not mandatory but as a convention, when an objects state is written to a file, the file should have an extension of .ser 
  3. To send an object over the network i.e from one JVM to another (ex. objects passed in RMI needs to be serializable to support marshaling and un-marshaling of objects)

There are three primary reasons why objects are not serializable by default and must implement the Serializable interface to access Java's serialization mechanism.
  1. Not all objects capture useful semantics in a serialized state. For example, a thread object is tied to the state of the current JVM. There is no context in which a de-serialized Thread object would maintain useful semantics.
  2. The serialized state of an object forms part of its class's compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be a deliberate design decision and not a default condition.
  3. Serialization allows access to non-transient private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable nor externalizable. (If there is a need for more control over the process of reading /writing the object to a stream then the class can implement Externalizable)
Some of the classes which are Serializable in java are the Wrapper classes,String class, Date,DateTime, File etc.

How can we achieve/JVM helps in serialization?
An object can be marked as serializable by making the object implement the Serializable interface. It is just a marker interface and has no method which the class needs to implement. (markers help the JVM to understand that the object is of a specific type and needs to be treated accordingly i.e it helps to identify the semantics of being serializable). All subtypes of the Serializable class are themselves Serializable.

To allow subtypes of non-serializable classes to be serialized, the subtype may assume responsibility for saving and restoring the state of the supertype's public, protected, and (if accessible) package fields provided  the class it extends has an accessible no-arg constructor to initialize the class's state. It is an error to declare a class Serializable if this is not the case and will be detected at runtime.

During deserialization, the fields of non-serializable classes will be initialized using the public or protected no-arg constructor of the class. A no-arg constructor must be accessible to the subclass that is serializable. The fields of serializable subclasses will be restored from the stream.

When traversing a graph, an object may be encountered that does not support the Serializable interface. In this case the NotSerializableException will be thrown and will identify the class of the non-serializable object.

What do you understand by serial Version ID? 
The serialVersionUID is a universal version identifier for a Serializable class. Deserialization uses this number to ensure that a loaded class corresponds exactly to a serialized object. 

During the process of serialization all classes are given an unique serial Version ID if it is not explicitly provided in the class. [You can explicitly add a unique ID yourself pro-grammatically or by the use of SerialVer tool.].
If the identifier of the class and that of the flattened object is not the same then the de-serialization process will throw an InvalidClassException. This can happen if a new attribute is added to the modified class or the class no longer extends the same hierarchy tree , that is, the structure of the class undergoes modification. This  exception could also be thrown when the serialVersionID calculated by different JVM's vary.


Type casting - Implicit and explicit

The basics
Whenever you create a class in java, you create a new data type. You can create instances of these classes  which are referred to as objects. To work with the objects you would need a variable through which you refer the objects, these are called as the reference variable and the object that the variable refers to is called the referred object.

Creating a reference variable with the java statement
         Employee employee;  (In general ReferenceType reference)
will not create an object of Employee instead it will just mean that we have created a reference variable for an object of type Employee.
This reference can hold the value of null or an object of ReferenceType  or any object whose type is subclass of ReferenceType.The reference type could be a interface/abstract class/class. The type of the reference determines how the referenced object i.e the object that is the value of the reference can be used,(you can use the behavior of the referenced object or its superclasses if any).  However, the object type of the referred object determines the behavior of the variable at runtime (FYI, polymorphism).

So what is casting?
Type casting happens when one data type is converted to another data type or it simply means treating a variable of one type as though its another type. There are 2 ways a casting can happen 
  1. Upcasting - also called widening conversions. Its easy to convert a subclass to the superclass, reason a subclass object is also a superclass object. Ex casting a SavingAccount object to an Account class variable is automatic.
  2. Downcasting - also called narrowing conversions, this requires an explicit cast and also that the object that is casted is a legitimate instance of the class you are casting too. When you code for explicit casting you tell the compiler, that you understand/take responsibility that the reference variable will hold an object of the reference type or its subclass. The JVM will then check for the correctness of this at runtime.
You can also cast an object to an interface provided that the object's class or any one of it superclass   implement the interface .

Casting can be applied to primitives as well. The primitives are placed in the following hierarchy/order,

             byte--- short---int---long---float---double.  \
Upcasting happens automatically when you go from left to right, but if you go from right to left explicit casting is required. Ex: Casting a short to a byte is possible, however if the value of the variable of type short exceeds the permissible value that can be handled by byte, then this conversion can lead to an overflow condition at runtime. Remember java does not inform us with exceptions, about overflow or underflow conditions, so care should be taken to understand the overflow and underflow behavior for such numerical conversions and computations.

When is ClassCastException thrown?
ClassCastException is thrown when you attempt to cast an object to a class of which it is not an instance.
Ex You cannot assign an Account object to a SavingAccount variable.   Although you could get away with the compile time error or Type mismatch error by explicit casting; it would fail at runtime with ClassCastException. A Type mismatch error also occurs when you try to reference a object of a different hierarchy. (Ex. trying to assign an employee object to an Account class variable)

You can deal with incorrect casting in 2 ways
  1. Catching the ClassCastException with the try-catch block
  2. Before casting use to instanceof operator to check if the object being casted is an legitimate instance of the reference type else be sure to deal with it either through the else block or the exception being generated.
You could also get a ClassCastException when two different class loaders load the same class because they are treated as two different classes.

What happens when you assign a reference variable with value null to a specific type. Say,   Account acc = (Account) acc1;  where acc1 is null. Casting has nothing to do with the value of the referenced object it just lets the compiler know that you know what you are doing (refer, Downcasting)

However, thing to note is when you have overloaded methods Example, myMethod(String s) and myMethod(Integer i) and you want to pass null as a parameter, then you need to explicitly cast so that the compiler knows which method to call, else it cribs saying the invocation is ambiguous.

Thursday, 11 August 2011

What do you mean by deep cloning and shallow cloning

Java supports two different types of cloning
  1. Shallow cloning - java's object.clone() gives a shallow copy of the object being cloned. Here the object cloned is copied without its contained object. Shallow cloning is a bitwise copy of an object. New object is created which is an exact copy of the original one. In case  of contained objects just the references are copied.
  2. Deep cloning - as object.clone() method yields shallow copy, to achieve deep copy classes needs to be adjusted (check note below). Here the original object is copied along with its contained objects (note: here the entire graph of objects are traversed and copied.) Each object in the graph is responsible of cloning itself through the clone method.So, In deep cloning, complete duplicate copy of the original copy is created. 
  • If the object wants to be able to clone itself it first needs to implement Cloneable interface else object is going to throw CloneNotSupportedException when the clone method is called. 
  • Second the object needs to make the clone() method public. 
  • Next in the clone method call the super.clone() 
  • You need not do anything special for primitive data types, its wrapper objects and immutable objects like String, the super.clone() method automatically creates a copy of them.
  • When you perform a deep copy involving collections the objects in the collection also needs to implement Cloneable.
Problems with cloning
  • The entire object graph needs to be cloned, which is tiresome and error prone and difficult to maintain. Care should be taken when there is a modification to the classes.
  • Care should also be taken in circular reference of object, the reference object should be created only once. Ex Say Department contains Employee class and Employee inturn refers Department, care should be taken that the reference department is created only once.
  • If the class is not available for modification (third party classes), and there is a need for cloning, you need to create a new class by sub-classing and then overriding the clone method.
  • Also its problematic to clone a polymorphic variable, as the decision has to wait till the runtime.
Alternatives to cloning
Java Serialization offers an alternative for cloning (key here is the object and all the referenced objects needs to be serializable), here the entire object tree is traversed and serialized, and de-serializing it later will yield the exact state but a different copy of the original serialized object. However the cost of serialization is its performance, as this involves writing and reading from the stream.


Saturday, 6 August 2011

Garbage collection in Java

What is garbage collection and who handles it?
is the process of collecting unused/unreachable objects (i.e those objects who have lived their life in the lifetime of a program and whose non existence does not affect the continuity of the program). Java handles memory de-allocation through garbage collection. There is a low priority daemon thread called the Garbage Collector thread which runs in the background performing this task of collecting unreachable objects. It runs in low memory situations to reclaim unused memory. Garbage collection cannot be forced, you can request the JVM to perform GC through System.gc(), but there is no guarantee that it will be a synchronous call or it will happen immediately on call.

Why does GC happen?
Whenever the JVM runs low in memory /or during certain time intervals, the garbage collector thread will wake up and try to reclaim unused memory by de-allocating memory referenced by unreachable object. Every programmer must have experienced a behavior of sudden unresponsiveness in the lifetime of a program. These are times when the garbage collector is doing its job of reclaiming memory. If the JVM cannot create an object on the heap it will fail with OutOfMemoryError.

When does an object become eligible for GC?
We know that in java an object that gets created, lives on the heap, whether the object is a local variable or instance variable, where as class or static variables live inside the memory area (fyi).
When an object becomes unreachable it becomes eligible for GC. (i.e no live threads or static references point to these object)
So when does object become unreachable.

  1. When you explicitly assign all references to an object to null, stating that the object has served its purpose and is no longer needed.
  2. Object goes out of scope once the block of code in which it is created gets executed.
  3. Cyclic references are detected by garbage collecting algorithms and hence become automatically eligible for garbage collection as long as there are no other live object referencing these objects.
  4. When the container object is set to null , the contained objects become eligible for GC.
  5. All weak reference gets eligible for GC in the next GC cycle.


How does JVM handle synchronization?

We have seen how the JVM organizes the program into runtime data areas in my previous blog. We know that each thread has its own stack and that all the threads in the program share the heap. The heap contains all the objects created within the program including the thread. The method area contains all the class /static variables of a classes used by the program. These variables are available to all the threads in the program.

Now we know that we have two data areas which contains data shared by all threads.
1. the heap and 2. the method area

So if two threads try to access the objects or class variables in these areas concurrently, then the data need to be properly managed else we will end up with inconsistent data. This situation can be handled through synchronization and java manages this through the use of 'monitor'. Hence monitor acts as a guardian over a piece of code, so that no two threads execute that code simultaneously.
Java's monitor supports two kinds of synchronization: mutual exclusion and cooperation

  • Mutual exclusion is achieved in JVM through the use of object or class blocks , they enable multiple threads to work independently on shared data without interfering with  each other. 
  • While Cooperation in JVM is achieved through the use of object wait, notify and notifyAll methods.

Each monitor is associated with an object reference. Whenever a thread reaches the code which is synchronized, it must obtain a lock on the referenced object failing which it will have to wait (block on synchronization state). Once the thread obtains the lock the JVM increases the count of the number of times an object has been locked. The same thread can lock the same object multiple times. Whenever the thread releases/relinquishes the lock the count is decremented. When there are no more locks on the object i.e count returns to zero, then any other thread can obtain the lock on the object if needed.

Synchronization is supported by the java language  in two ways

  1. synchronized method : Whenever the JVM encounters a symbolic reference to the method and realizes its a synchronized method, then it tries to obtains the lock from the monitor. If the lock is obtained then the synchronized method is executed and once all the statements are processed or the code throws an exception the lock is released. For an instance method, a lock is obtained on the object, of which synchronized method is invoked. In case of static synchronized method the lock is obtained on the class object. The JVM does not use special op-codes to invoke or return from method level synchronization.
  2. synchronized statement (Block of code): The JVM uses two special op codes monitorenter and monitorexit whenever a thread enters or exits the synchronized block of code. When the JVM's encounters monitorenter  it tries to obtain a lock on the object referred to by objectref on the stack. If the lock is already obtained the the count is incremented by one and whenever the monitorexit  is encountered the count is decremented by one. When the count reaches zero the lock is released.


Friday, 5 August 2011

Synchronization in Java

What is synchronization?
Threads communicate with each other by sharing access to fields and the objects reference fields refer to. (concurrent access to shared data).Though efficient, it can lead to thread interference and memory consistency errors. The tool to avoid these kind of errors in java is synchronization.

Why do we need to synchronize?
As said earlier, to avoid
  1. Thread interference: happens when two methods/operations running in two different threads but acting on same data, interleave.(i.e the sequence of steps interleave)
  2. Memory consistency errors:  occur when different threads have inconsistent views of what should be the same data.
Through the use of synchronization we can avoid the problem of  dirty data (This happens when the shared object is mutable) caused by multiple threads acting on the same data.  To avoid this, java uses monitors and 'synchronize' keyword to control access to a mutable shared object. Synchronization is not needed if you are using an immutable shared object or the shared object is used in read only mode.

What are the different ways/levels of synchronization can be applied in Java?
The synchronize keyword can be applied at the method level (coarse grained) or block of code (fine grained). 
  • If the keyword is applied to a static method then the lock is obtained on the class object (ex Account.getClass()).
  • If the keyword is applied on a non static method, then the lock is obtained on the instance of the object which contains synchronized code (ex obj1).
Any code which is written in the synchronized block has mutually exclusive lock on the object; which means that no other thread can access this critical section except the thread which is holding the lock. All other threads will have to wait till the lock is released.
It is usually a good practice to synchronize only that section of code which is critical rather than blocking the entire method ; reason synchronization comes with a performance cost.

Disadvantages of synchronization 

  1. There is a possibility of dead locks if not coded properly.
  2. Another problem with using it is performance; there is an overhead of obtaining locks.

Thursday, 4 August 2011

Threads in Java

What is a thread?
Thread is a lightweight process i.e single sequential flow of control within the JVM process (when you run the java command at the prompt it creates the JVM to run the application; the name of which you supply as parameter to the java command Ex. java com.test.MainApplication).This process can have multiple threads created within.
As discussed in my previous blog article, a thread shares the heap (place where the object created within the application resides) belonging to the process and has its own stack space. Multiple threads share the heap and hence care should be taken than the objects accessed are thread safe.

Ways to create thread?
There are two ways to create thread
  1. Extend Thread class : and override the run method from Thread class
  2. Implement Runnable interface: It has one method public void run() which the implementing class should define.
One should implement Runnable interface to get threading behavior instead of extending Thread. One should extend Thread only if the intention is to extend the behavior of the existing Thread class.

Why do we need to create Thread? If you want parallel processing of tasks and these tasks are independent of each other then we can achieve this using Threads. If there is critical data that is used by multiple threads and changing it in one thread can cause data inconsistency in the other thread then one should synchronize the critical section of code. More on synchronization in my next post.

Different states of a thread?
The life cycle of the thread will begin when the client code call the start() method on Thread. Creating a thread using the new operator will not start the thread.Threads can be in one of the following states once it started,
  1. Runnable : When you start the thread using t.start() the thread goes into the Thread pool, following which the ThreadScheduler can pick this ready to run thread for execution. A thread can be in this state  in others ways too. Please refer to the diagram for the same.
  2. Running: Current executing Thread is said to be in this state. It will be in this state till it is swapped by Scheduler or its blocked for IO or enters synchronization code or it relinquishes with the static Thread.yield method or goes to sleep with the Thread.sleep method.
  3. Waiting: execution of the object.wait() method causes the current thread to go into wait state.The current thread should hold the monitor for invoking the wait method. It will remain in this state till some other thread, notifies the thread by calling notify() or notifyAll() method. 
  4. Sleeping: A call to the Thread.sleep(long milliseconds) method causes the current thread to sleep or suspend its operation for the predefined time specified in the sleep method call. After the time elapses it will move to Runnable state and will begin execution when the ThreadScheduler will it pick it.
  5. Blocked: Thread can go into this state while performing an I/O operation,  or while waiting for the lock when entering a synchronized method or block of code; once the I/O operation completes the thread or when the lock is obtained the thread will move into the Runnable state.
  6. Dead:  Once the thread finishes execution or there is error/exception in the run method a thread will enter the dead state. Please note: A dead thread cannot be revived again by a call to the start() method.
Thread and deadlock
In case of multiple threads, there are chances that the following may happen

  • Deadlock occurs when two or more threads are trying to gain lock on the same object, and each one has a lock on another resource that they need in order to proceed. For example, When thread A waiting for lock on Object P while holding the lock on Object Q and at the same time, thread B holding a lock on Object P and waiting for lock on Object Q, deadlock occurs.
  • If the thread is holding a lock and went to a sleeping state, it does not loose the lock. However, when thread goes in blocked state, it normally releases the lock. This eliminates the potential of deadlocking threads.
  • Java does not provide any mechanisms for detection or control of deadlock situations, so we as programmer are responsible for avoiding them.