Monday, February 19, 2007

Tri-color Marking Garbage Collector

This is one type of garbage collector. This collector working is very simple.
All the references were separated into 3 groups. White, Gray and Black.

Objects that are directly reachable are in White group. (Global , static variables etc). And all the remaining references are put in white. Mostly black list starts empty.

This algorithm works much like a graph traversal.
1. Start with a node in White list.
2. Find all the references that are reachable from this one.
3. Gray those references. If the reference that are reachable from current node, is already in Gray or Black, then move current node to black list. Proceed this until gray node becomes empty.
4. Now, the reference that are still in White list are not reachable, so they all are eligible for garbage collection.

For more information about this see:
http://www.memorymanagement.org/glossary/t.html#tri-color.marking

Thursday, February 15, 2007

Garbage Collectors !

Garbage collection (also known as GC) is a form of automatic memory management. The garbage collector or collector attempts to reclaim garbage, or memory used by objects that will never again be accessed or mutated by the application. Garbage collection was invented by John McCarthy around 1959 to solve the problems of manual memory management in his Lisp programming language.

Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of the two approaches, and there are other techniques being studied (such as region inference) to solve the same fundamental problem. Note that there is an ambiguity of terms, as theory often uses the terms manual garbage-collection and automatic garbage-collection rather than manual memory management and garbage-collection, and does not restrict garbage-collection to memory management, rather considering that any logical or physical resource may be garbage-collected.

One very good thing about Java is that it relieves programmers from thinking about Memory Management. It takes care of Memory reclaiming automatically. But this feature doesn't come to us free of cost. It costs and the consequences of this slow execution (not much). In olden days of GC, GC runs for upto 25 - 50% of the time. But GC in Java have come a long way.

In this blog, we will discuss about various types of garbage collectors. How do they work and when to use what type of GC.

First we will look what a ideal GC should be...

1. Zero Pause time. That is the time consumed by the GC should be Zero
2. Should be able to take care of memory island.
3. Algorithm should be incremental. That is the GC can stop in the middle and continue later on without any problem.
4. Virtual memory interaction - That is GC should not cause Page fault. This happens when we try to reach the memory area, that is not used by the code for a long time and may be it's eligible for garbage collection. But if GC collector tries to do something(mark & sweep algorithm may have to touch the memory for GC) with that memory location, then page fault occurs, which causes the memory block to be refetched from Virtual memory.
and many others ...
  • Pause time. Does the collector stop the world to perform collection? For how long? Can pauses be bounded in time?

  • Pause predictability. Can garbage collection pauses be scheduled at times that are convenient for the user program, rather than for the garbage collector?

  • CPU usage. What percentage of the total available CPU time is spent in garbage collection?

  • Memory footprint. Many garbage collection algorithms require dividing the heap into separate memory spaces, some of which may be inaccessible to the user program at certain times. This means that the actual size of the heap may be several times bigger than the maximum heap residency of the user program.

  • Virtual memory interaction. On systems with limited physical memory, a full garbage collection may fault nonresident pages into memory to examine them during the collection process. Because the cost of a page fault is high, it is desirable that a garbage collector properly manage locality of reference.

  • Cache interaction. Even on systems where the entire heap can fit into main memory, which is true of virtually all Java applications, garbage collection will often have the effect of flushing data used by the user program out of the cache, imposing a performance cost on the user program.

  • Effects on program locality. While some believe that the job of the garbage collector is simply to reclaim unreachable memory, others believe that the garbage collector should also attempt to improve the reference locality of the user program. Compacting and copying collectors relocate objects during collection, which has the potential to improve locality.

  • Compiler and runtime impact. Some garbage collection algorithms require significant cooperation from the compiler or runtime environment, such as updating reference counts whenever a pointer assignment is performed. This creates both work for the compiler, which must generate these bookkeeping instructions, and overhead for the runtime environment, which must execute these additional instructions. What is the performance impact of these requirements? Does it interfere with compile-time optimizations?
A brief overview of each of the algorithm is given the below location:
http://www-128.ibm.com/developerworks/java/library/j-jtp10283/

In the next post we will take a look at a particular Garbage collector in more details, before that we need to learn a few more terms, which is used that particular algorithm.

root

In tracing garbage collection, a root holds a reference or set of references to objects that are a priori reachable. The root set is used as the starting point in determining all reachable data.

Roots basically comprise the references in the state of the mutator. Typical roots are global variables, other static data, and the control stack.

See also: weak root; strong root; ambiguous root; exact root.

root set

The root set is the collection of roots that the mutator declares to the collector(2).

For more understanding of such terms visit the following link:
http://www.memorymanagement.org/glossary/r.html
One algorithm we are going to look at it Tricolor Marking Garbage collector.

Tuesday, February 13, 2007

When JIT happens ?

Since Java works much like a interpreted language, it is argued that Java program will run a lot slower then their native language counter parts. To avoid this Sun came up with Just in Time compiler, which will change the Java byte code to native machine dependent assembly. This will make Java run as fast as an Application created using 'C'.

This JIT will not happen always. There is criteria at which this will happen. The following things discusses about when JIT will happen and how it can be configured during JVM invocation.

The following link provides much more details about all the command line options available.
http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp

Compiler Option: -XX:CompileThreshold=10000
Purpose : Number of method invocations/branches before compiling [-client: 1,500]

This options specifies that when a particular method is invoked 10000 times, then compile this method to native code.

INorder to know when the native compilation has happened we need to use the following JVM command line options.
java -XX:+PrintCompilation

The following link provides much more interesting test results. I enjoyed reading it.
http://www.javaworld.com/javaworld/javaqa/2003-04/01-qa-0411-hotspot.html

So If we could run Java Application at native code speed then why not just do it always. As specified in the above location, the problem could be.....

As an experiment, you might consider shortening the warm-up period by manipulating the -XX:CompileThreshold JVM option, although you will soon discover that making this threshold too small will delay your application's startup, as HotSpot begins compiling just about every method it discovers into native code, including methods in the core Java libraries.

So, it is advised to set the CompileThreshold option to 10000 for Server applications and 1500 for Client application.

Sunday, February 11, 2007

Reading simple XML File using DOM Parser

In this example, we will write Java code to read the contents of a simplest XML file and print it.

package com;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;


public class XMLLesson {

/*public String getNodeTextValue(NodeList nl)
{
Node node = null;
for(int i = 0 ; i< node =" nl.item(i);" t =" (Text)">

public void parseXML(String fname)
{

DocumentBuilderFactory fact = DocumentBuilderFactory.newInstance();
DocumentBuilder bld = null;
Document doc = null;
try {
bld = fact.newDocumentBuilder();
doc = bld.parse(fname);

} catch(Exception e)
{
e.printStackTrace() ;
}

Element parentElement = doc.getDocumentElement();
NodeList nl = parentElement.getElementsByTagName("child1");
int i;
System.out.println("Parent Element : " + parentElement.getTagName() );
for(i = 0;i<nl.getLength();++i) {
System.out.println("i = " + i + " " + nl.item(i).getFirstChild().getNodeValue() );
}


//System.out.println(getNodeTextValue(n));

}

public static void main(String[] args) {
XMLLesson l = new XMLLesson();
l.parseXML("c:\\temp.xml");
}

}

Contents of xml file:

child1data
child1data
child1data


Output is :
Parent Element : parent
i = 0 child1data
i = 1 child1data
i = 2 child1data

Explanation:
This code is written for Java 1.5. This use XML Parser provided by Sun. There are many XML
parsers available, including xerces and xml parser api from IBM.

All the xml file will have a document element. All the tags will be inside that element
only. So first we get the document element. Then get the list of child nodes associate
with it.

In order to get the text content of the tag,
data
"node.getFirstChild().getNodeValue()"
node.getFirstChild() return the value node, and calling getNodeValue returns the value.

you could also use node.getTextContent(), to get the text content of the node.




XML Parsers

XML Parsing in Java:

XML parser is a software module to read documents and a means to provide access to their content. XML parser generates a structured tree to return the results to the browser. An XML parser is similar to a processor that determines the structure and properties of the data. An XML parser can read a XML document to create an output to generate a display form.

Types of XML Parser:

1. SAX Parser
2. DOM Parser

SAX Parser:
Simple API for XML (SAX) was developed by the members of a public mailing list (XML-DEV).It gives an event based approach to XML parsing. It means that instead of going from node to node, it goes from event to event. SAX is an event driven interface. Events include XML tag, detecting errors etc,

DOM Parser:
DOM Parser stands for Document Object Model. This read the entire XML Files and puts in the memory. This makes traversal easy and efficient. This should only be used for small XML Files. This operates faster, but takes up more memory.

In the next post, we will see an example of DOM Parser in Java.