Thursday, February 15, 2007

Garbage Collectors !

Garbage collection (also known as GC) is a form of automatic memory management. The garbage collector or collector attempts to reclaim garbage, or memory used by objects that will never again be accessed or mutated by the application. Garbage collection was invented by John McCarthy around 1959 to solve the problems of manual memory management in his Lisp programming language.

Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of the two approaches, and there are other techniques being studied (such as region inference) to solve the same fundamental problem. Note that there is an ambiguity of terms, as theory often uses the terms manual garbage-collection and automatic garbage-collection rather than manual memory management and garbage-collection, and does not restrict garbage-collection to memory management, rather considering that any logical or physical resource may be garbage-collected.

One very good thing about Java is that it relieves programmers from thinking about Memory Management. It takes care of Memory reclaiming automatically. But this feature doesn't come to us free of cost. It costs and the consequences of this slow execution (not much). In olden days of GC, GC runs for upto 25 - 50% of the time. But GC in Java have come a long way.

In this blog, we will discuss about various types of garbage collectors. How do they work and when to use what type of GC.

First we will look what a ideal GC should be...

1. Zero Pause time. That is the time consumed by the GC should be Zero
2. Should be able to take care of memory island.
3. Algorithm should be incremental. That is the GC can stop in the middle and continue later on without any problem.
4. Virtual memory interaction - That is GC should not cause Page fault. This happens when we try to reach the memory area, that is not used by the code for a long time and may be it's eligible for garbage collection. But if GC collector tries to do something(mark & sweep algorithm may have to touch the memory for GC) with that memory location, then page fault occurs, which causes the memory block to be refetched from Virtual memory.
and many others ...
  • Pause time. Does the collector stop the world to perform collection? For how long? Can pauses be bounded in time?

  • Pause predictability. Can garbage collection pauses be scheduled at times that are convenient for the user program, rather than for the garbage collector?

  • CPU usage. What percentage of the total available CPU time is spent in garbage collection?

  • Memory footprint. Many garbage collection algorithms require dividing the heap into separate memory spaces, some of which may be inaccessible to the user program at certain times. This means that the actual size of the heap may be several times bigger than the maximum heap residency of the user program.

  • Virtual memory interaction. On systems with limited physical memory, a full garbage collection may fault nonresident pages into memory to examine them during the collection process. Because the cost of a page fault is high, it is desirable that a garbage collector properly manage locality of reference.

  • Cache interaction. Even on systems where the entire heap can fit into main memory, which is true of virtually all Java applications, garbage collection will often have the effect of flushing data used by the user program out of the cache, imposing a performance cost on the user program.

  • Effects on program locality. While some believe that the job of the garbage collector is simply to reclaim unreachable memory, others believe that the garbage collector should also attempt to improve the reference locality of the user program. Compacting and copying collectors relocate objects during collection, which has the potential to improve locality.

  • Compiler and runtime impact. Some garbage collection algorithms require significant cooperation from the compiler or runtime environment, such as updating reference counts whenever a pointer assignment is performed. This creates both work for the compiler, which must generate these bookkeeping instructions, and overhead for the runtime environment, which must execute these additional instructions. What is the performance impact of these requirements? Does it interfere with compile-time optimizations?
A brief overview of each of the algorithm is given the below location:
http://www-128.ibm.com/developerworks/java/library/j-jtp10283/

In the next post we will take a look at a particular Garbage collector in more details, before that we need to learn a few more terms, which is used that particular algorithm.

root

In tracing garbage collection, a root holds a reference or set of references to objects that are a priori reachable. The root set is used as the starting point in determining all reachable data.

Roots basically comprise the references in the state of the mutator. Typical roots are global variables, other static data, and the control stack.

See also: weak root; strong root; ambiguous root; exact root.

root set

The root set is the collection of roots that the mutator declares to the collector(2).

For more understanding of such terms visit the following link:
http://www.memorymanagement.org/glossary/r.html
One algorithm we are going to look at it Tricolor Marking Garbage collector.

No comments: