More for C++

Garbage Collection

License

Download

Documentation

What's next?

Credits

Feedback

The "More for C++" Garbage Collector

In computer science, garbage collection has been used for a long time. The standard book about this subject was written by Richard Jones. In order to explore the theory of garbage collection, Jones describes the most common approaches and algorithms that have been used.

When it comes to C++, most implementations of automatic memory management use Reference Counting. But there are some remarkable exceptions, i.e. the Boehm-Demers-Weiser conservative garbage collector, which uses a Mark and Sweep algorithm, and the Bartlett Mostly-Copying Collector.

The garbage collector used by "More for C++" has been implemented with both, a combination of reference counting and the mark-sweep approach.

Reference Counting

Without a garbage collector, every instance that uses a dynamically allocated object will have the same problem when it comes to deallocation: it must know, if it is the last instance that is using this object. Therefor, it has to deal with every other instance in the system, that uses (or may have used) that object too. Only when this instance is really the last one, it has to delete the object.

A simple solution seems to be an ordinary counter for the number of instances using an object. All objects will be equipped with such a counter, and we are done. But even when every instance in the system is using the reference counters accurately, there is still one pitfall: when two objects are using each other, their counters will never be decreased to zero. Therefor, objects with cyclic references will never be deallocated!

Smart Pointers

With C++, a very popular approach for reference counting is the use of Smart Pointers. Beneath the built-in pointer- and reference-types, they provide another possibility for accessing objects that have been created dynamically on the heap. Smart pointers are user-defined types which virtually describe objects as well. But in practice, they can be used like the C++ built-in pointers.

Every object that will be referenced by a smart pointer has its own reference counter. Whenever a new smart pointer to an object has been created, the object's reference counter will be increased by one. When the smart pointer has been deleted, or the object should not be referenced any more, the counter is decreased by one. After the counter has been decreased to null, the object will be destroyed.

Smart pointers have been widely established, even with their known pitfall of cyclic references. For example the OMG is using them for their CORBA C++ Language Mapping.

The Mark and Sweep algorithm

Garbage collectors with the mark-sweep approach use a completely different method of retrieving the objects, that are not in use any more.

A mark-sweep collector must have access to all objects on the heap as well as to every single reference that has been used for theses objects. References, that are not a member of an object will be classified into the so called Root Set. A member of the root set may be a reference on the stack or a global variable.

The process of marking and sweeping is quite simple. First, the collector is following the graph of referenced objects, beginning with the objects referenced by the root set. Every object reached by this graph will be marked as in use. After this, all the objects that have not been marked are considered as garbage and will therefor be sweeped from the heap.

The "More for C++" approach

The "More for C++" library defines its own new-operator, so managing the objects on the heap is not too hard. And while every object has to be referenced exclusively by smart pointers, the library's garbage collector could manage the set of references easily as well.

The real challenge for a C++ garbage collector is retrieving the root set of references. Conservative collectors consider every byte pattern a reference, only when the pattern may be a pointer to an object. Therefor they must have access not only to the address-room of their process, but to every processor register that may store a pointer as well. For the "More for C++" library such an assumption becomes obsolete, again by the use of smart pointers.

Every creation of a "More for C++" smart pointer will be notified to the library's heap manager. If the heap manager determines, that the pointer lives inside an object, the garbage collector marks the position of that pointer. Otherwise, the pointer must belong to the root set. Thus the collector is able to build the graph of references between the objects.

Whenever a smart pointer has been identified as a member of the root set, a "root counter" for the referenced object will be increased by one. With this modification of reference counting, the collector can determine, wether an object is referenced by the root set or not. Thus the "More for C++" garbarge collector is able to mark and sweep the objects as simply as described above.

The "More for C++" Garbage Collector may be used in a multi-threaded environment. It is able to work concurrently with other threads that may create new objects virtually at the same time. Thus, the GC will not block the whole process while marking and sweeping objects on the heap. Naturally, it can be used in a single-threaded process as well.

Drawbacks of Garbage Collection and C++

Many C++ programmers like "their" language because of criterias like "performance", "efficiency" and "optimization". Some of them may consider the use of a Garbage Collector and Smart Pointers as an inconsistency to the goals Bjarne Stroustrup had in mind when he designed C++. The possible loss of efficiency in terms of memory usage and runtime behavior should be discussed.

But the possible loss of efficiency may not be the only disadvantage of automatic memory management in C++. Uncountable lines of code were created regardless of an interaction with a Garbage Collector. For example, it is a common practice that a class-destructor does not only deallocate memory, but also make some "clean-up", like closing a file or releasing a GUI resource. With a GC, destructors will never be called! For this reason, such a clean-up should be implemented in a method named "dispose" or "destroy", which has to be called instead of deallocating an object by hand.

When a Garbage Collector shall be used with existing C++ code, a refactoring of such a "legacy software" could be very complex, because the control flow of such a code may become very different. Since there will be only a few projects in C++ with everything being programmed from scratch, this point needs to attract some attention.

Conclusion

Two of the most annoying bugs in C++ code are dangling pointers and memory leaks; and even experienced developers have to master another design problem: shared objects could never be used like local data. There must be some kind of global book-keeping to determine, if an object is not being used any more.

All these problems will be handled by a Garbage Collector like the one in "More for C++". So, primarily in large applications, garbage collection would be more efficient than managing the heap manually.

Reconsidering the advantages of automatic memory management, not only for new projects the use of garbage collection in C++ seems to be very suggestive.

© 1999 - 2003  Thorsten Görtz