When I try to remember the Data Structures course I did some years ago, I recall vaguely that:
- Linked lists should take less memory then array-based lists
- Hash tables should take more memory then tree-based maps
I actually put these believes to the test today. I profiled a small Java program that added 100,000 elements to these collections. The results were surprising:
- Unless you delete a lot of elements, linked lists take twice more memory then array lists
- If you have a good hash functions, hash tables consume only a small amount of memory more then tree maps
The explanation, in both cases is as follows: Array lists and hash tables are supposed to consume more memory then their array-based counterparts because the array-based implementations must keep an array that is larger then the exact capacity, while the size of node-based collections is concise. However, since the size of a reference is the same as of an array cell, the math is different: Array-based collections keep an array that is 1.5 larger then needed, so their size is 1.5N. Each node in a linked list or a tree map must keep a couple of references at least, so their size is 2N or 3N.
This is true in the case of reference-based languages, such as Java et al. The exception is for value-based languages, such as C++, for collections of concrete elements instead of pointers. In this case, the size of an empty array cell is the same as of a stored element, so the array wastes more memory.
Conclusions:
- In Java, if you have a collection and you do not add and remove elements frequently, use an array-based implementation rather then a node-based one
- Validate your assumptions! As the great sensei Steven Segal once said, “Assumption is the mother of all fuck-ups”
