Hash Array Mapped TreeHash Array Mapped Trie, or HAMT for short, is an interesting combination of a hash table and a trie. The definitive reference for HAMT is a publication of Phil Bagwell, Ideal hash trees. Search the web for this title if the link won't work anymore. I suggest you read it to find out how HAMT works, and give only a brief outline here.
HAMT is a tree of arrays (tries), each element of a trie contains a single key-value pair. Each trie,
except a root trie, is limited to The elements of each trie are ordered based on the value of 5-bit hash pieces, where each piece was used as the index into the trie at the corresponding level (index may not be bigger than 31). Since the size of the root trie may differ from 32 (it's still limited to the powers of 2), insertion at the first tree level may require a part of hash value, different from 5 bits. If a trie element at the obtained index is empty, insertion is completed. Otherwise, the next 5 bits of the hash value are taken, a new trie is added as a child of the current one if not created yet, and insertion continues. If the hash value is exhausted (after 6 or less levels of tree walking), a new one must be obtained. The tries are stored in compressed form, so that empty elements do not consume memory. For compression, a bitmap value is stored in each element to which a sub-trie is linked, each bit of this value indicates whether the sub-trie element at the corresponding index is taken or not. For example, the bitmap ...10100indicates that there are non-empty elements at positions 2 and 4 in the subtrie of this node. This permits to store elements contiguously, and use a bit-counting function to obtain index into this contiguous array. HAMT requirementsHAMT meets most of the map requirements, as defined by the C++ standard, and its public
interface matches closely the one of
HAMT iterators
HAMT implements bidirectional iterators. HAMT iterator only exists inside the [begin(), end()] range. That is,
an attempt to do Hash functionHAMT comes with three different hash functions, or hashers. The first one is a variant of a widely used, very fast hasher, which gives reasonable results for the number of keys below 1000. size_t hash_fast(_Char key) { size_t h = 0; for ( ; *key; ++key) h = 5 * h + *key; return h; }
The second hasher, used by default, is a high-quality rotating
hash function, described by Robert J. Jenkins Jr. After some experiments with the other hashers, I found that
this one yields very good hash distribution being only marginally slower than Finally, the third hasher is an implementation of Pearson lookup algorithm with a randomly chosen lookup table. |