稀疏哈希表背后的主要实现思路是什么?

时间:2011-03-13 12:08:58

标签: data-structures hash hashtable sparsehash

为什么Google sparsehash开源库有两个实现:密集哈希表和稀疏哈希表?

2 个答案:

答案 0 :(得分:18)

密集哈希表是普通的教科书哈希表实现。

稀疏哈希表只存储实际已设置的元素,并分成多个数组。引用稀疏表实现中的comments

// The idea is that a table with (logically) t buckets is divided
// into t/M *groups* of M buckets each.  (M is a constant set in
// GROUP_SIZE for efficiency.)  Each group is stored sparsely.
// Thus, inserting into the table causes some array to grow, which is
// slow but still constant time.  Lookup involves doing a
// logical-position-to-sparse-position lookup, which is also slow but
// constant time.  The larger M is, the slower these operations are
// but the less overhead (slightly).

要知道数组的哪些元素已设置,稀疏表包含位图:

// To store the sparse array, we store a bitmap B, where B[i] = 1 iff
// bucket i is non-empty.  Then to look up bucket i we really look up
// array[# of 1s before i in B].  This is constant time for fixed M.

这样每个元素只产生1比特的开销(在极限中)。

答案 1 :(得分:3)

sparsehash是一种将密钥映射到值的内存有效方法(每个密钥1-2位)。 Bloom过滤器可以为每个键提供更少的位,但是它们不会将值附加到除外部/可能在内部之外的键,这稍微少于一些信息。