Question

我想实现一个性能优化的unordered_map变体，它可以分几个阶段运行：

初始化：将约100个元素插入std::map
准备：做一些魔法，将std::map转换为std::unordered_map
工作：执行大量（无限制）查找次数;插入/删除是禁止的

为了尽可能快地完成“工作”阶段，我想选择一个散列函数，该函数对于给定的一组密钥没有冲突（在初始化阶段收集）。

我想衡量一下我可以从这个技巧中获得多少性能提升。所以这将是一个实验，可能会进入生产代码。

标准库是否具有此实现的功能（例如，查找给定unordered_map有多少冲突;或更改散列函数）？或者我应该自己实现？

Answer 1

以下是“碰撞管理”API：

size_type bucket_count() const;
size_type max_bucket_count() const;

size_type bucket_size(size_type n) const;
size_type bucket(const key_type& k) const;

local_iterator       begin(size_type n);
local_iterator       end(size_type n);
const_local_iterator begin(size_type n) const;
const_local_iterator end(size_type n) const;
const_local_iterator cbegin(size_type n) const;
const_local_iterator cend(size_type n) const;

简而言之，bucket_size(n)为您提供第n个桶的碰撞次数。您可以使用密钥查找存储桶，并且可以使用local_iterator迭代存储桶。

为了更改散列函数，我将分配/构造一个新的容器，从旧的散列函数到新的。

Answer 2

如果您有很多读取和少写，您可以使用矢量作为地图。这很常见，因为lower_bound比map更有效，并且从内存中使用更少的空间：

bool your_less_function( const your_type &a, const your_type &b )
{
  // based on keys
  return ( a < b );
}
...
std::vector<your_type> ordered-vector;

添加值时：

...
// First 100 values
ordered-vector.push_back(value)
...
// Finally. The vector must be sorted before read.
std::sort( ordered-vector.begin(), ordered-vector.end(), your_less_function );

询问数据时：

std::vector<your_type>::iterator iter = std::lower_bound( ordered-vector.begin(), ordered-vector.end(), value, your_less_function );
if ( ( iter == ordered-vector.end() ) || your_less_function( *iter, value ) )
  // you did not find the value
else
  // iter contains the value

不幸的是它是有序的，但非常快。

Answer 3

碰撞次数取决于铲斗的数量。根据{{3}}，使用rehash函数将桶数设置为100是否有用？

带有禁止碰撞的unordered_map

3 个答案: