我对无序容器中的哈希表感到困惑......或至少在集合中。
所以,我认为它会像这样工作:
我哈希我的对象。我计算了对象和向量长度的模数(hash%vectorlength),并将其用作哈希表中指向我对象的指针的索引..据我所知,这是一个向量...
所以对于一个简单的哈希函数,只返回int包装器的int成员的值,它看起来像这样:
hash table:
vector index: [0, 1, 2, 3, 4]
| | | | |
object with value...: 0 1 2 3 4
我写了一个程序来测试:
#include <iostream>
#include <unordered_set>
struct Obj
{
public:
Obj(int i)
{
mem = i;
}
friend bool operator==(const Obj& o1, const Obj& o2)
{
return (o1.mem == o2.mem);
}
friend std::ostream& operator<<(std::ostream& o, const Obj& obj)
{
o << obj.mem;
return o;
}
int mem;
};
namespace std
{
template<>
struct hash<Obj>
{
size_t operator()(const Obj& r) const
{
size_t hash = r.mem;
return hash;
}
};
}
int main()
{
std::unordered_set<Obj> map;
for (int i = 0; i < 5; ++i)
{
map.insert(Obj(i));
}
for(auto it = map.begin(); it != map.end(); ++it)
{
std::cout << (*it) << std::endl;
}
}
我期待输出
0
1
2
3
4
但我得到了:
4
3
2
1
0
为什么?
答案 0 :(得分:1)
您希望unordered
容器有订购。它没有任何指定或保证订购。正如您所发现的,您的实现利用了它的自由,并实现了除您描述的天真哈希表设计之外的其他功能。另一个实现可能会做其他事情。你根本不能依赖它。
答案 1 :(得分:1)
虽然标准库实现可以做任何他们喜欢的事情,但是看看你的假设 - 以及在几条评论中表达的 - 与实际实现相对应的地方也很有趣。
我可以使用GCC重现您的非“0 1 2 3 4”结果,但只能添加map.reserve(6)
或更多(奇怪的是,5产生“4 0 1 2 3”)。
以下详细说明了我看过的GCC版本的行为......
挖掘解释,我首先检查逻辑桶是否包含哈希函数隐含内容:
for (size_t i = 0; i < map.bucket_count(); ++i)
{
std::cout << '[' << i << ']';
for (auto it = map.begin(i); it != map.end(i); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
}
而且,他们确实做到了:
[0] 0
[1] 1
[2] 2
[3] 3
[4] 4
[5]
[6]
所以,评论建议“标准库可以自由地在你的哈希函数之上应用任何可逆函数,并且不能保证任何关于排序的信息” - 而是真的 - 不是什么发生在这里。
深入了解标准库标题,我在bits/hashtable.h
的文档中找到了原因:
* Here's _Hashtable data structure, each _Hashtable has:
* - _Bucket[] _M_buckets
* - _Hash_node_base _M_before_begin
* - size_type _M_bucket_count
* - size_type _M_element_count
*
* with _Bucket being _Hash_node* and _Hash_node constaining:
* - _Hash_node* _M_next
* - Tp _M_value
* - size_t _M_code if cache_hash_code is true
*
* In terms of Standard containers the hastable is like the aggregation of:
* - std::forward_list<_Node> containing the elements
* - std::vector<std::forward_list<_Node>::iterator> representing the buckets
*
* The non-empty buckets contain the node before the first bucket node. This
* design allow to implement something like a std::forward_list::insert_after
* on container insertion and std::forward_list::erase_after on container
* erase calls. _M_before_begin is equivalent to
* std::foward_list::before_begin. Empty buckets are containing nullptr.
* Note that one of the non-empty bucket contains &_M_before_begin which is
* not a derefenrenceable node so the node pointers in buckets shall never be
* derefenrenced, only its next node can be.
*
* Walk through a bucket nodes require a check on the hash code to see if the
* node is still in the bucket. Such a design impose a quite efficient hash
* functor and is one of the reasons it is highly advise to set
* __cache_hash_code to true.
*
* The container iterators are simply built from nodes. This way incrementing
* the iterator is perfectly efficient independent of how many empty buckets
* there are in the container.
*
* On insert we compute element hash code and thanks to it find the bucket
* index. If the element must be inserted on an empty bucket we add it at the
* beginning of the singly linked list and make the bucket point to
* _M_before_begin. The bucket that used to point to _M_before_begin, if any,
* is updated to point to its new before begin node.
因此,支持unordered_set
的哈希表在单链表中用 值组织,并且将迭代器的向量存储到该列表中 ,而不是通常设想的vector<forward_list<>>
。
当您插入元素时,它们会进入前面的前向列表 ,这是您从begin()
到end()
时进行迭代的列表vector
,没有reserve()
迭代器的任何参与,其排序对应于哈希值。
代码here说明迭代如何以插入的反向顺序返回值,而不考虑散列/碰撞 - 只要前面有足够的空间IDictionary<int, TimeSpan> calculations = (
from log in repository.Get<WorkLog>()
join work in repository.Get<Work>() on log.WorkId equals work.Id
group log by log.WorkId into step
select new
{
Id = step.Key,
AverageTime = step.Average(x =>DbFunctions.DiffSeconds(x.StartDate, x.EndDate))
}
).ToDictionary(x => x.Id, y => TimeSpan.FromSeconds(y.AverageTime.HasValue ? y.AverageTime.Value : 0.0));
d以避免重新散列