Question

这可能是一个愚蠢的问题，但这里有：

我将单词字典散列到基于unordered_set的哈希表中。我的哈希函数故意“坏”，因为包含相同字母集的所有字符串都会哈希到相同的值。我最初尝试覆盖正常的哈希函数行为，并使用每个单词中字母的“频率直方图”作为哈希值（我学到的是不可能:)），但其中一个线程建议使用26-位掩码实现相同。哈希函数到目前为止工作得很好而且花花公子。

例如，在我的方案中，CITIED和CITED哈希到相同的值，1049144。我的想法是，给定一组字母，我想找到包含该组字母的所有单词。

我猜我还没有完全理解哈希的概念（或者我的代码是完全错误的），因为我无法解释我遇到的行为：
我决定寻找所有由字母“LIVEN”中的字母组成的单词。我的输出（使用散列键）如下：

VENVILLE,4215328  
LEVIN,4215328  
ENLIVEN,4215328  
CURTSEYED,37486648

CURTSEYED到底是怎么回事的？可以看出，它与剩余的三个单词具有不同的散列值。我的理解/实现哈希表的错误在哪里？

产生以上输出的代码：


    typedef std::unordered_set< std::string, my_string_hash_function, my_string_equality> Dict    
    DictHash dict;       
    DictHash::const_local_iterator c_l_itr;

    DictHash::size_type bs = dict.bucket (std::string ("LIVEN"));
    for (c_l_itr = dict.begin(bs); c_l_itr != dict.end(bs); c_l_itr++)
         std::cout



My hash function : 

struct my_string_hash_function  
{  
    std::size_t operator()(const std::string& s) const  
    {  
        unsigned long hash = 0;  
        std::string::const_iterator itr;

        for (itr = s.begin(); itr != s.end(); itr++)
     hash |= 2 << (*itr - int('A'));

      return hash;
    } 
};


Comparison function :

struct my_string_hash_function  
{  
    std::size_t operator()(const std::string& s) const  
    {  
        unsigned long hash = 0;  
        std::string::const_iterator itr;

        for (itr = s.begin(); itr != s.end(); itr++)
     hash |= 2 << (*itr - int('A'));

      return hash;
    } 
};

Answer 1

不同的哈希值不一定会在不同的桶中结束。通常，哈希表将根据hash_value % number_of_buckets选择一个存储桶，因此以存储桶数量为模的哈希值将在同一个存储桶中结束。

基本上，您无法保证在哪个存储桶中显示哪个哈希值。

Answer 2

我认为你在my_string_equality中也有潜在的错误......难道你不想只使用常规std::string::operator==()吗？ AFAIK你应该对实际对象值进行比较，而不是比较它们的哈希值（容器已经知道哈希值，它可以只调用my_string_hash_function并比较结果，如果那是它需要做的）

使用unordered_set防止不同哈希值的键登陆到同一个存储桶中

2 个答案: