Question

#include <iostream>
#include <iomanip>
#include <string>
#include <vector>

using namespace std;

class Item {
public:
    Item(const string & v): value(v), next(0) { }
    string value;
    Item * next;
};

int hash_function(const string & s)
{
    unsigned int hashval = 0;
    int i = s.length();
    while (i > 0)
{
        hashval += s[--i];
}       
return hashval%101;
}

main()
{
    string name;
    int index;
    Item * p;

    vector<Item *> bucket(101);

    for (index = 0; index < 101; index++)
        bucket[index] = 0;

    while (cin >> name) {
        p = new Item(name);
        index = hash_function(name);

        // push front
        if (bucket[index] != 0)
            p->next = bucket[index];
        bucket[index] = p;
    }

    for (index = 0; index < 101; index++)
        if (bucket[index] != 0) {
            cout << setw(3) << index << ": ";
            p = bucket[index];
            while (p != 0) {
                cout << p->value << " ";
                p = p->next;
            }
            cout << endl;
        }

    Item * temp;
    for (index = 0; index < 101; index++) {
        p = bucket[index];
        while (p != 0) {
            temp = p;
            p = p->next;
            delete temp;
        }
    }
}

包含两个非常简单的哈希函数。我正在努力研究一个没有注释掉的那个，因为在测试时它似乎更好。我想要一组输入的名称均匀地分布在它自己的桶中，到目前为止，这似乎是有效的，除了以相同字母开头的名称。例如，Amy和Alice将出现在同一个桶中，依此类推。

以下是输入/输出示例：

Alice
Amy  
Barry
Carrie
David
Garret 
Edward
Henry
Ingrid
Fred
 65: Amy Alice 
 66: Barry 
 67: Carrie 
 68: David 
 69: Edward 
 70: Fred 
 71: Garret 
 72: Henry 
 73: Ingrid

我可以添加哪些算法让Amy和Alice放在自己的桶中？

Answer 1

您的函数hash_function实际上并未返回值。你应该多关注编译器的警告！

显然它恰好具有返回字符串中第一个字符的效果。这完全是武断的。在另一个平台上，它可能总是返回零，或导致您的计算机爆炸。（可能实际上并不是后者。）

至于制作更好的哈希函数：一旦你修复了这个bug，你就不会再发现哈希值只取决于第一个字符了。但是，你会发现例如“Brian”和“Brain”哈希值相同。这是你应该考虑的下一件事。

Answer 2

不要盲目地添加每个字母，而是为每个字母赋予一些权重，以便cpp，pcp，ppc都可以产生不同的哈希值。

这里的改进版本很少：

int hash_function(const string & s)
{
    double hashval = 0;
    int i = s.length();
    double weight = 1.0;
    while (i > 0)
    {
        hashval +=  weight * s[--i];
        weight *= 1.5;
    }       
    return (int) hashval;
}

假设字符串s不是太长，否则会出现溢出！

Answer 3

检查这些（由google sparsehash建议）：鲍勃詹金斯：http://burtleburtle.net/bob/hash/或 Paul Hsieh：http://www.azillionmonkeys.com/qed/hash.html

Answer 4

尝试以不同方式加权不同的字母。在您当前的实现中（假设它工作，如上所述），名称ab将散列为与ba相同的值。类似的东西：

for (int i = 0 to str.len())
    hash = hash + hash + str[i]

会为具有相同字母的两个字符串返回不同的值，但仍然非常简单。

创造更好的哈希函数

4 个答案: