哈希函数的结果几乎没有

时间:2014-12-04 00:04:54

标签: c hash

我正在C中进行散列处理。

int main(int argc, char *argv[]) {
    FILE *nums;
    FILE *out;
    nums = fopen("data.txt", "r");
    out = fopen("results.txt", "w");

    int i, j, used = 0, unused = 0, collision = 0, longChain = 0, chain = 0,
        bucketCount, numKeys, temp = 0;
    float avgChain = 0;

    fscanf(nums, "%d", &bucketCount);
    fscanf(nums, "%d", &numKeys);

    int buckets[bucketCount];

    for(i = 0; i <= bucketCount; i++)
        buckets[i] = 0;
    for(i = 0; i <= numKeys; i++) {
        fscanf(nums, "%d", &temp);
        j = hash(temp, numKeys);
        buckets[j]++;   
    }

    for(i = 0; i <= bucketCount; i++) {
        if(buckets[i] != 0) {
            used++;
            collision = collision + (buckets[i] - 1);
        }
        if(longChain < buckets[i])
            longChain = buckets[i];
        chain = chain + buckets[i];
    }
    avgChain = (double)chain / (double)used;

    fprintf(out, "----------------------------------\n");
    fprintf(out, "H A S H  S T A T I S T I C S\n");
    fprintf(out, "----------------------------------\n");
    fprintf(out, "Bucket Count: %d\n", bucketCount);
    fprintf(out, "Key Count: %d\n", numKeys);
    fprintf(out, "Used Bucket Count: %d\n", used);
    fprintf(out, "Unused Bucket Count: %d\n", bucketCount - used);
    fprintf(out, "Collision Count: %d\n", collision);
    fprintf(out, "Longest Chain Length: %d\n", longChain);
    fprintf(out, "Average Chain Length: %0.3f\n", avgChain);

    fclose(nums);
    fclose(out);

    return 0;
}

这里是哈希函数本身:

int hash(int i, int j) {
    int temp;
    temp = (i % j);
    return temp;
}

现在,我们获得了测试数据(如果需要,我可以发布,但它相当大)和数据的预期结果。 预期结果如下:

--------------------------------------
  H A S H I N G  S T A T I S T I C S 
--------------------------------------
Bucket count                      9997
Key count                        10000
Used bucket count                 6334
Unused (empty) bucket count       3663
Collision count                   3666
Longest chain length                 7
Average chain length             1.579
--------------------------------------

现在,我在运行时从程序中获得的结果如下:

--------------------------------------
  H A S H I N G  S T A T I S T I C S 
--------------------------------------
Bucket count                      9997
Key count                        10000
Used bucket count                 6336
Unused (empty) bucket count       3661
Collision count                   3662
Longest chain length                 7
Average chain length             1.578
--------------------------------------

我非常沮丧,因为除非我将数字硬编码到项目中,否则我似乎无法获得正确的结果,这是不允许的。

我做错了什么?

1 个答案:

答案 0 :(得分:0)

根据你的节目输出:

Bucket count                      9997
Key count                        10000

所以数组是

int buckets[9997];

然而hash()函数返回

data % 10000;

所以如果它被调用那么

j = hash(9999, 10000);

数组将被索引为

buckets[9999]++;

超出范围。