我正在C中进行散列处理。
int main(int argc, char *argv[]) {
FILE *nums;
FILE *out;
nums = fopen("data.txt", "r");
out = fopen("results.txt", "w");
int i, j, used = 0, unused = 0, collision = 0, longChain = 0, chain = 0,
bucketCount, numKeys, temp = 0;
float avgChain = 0;
fscanf(nums, "%d", &bucketCount);
fscanf(nums, "%d", &numKeys);
int buckets[bucketCount];
for(i = 0; i <= bucketCount; i++)
buckets[i] = 0;
for(i = 0; i <= numKeys; i++) {
fscanf(nums, "%d", &temp);
j = hash(temp, numKeys);
buckets[j]++;
}
for(i = 0; i <= bucketCount; i++) {
if(buckets[i] != 0) {
used++;
collision = collision + (buckets[i] - 1);
}
if(longChain < buckets[i])
longChain = buckets[i];
chain = chain + buckets[i];
}
avgChain = (double)chain / (double)used;
fprintf(out, "----------------------------------\n");
fprintf(out, "H A S H S T A T I S T I C S\n");
fprintf(out, "----------------------------------\n");
fprintf(out, "Bucket Count: %d\n", bucketCount);
fprintf(out, "Key Count: %d\n", numKeys);
fprintf(out, "Used Bucket Count: %d\n", used);
fprintf(out, "Unused Bucket Count: %d\n", bucketCount - used);
fprintf(out, "Collision Count: %d\n", collision);
fprintf(out, "Longest Chain Length: %d\n", longChain);
fprintf(out, "Average Chain Length: %0.3f\n", avgChain);
fclose(nums);
fclose(out);
return 0;
}
这里是哈希函数本身:
int hash(int i, int j) {
int temp;
temp = (i % j);
return temp;
}
现在,我们获得了测试数据(如果需要,我可以发布,但它相当大)和数据的预期结果。 预期结果如下:
--------------------------------------
H A S H I N G S T A T I S T I C S
--------------------------------------
Bucket count 9997
Key count 10000
Used bucket count 6334
Unused (empty) bucket count 3663
Collision count 3666
Longest chain length 7
Average chain length 1.579
--------------------------------------
现在,我在运行时从程序中获得的结果如下:
--------------------------------------
H A S H I N G S T A T I S T I C S
--------------------------------------
Bucket count 9997
Key count 10000
Used bucket count 6336
Unused (empty) bucket count 3661
Collision count 3662
Longest chain length 7
Average chain length 1.578
--------------------------------------
我非常沮丧,因为除非我将数字硬编码到项目中,否则我似乎无法获得正确的结果,这是不允许的。
我做错了什么?
答案 0 :(得分:0)
根据你的节目输出:
Bucket count 9997
Key count 10000
所以数组是
int buckets[9997];
然而hash()
函数返回
data % 10000;
所以如果它被调用那么
j = hash(9999, 10000);
数组将被索引为
buckets[9999]++;
超出范围。