C哈希表大小调整问题

时间:2017-11-30 17:25:08

标签: c hash

首先我要说的是我非常基本的C语言。我使用tutorialspoint网站上的代码实现了哈希表。

我的程序正在从文本文件中读取正整数,将文件中出现的次数和次数分别作为键和值。它实际上很好用,但我遇到了一个大问题:我需要事先知道哈希表的大小。我注意到,当我跳过一个负数时,它奇怪地弄乱了桌子的结构。

这是我的代码(如果使用20的大小):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>

#define SIZE 20

struct DataItem {
  int key;
  int data;
};

struct DataItem* hashArray[SIZE];
struct DataItem* item;

int hashCode(int key) {
  return key % SIZE;
}

struct DataItem *search(int key) {
  int hashIndex = hashCode(key);

  while(hashArray[hashIndex] != NULL) {

    if(hashArray[hashIndex]->key == key)
    return hashArray[hashIndex];

    ++hashIndex;
    hashIndex %= SIZE;
  }

  return NULL;
}

void insert(int key,int data) {
  struct DataItem *item = (struct DataItem*) malloc(sizeof(struct DataItem));
  item->data = data;
  item->key = key;
  int hashIndex = hashCode(key);

  while(hashArray[hashIndex] != NULL && hashArray[hashIndex]->key != -1) {
    ++hashIndex;
    hashIndex %= SIZE;
  }

  hashArray[hashIndex] = item;
}

void display() {
  int key_index;
  for(key_index = 0; key_index<SIZE; key_index++) {
    if(hashArray[key_index] != NULL) {
      printf("%d has occurred %d time(s)\n", hashArray[key_index]->key, hashArray[key_index]->data);
    }
  }
  printf("\n");
}

void sort(size_of_struct) {
  int i, tempkey, tempdata;

  if(size_of_struct == 1) {
    return;
  }

  for(i=0; i<(size_of_struct-1); i++) {
    if(hashArray[i] != NULL) {
      if(hashArray[i]->key > hashArray[i+1]->key) {
        tempkey = hashArray[i]->key;
        hashArray[i]->key = hashArray[i+1]->key;
        hashArray[i+1]->key = tempkey;

        tempdata = hashArray[i]->data;
        hashArray[i]->data = hashArray[i+1]->data;
        hashArray[i+1]->data = tempdata;
      }
    }
  }
  sort(size_of_struct-1);
}

int main() {
  FILE *myFile;
  myFile = fopen("test.txt", "r");

  int *numberArray = NULL;
  int i, count, number, current_num, temp;
  count = number = temp = 0;

  numberArray = malloc(sizeof(int)*500);

  while(fscanf(myFile, "%d", &number) > 0) {
    if(number >= 0 && number <= 100) {
        numberArray[count] = number;
        count++;
    }
  }

  for(i=0; i < count; i++)
  {
    item = search(numberArray[i]);
    if(item == NULL) {
      insert(numberArray[i], 1);
    } else {
      item = search(numberArray[i]);
      item->data += 1;
    }
    free(numberArray);
    numberArray = malloc(sizeof(int)*500);
  }

  display();
  sort(SIZE);
}

如果我运行此代码,我会得到以下结果:

80 has occurred 1 time(s)
100 has occurred 1 time(s)
2 has occurred 1 time(s)
3 has occurred 2 time(s)
84 has occurred 1 time(s)
12 has occurred 1 time(s)
33 has occurred 1 time(s)
99 has occurred 3 time(s)

Segmentation fault: 11

如果我使用SIZE为8(这是我在文件中使用的数字总数),那么我得到以下结果:

80 has occurred 1 time(s)
33 has occurred 1 time(s)
2 has occurred 1 time(s)
99 has occurred 3 time(s)
3 has occurred 2 time(s)
12 has occurred 1 time(s)
100 has occurred 1 time(s)
84 has occurred 1 time(s)

2 has occurred 1 time(s)
3 has occurred 2 time(s)
12 has occurred 1 time(s)
33 has occurred 1 time(s)
80 has occurred 1 time(s)
84 has occurred 1 time(s)
99 has occurred 3 time(s)
100 has occurred 1 time(s)

经过进一步调查,我发现在我的显示功能中发生了这种情况:

key_index: 0
80 has occurred 1 time(s)
key_index: 1
100 has occurred 1 time(s)
key_index: 2
2 has occurred 1 time(s)
key_index: 3
3 has occurred 2 time(s)
key_index: 4
84 has occurred 1 time(s)
key_index: 5
key_index: 6
key_index: 7
key_index: 8
key_index: 9
key_index: 10
key_index: 11
key_index: 12
12 has occurred 1 time(s)
key_index: 13
33 has occurred 1 time(s)
key_index: 14
key_index: 15
key_index: 16
key_index: 17
key_index: 18
key_index: 19
99 has occurred 3 time(s)

Segmentation fault: 11

我不完全确定发生了什么。我知道为什么发生分段错误:它试图引用该索引处的null值。我不知道的是为什么(显示函数内部)跳过5-11和14-18的key_index值;这是我真正关心的问题。我假设因为我没有使用文件中的所有数字,结构正试图填充它,以便总有20个空格。我错了吗?有没有办法解决这个问题?

我的test.txt文件:

99 2 99
3
80 12 -12 33
3 99 100 1234 84

1 个答案:

答案 0 :(得分:1)

嘿,看起来你在那里遇到了分段错误。 Darn,那些可能非常讨厌追踪。正如@underscore_d提到的那样,您可以通过调试器运行程序,以检查代码是否在飞行中。

但是,幸运的是,我们提供了一个(大多数)Minimal, Complete and Verifiable示例供我们查看。我提供了自己的test.txt文件,但能够重现崩溃。现在,这可能不是您所看到的崩溃,因为我没有test.txt,但希望以下分析可以帮助您。

由于您的示例很简短,您还可以考虑通过一些分析工具运行代码。我定期通过AddressSanitizerValgrind运行我的所有代码,以避免引入这类错误。在您的情况下,AddressSanitizer精确定位错误的位置。

$ gcc -fsanitize=address -g test.c -o test
$ ./test
80 has occurred 1 time(s)

ASAN:DEADLYSIGNAL
=================================================================
==18300==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x561f6050b39f bp 0x7ffe4d6866b0 sp 0x7ffe4d686690 T0)
    #0 0x561f6050b39e in sort /home/v1bri/test.c:67
    #1 0x561f6050bae2 in main /home/v1bri/test.c:111
    #2 0x7f60938ee3f0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x203f0)
    #3 0x561f6050acb9 in _start (/home/v1bri/test+0xcb9)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/v1bri/test.c:67 in sort
==18300==ABORTING

错误发生在sort()函数的第67行。果然......

void sort(size_of_struct) {
// ...

  for(i=0; i<(size_of_struct-1); i++) {
    if(hashArray[i] != NULL) {
      if(hashArray[i]->key > hashArray[i+1]->key) {

您在NULL处取消引用指针时忘记检查hashArray[i+1]。 Valgrind会给你带来更多错误,但也会找到罪魁祸首。

$ gcc -g test.c -o test
$ valgrind --tool=memcheck ./test
==18764== Memcheck, a memory error detector
==18764== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==18764== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info

<snip>

==18764== Invalid read of size 4
==18764==    at 0x108B0D: sort (test.c:67)
==18764==    by 0x108DA7: main (test.c:111)
==18764==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==18764==
==18764==
==18764== Process terminating with default action of signal 11 (SIGSEGV)
==18764==  Access not within mapped region at address 0x0
==18764==    at 0x108B0D: sort (test.c:67)
==18764==    by 0x108DA7: main (test.c:111)

我知道你提到你知道&#34;非常基本的,基本的C&#34;但希望你能看到使用现代调试工具测试程序错误是多么容易。学习工具对于新的程序员来说可能是令人生畏的,但这实际上是学习的最佳时间,因为它可以为你节省很多时间。