SuperFastHash为相等的字符串返回不同的哈希值,但仅在由不同的函数调用确定时才会返回

时间:2015-07-13 11:02:48

标签: c hashtable hash-function

所以,我的SFH功能:

/*  
 * Hash function (found at: 'http://www.azillionmonkeys.com/qed/hash.html')  
 */ 
int32_t SuperFastHash(const char * data, int len)  {
    uint32_t hash = len, tmp;
    int rem;

    if (len <= 0 || data == NULL) return 0;

    rem = len & 3;
    len >>= 2;

    /* Main loop */
    for (;len > 0; len--) {
        hash  += get16bits (data);
        tmp    = (get16bits (data+2) << 11) ^ hash;
        hash   = (hash << 16) ^ tmp;
        data  += 2*sizeof (uint16_t);
        hash  += hash >> 11;
    }

    /* Handle end cases */
    switch (rem) {
        case 3: hash += get16bits (data);
                hash ^= hash << 16;
                hash ^= ((signed char)data[sizeof (uint16_t)]) << 18;
                hash += hash >> 11;
                break;
        case 2: hash += get16bits (data);
                hash ^= hash << 11;
                hash += hash >> 17;
                break;
        case 1: hash += (signed char)*data;
                hash ^= hash << 10;
                hash += hash >> 1;
    }

    /* Force "avalanching" of final 127 bits */
    hash ^= hash << 3;
    hash += hash >> 5;
    hash ^= hash << 4;
    hash += hash >> 17;
    hash ^= hash << 25;
    hash += hash >> 6;

    // Limits hashes to be within the hash table    
    return hash % HT_LENGTH; 
}

看起来它的工作正常,(因为除了最后一行之外的一切都没有被我改变)。

这是我将字典加载到哈希表中的函数,哈希表似乎也在工作中。

bool load(const char* dictionary)
{
    // declares file pointer
    FILE* dictptr = fopen(dictionary, "r");

    // declare temp index
    uint32_t index = 0;

    // read words, one by one
    while(true)
    {

        // malloc node
        node* new_node = malloc(node_size);

        // insert word into node, if fscanf couldn't scan word; we're done
        if (fscanf(dictptr, "%s", new_node->word) != 1)
        {
            return true;
        }

        // hash word - HASH FUNCTION CALL -
        index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));

        // check if head node has been assigned with value
        if (!strcmp(hashtable[index].word,""))
        {
            // declare hashtable[index] to new_node
            hashtable[index] = *new_node;

            //increment size
            hashtablesize++;
        }

        else
        {
            // if node is initialized, insert after head 
            new_node->next = hashtable[index].next;
            hashtable[index].next = new_node;

            //increment size
            hashtablesize++;
        }
    } 
}

最后,我的检查功能会根据哈希表检查一个单词。

bool check(const char* keyword)
{

    // gets index from SFH
    uint32_t index = SuperFastHash(keyword, sizeof(keyword));

    // declares head pointer to the pointer of the index'd element of hashtable
    node* head = &hashtable[index];

    // if word of head is equal to keyword, return true 
    // else continue down chain till head is null or key is found
    while (head != NULL)
    {
        if (!strcmp(head->word, keyword))
        {
            return true;
        }
        head = head->next;
    }
    return false;
}

注意:当使用不同的哈希函数时,一切正常,所以我怀疑问题与len参数或实际的SFH函数有关。

我已经用lldb检查了索引返回的内容,例如&#34; cat&#34;不等于&#34; cat&#34;驻留在哈希表中。也就是说,函数调用在load中返回的索引。

1 个答案:

答案 0 :(得分:1)

一些事情......

  1. 作为提及的评论者,使用sizeof()将无法为您提供正确的字符串长度。例如,更改

    index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));
    

    index = SuperFastHash(&new_node->word[0], strlen(new_node->word));
    
  2. 阅读完字典文件后,您无法拨打fclose()。如果fopen()成功,则应致电fclose()

  3. 以下代码看起来有点可疑:

    // check if head node has been assigned with value
    if (!strcmp(hashtable[index].word,""))
    {
        // declare hashtable[index] to new_node
        hashtable[index] = *new_node;
    
        //increment size
        hashtablesize++;
    }
    
  4. 如果哈希表在开始时已完全初始化,您是否需要递增hashtablesize?如果哈希表未完全初始化,则对尚未初始化的条目调用strcmp()可能会出现问题。您没有显示声明或初始化代码,所以它不是100%清楚这是否实际上是一个问题,但可能需要仔细检查。