Question

我已经实现了二进制搜索，线性搜索和哈希表来比较每个时间复杂度。问题是，当我测量查找素数的时间时，我的哈希表比二进制搜索慢得多。以下是我的代码：

// Make the hash table 20 times the number of prime numbers
HashTable::HashTable(std::vector<int> primes)
{
    int tablesize = primes.size() * 20;
    table = new std::list<int>[tablesize];
    size = tablesize;
    for (auto &prime : primes)
        this->insert(prime);
}

// Hash function
int HashTable::hash(int key)
{
    return key % size;
}

// Finds element
int HashTable::find(int key)
{
    // Get index from hash
    int index = hash(key);

    // Find element
    std::list<int>::iterator foundelement = std::find(table[index].begin(), table[index].end(), key);


    // If element has been found return index
    // If not, return -1
    if (foundelement != table[index].end())
        return index;
    else
        return -1;
}



// Adds element to hashtable
void HashTable::insert(int element)
{
    // Get index from hash and insert the element
    int index = hash(element);
    table[index].push_back(element);
}

HashTable.h

#ifndef HASHTABLE_H
#define HASHTABLE_H

#include <list>
#include <iostream>
#include <vector>

class HashTable 
{
private:
    // Each position in Hashtable has an array of lists to store elements in case of collision
    std::list<int>* table;

    // Size of hashtable
    int size;

    // Hashfunction that returns the array location for the given key
    int hash(int key);

public:

    HashTable(int tablesize);
    HashTable(std::vector<int> primes);

    // Adds element to hashtable
    void insert(int element);

    // Deletes an element by key 
    void remove(int key);

    // Returns an element from hashtable for a given key
    int find(int key);

    // Displays the hashtable
    void printTable();

    // Display histogram to illustrate elements distribution
    void printHistogram();

    // Returns the number of lists in hash table
    int getSize();

    // Returns the total number of elements in hash table
    int getNumberOfItems();

    // De-allocates all memory used for the Hash Table.
    ~HashTable();
};

#endif

我已经尝试超过表格大小以消除碰撞，但我没有注意到任何差异。

This is the result

Answer 1

哈希表实现不太理想的一些事情：

primes.size() * 20过多 - 您将获得比必要更多的缓存未命中数;尝试1到2之间的一系列值来找到最佳点
primes.size() * 20始终是偶数，并且您使用key % size哈希的所有素数都是奇数，因此您永远不会占用一半的桶，浪费空间并降低缓存性能
你处理与链接列表的冲突：这意味着你总是至少跟一个指针远离表的连续内存，这是缓慢的，而对于你在内存中跳转的冲突列表中的每个节点;使用std::vector<int>存储冲突值会限制跳转到哈希表外的1个内存区域，或者您可以使用闭合哈希/开放寻址和位移列表来查找附近哈希表桶中的元素：我的基准测试有发现类似int值的速度快了一个数量级。

Answer 2

如果您的数据完全是随机的，则可能很难为模运算找到一个好的常量。如果您的数据遵循某种模式，您可能需要尝试运行一系列候选常量，以查看哪一个对您的数据表现最佳。

在this帖子中，我展示了如何构建这样一个大规模的测试。最后，我的哈希表在1.5比较中产生了平均查找，最差情况为14.该表包含16000个条目，大约为2 ^ 14。

Answer 3

所有关于复杂性的二进制搜索都是O（log n），你的发现是线性的，所以O（n），在你有很多碰撞时最差的一点。

我的哈希表比二进制搜索慢

3 个答案: