Java实现 - 如何提高哈希表的速度

时间:2014-09-26 04:30:11

标签: java performance key hashtable

我正在实现一个哈希表(根据要求)。它可以通过小输入正常工作,但不幸的是,在处理大量输入时它太慢了。我试过BufferedInputStream,但它没有任何区别。基本上我按照下面的逻辑实现了它。我有什么想法可以提高速度吗?是否存在导致性能不佳的特定功能?或者我们可能需要关闭扫描仪?

 int [] table = new int [30000];// creat an array as the table
 Scanner scan = new Scanner (System.in); //use scanner to read the input file. 
 while (scan.hasNextLine()) {
       //read one line at a time, and a sequence of int into an array list called keys  
       // functions used here is string.split(" ");
 }
 hashFuction{
       //use middle-squaring on each elements to the array list keys. 
       // Math.pow() and % 30000, which is the table size, to generate the hash value
       // assign table [hashvalue]= value
 }

1 个答案:

答案 0 :(得分:3)

首先,你应该现在该程序的哪个部分很慢。优化一切是一个愚蠢的想法,优化快速部分更糟糕。

  

Math.pow()和%30000,即表格大小

这是非常错误的。

  • 永远不要对哈希等事物使用浮点运算。它分布缓慢且分布不均。
  • 永远不要使用既不是2的幂也不是素数的表格大小。

您没有告诉我们您要渲染的内容以及原因......所以让我们假设您需要将一对两个整数映射到表中。

class IntPair {
    private int x;
    private int y;

    public int hashCode() {
        // the multiplier must be odd for good results
        // its exact value doesn't matter much, but it mustn't equal to your table size; ideally, it should be co-prime
        return 54321 * x + y;
    }

    public boolean equals() {
        do yourself
    }
}

//// Prime table size. The division is slow, but it works slightly better than power of two.

int[] table = new int[30011]; // this is a prime

int hashCodeToIndex(int hashCode) {
    int nonNegative = hashCode & Integer.MAX_VALUE;
    return nonNegative % table.length;
}

//// Power of two table size. No division, faster.

int[] table2 = new int[1<<15]; // this is 2**15, i.e., 32768

int smear(int hashCode) {
    // doing nothing may be good enough, if the hashCode is well distributed
    // otherwise, see e.g., https://github.com/google/guava/blob/c234ed7f015dc90d0380558e663f57c5c445a288/guava/src/com/google/common/collect/Hashing.java#L46
    return hashCode;
}

int hashCodeToIndex(int hashCode) {
    // the "&" cleans all unwanted bits
    return smear(hashCode) & (table2.length - 1);
}

// an alternative, explanation upon request
int hashCodeToIndex2(int hashCode) {
    return smear(hashCode) >>> 17;
}