我确定这是一个简单的问题,但是我看不到明显的解决方案...如果我有一个带有 m 箱的哈希表,并将该哈希表放入此 n < m 个键,那么没有bin接收到超过 k 个哈希键的概率是多少?我试图弄清楚如果我填满一个表来加载 n / m 然后重新哈希直到不超过 k 在任何bin中发生冲突(显然是 k > n / m )。
答案 0 :(得分:1)
在均匀分布的情况下,这与将球扔到垃圾箱中是一样的,{。{3}}中的研究由M. Raab和A. Steger进行。
这与"Balls into Bins - A Simple and Tight Analysis"有点相关,但是在这里您只使用一个哈希函数。
因为这是stackoverflow.com,所以我为您提供了一个可用于验证公式的模拟程序。据此,它还取决于球/桶的数量,而不仅取决于每个桶的平均球数量。
public static void main(String... args) throws InterruptedException {
for (int k = 1; k < 4; k++) {
test(10, 30, k);
test(100, 300, k);
}
}
public static void test(int ballCount, int binCount, int k) {
int rehashCount = 0;
Random r = new Random(1);
int testCount = 100000000 / ballCount;
for(int test = 0; test < testCount; test++) {
long[] balls = new long[ballCount];
int[] bins = new int[binCount];
for (int i = 0; i < ballCount; i++) {
balls[i] = r.nextLong();
}
// it's very unlikely to get duplicates, but test
Arrays.sort(balls);
for (int i = 1; i < ballCount; i++) {
if (balls[i - 1] == balls[i]) {
throw new AssertionError();
}
}
int universalHashId = 0;
boolean rehashNeeded = false;
for (int i = 0; i < ballCount; i++) {
long x = balls[i];
// might as well do y = x
long y = supplementalHashWeyl(x, universalHashId);
int binId = reduce((int) y, binCount);
if (++bins[binId] > k) {
rehashNeeded = true;
break;
}
}
if (rehashNeeded) {
rehashCount++;
}
}
System.out.println("balls: " + ballCount + " bins: " + binCount +
" k: " + k + " rehash probability: " + (double) rehashCount / testCount);
}
public static int reduce(int hash, int n) {
// http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
return (int) (((hash & 0xffffffffL) * n) >>> 32);
}
public static int supplementalHashWeyl(long hash, long index) {
long x = hash + (index * 0xbf58476d1ce4e5b9L);
x = (x ^ (x >>> 32)) * 0xbf58476d1ce4e5b9L;
x = ((x >>> 32) ^ x);
return (int) x;
}
输出:
balls: 10 bins: 30 k: 1 rehash probability: 0.8153816
balls: 100 bins: 300 k: 1 rehash probability: 1.0
balls: 10 bins: 30 k: 2 rehash probability: 0.1098305
balls: 100 bins: 300 k: 2 rehash probability: 0.777381
balls: 10 bins: 30 k: 3 rehash probability: 0.0066018
balls: 100 bins: 300 k: 3 rehash probability: 0.107309