Question

几周前，我在接受采访时遇到了以下问题，而且我们无法弄明白。

＆＃34;给定的数字流太大而无法放入内存中。随机返回流中的数字。您可以在流上使用的唯一方法是布尔值hasNext（）和int getNext（）。数字范围类型很长＆＃34;

Answer 1

考虑到问题的限制，可以合理地假设算法看起来像这样：

// Returns a uniformly distributed number between 0 and 1
double rand();

// Some unknown function that we are looking for
double f(long);

long choose() {
    long cur = 0;   // Number that we've chosen so far
    long pos = 0    // Just tracks the position where we picked
                    // cur, not actually used
    long count = 0; // Number of the current iteration
    while (hasNext()) {
        int next = getNext();
        if (rand() < f(count)) {
            cur = next;
            pos = count;
        }
        count++;
    }
    return cur;
}

因此，问题变成：我们如何实施f来获得制服分配？很明显，在执行count次迭代后，变量pos应该是一个均匀分布的数字 0和count - 1。即，

prob(pos, count) = (pos <= count - 1) ? 1 / count : 0

对于if hasNext()在下一次迭代中返回false，我们希望如此保证cur具有所需的分布。在另一如果hasNext()返回true并且我们的不变量成立，则表示我们将挑选下一个元素的概率 f(count)。因此，如果我们想在结束时保持我们的不变量迭代，我们应该做

double f(long count) {
    return 1 / (double) (count + 1);
}

然后我们需要确保这会重新调整概率对于之前以正确方式出现的所有元素。但这是很清楚，因为我们的不变

prob(pos, count+1) = (pos == count) ? f(count) :
                                      (1 - f(count)) * prob(pos, count)
                   = (pos == count) ? f(count) :
                     (pos <  count) ? (1 - f(count)) * 1 / count : 0
                   = (pos == count) ? f(count) :
                     (pos <  count) ? count / (count + 1) / count : 0
                   = (pos == count) ? f(count) :
                     (pos <  count) ? f(count) : 0
                   = (pos <= count + 1 - 1) ? f(count) : 0

如何随机均匀地选择未知大小的流的元素？

1 个答案: