Question

问题是，给定一个包含40亿个唯一整数的输入文件，提供一个算法来生成一个未包含在文件中的整数，假设只有10 MB的内存。

搜索下面的一些解决方案和发布的代码，其中之一是将整数存储到位向量块中（每个块表示40亿范围内的特定整数范围，块中的每个位表示整数），以及使用每个块的另一个计数器，计算每个块中的整数数。因此，如果整数的数量小于整数的块容量，则扫描块的位向量以找到缺少整数的块。

我对这个解决方案的疑问是，为什么＆＃34;我们选择的中间距离越近，在任何给定时间内使用的内存越少＆＃34; ，这里有更多的背景，

第一遍中的数组可以容纳10兆字节，或大约2 ^ 23字节的内存。由于数组中的每个元素都是int，而int是4个字节，因此我们可以容纳一个最多约2 ^ 21个元素的数组。因此，我们可以推断出以下内容：

因此，我们可以得出如下结论： 2 ^ 10＆LT; rangeSize＆lt; 2 ^ 26，这些条件为我们提供了大量的摆动空间，＆＃34;但是我们选择的中间距离越近，在任何给定时间内使用的内存就越少。

public class QuestionB {
    public static int bitsize = 1048576; // 2^20 bits (2^17 bytes)
    public static int blockNum = 4096; // 2^12
    public static byte[] bitfield = new byte[bitsize/8];
    public static int[] blocks = new int[blockNum];

    public static void findOpenNumber() throws FileNotFoundException {
        int starting = -1;
        Scanner in = new Scanner (new FileReader ("Chapter 10/Question10_3/input_file_q10_3.txt"));
        while (in.hasNextInt()) {
            int n = in.nextInt();
            blocks[n / (bitfield.length * 8)]++;
        }

        for (int i = 0; i < blocks.length; i++) {
            if (blocks[i] < bitfield.length * 8){
                /* if value < 2^20, then at least 1 number is missing in
                 * that section. */
                starting = i * bitfield.length * 8;
                break;
            }
        }

        in = new Scanner(new FileReader("Chapter 10/Question10_3/input_file_q10_3.txt"));
        while (in.hasNextInt()) {
            int n = in.nextInt();
            /* If the number is inside the block that’s missing 
             * numbers, we record it */
            if (n >= starting && n < starting + bitfield.length * 8) {
                bitfield [(n-starting) / 8] |= 1 << ((n - starting) % 8);
            }
        }

        for (int i = 0 ; i < bitfield.length; i++) {
            for (int j = 0; j < 8; j++) {
                /* Retrieves the individual bits of each byte. When 0 bit 
                 * is found, finds the corresponding value. */
                if ((bitfield[i] & (1 << j)) == 0) {
                    System.out.println(i * 8 + j + starting);
                    return;
                }
            }
        }       
    }

    public static void main(String[] args) throws FileNotFoundException {
        findOpenNumber();
    }

}

Answer 1

如果形成尺寸为2 ^ 32 / M的M个块，则所需的总存储量为M + 2 ^ 27 / M个字（32位）。当M =√2^ 27时，该函数达到最小值，该值在1和2 ^ 27个块之间。最小值为2 ^ 14.5个字，大约92 KB。

这在bilogarithmic情节中非常清楚。

Answer 2

我喜欢这个问题。我会额外考虑，但我认为如果磁盘空间和时间不是问题，你可以将数字分成100k块，并在每个文件中对它们进行排序。任何没有100k条目的块都会有间隙。它根本不优雅，但它会让球滚动。

使用有限的内存找到丢失的号码

2 个答案: