随机数外部排序

时间:2017-09-05 15:12:23

标签: c++ algorithm sorting random binaryfiles

我需要编写一个生成N个随机数的程序,并按降序将它们写入二进制文件。应该在不使用任何使用主存储器的排序算法的情况下完成。这就是我到目前为止所做的:

#include <iostream>
#include <fstream> 
#include <ctime>
#include <cstdlib>

using namespace std;
int main () {
  srand(time(0));
  rand();
  int N;
  do{
    cout << "Unesite N: ";
    cin >> N;
    } while(N<=0);

  ofstream br("broj.dat", ios::binary | ios::trunc);

  for(int i = 0; i<N; i++){
    int a = rand();
    br.write((char *)&a, sizeof(a));
  }
  br.close();

  return 0;
}

所以,我已经生成了随机数并将它们写入二进制文件,但我不知道如何对它进行排序。

3 个答案:

答案 0 :(得分:4)

您可以按线性时间的排序顺序生成数字。描述如何做到这一点的论文是:通过Bentley&amp;生成随机数的排序列表。萨克斯

https://pdfs.semanticscholar.org/2dbc/4e3f10b88832fcd5fb88d34b8fb0b0102000.pdf

/**
 * Generate an sorted list of random numbers sorted from 1 to 0, given the size
 * of the list being requested.
 * 
 * This is an implementation of an algorithm developed by Bentley and Sax, and
 * published in in ACM Transactions on Mathematical Software (v6, iss3, 1980) on
 * 'Generating Sorted Lists of Random Numbers'.
 */
public class SortedRandomDoubleGenerator {
    private long       valsFound;
    private double     curMax;
    private final long numVals;

    /**
     * Instantiate a generator of sorted random doubles.
     * 
     * @param numVals the size of the list of sorted random doubles to be
     *        generated
     */
    public SortedRandomDoubleGenerator(long numVals) {
        curMax = 1.0;
        valsFound = 0;
        this.numVals = numVals;
    }

    /**
     * @return the next random number, in descending order.
     */
    public double getNext() {
        curMax = curMax
                * Math.pow(Math.E, Math.log(RandomNumbers.nextDouble())
                        / (numVals - valsFound));
        valsFound++;
        return curMax;
    }
}

答案 1 :(得分:0)

这是我如何做的伪代码。

for i in 1..N:
    write rand() to new file
    push onto file stack (new file, size=1)
    while 2 < len(file stack) and size of top two files the same:
        pop top two and merge them
        push onto file stack (merged file, size=new size)

while 2 < len(file stack):
    pop top two and merge them
    push onto file stack (merged file, size=new size)

The top of the file stack is your new sorted file.

答案 2 :(得分:0)

标准库具有合并排序,但您需要使用随机访问迭代器。如果你可以使用mmap(或它的等价物),你有随机访问迭代器(是的,我知道你需要从命令行中取COUNT):

#include <algorithm>
#include <cstdio>
#include <random>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>

const size_t COUNT = 4096 * 4096;

int main()
{
    // create file (using mmap for simplicity)
    int fd = open("out.dat", O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
    if (fd < 0) {
        std::perror("open failed");
        return 1;
    }
    if (ftruncate(fd, COUNT * sizeof(unsigned)) != 0) {
        std::perror("ftruncate failed");
        close(fd);
        return 1;
    }
    void* mm = mmap(nullptr, COUNT * sizeof(unsigned), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mm == MAP_FAILED) {
        std::perror("mmap failed");
        close(fd);
        return 1;
    }
    close(fd);

    // populate file
    unsigned* begin = static_cast<unsigned*>(mm);
    std::default_random_engine rng((std::random_device())());
    std::generate_n(begin, COUNT, rng);
    msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
    std::puts("file written");

    // sort file
    std::stable_sort(begin, begin + COUNT);
    msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
    std::puts("file sorted");

    if (std::is_sorted(begin, begin + COUNT)) {
        std::puts("it's properly sorted");
    }

    // close file
    munmap(mm, COUNT * sizeof(unsigned));
    return 0;
}

实际上并不需要msync次来电。我真的很惊讶这有不错的表现。