我实现了一个API,该API必须在外部(使用硬盘驱动器空间)对数字进行排序,给定输入文件为大端字节格式。我限于100MB RAM(VM参数为-Xmx100m标志)。但是,使用我的程序对价值十亿字节的整数(1,000,000,000)进行排序大约需要30分钟,这太长了。我不是为什么我的程序效率如此之低。
现在,我将输入文件分成几个临时文件,用quicksort对其进行排序,然后使用最小堆(https://www.geeksforgeeks.org/external-sorting/)合并。但是,正如我之前所说,这在实践中太慢了。
MinHeapNode[] arr = new MinHeapNode[numTempFiles];
FileInputStream[] read = new FileInputStream[numTempFiles];
for(int i = 0; i < read.length; i++)
try {
read[i] = new FileInputStream(i + "");
} catch (Exception e) {
e.printStackTrace();
}
for(int i = 0; i < this.numTempFiles; i++) {
arr[i] = new MinHeapNode();
arr[i].i = i;
try {
byte[] bytes = new byte[4];
read[i].read(bytes);
arr[i].element = ByteBuffer.wrap(bytes).order(ByteOrder.BIG_ENDIAN).getInt();
} catch (NumberFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
MinHeap heap = new MinHeap(arr, numTempFiles);
int count = 0;
FileOutputStream fos = null;
try {
fos = new FileOutputStream(outputfile);
} catch (FileNotFoundException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
while(count != numTempFiles) {
MinHeapNode root = heap.getMin();
byte[] bytes = ByteBuffer.allocate(4).order(ByteOrder.BIG_ENDIAN).putInt(root.element).array();
try {
fos.write(bytes);
} catch (IOException e1) {
e1.printStackTrace();
}
try {
byte[] bytearr = new byte[4];
read[root.i].read(bytearr);
root.element = ByteBuffer.wrap(bytearr).order(ByteOrder.BIG_ENDIAN).getInt();
} catch(Exception e) {
root.element = Integer.MAX_VALUE;
count++;
}
heap.replaceMin(root);
}
for(int i = 0; i < numTempFiles; i++)
try{ read[i].close(); }
catch(Exception e) { e.printStackTrace(); }
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
}```
Here are some runtimes for filesizes:
2 seconds and 231 milliseconds for 1,000,000 bytes
20 seconds and 330 milliseconds for 10,000,000 bytes
4 minutes, 36 seconds, 940 milliseconds for 100,000,000 bytes
35 minutes, 55 seconds, 642 milliseconds for 1,000,000,000 bytes