我已经实现了一个外部mergesort来排序一个由Java int原语组成的文件,但是它非常慢(幸运的是它至少可以工作)。 排序方法很少发生;它只是递归地调用与blockSize的合并,每次调用加倍,并且每次都交换输入和输出文件。 我可以在这里失去那么多时间吗?
//Merge stage of external mergesort
//Read from input file, already sorted into blocks of size blockSize
//Write to output file, sorted into blocks of 2*blockSize
public static void merge(String inputFile, String outputFile, long blockSize)
throws IOException
{
//readers for block1/2
FileInputStream fis1 = new FileInputStream(inputFile);
DataInputStream dis1 = new DataInputStream(fis1);
FileInputStream fis2 = new FileInputStream(inputFile);
DataInputStream dis2 = new DataInputStream(fis2);
//writer to output file
FileOutputStream fos = new FileOutputStream(outputFile);
DataOutputStream dos = new DataOutputStream(fos);
// merging 2 sub lists
// go along pairs of blocks in inputFile
// continue until end of input
//initialise block2 at right position
dis2.skipBytes((int) blockSize);
//while we haven't reached the end of the file
while (dis1.available() > 0)
{
// if block1 is last block, copy block1 to output
if (dis2.available() <= 0)
{
while (dis1.available() > 0)
dos.writeInt(dis1.readInt());
break;
}
// if block1 not last block, merge block1 and block2
else
{
long block1Pos = 0;
long block2Pos = 0;
boolean block1Over = false;
boolean block2Over = false;
//data read from each block
int e1 = dis1.readInt();
int e2 = dis2.readInt();
//keep going until fully examined both blocks
while (!block1Over | !block2Over)
{
//copy from block 1 if:
// block1 hasnt been fully examined AND
// block1 element less than block2s OR block2 has been fully examined
while ( !block1Over & ((e1 <= e2) | block2Over) )
{
dos.writeInt(e1); block1Pos += 4;
if (block1Pos < blockSize & dis1.available() > 0)
e1 = dis1.readInt();
else
block1Over = true;
}
//same for block2
while ( !block2Over & ((e2 < e1) | block1Over) )
{
dos.writeInt(e2); block2Pos += 4;
if (block2Pos < blockSize & dis2.available() > 0)
e2 = dis2.readInt();
else
block2Over = true;
}
}
}
// skip to next blocks
dis1.skipBytes((int) blockSize);
dis2.skipBytes((int) blockSize);
}
dis1.close();
dis2.close();
dos.close();
fos.close();
}
答案 0 :(得分:0)
没有缓冲。在任何地方添加BufferedInputStreams和BufferedOutputStreams。
滥用available()。它不是流结束的有效测试,每次调用它都是一个额外的系统调用。只需等待流指示的真实结束。
非最佳初始分布。您收到单个块大小的事实表明您没有使用替换选择分配,因此您的初始运行最多可能是它们的一半。这对所需的合并传递数量具有指数影响。
不平衡合并。您需要在合并阶段的开始添加虚拟运行,以便您的上一次合并是N路,而不是在最坏的情况下,双向。这可以节省几乎整个数据的额外传递。因此,在开始合并之前,您需要知道初始运行的次数。