我的外部分类程序的哪一部分效率低下/瓶颈?

时间:2019-06-13 18:16:16

标签: java sorting bigdata external-sorting

我实现了一个API,该API必须在外部(使用硬盘驱动器空间)对数字进行排序,给定输入文件为大端字节格式。我限于100MB RAM(VM参数为-Xmx100m标志)。但是,使用我的程序对价值十亿字节的整数(1,000,000,000)进行排序大约需要30分钟,这太长了。我不是为什么我的程序效率如此之低。

现在,我将输入文件分成几个临时文件,用quicksort对其进行排序,然后使用最小堆(https://www.geeksforgeeks.org/external-sorting/)合并。但是,正如我之前所说,这在实践中太慢了。

      MinHeapNode[] arr = new MinHeapNode[numTempFiles];
      FileInputStream[] read = new FileInputStream[numTempFiles];
      for(int i = 0; i < read.length; i++)
          try {
              read[i] = new FileInputStream(i + "");
          } catch (Exception e) {
              e.printStackTrace();
          }
      for(int i = 0; i < this.numTempFiles; i++) {
          arr[i] = new MinHeapNode();
          arr[i].i = i;
          try {
              byte[] bytes = new byte[4];
              read[i].read(bytes);
              arr[i].element = ByteBuffer.wrap(bytes).order(ByteOrder.BIG_ENDIAN).getInt();
          } catch (NumberFormatException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
          } catch (IOException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
          }

      }

      MinHeap heap = new MinHeap(arr, numTempFiles);
      int count = 0;

      FileOutputStream fos = null;
      try {
          fos = new FileOutputStream(outputfile);
      } catch (FileNotFoundException e2) {
          // TODO Auto-generated catch block
          e2.printStackTrace();
      }

      while(count != numTempFiles) {
          MinHeapNode root = heap.getMin();

          byte[] bytes = ByteBuffer.allocate(4).order(ByteOrder.BIG_ENDIAN).putInt(root.element).array();
          try {
              fos.write(bytes);
          } catch (IOException e1) {
              e1.printStackTrace();
          }

          try {
              byte[] bytearr = new byte[4];
              read[root.i].read(bytearr);
              root.element = ByteBuffer.wrap(bytearr).order(ByteOrder.BIG_ENDIAN).getInt();
          } catch(Exception e) {
              root.element = Integer.MAX_VALUE;
              count++;
          }
          heap.replaceMin(root);
      }
      for(int i = 0; i < numTempFiles; i++)
          try{ read[i].close(); }
          catch(Exception e) { e.printStackTrace(); }

      try {
          fos.close();
      } catch (IOException e) {
          e.printStackTrace();
      }
  }```

Here are some runtimes for filesizes:
2 seconds and 231 milliseconds for 1,000,000 bytes
20 seconds and 330 milliseconds for 10,000,000 bytes
4 minutes, 36 seconds, 940 milliseconds for 100,000,000 bytes
35 minutes, 55 seconds, 642 milliseconds for 1,000,000,000 bytes

0 个答案:

没有答案