编写文件需要花费大量时间

时间:2017-07-21 21:05:31

标签: scala jena filewriter

我正在编写三个具有277270行aprox的TripleInts列表, 我的班级TripleInts如下:

class tripleInt  (var sub:Int, var pre:Int, var obj:Int)

另外,我使用RDF文件中的Apache Jena组件创建每个列表,我将RDF元素转换为ID,并将此ID存储在不同的列表中。获得列表后,我使用以下代码编写文件:

class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val listOSP:List[tripleInt] ){
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  val pl = listPSO.sortBy(l => (l.sub, l.pre))
  //val ol = listOSP.sortBy(l => (l.sub, l.pre))

  var y1:Int=0
  var y2:Int=0
  var y3:Int=0

  val fstream:FileWriter = new FileWriter("patSPO.dat")
  var out:BufferedWriter = new BufferedWriter(fstream)
  //val fstream:FileOutputStream = new FileOutputStream("patSPO.dat")
  //var out:ObjectOutputStream = new ObjectOutputStream(fstream)
  //out.writeObject(listSPO)
  val fstream2:FileWriter = new FileWriter("patPSO.dat")
  var out2:BufferedWriter = new BufferedWriter(fstream2)
  /*val fstream3:FileOutputStream = new FileOutputStream("patOSP.dat")
  var out3:BufferedOutputStream = new BufferedOutputStream(fstream3)*/

  for ( a <- 0 to sl.size-1){
    y1 = sl(a).sub
    y2 = sl(a).pre
    y3 = sl(a).obj
    out.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
  }
  for ( a <- 0 to pl.size-1){
    y1 = pl(a).sub
    y2 = pl(a).pre
    y3 = pl(a).obj
    out2.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
  }
  out.close()
  out2.close()

此过程需要30分钟aprox。我的电脑是16 Gb Ram,核心i7。然后我不明白为什么要花很多时间,有没有办法优化这种性能?

谢谢

1 个答案:

答案 0 :(得分:1)

是的,您需要明智地选择数据结构。 List用于顺序访问(Seq),而不是随机访问(IndexedSeq)。你正在做的是O(n ^ 2)因为索引大List s。以下应该更快(O(n),并且希望更容易阅读):

class Indexes (val listSPO: List[tripleInt], val listPSO: List[tripleInt], val listOSP: List[tripleInt] ){
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  val pl = listPSO.sortBy(l => (l.sub, l.pre))

  var y1:Int=0
  var y2:Int=0
  var y3:Int=0

  val fstream:FileWriter = new FileWriter("patSPO.dat")
  val out:BufferedWriter = new BufferedWriter(fstream)

  for (s <- sl){
    y1 = s.sub
    y2 = s.pre
    y3 = s.obj
    out.write(s"$y1,$y2,$y3\n"))
  }
  // TODO close in finally
  out.close()

  val fstream2:FileWriter = new FileWriter("patPSO.dat")
  val out2:BufferedWriter = new BufferedWriter(fstream2)

  for ( p <- pl){
    y1 = p.sub
    y2 = p.pre
    y3 = p.obj
    out2.write(s"$y1,$y2,$y3\n"))
  }
  // TODO close in finally
  out2.close()
}

(使用IndexedSeq / Vector作为输入不会有什么问题,但可能存在限制,因为在您的情况下首选List。)