我正在编写三个具有277270行aprox的TripleInts列表, 我的班级TripleInts如下:
class tripleInt (var sub:Int, var pre:Int, var obj:Int)
另外,我使用RDF文件中的Apache Jena组件创建每个列表,我将RDF元素转换为ID,并将此ID存储在不同的列表中。获得列表后,我使用以下代码编写文件:
class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val listOSP:List[tripleInt] ){
val sl = listSPO.sortBy(l => (l.sub, l.pre))
val pl = listPSO.sortBy(l => (l.sub, l.pre))
//val ol = listOSP.sortBy(l => (l.sub, l.pre))
var y1:Int=0
var y2:Int=0
var y3:Int=0
val fstream:FileWriter = new FileWriter("patSPO.dat")
var out:BufferedWriter = new BufferedWriter(fstream)
//val fstream:FileOutputStream = new FileOutputStream("patSPO.dat")
//var out:ObjectOutputStream = new ObjectOutputStream(fstream)
//out.writeObject(listSPO)
val fstream2:FileWriter = new FileWriter("patPSO.dat")
var out2:BufferedWriter = new BufferedWriter(fstream2)
/*val fstream3:FileOutputStream = new FileOutputStream("patOSP.dat")
var out3:BufferedOutputStream = new BufferedOutputStream(fstream3)*/
for ( a <- 0 to sl.size-1){
y1 = sl(a).sub
y2 = sl(a).pre
y3 = sl(a).obj
out.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
}
for ( a <- 0 to pl.size-1){
y1 = pl(a).sub
y2 = pl(a).pre
y3 = pl(a).obj
out2.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
}
out.close()
out2.close()
此过程需要30分钟aprox。我的电脑是16 Gb Ram,核心i7。然后我不明白为什么要花很多时间,有没有办法优化这种性能?
谢谢
答案 0 :(得分:1)
是的,您需要明智地选择数据结构。 List
用于顺序访问(Seq
),而不是随机访问(IndexedSeq
)。你正在做的是O(n ^ 2)因为索引大List
s。以下应该更快(O(n),并且希望更容易阅读):
class Indexes (val listSPO: List[tripleInt], val listPSO: List[tripleInt], val listOSP: List[tripleInt] ){
val sl = listSPO.sortBy(l => (l.sub, l.pre))
val pl = listPSO.sortBy(l => (l.sub, l.pre))
var y1:Int=0
var y2:Int=0
var y3:Int=0
val fstream:FileWriter = new FileWriter("patSPO.dat")
val out:BufferedWriter = new BufferedWriter(fstream)
for (s <- sl){
y1 = s.sub
y2 = s.pre
y3 = s.obj
out.write(s"$y1,$y2,$y3\n"))
}
// TODO close in finally
out.close()
val fstream2:FileWriter = new FileWriter("patPSO.dat")
val out2:BufferedWriter = new BufferedWriter(fstream2)
for ( p <- pl){
y1 = p.sub
y2 = p.pre
y3 = p.obj
out2.write(s"$y1,$y2,$y3\n"))
}
// TODO close in finally
out2.close()
}
(使用IndexedSeq
/ Vector
作为输入不会有什么问题,但可能存在限制,因为在您的情况下首选List
。)