拆分scala中的几个文件中的列表

时间:2017-07-05 15:32:01

标签: scala file

我有以下列表,

10,44,22
10,47,12
15,38,3
15,41,30
16,44,15
16,47,18
22,38,21
22,41,42
34,44,40
34,47,36
40,38,39
40,41,42
45,38,27
45,41,30
46,44,45
46,47,48

然后我创建一个文件,内容包含以下代码:

  val fstream:FileWriter = new FileWriter("patSPO.csv")
  var out:BufferedWriter = new BufferedWriter(fstream)

  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  for ( a <- 0 to listSPO.size-1){
    out.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
  }
  out.close()

但是我想将内容分成n个文件,然后我尝试以下4个文件:

  val fstream:FileWriter = new FileWriter("patSPO.csv")
  val fstream1:FileWriter = new FileWriter("patSPO1.csv")
  val fstream2:FileWriter = new FileWriter("patSPO2.csv")
  val fstream3:FileWriter = new FileWriter("patSPO3.csv")
  val fstream4:FileWriter = new FileWriter("patSPO4.csv")
  var out:BufferedWriter = new BufferedWriter(fstream)
  var out1:BufferedWriter = new BufferedWriter(fstream1)
  var out2:BufferedWriter = new BufferedWriter(fstream2)
  var out3:BufferedWriter = new BufferedWriter(fstream3)
  var out4:BufferedWriter = new BufferedWriter(fstream4)
  val b :Int = listSPO.size/4
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  for ( a <- 0 to listSPO.size-1){
    out.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
  }
  for ( a <- 0 to b-1){
    out1.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
  }
  for ( a <- b to (b*2)-1){
    out2.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
  }
  for ( a <- b*2 to (b*3)-1){
    out3.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
  }
  for ( a <- b*3 to (b*4)-1){
    out4.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
  }

  out.close()
  out1.close()
  out2.close()
  out3.close()
  out4.close() 

然后我的问题是,如果存在一个通用代码,我将生成的文件数量,例如32,而不是写出out,for和fstream的32倍?

2 个答案:

答案 0 :(得分:0)

假设列表中的对象具有此类型(否则,在以下代码中使用您的类型重定位SPO):

case class SPO(sub: Int, pre: Int, obj: Int) 

这应该这样做:

val sl = listSPO.sortBy(l => (l.sub, l.pre))

val files = 5 // whatever number you want

// helper function to write a single list - similar to your implementation but shorter
def writeListToFile(name: String, list: List[SPO]): Unit = {
  val writer = new BufferedWriter(new FileWriter(name))
  list.foreach(spo => writer.write(s"${spo.sub},${spo.pre},${spo.obj}\n"))
  writer.close()
}

sl.grouped(sl.size / files) // split into sublists
  .zipWithIndex             // add index of sublists for file name
  .foreach {
    case (sublist, 0) => writeListToFile(s"pasSPO.csv", sublist) // if indeed you want the first file name NOT to include the index
    case (sublist, index) => writeListToFile(s"pasSPO$index.csv", sublist)
  }

答案 1 :(得分:0)

首先,让我们做一些实用工具来消除丑陋的样板:

def open(
 number: Int = 1,
 prefix: String,
 ext: String
) = (0 to n-1)
  .iterator
  .map { i => 
   "%s%s%s".format(prefix, if(i == 0) "" else i.toString, postfix)
  }.map(new FileWriter(_))
   .map(new BufferedWriter(_))


def toString(l: WhateverTheType) = Seq(
  l.sub,
  l.pre,
  l.obj
).mkString(",") + "\n"

现在执行:

writeToFiles(
  listSPO: List[WhateverTheType], 
  numFiles: Int = 1,
  prefix: String = "pasSPO",
 ext: String = ".csv"
) = listSPO
  .grouped((listSPO.size+1)/numFiles)
  .zip(open(numFiles, prefix, ext))
  .foreach { case (input, file) => 
    try { 
     input.foreach(file.write(toString(input)))
    } finally {
      file.close
    }
  }