嗨〜我有200个文件,我想在scala中处理并行文件(读写)。
所以,我尝试了几种方法。
首先,我使用par
方法处理并行文件。
这是我的代码。
val path = new Path("[file directory path]") //Here is 200 text files.
val fileSystem = FileSystem.get(new Configuration())
val status = fileSystem.listStatus(path)
status.par.foreach{x =>
val stream = fileSystem.open(x.getPath) //Each text file open.
def readLines = Stream.cons(stream.readLine, Stream.continually(stream.readLine))
readLines.takeWhile(_ != null).foreach{ line =>
//I want to write several file in here using PrintWriter.
val f = new File(outputName)
if(f.exists() && !f.isDirectory()){
out = new PrintWriter(new FileOutputStream(new File(outputName), true))
out.append(line+"\n")
out.close
}else{
out = new PrintWriter(outputName)
out.println(line)
out.close
}
}
}
当我运行此代码时,出现“太多打开文件”的错误。
所以,我将ulimit -n 1024更改为65536.但它没有用。
其次,我使用了Thread
方法。
val path = new Path("[file directory path]") //Here is 200 text files.
val fileSystem = FileSystem.get(new Configuration())
val status = fileSystem.listStatus(path)
status.foreach{x =>
val thread = new Thread{
override def run{
val stream = fileSystem.open(x.getPath) //Each text file open.
def readLines = Stream.cons(stream.readLine, Stream.continually(stream.readLine))
readLines.takeWhile(_ != null).foreach{ line =>
//I want to write several file in here using PrintWriter.
val f = new File(outputName)
if(f.exists() && !f.isDirectory()){
out = new PrintWriter(new FileOutputStream(new File(outputName), true))
out.append(line+"\n")
out.close
}else{
out = new PrintWriter(outputName)
out.println(line)
out.close
}
}
}
thread.start
Thread.sleep(50)
}
}
它也给了我一个错误"$$anonfun$main$2$$anonfun$apply$mcVI$sp$3$$anonfun$apply$mcV$sp$2$$anon$1$$anonfun$run$2$$anonfun$apply$4.apply"
。
我的代码中有什么问题?
实际上,我不知道该怎么做。我尝试了所有我知道的事情。
有什么想法吗?