如何在scala中使用并行文件处理?

时间:2017-01-03 15:15:21

标签: scala file parallel-processing

嗨〜我有200个文件,我想在scala中处理并行文件(读写)。

所以,我尝试了几种方法。

首先,我使用par方法处理并行文件。

这是我的代码。

val path = new Path("[file directory path]")    //Here is 200 text files.
val fileSystem = FileSystem.get(new Configuration())
val status = fileSystem.listStatus(path)
status.par.foreach{x =>
val stream = fileSystem.open(x.getPath) //Each text file open.
def readLines = Stream.cons(stream.readLine, Stream.continually(stream.readLine))
readLines.takeWhile(_ != null).foreach{ line => 

        //I want to write several file in here using PrintWriter.

        val f = new File(outputName)
        if(f.exists() && !f.isDirectory()){
        out = new PrintWriter(new FileOutputStream(new File(outputName), true))
        out.append(line+"\n")
        out.close
        }else{
        out = new PrintWriter(outputName)
        out.println(line)                  
        out.close
        }
}
}

当我运行此代码时,出现“太多打开文件”的错误。

所以,我将ulimit -n 1024更改为65536.但它没有用。

其次,我使用了Thread方法。

val path = new Path("[file directory path]")    //Here is 200 text files.
val fileSystem = FileSystem.get(new Configuration())
val status = fileSystem.listStatus(path)
    status.foreach{x =>
      val thread = new Thread{
        override def run{
            val stream = fileSystem.open(x.getPath) //Each text file open.
            def readLines = Stream.cons(stream.readLine, Stream.continually(stream.readLine))
            readLines.takeWhile(_ != null).foreach{ line => 

            //I want to write several file in here using PrintWriter.

            val f = new File(outputName)
            if(f.exists() && !f.isDirectory()){
            out = new PrintWriter(new FileOutputStream(new File(outputName), true))
            out.append(line+"\n")
            out.close
            }else{
            out = new PrintWriter(outputName)
            out.println(line)                  
            out.close
            }
         }
      }
      thread.start
      Thread.sleep(50)
    }
}

它也给了我一个错误"$$anonfun$main$2$$anonfun$apply$mcVI$sp$3$$anonfun$apply$mcV$sp$2$$anon$1$$anonfun$run$2$$anonfun$apply$4.apply"

我的代码中有什么问题?

实际上,我不知道该怎么做。我尝试了所有我知道的事情。

有什么想法吗?

0 个答案:

没有答案