Question

使用以下代码从单个HDFS文件读取行，我正在使用here中所述的借用方法：

对于import org.apache.hadoop.fs._，以下代码使用fs: FileSystem和path: Path。这些函数实际上是包装在类（定义文件系统fs）中的方法。

  /**
    * @param param close-able method
    * @param action data manipulation
    * @return action's return value is bubbled up
    */

  private def using[A <: { def close(): Unit }, B](param: A)(action: A => B): B =
    try {
      action(param)
    } finally {
      param.close()
    }

  /**
    * Open - read - close file while returning file lines
    * 
    * @param path where are file is stored
    * @return array of lines in file  
    */

  def readFileByLine(path: Path): Array[String] = {
    using(fs.open(path)) { fileInputStream => {
      using(Source.fromInputStream(fileInputStream)) { bufferedSource => {
        (for (line <- bufferedSource.getLines()) yield line).toArray
      }
      }
    }
    }
  }

 /**
* Open - read - close file while returning file lines
* 
* @param path where are file is stored
* @return array of lines in file  
*/

  def readWholeFile(path: Path): Array[String] =
    using(fs.open(path)) { inputStream => {
      IOUtils.toString(inputStream, "UTF-8").split("\n")
    }
    }

这两种方法似乎要达到相同的目标需要做两种不同的事情-从HDFS文件中读取行并返回字符串数组。

鉴于这些文件很小，哪种读取方法将被视为标准Scala？这两种方法之间的权衡是什么？

已添加

我认为所有这些都可以编译为以下方法：

  def readWholeFile(fs: FileSystem, path: Path): Array[String] = {
    var inputStream: FSDataInputStream = null
    try {
      inputStream = fs.open(path)
      IOUtils.toString(inputStream, "UTF-8").split("\n")
    } finally {
      inputStream.close()
    }
  }


  def readFileByLine(fs: FileSystem, path: Path): Array[String] = {
    var fileInputStream : FSDataInputStream = null
    var bufferedSource :  scala.io.BufferedSource = null
    try {
      fileInputStream = fs.open(path)
      bufferedSource = Source.fromInputStream(fileInputStream)
      (for (line <- bufferedSource.getLines()) yield line).toArray
    } finally {
      bufferedSource.close
      fileInputStream.close()
    }
  }

也许这些更容易阅读和精简为使用更少堆栈，编译时间，运行持续时间的代码...

HDFS文件中的读取行的更轻松方法

0 个答案: