如何使用Play Iteratees为进程的每个步骤按块读取和处理文件块

时间:2016-09-26 16:27:56

标签: scala playframework playframework-2.0 iterate

我正在使用Play框架Iteratee来读取文件。我想按块处理这个文件块(对于每一步)。

我撰写了以下步骤:

  • groupByLines: Enumeratee[Array[Byte], List[String]]
  • turnIntoLines: Enumeratee[List[String], List[Line]](我定义了case class Line(number: Int, value: String)
  • parseChunk: Iteratee[List[Line], Try[List[T]]](例如CSV解析)

要定义groupByLines,我需要使用Iteratee.fold将前一个块的最后一行连接到当前块的第一行。

问题是这会创建一个包含文件所有行的块。

但我想按块处理文件块。我的意思是groupByLines应该产生200行的块(例如)。

turnIntoLine出现同样的问题。我还使用fold来创建Line。我需要使用累加器(由fold提供)来压缩行号和行内容。

我是play iteratee的初学者。

这是我的代码:

val chunkSize = 1024 * 8

val enumerator: Enumerator[Array[Byte]] = Enumerator.fromFile(file, chunkSize)

def isLastChunk(chunk: Array[Byte]): Boolean = {
  chunk.length < chunkSize
}

val groupByLines: Enumeratee[Array[Byte], List[String]] = Enumeratee.grouped {
  println("groupByLines")
  Iteratee.fold[Array[Byte], (String, List[String])]("", List.empty) {
    case ((accLast, accLines), chunk) =>
      println("groupByLines chunk size " + chunk.length)
      new String(chunk)
        .trim
        .split("\n")
        .toList match {
        case lines  @ Cons(h, tail) =>
          val lineBetween2Chunks: String = accLast + h

          val goodLines =
            isLastChunk(chunk) match {
              case true  => Cons(lineBetween2Chunks, tail)
              case false => Cons(lineBetween2Chunks, tail).init
            }

          (lines.last, accLines ++ goodLines)
        case Nil => ("", accLines)
      }
  }.map(_._2)
}


val turnIntoLines: Enumeratee[List[String], List[Line]] = Enumeratee.grouped {
  println("turnIntoLines")
  Iteratee.fold[List[String], (Int, List[Line])](0, List.empty) {
    case ((index, accLines), chunk) =>
      println("turnIntoLines chunk size " + chunk.length)
      val lines =
        ((Stream from index) zip chunk).map {
          case (lineNumber, content) => Line(lineNumber, content)
        }.toList
      (index + chunk.length, lines ++ accLines)
  }.map(_._2)
}

1 个答案:

答案 0 :(得分:0)

这里的问题是,如何使用Play Iteratees逐行处理文件。

首先,使用changeValue:function(){ var selection=Ext.ComponentQuery.query('#grid2')[0].getSelectionModel().getSelection(); for (i = 0; i < selection.length; i++){ var index=Ext.ComponentQuery.query('#grid2')[0].getStore().indexOf(selection[i]); var record=Ext.ComponentQuery.query('#grid2')[0].getStore().getAt(index); record.set('ADJ',true); } Ext.ComponentQuery.query('#grid2')[0].getStore().commitChanges(); 读取文件,我使用了:

UTF-8

然后,将输入块拆分为行(在object EnumeratorAdditionalOperators { implicit def enumeratorAdditionalOperators(e: Enumerator.type): EnumeratorAdditionalOperators = new EnumeratorAdditionalOperators(e) } class EnumeratorAdditionalOperators(e: Enumerator.type) { def fromUTF8File(file: File, chunkSize: Int = 1024 * 8): Enumerator[String] = e.fromFile(file, chunkSize) .map(bytes => new String(bytes, Charset.forName("UTF-8"))) } 处切割):

'\n'

第三,要添加行号,我使用了https://github.com/michaelahlers/michaelahlers-playful/blob/master/src/main/scala/ahlers/michael/playful/iteratee/EnumerateeFactoryOps.scala中的一段代码。

object EnumerateeAdditionalOperators {
  implicit def enumerateeAdditionalOperators(e: Enumeratee.type): EnumerateeAdditionalOperators = new EnumerateeAdditionalOperators(e)
}

class EnumerateeAdditionalOperators(e: Enumeratee.type) {

  def splitToLines: Enumeratee[String, String] = e.grouped(
    Traversable.splitOnceAt[String,Char](_ != '\n')  &>>
      Iteratee.consume()
  )

}

请注意,我定义了对&#34;添加&#34; class EnumerateeAdditionalOperators(e: Enumeratee.type) { /** * As a complement to [[play.api.libs.iteratee.Enumeratee.heading]] and [[play.api.libs.iteratee.Enumeratee.trailing]], allows for inclusion of arbitrary elements between those from the producer. */ def joining[E](separators: Enumerator[E])(implicit ec: ExecutionContext): Enumeratee[E, E] = zipWithIndex[E] compose Enumeratee.mapInputFlatten[(E, Int)] { case Input.Empty => Enumerator.enumInput[E](Input.Empty) case Input.El((element, index)) if 0 < index => separators andThen Enumerator(element) case Input.El((element, _)) => Enumerator(element) case Input.EOF => Enumerator.enumInput[E](Input.EOF) } /** * Zips elements with an index of the given [[scala.math.Numeric]] type, stepped by the given function. * * (Special thanks to [[https://github.com/eecolor EECOLOR]] for inspiring this factory with his answer to [[https://stackoverflow.com/a/27589990/700420 a question about enumeratees on Stack Overflow]].) */ def zipWithIndex[E, I](first: I, step: I => I)(implicit ev: Numeric[I]): Enumeratee[E, (E, I)] = e.scanLeft[E](null.asInstanceOf[E] -> ev.minus(first, step(ev.zero))) { case ((_, index), value) => value -> step(index) } /** * Zips elements with an incrementing index of the given [[scala.math.Numeric]] type, adding one each time. */ def zipWithIndex[E, I](first: I)(implicit ev: Numeric[I]): Enumeratee[E, (E, I)] = zipWithIndex(first, ev.plus(_, ev.one)) /** * Zips elements with an incrementing index by the same contract [[scala.collection.GenIterableLike#zipWithIndex zipWithIndex]]. */ def zipWithIndex[E]: Enumeratee[E, (E, Int)] = zipWithIndex(0) // ... } Enumerator的方法。例如,这个技巧可以编写:Enumeratee

全部放在一起:

Enumerator.fromUTF8File(file)

新代码比问题中给出的代码简单明了。