如何展平使用I / O的嵌套For Comprehension?

时间:2011-09-07 13:22:31

标签: scala for-loop scala-collections

我无法将嵌套的For Generator扁平化为单个For Generator。

我创建了 MapSerializer 来保存和加载地图。

MapSerializer.scala 的列表:

import java.io.{ObjectInputStream, ObjectOutputStream}

object MapSerializer {
  def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
    (for (_ <- 1 to in.readInt()) yield {
      val key = in.readUTF()
      for (_ <- 1 to in.readInt()) yield {
        val value = in.readInt()
        (key, value)
      }
    }).flatten.groupBy(_ _1).mapValues(_ map(_ _2))

  def saveMap(out: ObjectOutputStream, map: Map[String, Seq[Int]]) {
    out.writeInt(map size)
    for ((key, values) <- map) {
      out.writeUTF(key)
      out.writeInt(values size)
      values.foreach(out.writeInt(_))
    }
  }
}

修改 loadMap 以在生成器中分配会导致其失败:

def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
  (for (_ <- 1 to in.readInt();
        key = in.readUTF()) yield {
    for (_ <- 1 to in.readInt()) yield {
      val value = in.readInt()
      (key, value)
    }
  }).flatten.groupBy(_ _1).mapValues(_ map(_ _2))

这是我得到的堆栈跟踪:

java.io.UTFDataFormatException
    at java.io.ObjectInputStream$BlockDataInputStream.readWholeUTFSpan(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readOpUTFSpan(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readWholeUTFSpan(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
    at java.io.ObjectInputStream.readUTF(ObjectInputStream.java:1050)
    at MapSerializer$$anonfun$loadMap$1.apply(MapSerializer.scala:8)
    at MapSerializer$$anonfun$loadMap$1.apply(MapSerializer.scala:7)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
    at scala.collection.immutable.Range.foreach(Range.scala:76)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
    at scala.collection.immutable.Range.map(Range.scala:43)
    at MapSerializer$.loadMap(MapSerializer.scala:7)

我想将加载代码压缩为单个For Comprehension,但是我得到的错误表明它正在以不同的顺序执行或重复我不希望重复的步骤。

为什么将的分配移动到生成器会导致它失败?

我可以将它压平成一个发电机吗?如果是这样,那个发电机会是什么?

2 个答案:

答案 0 :(得分:6)

感谢您在问题中提供自包含的编译代码。由于结构不平坦,我认为你不想弄平环。然后,您需要使用groupBy来恢复结构。此外,如果您将“零 - &gt; Seq()”作为地图的元素,它将会丢失。使用这个简单的映射可以避免groupBy并保留映射到空序列的元素:

def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] = {
  val size = in.readInt
  (1 to size).map{ _ =>
    val key = in.readUTF
    val nval = in.readInt
    key -> (1 to nval).map(_ => in.readInt)
  }(collection.breakOut)
}

我使用breakOut生成正确的类型,否则我认为编译器会抱怨泛型Map和不可变Map不匹配。您也可以使用Map() ++ (...)

注意:我通过你的for循环混淆并开始使用flatMap和map重写来达到这个解决方案:

val tuples = (1 to size).flatMap{ _ =>
  val key = in.readUTF
  println("key " + key)
  val nval = in.readInt
  (1 to nval).map(_ => key -> in.readInt)
}

我认为在for循环中,当你不使用某些生成器时会发生一些事情。我虽然这相当于:

val tuples = for {
  _ <- 1 to size
  key = in.readUTF
  nval = in.readInt
  _ <- 1 to nval
  value = in.readInt
} yield { key -> value }

但事实并非如此,所以我觉得我在翻译中遗漏了一些东西。

编辑:弄清楚单个for循环有什么问题。简短的说明:for循环中定义的转换导致在执行内循环之前连续调用key = in.readUTF语句。要解决此问题,请使用viewforce

val tuples = (for {
  _ <- (1 to size).view
  key = in.readUTF
  nval = in.readInt
  _ <- 1 to nval
  value = in.readInt
} yield { key -> value }).force

使用这段代码可以更清楚地说明问题:

val iter = Iterator.from(1)
val tuple = for {
  _ <- 1 to 3
  outer = iter.next
  _ <- 1 to 3
  inner = iter.next
} yield (outer, inner)

返回Vector((1,4), (1,5), (1,6), (2,7), (2,8), (2,9), (3,10), (3,11), (3,12)),表示在内部值之前评估所有外部值。这是因为它或多或少translated类似于:

for { 
  (i, outer) <- for (i <- (1 to 3)) yield (i, iter.next)
  _ <- 1 to 3
 inner = iter.next
} yield (outer, inner)

首先计算所有外部iter.next。回到原始用例,所有in.readUTF值将在in.readInt之前连续调用。

答案 1 :(得分:1)

以下是我最终部署的@ huynhjl答案的压缩版本:

def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
  ((1 to in.readInt()) map { _ =>
    in.readUTF() -> ((1 to in.readInt()) map { _ => in.readInt()) }
  })(collection.breakOut)

此版本的优点是没有直接分配。