Question

它是查找整数集合中第三大集合的函数。我这样称呼它：

val lineStream = thirdLargest(Source.fromFile("10m.txt").getLines.toIterable
val intStream = lineStream map { s => Integer.parseInt(s) }
thirdLargest(intStream)

文件10m.txt包含1000万行，每行都有一个随机整数。下面的thirdLargest函数在测试它们之后不应该保留任何整数，但它会导致JVM内存不足（在我的情况下大约90秒后）。

def thirdLargest(numbers: Iterable[Int]): Option[Int] = {
    def top3of4(top3: List[Int], fourth: Int) = top3 match {
        case List(a, b, c) =>
            if (fourth > c) List(b, c, fourth)
            else if (fourth > b) List(b, fourth, c)
            else if (fourth > a) List(fourth, b, c)
            else top3
    }

    @tailrec
    def find(top3: List[Int], rest: Iterable[Int]): Int = (top3, rest) match {
        case (List(a, b, c), Nil) => a
        case (top3, d #:: rest) => find(top3of4(top3, d), rest)
    }

    numbers match {
        case a #:: b #:: c #:: rest => Some(find(List[Int](a, b, c).sorted, rest))
        case _ => None
    }
}

Answer 1

OOM错误与您读取文件的方式无关。它完全没问题，甚至建议在这里使用Source.getLines。问题出在其他地方。

许多人对Scala Stream概念的性质感到困惑。事实上，这不是你想用来迭代事物的东西。它实际上是懒惰的，但它不会丢弃以前的结果 - 它们被记忆，所以在下一次使用时不需要再次重新计算它们（在你的情况下从来没有发生，但这就是你的记忆所在的地方）。另请参阅this answer。

考虑使用foldLeft。这是一个工作（但有意简化）的例子，用于说明目的：

val lines = Source.fromFile("10m.txt").getLines()

print(lines.map(_.toInt).foldLeft(-1 :: -1 :: -1 :: Nil) { (best3, next) =>
  (next :: best3).sorted.reverse.take(3)
})

为什么这个功能耗尽内存？

1 个答案: