Question

Scala流是懒惰的，意味着他们按需计算值并记住它们。如果流处理的流非常大（可能是无限的）并且不适合存储器，则这是有问题的。

我想要的是拥有一个方法来转发部分字符串而不保留引用，例如：

def fun(stream: Stream[Int]) = {
  val x = doSomethingWithPrefix(stream.take(10).toList)
  val y = doSomethingWithRestOfStream(stream.drop(10))
  computeResult(x,y)
}

但这可以产生OOM：

scala> def ones = Stream.continually(1)
ones: scala.collection.immutable.Stream[Int]

scala> def f1(stream: Stream[Int]) = {
     |   stream.take(10).toList
     |   stream.drop(10).length
     | }
f1: (stream: Stream[Int])Int

scala> println(f1(ones.take(100000000)))
java.lang.OutOfMemoryError: GC overhead limit exceeded
// ... stacktrace ...

在许多地方建议的解决方案（例如here，第3页）是使用scala的名字传递，它创建了一个无参数函数，可以对其进行求值以获得实际参数。但是这个解决方案在这里也不好，因为那时函数会被评估两次：

scala> def f2(stream: => Stream[Int]) = {
     |   stream.take(10).toList
     |   stream.drop(10).length
     | }
f2: (stream: => Stream[Int])Int

scala> def makeOnes = {
     |   println("ha")
     |   ones.take(100000000)
     | }
makeOnes: scala.collection.immutable.Stream[Int]

scala> println(f2(makeOnes))
ha
ha
99999990

我现在唯一的解决方法是手动内联doSomethingWithRestOfStream函数，例如：

scala> def f3(stream: => Stream[Int]) = {
     |   var str: Stream[Int] = stream
     |   str.take(10).toList
     |   str = str.drop(10)
     |   var len: Int = 0
     |   while (!str.isEmpty) {
     |     str = str.tail
     |     len = len+1
     |   }
     |   len
     | }
f3: (stream: => Stream[Int])Int

scala> println(f3(makeOnes))
ha
99999990

有没有更好的解决方案？

此外，如果scala具有可重新绑定的函数参数（does not），则可以通过使用普通Stream[Int]代替=> Stream[Int]并使用stream来简化此变通方法str的地方。

Scala Streams：通过函数转发时如何避免内存泄漏？

0 个答案: