Question

是否可以将函数式编程应用于Scala流，以便按顺序处理流，但流的已处理部分可以被垃圾收集？

例如，我定义了一个Stream，其中包含从start到end的数字：

def fromToStream(start: Int, end: Int) : Stream[Int] = {
  if (end < start) Stream.empty
  else start #:: fromToStream(start+1, end)
}

如果我以函数式总结值：

println(fromToStream(1,10000000).reduceLeft(_+_))

我得到OutOfMemoryError - 也许是因为对reduceLeft的调用的堆栈帧保存了对流的头部的引用。但如果我以迭代的方式做到这一点，它就可以了：

var sum = 0
for (i <- fromToStream(1,10000000)) {
  sum += i
}

有没有办法以功能样式执行此操作而不会获得OutOfMemory？

更新：现在已修复a bug in scala。所以这或多或少已经过时了。

Answer 1

当我开始学习Stream时，我觉得这很酷。然后我意识到Iterator几乎一直是我想要使用的。

如果您确实需要Stream但希望reduceLeft能够工作：

fromToStream(1,10000000).toIterator.reduceLeft(_ + _)

如果您尝试上面的行，它将垃圾收集就好了。我发现使用Stream非常棘手，因为它很容易保持头部而不会意识到它。有时标准的lib会以非常微妙的方式为你保留它。

Answer 2

是的，你可以。诀窍是使用尾递归方法，以便本地堆栈帧包含对Stream实例的唯一引用。由于该方法是尾递归的，因此一旦递归调用自身，对先前Stream头的本地引用将被删除，从而使GC能够在您前进时收集Stream的开头。

Welcome to Scala version 2.9.0.r23459-b20101108091606 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import collection.immutable.Stream
import collection.immutable.Stream

scala> import annotation.tailrec
import annotation.tailrec

scala> @tailrec def last(s: Stream[Int]): Int = if (s.tail.isEmpty) s.head else last(s.tail)
last: (s: scala.collection.immutable.Stream[Int])Int

scala> last(Stream.range(0, 100000000))                                                                             
res2: Int = 99999999

此外，您必须确保传递给上面方法last的内容在堆栈上只有一个引用。如果将Stream存储到局部变量或值中，则在调用last方法时不会对其进行垃圾回收，因为其参数不是Stream的唯一引用。下面的代码内存不足。

scala> val s = Stream.range(0, 100000000)                                                                           
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)                                                            

scala> last(s)                                                                                                      
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space                                              
        at sun.net.www.ParseUtil.encodePath(ParseUtil.java:84)                                                      
        at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:674)                                     
        at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:759)                                       
        at sun.misc.URLClassPath.getResource(URLClassPath.java:169)                                                 
        at java.net.URLClassLoader$1.run(URLClassLoader.java:194)                                                   
        at java.security.AccessController.doPrivileged(Native Method)                                               
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)                                               
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)                                                    
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)                                            
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)                                                    
        at scala.tools.nsc.Interpreter$Request$$anonfun$onErr$1$1.apply(Interpreter.scala:978)                      
        at scala.tools.nsc.Interpreter$Request$$anonfun$onErr$1$1.apply(Interpreter.scala:976)                      
        at scala.util.control.Exception$Catch.apply(Exception.scala:80)
        at scala.tools.nsc.Interpreter$Request.loadAndRun(Interpreter.scala:984)                                    
        at scala.tools.nsc.Interpreter.loadAndRunReq$1(Interpreter.scala:579)                                       
        at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:599)                                             
        at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:576)
        at scala.tools.nsc.InterpreterLoop.reallyInterpret$1(InterpreterLoop.scala:472)                             
        at scala.tools.nsc.InterpreterLoop.interpretStartingWith(InterpreterLoop.scala:515)                         
        at scala.tools.nsc.InterpreterLoop.command(InterpreterLoop.scala:362)
        at scala.tools.nsc.InterpreterLoop.processLine$1(InterpreterLoop.scala:243)
        at scala.tools.nsc.InterpreterLoop.repl(InterpreterLoop.scala:249)
        at scala.tools.nsc.InterpreterLoop.main(InterpreterLoop.scala:559)
        at scala.tools.nsc.MainGenericRunner$.process(MainGenericRunner.scala:75)
        at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:31)
        at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

总结：

使用尾递归方法
将它们注释为tail-recursive
调用它们时，请确保其参数是对Stream

编辑：

请注意，这也有效，并且不会导致内存不足错误：

scala> def s = Stream.range(0, 100000000)                                                   
s: scala.collection.immutable.Stream[Int]

scala> last(s)                                                                              
res1: Int = 99999999

EDIT2：

~~在你需要的reduceLeft的情况下，你必须为结果定义一个带有累加器参数的辅助方法。~~

对于reduceLeft，您需要一个accumulator参数，您可以使用默认参数将其设置为某个值。一个简化的例子：

scala> @tailrec def rcl(s: Stream[Int], acc: Int = 0): Int = if (s.isEmpty) acc else rcl(s.tail, acc + s.head)
rcl: (s: scala.collection.immutable.Stream[Int],acc: Int)Int

scala> rcl(Stream.range(0, 10000000))
res6: Int = -2014260032

Answer 3

您可能需要查看Scalaz的ephemeral streams。

Answer 4

事实证明，在reduceLeft的当前实现中，这是a bug。问题是reduceLeft调用foldLeft，因此reduceLeft的堆栈帧在整个调用期间保持对流的头部的引用。 foldLeft使用尾递归来避免这个问题。比较：

(1 to 10000000).toStream.foldLeft(0)(_+_)
(1 to 10000000).toStream.reduceLeft(_+_)

这些在语义上是等价的。在Scala版本2.8.0中，对foldLeft的调用有效，但对reduceLeft的调用会抛出OutOfMemory。如果reduceLeft会自己做，那么就不会出现这个问题。

Scala流的功能处理没有OutOfMemory错误

4 个答案: