Question

对于Project Euler problem 14，我创建了这个答案：

import scala.collection.immutable.LongMap

object LongestCollatzSequencePerso {
  def nextElem(n: Long): Long = n match {
    case x if n % 2 == 0 => n / 2
    case _ => 3 * n + 1
  }

  def funcVal(acc: LongMap[Long], n: Long): LongMap[Long] = {
    if (acc.contains(n)) {
      return acc
    }
    else {
      val nNext = nextElem(n)
      val size = funcVal(acc, nNext)(nNext) + 1      
      return acc + (n -> size)
    }
  }

  def main = {
    val max = 1000000L
    val allVal = (1L to max).foldLeft(LongMap(1L -> 1L))(funcVal)
    println(allVal.filter(_._1 < max).maxBy(_._2)._1)
  }
}

我使用一个不可变的LongMap来缓存到目前为止计算的每个结果，以便在必须返回时立即停止递归调用。我的代码很慢，我无法得到结果。

现在这段代码取自Internet，不会缓存任何内容：

object LongestCollatzSequenceWeb {  
  def from(n: Long, c: Int = 0): Int = if (n == 1) c + 1 else
    from(if (n % 2 == 0) n / 2 else 3 * n + 1, c + 1)

  val r = (1 until 1000000).view
                           .map(n => (n, from(n)))
                           .reduceLeft((a, b) => if (a._2 > b._2) a else b)
                           ._1

  def main = println(r)
}

但它的运行速度足以在短时间内得到正确答案。

为什么我的缓存版本这么慢？我知道缓存会产生自己的开销，但我希望无论如何都能在合理的时间内获得结果。你是否看到了一种可以在保持一切不变的同时提高性能的方法？

我还创建了这个尾递归版（如答案所示），但它也很慢：

import scala.annotation.tailrec
import scala.collection.immutable.LongMap

object LongestCollatzSequenceTailRec {
  def nextElem(n: Long): Long = n match {
    case x if n % 2 == 0 => n / 2
    case _ => 3 * n + 1
  }

  @tailrec
  def funcVal(acc: (List[Long], LongMap[Long]), n: Long): (List[Long], LongMap[Long]) = {
    val (previous, dic) = acc
    if (dic.contains(n)) {
      val disN = dic(n)
      val dis = disN + 1 to disN + previous.length
      return (Nil, dic ++ previous.zip(dis))
    }
    else {   
      return funcVal((n :: previous, dic), nextElem(n))
    }
  }

  def main = {
    val max = 1000000L
    val allVal = (1L to max).foldLeft((List[Long](), LongMap(1L -> 1L)))(funcVal)
    println(allVal._2.filter(_._1 < max).maxBy(_._2)._1)
  }
}

Answer 1

迭代＃1 ：初始实施

事实证明，您发布的版本不是尾递归，只需向@tailrec方法添加funcVal注释，并看到它不会编译，因为递归调用不在尾部位置。

相反，from 中的LongestCollatzSequenceWeb方法尾递归（也通过添加@tailrec进行检查）。

现在我们尝试将苹果与橙子或递归方法性能与迭代方法进行比较：）

迭代次数＃2 ：修改@tailrec之后，您应该清楚地看到您创建了大量内存。让我们通过简单的记录垃圾收集时间来证明：

val scheduler = Executors.newScheduledThreadPool(1)
scheduler.scheduleAtFixedRate(new Runnable {

  override def run(): Unit = {

    val totalTime = ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum
    println("Spent time for GC: " + totalTime)
  }
}, 0, 5, TimeUnit.SECONDS)

让我们使用val max = 1000000L运行您的代码。我们不会等待它永久停止，但我们会看到以下内容：

Spent time for GC: 0
Spent time for GC: 47
Spent time for GC: 67
Spent time for GC: 107
Spent time for GC: 157
Spent time for GC: 201
......................
Spent time for GC: 940
Spent time for GC: 988
Spent time for GC: 1034
......................

很快，我们最终在GC上浪费了超过 1秒！此外，它表明垃圾收集频繁发生。

相反，让我们试试'代码来自互联网' 100000000限制（比原来的100倍）：

Spent time for GC: 55
Spent time for GC: 58
Spent time for GC: 60
Spent time for GC: 64

正如您所看到的，我们减少了垃圾收集（因为我们分配了更少的内存），并且增长速度较慢（与前一个示例中的+2 - +4 millis相比，每5秒+40 - +70。< / p>

希望它有助于明确指出当前解决方案中的缺陷。

Answer 2

在这种算法中使用不可变结构毫无意义。只需更换

import scala.collection.immutable.LongMap

带

import scala.collection.mutable.LongMap

并更改

return (Nil, dic ++ previous.zip(dis))

到

return (Nil, dic ++= previous.zip(dis))

你将看到巨大的差异。 dict的大小约为2M，因此您需要重新分配200万次（自缓存previous后稍微减少，但仍然绰绰有余）。不值得。

Scala慢哈希映射

2 个答案: