从不可变Map中提取子集时的性能问题

时间:2016-03-02 16:45:00

标签: performance scala

我根据自己的需要编写了一个函数,从现有的Map [String,List [Double]]中获取子集。这是代码:

object Myobj {
  val globalPointsByChannel = 5120
  val numberOfWindow = 125
  val numOfPoints2Remove = 41
  val recordFrequency = 1024
  val windowDuration = 3000
  val localWindowSize = 3072

  def subwindow
    (
      start : Long,
      m : Map[String, List[Double]],
      nxtWindow : Map[String, List[Double]],
      left : Int,
      right : Int
    ) : List[String] = {
    for {
      windowIndex <- (0 until numberOfWindow)
    } yield {
      // in microseconds
      val startime = (start * 1000) + (math round ((windowIndex * (numOfPoints2Remove / recordFrequency) * 10e6))).toLong
      // in microseconds
      val endtime = startime + (windowDuration * 10e3).toLong
      val data : Map[String, List[Double]] = {
        for ((key, values) <- m) yield {
          // the number of points to remove at the left side from the window
          val from = windowIndex * numOfPoints2Remove
          // the number of points to remove at the right side from the window
          val until = localWindowSize + from
          val t0 = System.currentTimeMillis
          val channelData : List[Double] = values.drop(from).take(globalPointsByChannel - until - 1)
          val t1 = System.currentTimeMillis
          channelData match {
            case cd if cd.size >= localWindowSize =>
              (key -> channelData)
            case _ =>
              val numOfMissingPts = localWindowSize - channelData.size
              val missingPoints  = nxtWindow(key) take numOfMissingPts
              val completePoints : List[Double] = List(channelData,missingPoints).flatten
              (key -> completePoints)
          }
        }
      }
      val strData = data.values map (values => "[" + (values mkString(",")) + "]") mkString(",")
      strData 
    }
  }.toList
}

我认为我的代码中有两个部分需要时间。第一部分:

val channelData : List[Double] = values.drop(from).take(globalPointsByChannel - until - 1)

我正在为每个窗口的Map中的每个值执行此操作

第二部分:

val numOfMissingPts = localWindowSize - channelData.size
          val missingPoints  = nxtWindow(key) take numOfMissingPts
          val completePoints : List[Double] = List(channelData,missingPoints).flatten
          (key -> completePoints)

这种情况的匹配程度低于第一种情况,但同样需要时间。

例如,当我执行此代码10次时,它需要(平均说话)9 000毫秒。我是否可以就如何更改这两段代码提出一些建议,使其比现在更快?

1 个答案:

答案 0 :(得分:1)

取决于您使用ParallelCollection的硬件可能对您有用: http://docs.scala-lang.org/overviews/parallel-collections/overview.html

好处是你可以很容易地将它添加到你的代码中。只需在.par上调用Map - 方法。

Map(1 -> "a", 2 -> "b").par
res0: scala.collection.parallel.immutable.ParMap[Int,String] = ParMap(1 -> a, 2 -> b)