Scala Group是否保留了插入顺序?

时间:2012-03-07 01:13:57

标签: scala collections map hashmap

Lists,Maps等中的groupBy方法在函数后生成Map。

有没有办法使用groupBy生成一个保留插入顺序的Map(例如LinkedHashMap)?

我正在使用for循环来手动插入,但我想知道其中一个有用的已定义函数是否可以帮助我。

提前致谢。

4 个答案:

答案 0 :(得分:21)

groupBy上定义的

TraversableLike生成immutable.Map,因此您无法使此方法产生其他内容。

每个条目中元素的顺序已经保留,但不是键的顺序。键是提供的功能的结果,因此它们实际上没有订单。

如果您想根据特定键的第一次出现进行订单,请参阅下面的草图,了解如何执行此操作。假设我们想要按值/ 2分组整数:

val m = List(4, 0, 5, 1, 2, 6, 3).zipWithIndex groupBy (_._1 / 2)
val lhm = LinkedHashMap(m.toSeq sortBy (_._2.head._2): _*)
lhm mapValues (_ map (_._1))
// Map(2 -> List(4, 5), 0 -> List(0, 1), 1 -> List(2, 3), 3 -> List(6))
// Note order of keys is same as first occurrence in original list

答案 1 :(得分:19)

以下内容将为您提供一种groupByOrdered方法,其行为符合您的要求。

import collection.mutable.{LinkedHashMap, LinkedHashSet, Map => MutableMap}

object GroupByOrderedImplicit {
  implicit class GroupByOrderedImplicitImpl[A](val t: Traversable[A]) extends AnyVal {
    def groupByOrdered[K](f: A => K): MutableMap[K, LinkedHashSet[A]] = {
      val map = LinkedHashMap[K,LinkedHashSet[A]]().withDefault(_ => LinkedHashSet[A]())
      for (i <- t) {
        val key = f(i)
        map(key) = map(key) + i
      }
      map
    }
  }
}

当我使用以下代码时:

import GroupByOrderedImplicit._
0.to(100).groupByOrdered(_ % 10).foreach(println)

我得到以下输出:

(0,Set(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100))
(1,Set(1, 11, 21, 31, 41, 51, 61, 71, 81, 91))
(2,Set(2, 12, 22, 32, 42, 52, 62, 72, 82, 92))
(3,Set(3, 13, 23, 33, 43, 53, 63, 73, 83, 93))
(4,Set(4, 14, 24, 34, 44, 54, 64, 74, 84, 94))
(5,Set(5, 15, 25, 35, 45, 55, 65, 75, 85, 95))
(6,Set(6, 16, 26, 36, 46, 56, 66, 76, 86, 96))
(7,Set(7, 17, 27, 37, 47, 57, 67, 77, 87, 97))
(8,Set(8, 18, 28, 38, 48, 58, 68, 78, 88, 98))
(9,Set(9, 19, 29, 39, 49, 59, 69, 79, 89, 99))

答案 2 :(得分:5)

这里没有地图:

def orderedGroupBy[T, P](seq: Traversable[T])(f: T => P): Seq[(P, Traversable[T])] = {
   @tailrec
   def accumulator(seq: Traversable[T], f: T => P, res: List[(P, Traversable[T])]): Seq[(P, Traversable[T])] = seq.headOption match {
     case None => res.reverse
     case Some(h) => {
       val key = f(h)
       val subseq = seq.takeWhile(f(_) == key)
       accumulator(seq.drop(subseq.size), f, (key -> subseq) :: res)
     }
   }
   accumulator(seq, f, Nil)
 }

如果您只需要按顺序访问结果(无随机访问)并且希望避免创建和使用Map对象的开销,那么它可能很有用。注意:我没有将性能与其他选项进行比较,实际上可能更糟。

编辑:要清楚;假设您的输入已按组密钥排序。我的用例是SELECT ... ORDER BY

答案 3 :(得分:-1)

    This yields better results on ScalaMeter though the solution is very similar to the actual scala groupBy
    ::Benchmark Range.GroupBy::
    cores: 8
    hostname: xxxxx-MacBook-Pro.local
    name: Java HotSpot(TM) 64-Bit Server VM
    osArch: x86_64
    osName: Mac OS X
    vendor: Oracle Corporation
    version: 25.131-b11
    Parameters(size -> 300000): 6.500884
    Parameters(size -> 600000): 13.019679
    Parameters(size -> 900000): 22.756615
    Parameters(size -> 1200000): 25.481007
    Parameters(size -> 1500000): 33.129888
    compared to the one that zipWithIndex approach which yields
    :Benchmark Range.GroupBy::
    cores: 8
    hostname: xxxxx-MacBook-Pro.local
    name: Java HotSpot(TM) 64-Bit Server VM
    osArch: x86_64
    osName: Mac OS X
    vendor: Oracle Corporation
    version: 25.131-b11
    Parameters(size -> 300000): 9.57414
    Parameters(size -> 600000): 18.569085
    Parameters(size -> 900000): 28.233822
    Parameters(size -> 1200000): 36.975254
    Parameters(size -> 1500000): 47.447057
    implicit class GroupBy[A](val t: TraversableOnce[A]) {
 def sortedGroupBy[K](f: A => K)(implicit ordering: Ordering[K]): immutable.SortedMap[K, ArrayBuffer[A]] = {
 val m = mutable.SortedMap.empty[K, ArrayBuffer[A]]
 for (elem <- t) {
 val key = f(elem)
 val bldr = m.getOrElseUpdate(key, mutable.ArrayBuffer[A]())
 bldr += elem
 }
 val b = immutable.SortedMap.newBuilder[K, ArrayBuffer[A]]
 for ((k, v) <- m) {
 b += ((k, v.result))
 }
 b.result
 }
 }
 example: val sizes = Gen.range("size")(300000, 1500000, 300000) and 
 groupByOrdered(_ % 10)