Groupby喜欢Python的itertools.groupby

时间:2014-07-01 14:20:34

标签: scala scala-collections

在Python中,我可以使用itertools.groupby

对具有相同键的连续元素进行分组
>>> items = [(1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4)]
>>> import itertools
>>> list(key for key,it in itertools.groupby(items, lambda tup: tup[0]))
[1, 2, 3, 1]

Scala也有groupBy,但它会产生不同的结果 - 一个映射从键指向迭代中使用指定键找到的所有值(不是使用相同键的连续运行):

scala> val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items: List[(Int, Int)] = List((1,2), (1,5), (1,3), (2,9), (3,7), (1,5), (1,4))

scala> items.groupBy {case (key, value) => key}
res0: scala.collection.immutable.Map[Int,List[(Int, Int)]] = Map(2 -> List((2,9)), 1 -> List((1,2), (1,5), (1,3), (1,5), (1,4)), 3 -> List((3,7)))

与Python itertools.groupby实现相同的最有说服力的方法是什么?

5 个答案:

答案 0 :(得分:2)

如果您只想丢弃顺序重复项,可以执行以下操作:

def unchain[A](items: Seq[A]) = if (items.isEmpty) items else {
  items.head +: (items zip items.drop(1)).collect{ case (l,r) if r != l => r }
}

也就是说,只需将列表与移动一个位置的自身版本进行比较,只保留不同的项目。如果您希望自定义行为与计数相同(例如,仅按键执行),则可以轻松地向方法添加(same: (a1: A, a2: A) => Boolean)参数并使用!same(l,r)

如果你想保留重复项,你可以使用Scala的groupBy来获得一个非常紧凑(但效率低下)的解决方案:

def groupSequential(items: Seq[A])(same: (a1: A, a2: A) => Boolean) = {
  val ns = (items zip items.drop(1)).
    scanLeft(0){ (n,cc) => if (same(cc._1, cc._2)) n+1 else n }
  (ns zip items).groupBy(_._1).toSeq.sortBy(_._1).map(_._2)
}

答案 1 :(得分:2)

使用List.span,就像这样

def keyMultiSpan(l: List[(Int,Int)]): List[List[(Int,Int)]] = l match {

  case Nil => List()
  case h :: t =>
    val ms = l.span(_._1 == h._1)
    ms._1 :: keyMultiSpan(ms._2)
}

因此,让

val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))

等等

keyMultiSpan(items).map { _.head._1 }
res: List(1, 2, 3, 1)

<强>更新

一种更易读的语法,如@Paul所建议的,一个可能更整洁的用法的隐式类,以及通用性的类型参数化,

implicit class RichSpan[A,B](val l: List[(A,B)]) extends AnyVal {

  def keyMultiSpan(): List[List[(A,B)]] = l match {

      case Nil => List()
      case h :: t =>
        val (f, r) = l.span(_._1 == h._1)
        f :: r.keyMultiSpan()
  }
}

因此,请按如下方式使用,

items.keyMultiSpan.map { _.head._1 }
res: List(1, 2, 3, 1)

答案 2 :(得分:1)

尝试:

val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
val res = compress(items.map(_._1))

/** Eliminate consecutive duplicates of list elements **/
def compress[T](l : List[T]) : List[T] = l match {
  case head :: next :: tail if (head == next) => compress(next :: tail)
  case head :: tail => head :: compress(tail)
  case Nil => List()
}

/** Tail recursive version **/
def compress[T](input: List[T]): List[T] = {
  def comp(remaining: List[T], l: List[T], last: Any): List[T] = {
    remaining match {
      case Nil => l
      case head :: tail if head == last => comp(tail, l, head)
      case head :: tail => comp(tail, head :: l, head)
    }
  }
  comp(input, Nil, Nil).reverse
}

其中compress是其中一个99 Problems in Scala的解决方案。

答案 3 :(得分:1)

这是一个简洁但效率低下的解决方案:

def pythonGroupBy[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
  items.foldLeft(List[List[T]]()) {
    case (Nil, x) => List(List(x))
    case (g :: gs, x) if f(g.head) == f(x) => (x :: g) :: gs
    case (gs, x) => List(x) :: gs
  }.map(_.reverse).reverse
}

这是一个更好的,只在每个元素上调用f一次:

def pythonGroupBy2[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
  if (items.isEmpty)
    List(List())
  else {
    val state = (List(List(items.head)), f(items.head))
    items.tail.foldLeft(state) { (state, x) =>
      val groupByX = f(x)
      state match {
        case (g :: gs, groupBy) if groupBy == groupByX => ((x :: g) :: gs, groupBy)
        case (gs, _) => (List(x) :: gs, groupByX)
      }
    }._1.map(_.reverse).reverse
  }
}

两种解决方案均折叠items,随时建立一个群组列表。 pythonGroupBy2还会跟踪当前组的f值。最后,我们必须反转每个组和组列表以获得正确的顺序。

答案 4 :(得分:1)

hmm无法找到开箱即用的东西,但这样做会

def groupz[T](list:List[T]):List[T] = {
      list match {
      case Nil => Nil
      case x::Nil => List(x)
      case x::xs if (x == xs.head) => groupz(xs)
      case x::xs => x::groupz(xs)
      }}

//now let's add this functionality to List class 
 implicit def addPythonicGroupToList[T](list:List[T]) = new {def pythonGroup = groupz(list)}

现在你可以这样做:

val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items.map(_._1).pythonGroup
res1: List[Int] = List(1, 2, 3, 1)