如何合并已排序的“Stream”或“List”的相邻条目

时间:2016-04-05 13:13:49

标签: scala functional-programming

给出

  • 大(> 1,000,000个条目,不要期望它适合内存)
  • 排序(wrt。元组的第一个值)

一样流
val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream
// just for demo
val xs = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0"))

我想加入相邻的条目,以便转换的输出变为

List( (1, "2.5 5.0"), (2, "3.0 4.0 6.0"), (3, "6.0") )

第二个元组值将由某个monoid函数合并(此处为字符串连接)

想法/尝试/尝试

GROUPBY

groupBy似乎不是一个有效的选择,因为条目是在内存中的地图中收集的。

scanLeft

val ss: Stream[(Int, String)] = List( (1, "2.5"), (1, "5.0"), (2, "3.0")).toStream

val transformed = ss.scanLeft(Joiner(0, "a"))( (j, t) => {
  j.x match {
    case t._1 => j.copy(y = j.y + " " + t._2)
    case _ => Joiner(t._1, t._2)
  }
})
println(transformed.toList)

最终在

List(Joiner(0,a), Joiner(1,2.5), Joiner(1,2.5 5.0), Joiner(2,3.0))

(请忽略包装Joiner

但我没有办法摆脱"不完整的"条目。

2 个答案:

答案 0 :(得分:1)

发出true表示初始元素(当值切换时),而不是最终元素,这很容易,对吧?然后你可以收集那些条目,然后是初始条目。 也许是这样的事情:

   ss.scanLeft((0, "", true)) { 
     case ((a, str, _), (b, c)) if (str == "" || a == b) => (b, str + " " + c, false) 
     case (_, (b, c)) => (b, c.toString, true)
   } .:+ (0, "", true)
     .sliding(2)
     .collect { case Seq(a, (_, _, true)) =>  (a._1, a._2) }

(注意.:+的东西 - 它会在流的末尾添加一个"虚拟"条目,以便最后一个" real"元素也跟着" true"条目,并且不会被过滤掉。

答案 1 :(得分:1)

这似乎没问题。

def makeEm(s: Stream[(Int, String)]) = {

  import Stream._

  @tailrec
  def z(source: Stream[(Int, String)], curr: (Int, List[String]), acc: Stream[(Int, String)]): Stream[(Int, String)] = source match {
    case Empty =>
      Empty
    case x #:: Empty =>
      acc :+ (curr._1 -> (x._2 :: curr._2).mkString(","))
    case x #:: y #:: etc if x._1 != y._1 =>
      val c = curr._1 -> (x._2 :: curr._2).mkString(",")
      z(y #:: etc, (y._1, List[String]()), acc :+ c)
    case x #:: etc =>
      z(etc, (x._1, x._2 :: curr._2), acc)
  }

  z(s, (0, List()), Stream())
}

试验:

val ss = List( (1, "2.5"), (1, "5.0"), (2, "3.0"), (2, "4.0"), (2, "6.0"), (3, "1.0")).toStream
makeEm(ss).toList.mkString(",")

val s = List().toStream
makeEm(s).toList.mkString(",")

val ss2 = List( (1, "2.5"), (1, "5.0")).toStream
makeEm(ss2).toList.mkString(",")

val s3 = List((1, "2.5"),(2, "4.0"),(3, "1.0")).toStream
makeEm(s3).toList.mkString(",")

输出

ss: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res0: String = (1,5.0,2.5),(2,6.0,4.0,3.0),(3,1.0)

s: scala.collection.immutable.Stream[Nothing] = Stream()
res1: String = 

ss2: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res2: String = (1,5.0,2.5)

s3: scala.collection.immutable.Stream[(Int, String)] = Stream((1,2.5), ?)
res3: String = (0,2.5),(2,4.0),(3,1.0)