如何使用SubFlow对已排序流的项目进行分组?

时间:2017-08-01 11:06:38

标签: akka akka-stream

你们能解释一下如何在akka-streams中使用新的groupBy吗? Documentation似乎毫无用处。 groupBy过去常常返回(T, Source)但不再返回Source(List( 1 -> "1a", 1 -> "1b", 1 -> "1c", 2 -> "2a", 2 -> "2b", 3 -> "3a", 3 -> "3b", 3 -> "3c", 4 -> "4a", 5 -> "5a", 5 -> "5b", 5 -> "5c", 6 -> "6a", 6 -> "6b", 7 -> "7a", 8 -> "8a", 8 -> "8b", 9 -> "9a", 9 -> "9b", )) .groupBy(3, _._1) .map { case (aid, raw) => aid -> List(raw) } .reduce[(Int, List[String])] { case (l: (Int, List[String]), r: (Int, List[String])) => (l._1, l._2 ::: r._2) } .mergeSubstreams .runForeach { case (aid: Int, items: List[String]) => println(s"$aid - ${items.length}") } 。这是我的例子(我从文档中模仿了一个):

groupBy

这简直就是挂起。也许它会挂起,因为子流的数量低于唯一键的数量。但如果我有无限的流,我该怎么办?我想分组直到关键变化。

在我的实际流中,数据总是按值I分组排序。也许我根本不需要div { border-width: 1px; border-style: solid; } #container { width: 200px; height: 200px; display: grid; grid-template-rows: auto 1fr auto; }

4 个答案:

答案 0 :(得分:4)

你也可以使用statefulMapConcat来实现它,因为它没有做任何子实现(但你必须忍受使用var s的耻辱),这将会有点便宜:

source.statefulMapConcat { () =>
  var prevKey: Option[Int] = None
  var acc: List[String] = Nil

  { case (newKey, str) =>
    prevKey match {
      case Some(`newKey`) | None =>
        prevKey = Some(newKey)
        acc = str :: acc
        Nil
      case Some(oldKey) =>
        val accForOldKey = acc.reverse
        prevKey = Some(newKey)
        acc = str :: Nil
        (oldKey -> accForOldKey) :: Nil
    }
  }
}.runForeach(println)

答案 1 :(得分:3)

一年后,Akka Stream Contrib有一个AccumulateWhileUnchanged类可以做到这一点:

libraryDependencies += "com.typesafe.akka" %% "akka-stream-contrib" % "0.9"

和:

import akka.stream.contrib.AccumulateWhileUnchanged
source.via(new AccumulateWhileUnchanged(_._1))

答案 2 :(得分:1)

如果您的流数据始终排序,您可以利用它进行分组:

<select>
  <option selected>ABC</option>
  <option>DEF</option>
  <option>GHI</option>
</select>

最后你会得到这些结果:

val source = Source(List(
  1 -> "1a", 1 -> "1b", 1 -> "1c",
  2 -> "2a", 2 -> "2b",
  3 -> "3a", 3 -> "3b", 3 -> "3c",
  4 -> "4a",
  5 -> "5a", 5 -> "5b", 5 -> "5c",
  6 -> "6a", 6 -> "6b",
  7 -> "7a",
  8 -> "8a", 8 -> "8b",
  9 -> "9a", 9 -> "9b",
))

source
  // group elements by pairs
  // the last one will be not a pair, but a single element
  .sliding(2,1)
  // when both keys in a pair are different, we split the group into a subflow
  .splitAfter(pair => (pair.headOption, pair.lastOption) match {
    case (Some((key1, _)), Some((key2, _))) => key1 != key2
  })
  // then we cut only the first element of the pair 
  // to reconstruct the original stream, but grouped by sorted key
  .mapConcat(_.headOption.toList)
  // then we fold the substream into a single element
  .fold(0 -> List.empty[String]) {
    case ((_, values), (key, value)) => key -> (value +: values)
  }
  // merge it back and dump the results
  .mergeSubstreams
  .runWith(Sink.foreach(println))

但与groupBy相比,你并不受不同键数量的限制。

答案 3 :(得分:1)

我最终实现了自定义阶段

class GroupAfterKeyChangeStage[K, T](keyForItem: T ⇒ K, maxBufferSize: Int) extends GraphStage[FlowShape[T, List[T]]] {

  private val in = Inlet[T]("GroupAfterKeyChangeStage.in")
  private val out = Outlet[List[T]]("GroupAfterKeyChangeStage.out")

  override val shape: FlowShape[T, List[T]] =
    FlowShape(in, out)

  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with InHandler with OutHandler {

    private val buffer = new ListBuffer[T]
    private var currentKey: Option[K] = None

    // InHandler
    override def onPush(): Unit = {
      val nextItem = grab(in)
      val nextItemKey = keyForItem(nextItem)

      if (currentKey.forall(_ == nextItemKey)) {
        if (currentKey.isEmpty)
          currentKey = Some(nextItemKey)

        if (buffer.size == maxBufferSize)
          failStage(new RuntimeException(s"Maximum buffer size is exceeded on key $nextItemKey"))
        else {
          buffer += nextItem
          pull(in)
        }
      } else {
        val result = buffer.result()
        buffer.clear()
        buffer += nextItem
        currentKey = Some(nextItemKey)
        push(out, result)
      }
    }

    // OutHandler
    override def onPull(): Unit = {
      if (isClosed(in))
        failStage(new RuntimeException("Upstream finished but there was a truncated final frame in the buffer"))
      else
        pull(in)
    }

    // InHandler
    override def onUpstreamFinish(): Unit = {
      val result = buffer.result()
      if (result.nonEmpty) {
        emit(out, result)
        completeStage()
      } else
        completeStage()

      // else swallow the termination and wait for pull
    }

    override def postStop(): Unit = {
      buffer.clear()
    }

    setHandlers(in, out, this)
  }
}

如果您不想复制粘贴,则将其添加到我维护的helper library中。为了使用,您需要添加

Resolver.bintrayRepo("cppexpert", "maven")

给您的解析器。添加添加foolowing到您的依赖项

"com.walkmind" %% "scala-tricks" % "2.15"

它在com.walkmind.akkastream.FlowExt中作为流实现

def groupSortedByKey[K, T](keyForItem: T ⇒ K, maxBufferSize: Int): Flow[T, List[T], NotUsed]

以我的示例为例

source
  .via(FlowExt.groupSortedByKey(_._1, 128))