Question

我尝试实现批处理。我的算法：

1）首先，我需要从db开始的请求项，skip = 0。如果没有项目，则完全停止处理。

  case class Item(i: Int)

  def getItems(skip: Int): Future[Seq[Item]] = {
    Future((skip until (skip + (if (skip < 756) 100 else 0))).map(Item))
  }

2）然后对每一项进行繁重的工作（parallelism = 4）

  def heavyJob(item: Item): Future[String] = Future {
    Thread.sleep(1000)
    item.i.toString + " done"
  }

3）处理完所有项目后，使用skip += 100进入1步

我要做什么：

val dbSource: Source[List[Item], _] = Source.fromFuture(getItems(0).map(_.toList))

val flattened: Source[Item, _] = dbSource.mapConcat(identity)

val procced: Source[String, _] = flattened.mapAsync(4)(item => heavyJob(item))

procced.runWith(Sink.onComplete(t => println("Complete: " + t.isSuccess)))

但是我不知道如何实现分页

Answer 1

可以将skip作为值的基础来处理Iterator增量：

val skipIncrement = 100

val skipIterator : () => Iterator[Int] = 
  () => Iterator from (0, skipIncrement)

然后可以使用此Iterator来驱动akka Source，该akka获取项目并继续处理，直到查询返回空Seq：

val databaseStillHasValues : Seq[Item] => Boolean = 
  (dbValues) => !dbValues.isEmpty

val itemSource : Source[Item, _] = 
  Source.fromIterator(skipIterator)
        .mapAsync(1)(getItems)
        .takeWhile(databaseStillHasValues)
        .mapConcat(identity)

heavyJob可以在Flow中使用：

val heavyParallelism = 4

val heavyFlow : Flow[Item, String, _] = 
  Flow[Item].mapAsync(heavyParallelism)(heavyJob)

最后，源和流可以附加到Sink：

val printSink = Sink[String].foreach(t => println(s"Complete: ${t.isSuccess}"))

itemSource.via(heavyFlow)
          .runWith(printSink)

如何使用跳过和条件停止来实现流

1 个答案: