我尝试实现批处理。我的算法:
1)首先,我需要从db开始的请求项,skip = 0
。如果没有项目,则完全停止处理。
case class Item(i: Int)
def getItems(skip: Int): Future[Seq[Item]] = {
Future((skip until (skip + (if (skip < 756) 100 else 0))).map(Item))
}
2)然后对每一项进行繁重的工作(parallelism = 4
)
def heavyJob(item: Item): Future[String] = Future {
Thread.sleep(1000)
item.i.toString + " done"
}
3)处理完所有项目后,使用skip += 100
进入1步
我要做什么:
val dbSource: Source[List[Item], _] = Source.fromFuture(getItems(0).map(_.toList))
val flattened: Source[Item, _] = dbSource.mapConcat(identity)
val procced: Source[String, _] = flattened.mapAsync(4)(item => heavyJob(item))
procced.runWith(Sink.onComplete(t => println("Complete: " + t.isSuccess)))
但是我不知道如何实现分页
答案 0 :(得分:0)
可以将skip
作为值的基础来处理Iterator
增量:
val skipIncrement = 100
val skipIterator : () => Iterator[Int] =
() => Iterator from (0, skipIncrement)
然后可以使用此Iterator来驱动akka Source
,该akka获取项目并继续处理,直到查询返回空Seq
:
val databaseStillHasValues : Seq[Item] => Boolean =
(dbValues) => !dbValues.isEmpty
val itemSource : Source[Item, _] =
Source.fromIterator(skipIterator)
.mapAsync(1)(getItems)
.takeWhile(databaseStillHasValues)
.mapConcat(identity)
heavyJob
可以在Flow
中使用:
val heavyParallelism = 4
val heavyFlow : Flow[Item, String, _] =
Flow[Item].mapAsync(heavyParallelism)(heavyJob)
最后,源和流可以附加到Sink
:
val printSink = Sink[String].foreach(t => println(s"Complete: ${t.isSuccess}"))
itemSource.via(heavyFlow)
.runWith(printSink)