在构建Source作为对数据的响应时消除内部收集

时间:2017-11-24 18:40:02

标签: scala akka-stream reactive-streams

我有一个Flow(createDataPointFlow),它是通过执行mapAsync来构建的,它收集数据点(通过Sink.seq),否则我将直接流式传输(即不先收集)。

但是,如果不收集项目我怎么能这样做是不明显的,我似乎需要某种机制将我的项目直接发布到我正在创建的流程的输出部分,但是我是对此我不熟悉,如果没有明确的参与者,我也不知道如何做到这一点,我想避免这样做。

如何在不首先收集信息的情况下实现这一目标?请记住,我想要实现的是完全流式传输而没有Sink.seq(...)正在进行的显式缓冲。

object MyProcess {

  def createDataSource(job:Job, dao:DataService):Source[JobDataPoint,NotUsed] = {
    // Imagine the below call is equivalent to streaming a parameterized query using Slick
    val publisher: Publisher[JobDataPoint] = dao.streamData(Criteria(job.name, job.data))
    // Convert to a Source
    val src: Source[JobDataPoint, NotUsed] = Source.fromPublisher(publisher)
    src
  }

  def createDataPointFlow(dao:DataService, parallelism:Int=1): Flow[Job,JobDataPoint, NotUsed] =
    Flow[Job].mapAsync(parallelism)(job =>
      createDataSource(job,dao).toMat(Sink.seq)(Keep.right).run()
    ).mapConcat(identity)

  def apply(src:Source[Job,NotUsed], dao:DataService,parallelism:Int=5) = RunnableGraph.fromGraph(GraphDSL.create(){ implicit builder =>
    import GraphDSL.Implicits._

    //Source
    val jobs:Outlet[Job] = builder.add(src).out
    //val bcastJobsSrc: Source[Job, NotUsed] = src.toMat(BroadcastHub.sink(256))(Keep.right).run()
    //val bcastOutlet:Outlet[Job] = builder.add(bcastJobsSrc).out

    //Flows
    val bcastJobs:UniformFanOutShape[Job,Job] = builder.add(Broadcast[Job](4))
    val rptMaker = builder.add(MyProcessors.flow(dao,parallelism))
    val dpFlow = createDataPointFlow(dao,parallelism)

    //Sinks
    val jobPrinter:Inlet[Job] = builder.add(Sink.foreach[Job](job=>println(s"[MyGraph] Received job: ${job.name} => $job"))).in
    val jobList:Inlet[Job] = builder.add(Sink.fold(List.empty[Job])((list,job:Job)=>job::list)).in
    val reporter: Inlet[ReportTable] = builder.add(Sink.foreach[ReportTable](r=>println(s"[Report]: $r"))).in

    val dpSink: Inlet[JobDataPoint] = builder.add(Sink.foreach[JobDataPoint](dp=>println(s"[DataPoint]: $dp"))).in

    jobs ~> bcastJobs

    bcastJobs ~> jobPrinter
    bcastJobs ~> jobList
    bcastJobs ~> rptMaker ~> reporter
    bcastJobs ~> dpFlow ~> dpSink
    ClosedShape
  })
}

1 个答案:

答案 0 :(得分:0)

因此,在重新阅读有关各个阶段的文档之后,我发现我需要的是flatMapConcat

def createDataPointFlow(dao:DataService, parallelism:Int=1): Flow[Job,JobDataPoint, NotUsed] =
    Flow[Job].flatMapConcat(createDataSource(_,dao))