项目流,每个流产生n {个http请求}>合并{response}> wrapInHeaderAndFooter {data}> http请求

时间:2019-05-09 14:10:07

标签: akka-stream akka-http

我正在尝试使用流媒体解决经典的ETL问题。我有一组段,每个段都包含有关与该段关联的记录的信息,例如记录数,要检索的url等,以发出http请求来收集数据。我需要从分页大小为100条记录的源中提取记录,合并每个段的记录页,并包装为xml标头和页脚。现在,将每个分段的每个xml有效负载发送到目标。


                 {http}
                 page 1
                 /      \
       seg 1 >  page 2  -> merge -> wrapHeaderAndFooter -> http target 
          /      \      /
         /       page n
        / 
       /
batch -  seg 2                    "                     -> http target
       \ seg n                    "                     -> http target

val loadSegment: Flow[Segment, Response, NotUsed] = {
    Flow[Segment].mapAsync(parallelism = 5) { segment =>
      val pages: Source[ByteString, NotUsed] = pagedPayload(segment).map(page => page.payload)
//Using source concatenation to prepend and append
      val wrappedInXML: Source[ByteString, NotUsed] = xmlRootStartTag ++ pages ++ xmlRootEndTag
      val httpEntity: HttpEntity = HttpEntity(MediaTypes.`application/octet-stream`, pages)
        invokeTargetLoad(httpEntity, request, segment)
    }
  }
def pagedPayload(segment: Segment): Source[Payload, NotUsed] = {
    val totalPages: Int =   calculateTotalPages(segment.instanceCount)
      Source(0 until totalPages).mapAsyncUnordered(parallelism = 5)(i => {
        sendPayloadRequest(request, segment, i).mapTo[Try[Payload]].map(_.get)
      })
  }

val batch: Batch = someBatch
  Source(batch.segments)
    .via(loadSegment)
    .runWith(Sink.ignore)
    .andThen {
      case Success(value) => log("success")
      case Failure(error) => report(error)
    }

有没有更好的方法?我正在尝试使用HttpEntity.Chunked编码来流式传输页面。有时,由于预热,来自源的第一个请求可能会花费更长的时间,而目标会截断没有数据的流。有没有一种方法可以延迟到目标的实际连接,直到我们拥有流中的第一页?

我更喜欢做下面的事情。如果可能的话,如何实现方法 wrapXMLHeader toHttpEntity

val splitPages: Flow[BuildSequenceSegment, Seq[PageRequest], NotUsed] = ???
  val requestPayload: Flow[Seq[PageRequest], Seq[PageResponse], NotUsed] = ???
  val wrapXMLHeader: Flow[Seq[PageResponse], Seq[PageResponse], NotUsed] = ???
  val toHttpEntity: Flow[Seq[PageResponse], HttpEntity.Chunked, NotUsed] = ???
  val invokeTargetLoad: Flow[HttpEntity.Chunked, RestResponse, NotUsed] = ???

  Source(batch.segments)
    .via(splitPages)
    .via(requestPayload)
    .via(wrapXMLHeader)
    .via(toHttpEntity)
    .via(invokeTargetLoad)
    .runWith(Sink.ignore)

0 个答案:

没有答案