Question

我理解为从Web服务中获取逗号分隔的Id列表然后我使用Id List进行新的调用，我的问题是Id List大约是10000个长，每个调用都是一个中等大小的XML文档。
Web服务端点，或者它可能是Play Framework，当我同时请求所有10 000个异步时，我不太喜欢它，因为我只得到大约500个正确的响应。

一些用于突出意图的伪代码。

for {
  respA <- WS.url(url1).get
  id <- respA.body.split(",")
  respB <- WS.url(url2 + id).get
} yield ...

如何将并发请求限制为更可行？

Answer 1

这是一个示例应用程序，它将10,000个请求（通过Play的WS库）批量分组为1,000个 - 所有这些都在async＆amp; amp;非阻塞方式：

package controllers

import play.api.libs.concurrent.Promise
import scala.concurrent.duration._
import play.api.libs.ws.WS
import scala.concurrent.{Await, Future}
import scala.concurrent.ExecutionContext.Implicits.global
import play.api.mvc.{Action, Controller}
import play.api.libs.ws.Response
import play.api.Logger

object Application extends Controller {

  var numRequests = 0

  def index = Action {
    Async {
      val batches: Iterator[Seq[WS.WSRequestHolder]] = requests.grouped(1000)

      val allBatchesFutureResponses = batches.foldLeft(Future.successful(Seq.empty[Response])) { (allFutureResponses, batch) =>
        allFutureResponses.flatMap { responses =>
          val batchFutures = Future.sequence(batch.map(_.get))
          batchFutures.map { batchResponses =>
            responses ++ batchResponses
          }
        }
      }

      allBatchesFutureResponses.map { responses =>
        Logger.info(responses.size.toString)
        Ok
      }
    }
  }

  def requests = (1 to 10000).map { i =>
    WS.url("http://localhost:9000/pause")
  }

  def pause = Action {
    Async {
      Logger.info(numRequests.toString)
      numRequests = numRequests + 1
      Promise.timeout(Ok, 1 seconds)
    }
  }

}

Answer 2

你需要做一些限制。

阿卡

如何使用一些Akka Actors来发出请求？看看这些用akka限制的方法：

让一些子Actor等于您想要的并发请求数量。每个子actor在HTTP请求Future完成时向父Actor发送响应。每当孩子Actor回复时，请将其发送给下一个请求。
使用Akka的TimerBasedThrottler向发送HTTP请求的子Actor提供Feed消息：http://doc.akka.io/docs/akka/2.1.2/contrib/throttle.html
https://stackoverflow.com/a/9615080/936869

Just with Futures

如果您只想使用Future s而不使用Akka Actor，则可以使用flatMap的组合（将HTTP请求链接到一个接一个）和{ {1}}获得所需的并行度。

Answer 3

您可以考虑将呼叫批处理到第二个Web服务，并且仅在上一批完成后继续进行后续批处理。这种方法可能如下所示：

val fut = for {
  ids <- WS.url(url1).get.map(res => res.body.split("").grouped(batchSize).toList)   
  responses <- processBatches(ids)
} yield responses

fut onComplete{
  case Success(responses) => //handle responses
  case Failure(ex) => //handle fail
}

def processBatches(batches:List[Array[String]]) = {
  val prom = Promise[List[Response]]()
  var trys = List[List[Response]]()

  def doProcessBatch(remainingBatches:List[Array[String]]) {      
    val batch = remainingBatches.head
    val futs = batch.map(id => WS.url(url2 + id).get).toList
    Future.sequence(futs) onComplete{ tr =>
      val list = tr.getOrElse(List()) //add better error handling here
      trys = list :: trys
      if (remainingBatches.size > 1) doProcessBatch(remainingBatches.tail)
      else prom.success(trys.flatten)
    }      
  }
  doProcessBatch(batches)
  prom.future
}

这个想法就是点击第一个服务来获取ID列表，然后将其分成由您选择的某个批量大小确定的批次。然后，处理这些批次，将batchSize并发呼叫数发送到第二个ws呼叫，等待所有已完成，然后再转移到下一批。完成后，您将拥有一个Future，其中List[Response]表示对第二个服务发出的所有呼叫。这不是生产就绪代码，因为它需要更好地处理故障（在这种情况下我只是返回一个空列表）。也可能需要在此行recover之后将get的来电链接起来：

val futs = batch.map(id => WS.url(url2 + id).get).toList

防止批次中的一个失败导致您丢失该批次的其余结果，但我会将这些内容留给您。我只想向您展示一个高级概念，用于将呼叫分配到第二个服务，以便不通过呼叫将其淹没。

Answer 4

使用线程池。以下URL描述了整个机制：http://msdn.microsoft.com/en-us/library/ms973903.aspx

限制并发Web服务请求（或某些批处理方法）

4 个答案:

阿卡

Just with Futures