Question

我是Groovy的新手，在如何批量处理请求方面有些迷路，因此可以将它们成批提交，而不是像我目前一样单独提交给服务器：

class Handler {
    private String jobId
    // [...]
    void submit() {
        // [...]
        // client is a single instance of Client used by all Handlers
        jobId = client.add(args)
    }
}

class Client {
    //...
    String add(String args) {
        response = postJson(args)
        return parseIdFromJson(response)
    }
}

现在，有一些调用Client.add()，它会发布到REST API并返回已解析的结果。

我遇到的问题是，add()方法可能被快速连续调用了数千次，并且收集传递给args的所有add()的效率将大大提高。，请等待一段时间，直到add()调用停止发出，然后对该批次一次过一次发布到REST API，然后一次性发送所有参数。

这可能吗？只要进行批处理，提交就可以了，add()可能会立即返回伪造的ID，并且客户以后可以知道在伪造的ID与REST API的ID之间进行的查询（这将在REST API中返回ID）。与发送给它的参数相对应的顺序。

Answer 1

正如评论中提到的那样，这对于gpars可能是一个很好的案例，它在这种情况下非常出色。

这实际上与groovy无关，而与Java和一般jvm上的异步编程有关。

如果您想坚持使用Java并发习惯用语，我将其作为一个代码片段，您可以将其用作潜在的起点。尚未对此进行测试，也未考虑边缘情况。我是为了好玩而写的，并且由于这是异步编程，而且我还没有花适当的时间思考它，所以我怀疑其中有足够大的孔可以推动坦克通过。

话虽这么说，下面是一些尝试对请求进行批处理的代码：

import java.util.concurrent.* 
import java.util.concurrent.locks.* 

// test code 
def client = new Client()

client.start()
def futureResponses = []
1000.times { 
  futureResponses << client.add(it as String)
}
client.stop()

futureResponses.each { futureResponse ->
  // resolve future...will wait if the batch has not completed yet
  def response = futureResponse.get()
  println "received response with index ${response.responseIndex}"
}
// end of test code 

class FutureResponse extends CompletableFuture<String> {
  String args
}

class Client {
  int minMillisLullToSubmitBatch = 100
  int maxBatchSizeBeforeSubmit   = 100
  int millisBetweenChecks        = 10
  long lastAddTime               = Long.MAX_VALUE

  def batch = []
  def lock = new ReentrantLock()
  boolean running = true

  def start() {
    running = true
    Thread.start { 
      while (running) {
        checkForSubmission()
        sleep millisBetweenChecks
      }
    }
  }

  def stop() {
    running = false
    checkForSubmission()
  }

  def withLock(Closure c) {
    try { 
      lock.lock()
      c.call()
    } finally { 
      lock.unlock()
    }    
  }

  FutureResponse add(String args) {
    def future = new FutureResponse(args: args)

    withLock { 
      batch << future
      lastAddTime = System.currentTimeMillis()
    }

    future
  }

  def checkForSubmission() {
    withLock { 
      if (System.currentTimeMillis() - lastAddTime > minMillisLullToSubmitBatch ||
          batch.size() > maxBatchSizeBeforeSubmit) { 
        submitBatch()
      }
    }
  }

  def submitBatch() {
    // here you would need to put the combined args on a format 
    // suitable for the endpoint you are calling. In this 
    // example we are just creating a list containing the args
    def combinedArgs = batch.collect { it.args }

    // further there needs to be a way to map one specific set of 
    // args in the combined args to a specific response. If the 
    // endpoint responds with the same order as the args we submitted
    // were in, then that can be used otherwise something else like 
    // an id in the response etc would need to be figured out. Here 
    // we just assume responses are returned in the order args were submitted
    List<String> combinedResponses = postJson(combinedArgs)
    combinedResponses.indexed().each { index, response -> 
      // here the FutureResponse gets a value, can be retrieved with 
      // futureResponse.get()
      batch[index].complete(response)
    }

    // clear the batch
    batch = []
  }

  // bogus method to fake post
  def postJson(combinedArgs) {
    println "posting json with batch size: ${combinedArgs.size()}"
    combinedArgs.collect { [responseIndex: it] }
  }
}

一些注意事项：

需要一段时间才能对一段时间内没有添加呼叫的情况做出反应。这意味着一个单独的监视线程，这是start和stop方法管理的。
如果我们有无限个添加序列而没有暂停，则可能会耗尽资源。因此，该代码具有最大批处理大小，即使在添加调用中没有停顿，该代码也将提交批处理。
代码使用锁来确保（或尝试，如上所述，我没有考虑所有潜在问题）在批量提交等过程中保持线程安全
假设这里的基本思想是合理的，那么您只需执行submitBatch中的逻辑，其中的主要问题是处理将特定的args映射到特定的响应
CompletableFuture是Java 8类。可以使用早期版本中的其他构造来解决，但是我碰巧是在Java 8上。
我或多或少地在没有执行或测试的情况下编写了此代码，我确定其中存在一些错误。
从下面的打印输出中可以看出，“ maxBatchSizeBeforeSubmit”设置更建议实际最大由于监视线程休眠一段时间，然后醒来检查我们的运行状况，因此调用add方法的线程可能在批处理中累积了任意数量的请求。我们所保证的是，我们每millisBetweenChecks都会醒来并检查我们的工作方式，并且如果达到提交批处理的标准，则将提交该批处理。

如果您不熟悉Java期货和锁，建议您继续阅读。

如果将以上代码保存在常规脚本code.groovy中并运行它：

~> groovy code.groovy
posting json with batch size: 153
posting json with batch size: 234
posting json with batch size: 243
posting json with batch size: 370
received response with index 0
received response with index 1
received response with index 2
...
received response with index 998
received response with index 999

~>

它应该工作并打印出从我们的假json提交中收到的“响应”。

在Groovy中批量处理请求？

1 个答案: