Question

我想与Google OCR异步处理20000个PDFS，但是我没有找到与其相关的文档，我已经尝试使用client.asyncBatchAnnotateFilesAsync功能；

List<AsyncAnnotateFileRequest> requests = new ArrayList<>();
for (MultipartFile file : files) {
    GcsSource gcsSource = GcsSource.newBuilder().setUri(gcsSourcePath + file.getOriginalFilename()).build();
    InputConfig inputConfig = InputConfig.newBuilder().setMimeType("application/pdf").setGcsSource(gcsSource)
            .build();
    GcsDestination gcsDestination = GcsDestination.newBuilder()
            .setUri(gcsDestinationPath + file.getOriginalFilename()).build();
    OutputConfig outputConfig = OutputConfig.newBuilder().setBatchSize(2).setGcsDestination(gcsDestination)
            .build();
    AsyncAnnotateFileRequest request = AsyncAnnotateFileRequest.newBuilder().addFeatures(feature)
            .setInputConfig(inputConfig).setOutputConfig(outputConfig).build();
    requests.add(request);

}
AsyncBatchAnnotateFilesRequest request = AsyncBatchAnnotateFilesRequest.newBuilder().addAllRequests(requests)
        .build();
AsyncBatchAnnotateFilesResponse response = client.asyncBatchAnnotateFilesAsync(request).get();
System.out.println("Waiting for the operation to finish.");

但是我得到的是一条错误消息

io.grpc.StatusRuntimeException: INVALID_ARGUMENT: At this time, only single requests are supported for asynchronous processing.

如果Google不提供批处理，为什么他们提供asyncBatchAnnotateFilesAsync？也许我正在使用旧版本？ asyncBatchAnnotateFilesAsync函数是否可以在其他Beta版本中使用？

Answer 1

Vision服务不支持单个呼叫上的多个请求。

这可能会造成混淆，因为根据RPC API documentation，您确实可以在一个服务调用上提供多个请求（每个请求1个文件），但是根据此issue tracker，存在已知的限制在Vision服务上运行，目前一次只能接受一个请求。

Answer 2

由于每个请求仅限制为1个文件，您能否仅发送2万个请求？它们是异步请求，因此发送请求应该很快。

如何使用PDF OCR批量处理文件？

2 个答案: