Question

我有spring-batch作业，我需要将ID列表作为输入传递给该作业，我希望ID列表中的ID能够传递到可以并行运行所有ID的步骤。现在，我已经完成的工作是在一个threadpoolExecutor中运行多个作业实例，该作业实例执行了x次。这意味着它将对所有作业进行单个查询。我们正在谈论超过5000万条记录。记录表示时间序列@消费的特定日期。我需要按月汇总的ID和BatchId并将此信息发送给经纪人。

Reader->根据ID和时间戳从数据库中读取代表时间序列。
处理器-> PassThroughItemProcessor
作家->发送到AMQP（汇总项目列表）

您能提供给我什么最佳实践吗？

根据建议，这就是我的分区器的样子；

@Override
public Map<String, ExecutionContext> partition(int gridSize) {
    log.debug("START: Partition");

    Map<String, ExecutionContext> partitionMap = new HashMap<>();
    final AtomicInteger counter = new AtomicInteger(0);
    final AtomicInteger partitionerCounter = new AtomicInteger(0);
    Page<Integer> result = null;
    do {
        result = repository.findDistinctByBatchId(LocalDateTime.parse(batchId, AipForecastService.DEFAULT_DATE_TIME_FORMATTER), Optional.ofNullable(result)
                .map(Page::nextPageable)
                .orElse(PageRequest.of(0, 100000)));
        result
                .stream()
                .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / 100))
                .values()
                .forEach(listOfInstallation -> {
                    ExecutionContext context = new ExecutionContext();
                    context.put("listOfInstallation", listOfInstallation);
                    partitionMap.put("partition" + partitionerCounter.incrementAndGet(), context);
                    log.debug("Adding to the partition map {}, listOfInstallation {}", partitionerCounter.get(), listOfInstallation);
                });
    } while (result.hasNext());

    log.debug("END: Created Partitions for installation job of size:{}", partitionMap.size());
    return partitionMap;
}

Answer 1

我需要将ID列表作为输入传递给作业，我希望ID列表中的ID能够传递到可以并行运行所有ID的步骤

您可以对该列表进行分区，并使用partitioned step来并行处理分区。

您能提供给我什么最佳实践吗？

如果您选择分区的阶梯式路由（对于您的用例来说，这对我来说很合适），我建议您不要为每个ID创建一个分区（除非您有合理数量的ID）。例如，您可以按ID范围创建一个分区，并使每个工作步骤执行您所描述的读/处理/写逻辑，这些逻辑可以并行完成。

希望这会有所帮助。

春季批Parrall步骤

1 个答案: