Question

我有带分区的Spring批处理。 gridSize为10，因此它将产生10个线程。一切都是默认的Bean单例。 TaskExeutor最多可以有15个核心池10。

@Bean
@StepScope
public RepositoryItemReader<CustomObject> reader(${executorContext[from]} from, ${executorContext[to], ${executorContext[partitonId]) {
    LOG.info("Partition ID: {} will process row from: {}  to: {}", partitionId, from, to);
    //here has the right output, say 1 to 10, 10 to 20, include from, exclude to
    RepositoryItemReader reader = new RepositoryItemReader();
    reader.setRepository(objectRepo);
    reader.setMethod("findByProcessedFromAndTo");
    //from here I pass in from and to to do the partition
    //omit sorts, pageSize,  params
    reader.setSaveState(false);
    return reader;
}

这里是读取器，该读取器将在DB中返回4行。 CustomObject 1至4。

@Bean
public class processor implements ItemProcessor() {
   @override
   public Object process(customObject) {
       logger.info(customObject.getId());
       //logic
   }
}

@Bean
Step processStep(){
    //chunk 1
    //item reader
    //item processor
    //item writer
    //build
}

Step partitionStep {
    //partion with gridSize 10,
    //processStep
    //taskExecutor
}

Partition  {
    int start = 1;
    int range = totalCount/gridSize + 1;
    for(i to gridSize){
        ExecutionContext context = New ExecutionContext();
       context.put("from",start);
       context.put("to", start*range);
       start += range;
       context.put("partitionId", i); 
       map.put(PARTITION_KEY, context);      
    }
    return map;
}

示例查询：

select * from Table where rownum >=:from and rownum < :to;

设置非常简单。只是具有分区gridsize 10的批处理过程。

当我运行它时，项目阅读器会得到4条正确的记录。但是，当读者将数据传递给项目处理器时，我得到了这样的日志，我正在加码。

Thread 2 processing Object Id: 11 //row 10 to 20
Thread 1 processing Object Id: 1 //row  1 to 10
Thread 4 processing Object Id: 31 //row  30 to 40
Thread 6 processing Object Id: 51 //row  50 to 60

由于我实现了分区并在查询中进行了分区。现在，所有线程都应该处理分区集，并且不应该处理重复记录，但是仍然出现相同的问题，某些线程将处理重复记录。

Thread 9 processing Object ID:2
Thread 3 processing Object ID:4
and so on

整个工作完成后，db中将有未处理的记录。

我是否缺少某些东西，需要一些帮助。

Answer 1

预期的行为是ItemReader只应将这4个数据传递给4个线程

这是您期望的多线程步骤，而不是分区步骤。

对于分区步骤，仅设置网格大小是不够的。您需要使用Partitioner预先定义分区，并将其设置在主步骤上，以便每个工作步骤处理一个分区。如果不指定分区程序，Spring Batch将不知道如何对项目进行分区，并且您的阅读器将用于所有（未定义）分区，从而多次读取整个数据集。

Spring Batch Itemreader将相同的对象两次传递给Itemprocessor

1 个答案: