Question

我每天都有一份特定时间的工作。它有两个步骤，每个步骤都在reader（）和processor（）部分中设置了一个rest调用。资源是存储在MySQL DB中的帐号。因此，Spring Batch作业运行正常，我们获得预期的输出。但只能在一个线程上运行。我尝试将其并行化，通过文档和一些示例，并在一段时间后使用了这个特定的example。这是我在java中的作业配置代码。

@Bean
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
    taskExecutor.setMaxPoolSize(60);
    taskExecutor.afterPropertiesSet();
    return taskExecutor;
}

@Bean
public Job processJob(BatchListener listener) {
    return jobBuilderFactory.get("Job").incrementer(new RunIdIncrementer()).listener(listener)
            .flow(processStep1()).on("*").to(Step2()).end().build();
}
@Bean
public Step processStep1() {
    return (Step) stepBuilderFactory.get("Step1")
            .<response, response>chunk(3).
            reader(getItemReader()).
            processor(getItemProcess()).
            writer(getItemWriter()).
            taskExecutor(taskExecutor()).
            throttleLimit(2).
            build();

}


@Bean 
public Step processStep2() {

    SimpleStepBuilder<AccountResponse,batch_details> process = stepBuilderFactory.get("processStep2")
            .<AccountResponse,batch_details>chunk(5).reader(getBatchReader()).processor(getBatchProcessor());
    return process.writer(getBatchWriter()).build();

}

即使配置了任务执行程序，此配置也只在一个线程上运行。有人可以帮我解决我在错误或缺少的问题，以便在不同的线程中运行它。我想并行化step1和step2，数据并发不是问题。如果我将第1步并行化，我将复制第2步。谢谢。

示例输出：

Thread # 37 is doing this task
Hibernate: Select * from batch_details where status != 'complete' and     session_id = '' and status != 'in_solve' ORDER BY RAND() LIMIT 3
Hibernate: update batch_details set status = 'in_cs' where account_id= ?
Hibernate: update batch_details set session_completion_time=?, session_id=?,     status=? where account_id=?
accountnumber1

Thread # 37 is doing this task
Hibernate: Select * from batch_details where status != 'complete' and session_id = '' and status != 'in_solve' ORDER BY RAND() LIMIT 3
Hibernate: update batch_details set status = 'in_cs' where account_id= ?
accountnumber2

另一个问题是，如果我更改块大小，读取器会重复块大小，但是在同一个线程中。我无法理解这个阶段的意义，如果你也可以解释为什么会这样，非常感谢。

Answer 1

您的工作不包含并行流程。目前，它只是在完成步骤1后依次执行步骤1和步骤2。

在this question中，Hansjoerg Wingeier提供了一种与一组辅助方法并行执行步骤的好方法：

// helper method to create a split flow out of a List of steps
private static Flow createParallelFlow(List<Step> steps) {
    SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
    // -1 indicates no concurrency limit at all, steps.size is in this case 2 threads, 1 means just 1 thread. 
    taskExecutor.setConcurrencyLimit(steps.size());

    List<Flow> flows = steps.stream() // we have to convert the steps to a flows
        .map(step -> //
                new FlowBuilder<Flow>("flow_" + step.getName()) //
                .start(step) //
                .build()) //
            .collect(Collectors.toList());

    return new FlowBuilder<SimpleFlow>("parallelStepsFlow").split(taskExecutor) //
        .add(flows.toArray(new Flow[flows.size()])) //
        .build();
}

你的工作看起来像这样：

@Bean
public Job myJob() {

   List<Step> steps = new ArrayList<>();
   steps.add(processStep1);
   steps.add(processStep2);

   return jobBuilderFactory.get("yourJobName")               
        .start(createParallelFlow(steps));                
        .end()
        .build();
    }

步骤仅在具有Task Executor的一个线程上运行。

1 个答案: