我每天都有一份特定时间的工作。它有两个步骤,每个步骤都在reader()和processor()部分中设置了一个rest调用。资源是存储在MySQL DB中的帐号。因此,Spring Batch作业运行正常,我们获得预期的输出。但只能在一个线程上运行。我尝试将其并行化,通过文档和一些示例,并在一段时间后使用了这个特定的example。这是我在java中的作业配置代码。
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setMaxPoolSize(60);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
public Job processJob(BatchListener listener) {
return jobBuilderFactory.get("Job").incrementer(new RunIdIncrementer()).listener(listener)
.flow(processStep1()).on("*").to(Step2()).end().build();
}
@Bean
public Step processStep1() {
return (Step) stepBuilderFactory.get("Step1")
.<response, response>chunk(3).
reader(getItemReader()).
processor(getItemProcess()).
writer(getItemWriter()).
taskExecutor(taskExecutor()).
throttleLimit(2).
build();
}
@Bean
public Step processStep2() {
SimpleStepBuilder<AccountResponse,batch_details> process = stepBuilderFactory.get("processStep2")
.<AccountResponse,batch_details>chunk(5).reader(getBatchReader()).processor(getBatchProcessor());
return process.writer(getBatchWriter()).build();
}
即使配置了任务执行程序,此配置也只在一个线程上运行。有人可以帮我解决我在错误或缺少的问题,以便在不同的线程中运行它。我想并行化step1和step2,数据并发不是问题。如果我将第1步并行化,我将复制第2步。谢谢。
示例输出:
Thread # 37 is doing this task
Hibernate: Select * from batch_details where status != 'complete' and session_id = '' and status != 'in_solve' ORDER BY RAND() LIMIT 3
Hibernate: update batch_details set status = 'in_cs' where account_id= ?
Hibernate: update batch_details set session_completion_time=?, session_id=?, status=? where account_id=?
accountnumber1
Thread # 37 is doing this task
Hibernate: Select * from batch_details where status != 'complete' and session_id = '' and status != 'in_solve' ORDER BY RAND() LIMIT 3
Hibernate: update batch_details set status = 'in_cs' where account_id= ?
accountnumber2
另一个问题是,如果我更改块大小,读取器会重复块大小,但是在同一个线程中。我无法理解这个阶段的意义,如果你也可以解释为什么会这样,非常感谢。
答案 0 :(得分:1)
您的工作不包含并行流程。目前,它只是在完成步骤1后依次执行步骤1和步骤2。
在this question中,Hansjoerg Wingeier提供了一种与一组辅助方法并行执行步骤的好方法:
// helper method to create a split flow out of a List of steps
private static Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// -1 indicates no concurrency limit at all, steps.size is in this case 2 threads, 1 means just 1 thread.
taskExecutor.setConcurrencyLimit(steps.size());
List<Flow> flows = steps.stream() // we have to convert the steps to a flows
.map(step -> //
new FlowBuilder<Flow>("flow_" + step.getName()) //
.start(step) //
.build()) //
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow").split(taskExecutor) //
.add(flows.toArray(new Flow[flows.size()])) //
.build();
}
你的工作看起来像这样:
@Bean
public Job myJob() {
List<Step> steps = new ArrayList<>();
steps.add(processStep1);
steps.add(processStep2);
return jobBuilderFactory.get("yourJobName")
.start(createParallelFlow(steps));
.end()
.build();
}