我有如下所示的spring批处理应用程序(对表名和查询进行了一些通用名称的编辑)
当我执行该程序时,它能够读取7500个事件,即块大小的3倍,并且无法读取oracle数据库中的剩余记录。我的表包含5000万条记录,并且能够复制到另一个noSql数据库。
@EnableBatchProcessing
@SpringBootApplication
@EnableAutoConfiguration
public class MultiThreadPagingApp extends DefaultBatchConfigurer{
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Autowired
public DataSource dataSource;
@Bean
public DataSource dataSource() {
final DriverManagerDataSource dataSource = new DriverManagerDataSource();
dataSource.setDriverClassName("oracle.jdbc.OracleDriver");
dataSource.setUrl("jdbc:oracle:thin:@***********");
dataSource.setUsername("user");
dataSource.setPassword("password");
return dataSource;
}
@Override
public void setDataSource(DataSource dataSource) {}
@Bean
@StepScope
ItemReader<UserModel> dbReader() throws Exception {
JdbcPagingItemReader<UserModel> reader = new JdbcPagingItemReader<UserModel>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select * ");
sqlPagingQueryProviderFactoryBean.setFromClause("from user");
sqlPagingQueryProviderFactoryBean.setWhereClause("where id>0");
sqlPagingQueryProviderFactoryBean.setSortKey("name");
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
reader.setDataSource(dataSource);
reader.setPageSize(2500);
reader.setRowMapper(new BeanPropertyRowMapper<>(UserModel.class));
reader.afterPropertiesSet();
reader.setSaveState(true);
System.out.println("Reading users anonymized in chunks of {}"+ 2500);
return reader;
}
@Bean
public Dbwriter writer() {
return new Dbwriter(); // I had another class for this
}
@Bean
public Step step1() throws Exception {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return this.stepBuilderFactory.get("step1")
.<UserModel, UserModel>chunk(2500)
.reader(dbReader())
.writer(writer())
.taskExecutor(taskExecutor)
.build();
}
@Bean
public Job multithreadedJob() throws Exception {
return this.jobBuilderFactory.get("multithreadedJob")
.start(step1())
.build();
}
@Bean
public PlatformTransactionManager getTransactionManager() {
return new ResourcelessTransactionManager();
}
@Bean
public JobRepository getJobRepo() throws Exception {
return new MapJobRepositoryFactoryBean(getTransactionManager()).getObject();
}
public static void main(String[] args) {
SpringApplication.run(MultiThreadPagingApp.class, args);
}
}
您能帮我如何使用Spring Batch高效地读取所有记录,或者可以帮助我使用其他任何方法来处理此问题。我尝试过这里提到的一种方法:http://techdive.in/java/jdbc-handling-huge-resultset 使用单线程应用程序需要120分钟才能读取和保存所有记录。由于春季批处理最适合此操作,因此我认为我们可以尽快处理此情况。
答案 0 :(得分:0)
您正在将saveState
上的afterPropertiesSet
标志设置为true(顺便说一句,应在调用JdbcPagingItemReader
之前将其设置),并在多线程步骤中使用此读取器。但是,it is documented可以在多线程上下文中将此标志设置为false。
使用数据库读取器进行多线程通常不是最好的选择,我建议在您的情况下使用分区。