春季批处理JdbcPagingItemReader无法读取所有事件

时间:2019-01-04 19:50:35

标签: spring spring-boot spring-integration spring-batch

我有如下所示的spring批处理应用程序(对表名和查询进行了一些通用名称的编辑)

当我执行该程序时,它能够读取7500个事件,即块大小的3倍,并且无法读取oracle数据库中的剩余记录。我的表包含5000万条记录,并且能够复制到另一个noSql数据库。

@EnableBatchProcessing
@SpringBootApplication
@EnableAutoConfiguration
public class MultiThreadPagingApp extends DefaultBatchConfigurer{

@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private StepBuilderFactory stepBuilderFactory;

@Autowired
public DataSource dataSource;

@Bean
public DataSource dataSource() {
    final DriverManagerDataSource dataSource = new DriverManagerDataSource();
    dataSource.setDriverClassName("oracle.jdbc.OracleDriver");
    dataSource.setUrl("jdbc:oracle:thin:@***********");
    dataSource.setUsername("user");
    dataSource.setPassword("password");

    return dataSource;
}


@Override
public void setDataSource(DataSource dataSource) {}

@Bean
@StepScope
ItemReader<UserModel> dbReader() throws Exception {

    JdbcPagingItemReader<UserModel> reader = new JdbcPagingItemReader<UserModel>();
    final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();        
    sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
    sqlPagingQueryProviderFactoryBean.setSelectClause("select * ");
    sqlPagingQueryProviderFactoryBean.setFromClause("from user");
    sqlPagingQueryProviderFactoryBean.setWhereClause("where id>0");
    sqlPagingQueryProviderFactoryBean.setSortKey("name");
    reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
    reader.setDataSource(dataSource);
    reader.setPageSize(2500);       
    reader.setRowMapper(new BeanPropertyRowMapper<>(UserModel.class));
    reader.afterPropertiesSet();
    reader.setSaveState(true);
    System.out.println("Reading users anonymized in chunks of {}"+ 2500);
    return reader;
}


@Bean
public Dbwriter writer() {
    return new Dbwriter(); // I had another class for this
}   

@Bean
public Step step1() throws Exception {
    ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
    taskExecutor.setCorePoolSize(4);
    taskExecutor.setMaxPoolSize(10);
    taskExecutor.afterPropertiesSet();

    return this.stepBuilderFactory.get("step1")
            .<UserModel, UserModel>chunk(2500)
            .reader(dbReader())
            .writer(writer())
            .taskExecutor(taskExecutor)
            .build();
}


@Bean
public Job multithreadedJob() throws Exception {
    return this.jobBuilderFactory.get("multithreadedJob")
            .start(step1())
            .build();
} 


@Bean
public PlatformTransactionManager getTransactionManager() {
    return new ResourcelessTransactionManager();
}

@Bean
public JobRepository getJobRepo() throws Exception {
    return new MapJobRepositoryFactoryBean(getTransactionManager()).getObject();
}

public static void main(String[] args) {
    SpringApplication.run(MultiThreadPagingApp.class, args);
}

}

您能帮我如何使用Spring Batch高效地读取所有记录,或者可以帮助我使用其他任何方法来处理此问题。我尝试过这里提到的一种方法:http://techdive.in/java/jdbc-handling-huge-resultset 使用单线程应用程序需要120分钟才能读取和保存所有记录。由于春季批处理最适合此操作,因此我认为我们可以尽快处理此情况。

1 个答案:

答案 0 :(得分:0)

您正在将saveState上的afterPropertiesSet标志设置为true(顺便说一句,应在调用JdbcPagingItemReader之前将其设置),并在多线程步骤中使用此读取器。但是,it is documented可以在多线程上下文中将此标志设置为false。

使用数据库读取器进行多线程通常不是最好的选择,我建议在您的情况下使用分区。