Question

这是我的第一个问题。我正在进行弹簧批处理，我正在使用步骤分区来处理70K记录。为了进行测试，我使用了1021条记录，发现每个线程的分区并不相同。我正在使用带有5个线程的JDBCPagingItemReader。分发应该是

主题1 - 205

主题2 - 205

主题3 - 205

主题4 - 205

主题5 - 201

但不幸的是，这种情况并没有发生，而且我在线程中获得了以下记录分布

主题1 - 100

线程2 - 111

主题3 - 100

主题4 - 205

主题5 - 200

分区时会跳过总共716条记录和305条记录。我真的不知道发生了什么。你能看看下面的配置，让我知道我错过了什么吗？在此先感谢您的帮助。

<import resource="../config/batch-context.xml" />
<import resource="../config/database.xml" />

<job id="partitionJob"  xmlns="http://www.springframework.org/schema/batch">

    <step id="masterStep" parent="abstractPartitionerStagedStep">

        <partition step="slave" partitioner="rangePartitioner">
            <handler grid-size="5" task-executor="taskExecutor"/>
        </partition>

    </step>

</job>
<bean id="abstractPartitionerStagedStep" abstract="true">
    <property name="listeners">
        <list>
            <ref bean="updatelistener" />
        </list>
    </property>

</bean>
<bean id="updatelistener" 
      class="com.test.springbatch.model.UpdateFileCopyStatus" >
</bean>
<!-- Jobs to run -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
    <tasklet>
        <chunk reader="pagingItemReader" writer="flatFileItemWriter"
            processor="itemProcessor" commit-interval="1" retry-limit="0" skip-limit="100">
        <skippable-exception-classes>
            <include class="java.lang.Exception"/>
        </skippable-exception-classes>
        </chunk>    
    </tasklet>
</step>

<bean id="rangePartitioner" class="com.test.springbatch.partition.RangePartitioner"> 
    <property name="dataSource" ref="dataSource" />
</bean>

<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="5"/>
<property name="maxPoolSize" value="5"/>
<property name="queueCapacity" value="100" />
<property name="allowCoreThreadTimeOut" value="true"/>
<property name="keepAliveSeconds" value="60" /> 
</bean>

<bean id="itemProcessor" class="com.test.springbatch.processor.CaseProcessor" scope="step">
    <property name="threadName" value="#{stepExecutionContext[name]}" />
</bean>

<bean id="pagingItemReader"
    class="org.springframework.batch.item.database.JdbcPagingItemReader"
    scope="step">
    <property name="dataSource" ref="dataSource" />
    <property name="queryProvider">
        <bean
            class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
            <property name="dataSource" ref="dataSource" />
            <property name="selectClause" value="SELECT *" />
            <property name="fromClause" value="FROM ( SELECT CASE_NUM ,CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD,
                        SBMT_OFC_NUM,DSTR_CHNL_CD,APRV_OFC_CD,APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER() 
                        OVER(ORDER BY CASE_NUM) AS rownumber FROM TSMCASE WHERE PROC_IND ='N' ) AS data" />
            <property name="whereClause" value="WHERE rownumber BETWEEN :fromRow AND :toRow " />
            <property name="sortKey" value="CASE_NUM" />
        </bean>
    </property>
    <!--  Inject via the ExecutionContext in rangePartitioner  -->
    <property name="parameterValues">
        <map>
            <entry key="fromRow" value="#{stepExecutionContext[fromRow]}" />
            <entry key="toRow" value="#{stepExecutionContext[toRow]}" />
        </map>
    </property>
    <property name="pageSize" value="100" /> 
    <property name="rowMapper">
        <bean class="com.test.springbatch.model.CaseRowMapper" />
    </property>
</bean>

<bean id="flatFileItemWriter" class="com.test.springbatch.writer.FNWriter" scope="step" >
</bean>

这里是分区代码

public class OffRangePartitioner implements Partitioner {

private String officeLst;
private double splitvalue;
private DataSource dataSource;
private static Logger LOGGER = Log4JFactory.getLogger(OffRangePartitioner.class);
private static final int INDENT_LEVEL = 6;

public String getOfficeLst() {
    return officeLst;
}

public void setOfficeLst(final String officeLst) {
    this.officeLst = officeLst;
}

public void setDataSource(DataSource dataSource) {
    this.dataSource = dataSource;
}

public OfficeRangePartitioner() {
    super();
    final GlobalProperties globalProperties = GlobalProperties.getInstance();
    splitvalue = Double.parseDouble(globalProperties.getProperty("springbatch.part.splitvalue"));
}

@Override
public Map<String, ExecutionContext> partition(int threadSize) {
    FormattedTraceHelper.formattedTrace(LOGGER,"Partition method in OffRangePartitioner class Start",INDENT_LEVEL, Level.INFO_INT);
    final Session currentSession = HibernateUtil.getSessionFactory(HibernateConstants.DB2_DATABASE_NAME).getCurrentSession();

    Query queryObj;
    double count = 0.0;

    final Transaction transaction = currentSession.beginTransaction();
    queryObj = currentSession.createQuery(BatchConstants.PARTITION_CNT_QRY);

    if (queryObj.iterate().hasNext()) {
        count = Double.parseDouble(queryObj.iterate().next().toString());
    }

    int fromRow = 0;
    int toRow = 0;
    ExecutionContext context;

    FormattedTraceHelper.formattedTrace(LOGGER,"Count of total records submitted for processing >> " + count, INDENT_LEVEL, Level.DEBUG_INT);
    int gridSize = (int) Math.ceil(count / splitvalue);
    FormattedTraceHelper.formattedTrace(LOGGER,"Total Grid size based on the count >> " + gridSize, INDENT_LEVEL, Level.DEBUG_INT);
    Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();

    for (int threadCount = 1; threadCount <= gridSize; threadCount++) {
        fromRow = toRow + 1;
        if (threadCount == gridSize || gridSize == 1) {
            toRow = (int) count;
        } else {
            toRow += splitvalue;
        }
        context = new ExecutionContext();
        context.putInt("fromRow", fromRow);
        context.putInt("toRow", toRow);
        context.putString("name", "Processing Thread" + threadCount);
        result.put("partition" + threadCount, context);
        FormattedTraceHelper.formattedTrace(LOGGER, "Partition number >> "
                + threadCount + " from Row#: " + fromRow + " to Row#: "
                + toRow, INDENT_LEVEL, Level.DEBUG_INT);

    }
    if (transaction != null) {
        transaction.commit();
    }
    FormattedTraceHelper.formattedTrace(LOGGER,
            "Partition method in OffRangePartitioner class End",
            INDENT_LEVEL, Level.INFO_INT);
    return result;
}

}

今天，我已经使用Spring Framework日志调试对1056条记录测试了同一批次。

PAGE SIZE 100

SELECT * FROM (
        SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD, 
            APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER() OVER(ORDER BY CASE_NUM) AS rownumber 
        FROM TCASE 
        WHERE **SECARCH_PROC_IND ='P'**
    ) AS data 
WHERE 
    rownumber BETWEEN :fromRow AND :toRow 
ORDER BY 
    rownumber ASC 
FETCH FIRST 100 ROWS ONLY

我们正在更新SECARCH_PROC_IND =＆＃39; P＆＃39;标记为＆＃39; C＆＃39;一旦处理完每条记录。我们在主查询中使用ROWNUM来根据SECARCH_PROC_IND =＆＃39; P＆＃39;来分区记录。并且一旦SECARCH_PROC_IND =＆＃39; P＆＃39;任何线程都会将标记更新为“C＆＃39;

看起来这就是问题所在。

Answer 1

Spring Batch在查询下面触发以从数据库

获取数据

SELECT * FROM ( SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD, APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, **ROW_NUMBER()** OVER(ORDER BY CASE_NUM) AS rownumber FROM TCASE WHERE **SECARCH_PROC_IND ='P'** ) AS data WHERE rownumber BETWEEN :fromRow AND :toRow ORDER BY rownumber ASC FETCH FIRST 100 ROWS ONLY

处理完每行后，标志 SECARCH_PROC_IND =＆＃39; P＆＃39; 更新为 SECARCH_PROC_IND =＆＃39; C＆＃39; 。由于在 WHERE 子句中使用了SECARCH_PROC_IND，这实际上减少了弹簧批处理的下一个查询序列中的 ROW_NUMBER 。这是问题的根本原因。

我们在表格中引入了另一列 SECARCH_PROC_TMP_IND ，我们正在使用标记＆＃39; P＆＃39;进行批处理前更新。在beforeJob（）方法中，我们在查询的 WHERE 子句中使用该列，而不是使用 SECARCH_PROC_IND 列。

批量处理后，在afterJob（）中我们将SECARCH_PROC_TMP_IND重新设置为NULL。

这解决了分区问题。

Spring Batch JDBCPagingItemReader不为每个线程平均分区

1 个答案: