这是我的第一个问题。我正在进行弹簧批处理,我正在使用步骤分区来处理70K记录。为了进行测试,我使用了1021条记录,发现每个线程的分区并不相同。我正在使用带有5个线程的JDBCPagingItemReader。分发应该是
主题1 - 205
主题2 - 205
主题3 - 205
主题4 - 205
主题5 - 201
但不幸的是,这种情况并没有发生,而且我在线程中获得了以下记录分布
主题1 - 100
线程2 - 111
主题3 - 100
主题4 - 205
主题5 - 200
分区时会跳过总共716条记录和305条记录。我真的不知道发生了什么。你能看看下面的配置,让我知道我错过了什么吗?在此先感谢您的帮助。
<import resource="../config/batch-context.xml" />
<import resource="../config/database.xml" />
<job id="partitionJob" xmlns="http://www.springframework.org/schema/batch">
<step id="masterStep" parent="abstractPartitionerStagedStep">
<partition step="slave" partitioner="rangePartitioner">
<handler grid-size="5" task-executor="taskExecutor"/>
</partition>
</step>
</job>
<bean id="abstractPartitionerStagedStep" abstract="true">
<property name="listeners">
<list>
<ref bean="updatelistener" />
</list>
</property>
</bean>
<bean id="updatelistener"
class="com.test.springbatch.model.UpdateFileCopyStatus" >
</bean>
<!-- Jobs to run -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<tasklet>
<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="1" retry-limit="0" skip-limit="100">
<skippable-exception-classes>
<include class="java.lang.Exception"/>
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
<bean id="rangePartitioner" class="com.test.springbatch.partition.RangePartitioner">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="5"/>
<property name="maxPoolSize" value="5"/>
<property name="queueCapacity" value="100" />
<property name="allowCoreThreadTimeOut" value="true"/>
<property name="keepAliveSeconds" value="60" />
</bean>
<bean id="itemProcessor" class="com.test.springbatch.processor.CaseProcessor" scope="step">
<property name="threadName" value="#{stepExecutionContext[name]}" />
</bean>
<bean id="pagingItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader"
scope="step">
<property name="dataSource" ref="dataSource" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="selectClause" value="SELECT *" />
<property name="fromClause" value="FROM ( SELECT CASE_NUM ,CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD,
SBMT_OFC_NUM,DSTR_CHNL_CD,APRV_OFC_CD,APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER()
OVER(ORDER BY CASE_NUM) AS rownumber FROM TSMCASE WHERE PROC_IND ='N' ) AS data" />
<property name="whereClause" value="WHERE rownumber BETWEEN :fromRow AND :toRow " />
<property name="sortKey" value="CASE_NUM" />
</bean>
</property>
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<map>
<entry key="fromRow" value="#{stepExecutionContext[fromRow]}" />
<entry key="toRow" value="#{stepExecutionContext[toRow]}" />
</map>
</property>
<property name="pageSize" value="100" />
<property name="rowMapper">
<bean class="com.test.springbatch.model.CaseRowMapper" />
</property>
</bean>
<bean id="flatFileItemWriter" class="com.test.springbatch.writer.FNWriter" scope="step" >
</bean>
这里是分区代码
public class OffRangePartitioner implements Partitioner {
private String officeLst;
private double splitvalue;
private DataSource dataSource;
private static Logger LOGGER = Log4JFactory.getLogger(OffRangePartitioner.class);
private static final int INDENT_LEVEL = 6;
public String getOfficeLst() {
return officeLst;
}
public void setOfficeLst(final String officeLst) {
this.officeLst = officeLst;
}
public void setDataSource(DataSource dataSource) {
this.dataSource = dataSource;
}
public OfficeRangePartitioner() {
super();
final GlobalProperties globalProperties = GlobalProperties.getInstance();
splitvalue = Double.parseDouble(globalProperties.getProperty("springbatch.part.splitvalue"));
}
@Override
public Map<String, ExecutionContext> partition(int threadSize) {
FormattedTraceHelper.formattedTrace(LOGGER,"Partition method in OffRangePartitioner class Start",INDENT_LEVEL, Level.INFO_INT);
final Session currentSession = HibernateUtil.getSessionFactory(HibernateConstants.DB2_DATABASE_NAME).getCurrentSession();
Query queryObj;
double count = 0.0;
final Transaction transaction = currentSession.beginTransaction();
queryObj = currentSession.createQuery(BatchConstants.PARTITION_CNT_QRY);
if (queryObj.iterate().hasNext()) {
count = Double.parseDouble(queryObj.iterate().next().toString());
}
int fromRow = 0;
int toRow = 0;
ExecutionContext context;
FormattedTraceHelper.formattedTrace(LOGGER,"Count of total records submitted for processing >> " + count, INDENT_LEVEL, Level.DEBUG_INT);
int gridSize = (int) Math.ceil(count / splitvalue);
FormattedTraceHelper.formattedTrace(LOGGER,"Total Grid size based on the count >> " + gridSize, INDENT_LEVEL, Level.DEBUG_INT);
Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();
for (int threadCount = 1; threadCount <= gridSize; threadCount++) {
fromRow = toRow + 1;
if (threadCount == gridSize || gridSize == 1) {
toRow = (int) count;
} else {
toRow += splitvalue;
}
context = new ExecutionContext();
context.putInt("fromRow", fromRow);
context.putInt("toRow", toRow);
context.putString("name", "Processing Thread" + threadCount);
result.put("partition" + threadCount, context);
FormattedTraceHelper.formattedTrace(LOGGER, "Partition number >> "
+ threadCount + " from Row#: " + fromRow + " to Row#: "
+ toRow, INDENT_LEVEL, Level.DEBUG_INT);
}
if (transaction != null) {
transaction.commit();
}
FormattedTraceHelper.formattedTrace(LOGGER,
"Partition method in OffRangePartitioner class End",
INDENT_LEVEL, Level.INFO_INT);
return result;
}
}
今天,我已经使用Spring Framework日志调试对1056条记录测试了同一批次。
PAGE SIZE 100
SELECT * FROM (
SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD,
APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, ROW_NUMBER() OVER(ORDER BY CASE_NUM) AS rownumber
FROM TCASE
WHERE **SECARCH_PROC_IND ='P'**
) AS data
WHERE
rownumber BETWEEN :fromRow AND :toRow
ORDER BY
rownumber ASC
FETCH FIRST 100 ROWS ONLY
我们正在更新SECARCH_PROC_IND =&#39; P&#39;标记为&#39; C&#39;一旦处理完每条记录。我们在主查询中使用ROWNUM来根据SECARCH_PROC_IND =&#39; P&#39;来分区记录。并且一旦SECARCH_PROC_IND =&#39; P&#39;任何线程都会将标记更新为“C&#39;
看起来这就是问题所在。
答案 0 :(得分:0)
Spring Batch在查询下面触发以从数据库
获取数据SELECT * FROM ( SELECT CASE_NUM, CASE_STTS_CD, UPDT_TS,SBMT_OFC_CD, SBMT_OFC_NUM, DSTR_CHNL_CD, APRV_OFC_CD, APRV_OFC_NUM,SBMT_TYP_CD, **ROW_NUMBER()** OVER(ORDER BY CASE_NUM) AS rownumber FROM TCASE WHERE **SECARCH_PROC_IND ='P'** ) AS data WHERE rownumber BETWEEN :fromRow AND :toRow ORDER BY rownumber ASC FETCH FIRST 100 ROWS ONLY
处理完每行后,标志 SECARCH_PROC_IND =&#39; P&#39; 更新为 SECARCH_PROC_IND =&#39; C&#39; 。由于在 WHERE 子句中使用了SECARCH_PROC_IND,这实际上减少了弹簧批处理的下一个查询序列中的 ROW_NUMBER 。这是问题的根本原因。
我们在表格中引入了另一列 SECARCH_PROC_TMP_IND ,我们正在使用标记&#39; P&#39;进行批处理前更新。在beforeJob()方法中,我们在查询的 WHERE 子句中使用该列,而不是使用 SECARCH_PROC_IND 列。
批量处理后,在afterJob()中我们将SECARCH_PROC_TMP_IND重新设置为NULL。
这解决了分区问题。