在Spring Batch中我试图读取一个CSV文件,并希望将每一行分配给一个单独的线程并进行处理。我试图通过使用TaskExecutor来实现它,但是所有线程正在发生的事情是一次选择同一行。我也尝试使用Partioner实现这个概念,也有同样的事情发生。请参阅下面的我的配置Xml。
步骤说明
<step id="Step2">
<tasklet task-executor="taskExecutor">
<chunk reader="reader" processor="processor" writer="writer" commit-interval="1" skip-limit="1">
</chunk>
</tasklet>
</step>
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="file:cvs/user.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<!-- split it -->
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="userid,customerId,ssoId,flag1,flag2" />
</bean>
</property>
<property name="fieldSetMapper">
<!-- map to an object -->
<bean
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="user" />
</bean>
</property>
</bean>
</property>
</bean>
<bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
<property name="concurrencyLimit" value="4"/>
我尝试过不同类型的任务执行器,但所有这些都以相同的方式运行。如何将每一行分配给一个单独的线程?
答案 0 :(得分:6)
FlatFileItemReader不是线程安全的。在您的示例中,您可以尝试将CSV文件拆分为较小的CSV文件,然后使用MultiResourcePartitioner处理其中的每个文件。这可以分两步完成,一个用于拆分原始文件(如10个较小的文件),另一个用于处理拆分文件。这样您就不会有任何问题,因为每个文件将由一个线程处理。
示例:
<batch:job id="csvsplitandprocess">
<batch:step id="step1" next="step2master">
<batch:tasklet>
<batch:chunk reader="largecsvreader" writer="csvwriter" commit-interval="500">
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:step id="step2master">
<partition step="step2" partitioner="partitioner">
<handler grid-size="10" task-executor="taskExecutor"/>
</partition>
</batch:step>
</batch:job>
<batch:step id="step2">
<batch:tasklet>
<batch:chunk reader="smallcsvreader" writer="writer" commit-interval="100">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10" />
<property name="maxPoolSize" value="10" />
</bean>
<bean id="partitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="resources" value="file:cvs/extracted/*.csv" />
</bean>
替代分区的替代方案可能是自定义线程安全的读者,他将为每一行创建一个线程,但可能分区是您的最佳选择
答案 1 :(得分:1)
你的问题是你的读者不在范围内。
这意味着:所有线程共享相同的输入流(资源文件)。
要为每个线程处理一行,您需要:
我写了一些代码,这是输出:
com.test.partitioner.RangePartitioner
类代码:
public Map<String, ExecutionContext> partition() {
Map < String, ExecutionContext > result = new HashMap < String, ExecutionContext >();
int range = 1;
int fromId = 1;
int toId = range;
for (int i = 1; i <= gridSize; i++) {
ExecutionContext value = new ExecutionContext();
log.debug("\nStarting : Thread" + i);
log.debug("fromId : " + fromId);
log.debug("toId : " + toId);
value.putInt("fromId", fromId);
value.putInt("toId", toId);
// give each thread a name, thread 1,2,3
value.putString("name", "Thread" + i);
result.put("partition" + i, value);
fromId = toId + 1;
toId += range;
}
return result;
}
- &GT;查看outPut控制台
开始:Thread1 fromId:1 toId:1
开始:Thread2 fromId:2 toId:2
开始:Thread3 fromId:3 toId:3
开始:Thread4 fromId:4 toId:4
开始:Thread5 fromId:5 toId:5
开始:Thread6 fromId:6 toId:6
开始:Thread7 fromId:7 toId:7
开始:Thread8 fromId:8 toId:8
开始:Thread9 fromId:9 toId:9
开始:Thread10 fromId:10 toId:10
请看下面的配置:
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.2.xsd“&GT;
<import resource="../config/context.xml" />
<import resource="../config/database.xml" />
<bean id="mouvement" class="com.test.model.Mouvement" scope="prototype" />
<bean id="itemProcessor" class="com.test.processor.CustomItemProcessor" scope="step">
<property name="threadName" value="#{stepExecutionContext[name]}" />
</bean>
<bean id="xmlItemWriter" class="com.test.writer.ItemWriter" />
<batch:job id="mouvementImport" xmlns:batch="http://www.springframework.org/schema/batch">
<batch:listeners>
<batch:listener ref="myAppJobExecutionListener" />
</batch:listeners>
<batch:step id="masterStep">
<batch:partition step="slave" partitioner="rangePartitioner">
<batch:handler grid-size="10" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
<bean id="rangePartitioner" class="com.test.partitioner.RangePartitioner" />
<bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" />
<batch:step id="slave">
<batch:tasklet>
<batch:listeners>
<batch:listener ref="stepExecutionListener" />
</batch:listeners>
<batch:chunk reader="mouvementReader" writer="xmlItemWriter" processor="itemProcessor" commit-interval="1">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="stepExecutionListener" class="com.test.listener.step.StepExecutionListenerCtxInjecter" scope="step" />
<bean id="myAppJobExecutionListener" class="com.test.listener.job.MyAppJobExecutionListener" />
<bean id="mouvementReaderParent" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="classpath:XXXXX/XXXXXXXX.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" value="|" />
<property name="names"
value="id,numen,prenom,grade,anneeScolaire,academieOrigin,academieArrivee,codeUsi,specialiteEmploiType,natureSupport,dateEffet,modaliteAffectation" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.test.mapper.MouvementFieldSetMapper" />
</property>
</bean>
</property>
</bean>
<!-- <bean id="itemReader" scope="step" autowire-candidate="false" parent="mouvementReaderParent">-->
<!-- <property name="resource" value="#{stepExecutionContext[fileName]}" />-->
<!-- </bean>-->
<bean id="mouvementReader" class="com.test.reader.MouvementItemReader" scope="step">
<property name="delegate" ref="mouvementReaderParent" />
<property name="parameterValues">
<map>
<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
<entry key="toId" value="#{stepExecutionContext[toId]}" />
</map>
</property>
</bean>
<!-- <bean id="xmlItemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">-->
<!-- <property name="resource" value="file:xml/outputs/Mouvements.xml" />-->
<!-- <property name="marshaller" ref="reportMarshaller" />-->
<!-- <property name="rootTagName" value="Mouvement" />-->
<!-- </bean>-->
<bean id="reportMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="classesToBeBound">
<list>
<value>com.test.model.Mouvement</value>
</list>
</property>
</bean>
TODO:将我的读者更改为其他与位置(开始和结束位置)一样的读者,就像在java中使用Scanner Class一样。
希望得到这个帮助。
答案 2 :(得分:0)
您可以将输入文件拆分为多个文件,使用Partitionner并使用线程加载小文件,但出错时,必须在清除DB后重新启动所有作业。
<batch:job id="transformJob">
<batch:step id="deleteDir" next="cleanDB">
<batch:tasklet ref="fileDeletingTasklet" />
</batch:step>
<batch:step id="cleanDB" next="split">
<batch:tasklet ref="countThreadTasklet" />
</batch:step>
<batch:step id="split" next="partitionerMasterImporter">
<batch:tasklet>
<batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
</batch:tasklet>
</batch:step>
<batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
<partition step="importChunked" partitioner="filePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</batch:step>
完整example code working (on Github)
希望得到这个帮助。