Question

我有一个csv文件或带有分隔符的文件'|'其中有200万条记录。

这是我想要的工作

Job
Step 1: Read records from line number 'r1' to 'r2' from the file (of 2 Million) records and transform them into a data transfer object (I want this DTO to be available in subsequent steps, please suggest a method for this too.)
Step 2: Perform some calculations on it.
Step 3: Perform some transformations on it (or rather create another DTO and use it in the subsequent steps)
Step 4: After the processing is done, I would like to commit the DTO or a set of DTO using Transaction Level DB Persistance.
Step 5(RepeatStep): Repeat this job for some other 'r3', 'r4' (record line numbers) and stop this job when there is no other r3, r4 present.

现在我希望这个作业在多个线程中执行

Job
--> Thread1 - Step1>Step2>Step3>Step4>....>StepN>RepeatStep
--> Thread2 - Step1>Step2>Step3>Step4>....>StepN>RepeatStep
--> Thread3 - Step1>Step2>Step3>Step4>....>StepN>RepeatStep
--> ....
--> ThreadM - Step1>Step2>Step3>Step4>....>StepN>RepeatStep

我想保留'（r2 - r1）＆lt; 1000'和'不。 of Threads，M＆lt; 5'，这就是为什么我最后一步重复，因为我想处理所有200万条记录。

我还希望这些线程不断重复，直到处理完文件中的所有记录。

现在应该有一个单独的类来计算r1，r2，r3，r4，...的值并将它们提供给这个工作，还是批处理它自己做什么？

我知道这个含糊不清，但如果我能得到一些指针或示例代码来研究。我有一些零碎的代码，但我无法将它们合并。

请有人帮忙。

Answer 1

你想要的是基于分块的阅读器 - 处理器 - 写作器组合。你可以使用执行器来诱导并发。

对于自定义阅读需求，请阅读r1或r3但跳过r2或r4则需要为此实现自己的阅读器。

并行螺纹步进和后续多步骤作业的最佳方法，具有重复弹簧批次

1 个答案: