Question

我有大量的报告加载到块分区步骤中。每个报告将被进一步处理以生成单个报告。但是，如果我在分区步骤中加载了50k的报告，这会使服务器超载，并且速度会变慢。与其相反，我更喜欢分区步骤来加载3k的报告列表，对其进行处理，然后在分区步骤中加载另一个3k的报告..继续相同操作，直到处理了50k的报告。

    <step id="genReport" next="fileTransfer">
        <chunk  item-count="1000">
            <reader ref="Reader" >
            </reader>
            <writer
                ref="Writer" >
            </writer>
        </chunk>
      <partition>
            <mapper ref="Mapper">
                <properties >
                    <property name="threadCount" value="#{jobProperties['threadCount']}"/>
                    <property name="threadNumber" value="#{partitionPlan['threadNumber']}"/>
                </properties>
            </mapper>
      </partition>
    </step>

public PartitionPlan mapPartitions() {
        PartitionPlanImpl partitionPlan = new PartitionPlanImpl();
        int numberOfPartitions = //dao call to load the reports count
        partitionPlan.setThreads(getThreadCount());
        partitionPlan.setPartitions(numberOfPartitions); //This numberOfPartitions is comes from the database, huge size like 20k to 40k
        Properties[] props = new Properties[numberOfPartitions];

        for (int idx = 0; idx < numberOfPartitions; idx++) {
            Properties threadProperties = new Properties();
            threadProperties.setProperty("threadNumber", idx + "");
            GAHReportListData gahRptListData = gahReportListManager.getPageToProcess(); //Data pulled from PriorityBlockingQueue 
            String dynSqlId = gahRptListData.getDynSqlId(); 

            threadProperties.setProperty("sqlId", dynSqlId);
            threadProperties.setProperty("outFile", fileName);

            props[idx] = threadProperties;
        }
        partitionPlan.setPartitionProperties(props);
        return partitionPlan;
    }

一旦3k报告了分区映射器处理的数据，则必须检查下一个可用列表。如果分区可用，则应使用下一组3k报告重置分区。

Answer 1

无法重置分区。完成partitionMapper定义的所有分区后，该步骤结束。您可以进行第二个分区步骤，就像我猜的第一个步骤一样（第三个，第四个），直到您完成所有步骤。太乱了而且，您无法循环回到JSL中并再次执行同一步骤。

您可以同时执行多个步骤的拆分/流程，但不能动态设置流程数量。在JSL中。这样，您最终将获得环境可能可以处理的更多并发性。

我假设您的块读取器/处理器/写入器正在遍历现在分配给该分区的一个SQLid的结果。我想列出一个乌贼清单，您需要一种方法来告诉一个乌贼何时完成，下一个乌贼在同一块循环中开始。读者可能可以管理列表，并且知道何时发生转换。您可能需要向编写者发出一个信号，即块末尾是一个报告的末尾，应该移至下一个报告。您可能希望为此使用自定义检查点算法，因此可以确保在报告末尾建立检查点，而不是希望在每个sqlid都用完要处理的记录时点击检查点。

我将其作为答案而不是其他评论，因为看来这里提出的问题的答案为“否”。剩下的只是关于可能的替代方法的讨论。

重新加载分区步骤以创建另一组线程

1 个答案: