Question

我正在尝试使用Spring Batch + Spring Boot（Java配置）开发批处理过程，但是这样做有问题。我有一个具有数据库和Java API的软件，并且从那里读取记录。批处理应检索所有到期日期小于特定日期的文档，更新日期，然后再次将它们保存在同一数据库中。

我的第一种方法是按100乘100读取记录；因此ItemReader会检索100条记录，然后按1对1进行处理，最后再写一次。在阅读器中，我输入了以下代码：

public class DocumentItemReader implements ItemReader<Document> {

    public List<Document> documents = new ArrayList<>();

    @Override
    public Document read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {

        if(documents.isEmpty()) {
            getDocuments(); // This method retrieve 100 documents and store them in "documents" list.
            if(documents.isEmpty()) return null;
        }

        Document doc = documents.get(0);
        documents.remove(0);
        return doc;
    }
}

因此，使用此代码，阅读器将从数据库中读取数据，直到找不到记录为止。当“ getDocuments（）”方法未检索任何文档时，列表为空，并且阅读器返回null（因此Job完成）。这里一切都很好。

但是，如果我要使用多个线程，则会出现问题。在这种情况下，我开始使用Partitioner方法而不是多线程。这样做的原因是因为我从同一个数据库中读取数据，所以如果我用几个线程重复整个步骤，则所有线程都将找到相同的记录，因此我无法使用分页（见下文）。

另一个问题是数据库记录是动态更新的，因此我不能使用分页。例如，假设我有200条记录，并且所有记录都将很快过期，因此该过程将检索它们。现在想象一下，我使用一个线程检索10，在执行其他任何操作之前，该线程处理一个线程并在同一数据库中对其进行更新。下一个线程无法从11到20条记录中检索，因为第一个记录不会出现在搜索中（因为它已经被处理，它的日期已经更新，所以它与查询不匹配）。

有点难以理解，有些事情听起来很奇怪，但是在我的项目中：

我被迫使用相同的数据库进行读写。
我可以拥有数百万个文档，因此无法同时读取所有记录。我需要以100乘100或500乘500来阅读它们。
我需要使用多个线程。
我无法使用分页，因为对数据库的查询每次执行时都会检索不同的文档。

因此，经过数小时的思考，我认为唯一可能的解决方案是重复执行该作业，直到查询未检索到任何文档为止。这可能吗？我想做类似该步骤的操作：做某事直到返回null为止-重复该作业直到查询返回零记录。

如果这不是一个好的方法，我将感谢其他可能的解决方案。

谢谢。

Answer 1

也许您可以在步骤中添加一个分区程序：

选择所有需要更新的数据ID（如果需要，还可以选择其他列）
将它们拆分为x个（x = gridSize参数）分区，并将其写入临时文件（按分区1个）。
注册要在executionContext中读取的文件名

然后，您的阅读器不再从数据库中读取，而是从分区文件中读取。

看起来很复杂，但还不算很多，下面是一个示例，该示例使用JDBC查询处理数百万条记录，但可以很方便地针对您的用例进行调换：

public class JdbcToFilePartitioner implements Partitioner {

    /** number of records by database fetch  */
    private int fetchSize = 100;

    /** working directory */
    private File tmpDir;

    /** limit the number of item to select */
    private Long nbItemMax;

    @Override
    public Map<String, ExecutionContext> partition(final int gridSize) {

        // Create contexts for each parttion
        Map<String, ExecutionContext> executionsContexte = createExecutionsContext(gridSize);

        // Fill partition with ids to handle
        getIdsAndFillPartitionFiles(executionsContexte);

        return executionsContexte;
    }

    /**
     * @param gridSize number of partitions
     * @return map of execution context, one for each partition
     */
    private Map<String, ExecutionContext> createExecutionsContext(final int gridSize) {

        final Map<String, ExecutionContext> map = new HashMap<>();

        for (int partitionId = 0; partitionId < gridSize; partitionId++) {
            map.put(String.valueOf(partitionId), createContext(partitionId));
        }

        return map;
    }

    /**
     * @param partitionId id of the partition to create context
     * @return created executionContext
     */
    private ExecutionContext createContext(final int partitionId) {

        final ExecutionContext context = new ExecutionContext();

        String fileName = tmpDir + File.separator + "partition_" + partitionId + ".txt";

        context.put(PartitionerConstantes.ID_GRID.getCode(), partitionId);
        context.put(PartitionerConstantes.FILE_NAME.getCode(), fileName);

        if (contextParameters != null) {
            for (Entry<String, Object> entry : contextParameters.entrySet()) {
                context.put(entry.getKey(), entry.getValue());
            }
        }

        return context;
    }

    private void getIdsAndFillPartitionFiles(final Map<String, ExecutionContext> executionsContexte) {

        List<BufferedWriter> fileWriters = new ArrayList<>();
        try {

            // BufferedWriter for each partition
            for (int i = 0; i < executionsContexte.size(); i++) {
                BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(executionsContexte.get(String.valueOf(i)).getString(
                        PartitionerConstantes.FILE_NAME.getCode())));
                fileWriters.add(bufferedWriter);
            }

            // Fetching the datas
            ScrollableResults results = runQuery();

            // Get the result and fill the files
            int currentPartition = 0;
            int nbWriting = 0;
            while (results.next()) {
                fileWriters.get(currentPartition).write(results.get(0).toString());
                fileWriters.get(currentPartition).newLine();
                currentPartition++;
                nbWriting++;

                // If we already write on all partitions, we start again
                if (currentPartition >= executionsContexte.size()) {
                    currentPartition = 0;
                }

                // If we reach the max item to read we stop
                if (nbItemMax != null && nbItemMax != 0 && nbWriting >= nbItemMax) {
                    break;
                }
            }

            // closing
            results.close();
            session.close();
            for (BufferedWriter bufferedWriter : fileWriters) {
                bufferedWriter.close();
            }
        } catch (IOException | SQLException e) {
            throw new UnexpectedJobExecutionException("Error writing partition file", e);
        }
    }

    private ScrollableResults runQuery() {
        ...
    }
}

当Spring Batch具有动态数据时，如何使用Partitioner重复Job？

1 个答案: