Question

我试图优化我的Future管理技巧。

假设我们有这种典型的处理方案：我运行一个查询来从数据库中获取一些记录：

SELECT * FROM mytable WHERE mycondition;

此查询返回了我需要处理的许多行，例如：

while (recordset have more results) {
    MyRow row = recordset.getNextRow(); // Get the next row
    processRow(row);                    // Process the row
}

现在假设所有行彼此独立，函数processRow很慢，因为它在C *集群上执行一些硬处理和查询：

void processRow(MyRow row) {
    // Fetch some useful data from the DB
    int metadataid = row.getMetadataID();
    Metadata metadata = getMetadataFromCassandra(metadataid);

    // .... perform more processing on the row .....

    // Store the processing result in the DB
    ProcessingResult result = ....;
    insertProcessingResultIntoCassandra(result);
}

这样的串行方法预计会表现不佳，因此并行执行是有争议的。

考虑到这个基本的处理结构，我在算法上进行了一些转换以获得主要的速度升级。

第1步：并行化处理

这非常简单。我创建了一个Executor，可以并行完成作业。然后我等待所有工作完成。代码如下：

ThreadPoolExecutor executor = (ThreadPoolExecutor)Executors.newCachedThreadPool();
int failedJobs = 0;
ArrayList<Future<Boolean>> futures = new ArrayList<>();
while (recordset have more results) {
    final MyRow row = recordset.getNextRow(); // Get the next row

    // Create the async job and send it to the executor
    Callable<Boolean> c = new Callable<Boolean>() {
            @Override
            public Boolean call() {
                try {
                    processRow(row);
                } catch (Exception e) {
                    return false; // Job failed
                }
                return true; // Job is OK
            }
    };
    futures.add(executor.submit(c));
}

// All jobs submitted. Wait for the completion.
while (futures.size() > 0) {
    Future<Boolean> future = futures.remove(0);
    Boolean result = false;
    try {
        result = future.get();
    } catch (Exception e) {
        e.printStackTrace();
    }
    failedJobs += (result ? 0 : 1);
}

第2步：限制并发行数

到目前为止一切顺利，除非我的作业数量很少，否则这会因内存不足而失败，因为执行程序由未绑定队列支持，主循环会一路提交工作。我可以通过控制并发提交的最大作业数来解决这个问题：

final const int MAX_JOBS = 1000;
while (recordset have more results) {
    ....
    futures.add(executor.submit(c));
    while (futures.size() >= MAX_JOBS) {
        Future<Boolean> future = futures.remove(0);
        Boolean result = false;
        try {
            result = future.get();
        } catch (Exception e) {
            e.printStackTrace();
        }
        failedJobs += (result ? 0 : 1);
    }
}

简单地说，如果我们达到某个阈值（在这种情况下为1000），我等待列表的第一个工作完成。这有效，这是一个很好的加速。

第3步：并行化单行处理

这是我希望获得一些帮助的一步。由于IO的缓慢，我预计 1000 作业将在队列中快速累积。也就是说，我希望JVM能够激活 1000 线程以容纳所有作业。现在，当你只有一台8核机器时，1000个线程通常会减慢一切，我认为通过更加调整的并行性，这个数字可以降低。

目前，getMetadataFromCassandra函数是session.executeAsync的包装，但管理重试：

public static ResultSet getMetadataFromCassandra(...) {
    int retries = 0;

    // Loop here
    while (retries < MAX_RETRIES) {
        // Execute the query
        ResultSetFuture future = session.executeAsync(statement);
        try {
            // Try to get the result
            return future.get(1000 * (int)Math.pow(2, retries), TimeUnit.MILLISECONDS);
        } catch (Exception e) {
            // Ooops. An error occurred. Cancel the future and schedule it again
            future.cancel(true);
            if (retries == MAX_RETRIES) {
                e.printStackTrace();

                String stackTrace = Throwables.getStackTraceAsString(e);
                logToFile("Failed to execute query. Stack trace: " + stackTrace);
            }

            retries++;
        }
    }

    return null;
}

正如您所看到的，这是一个阻止函数，因为.get()上的ResultSetFuture。也就是说，此调用将阻止等待IO的每个线程。所以我得到了异步方法，但我觉得我浪费了很多硬件资源。

问题

在我看来，当.executeAsync结果可用（或发生超时）时，我应该会收到通知，＆＃34;释放＆＃34;线程并允许相同的线程执行其他事情。

简单地说，在我看来，我需要将processRow的顺序结构转换为管道：查询是以异步方式执行，并且当结果可用时，执行剩余的处理部分。当然，我希望主循环等待整个流水线进程完成，而不仅仅是第一部分。

换句话说，主循环提交作业（让我们称之为jobJob），然后我得到Future（让我们调用jobFuture）我可以.get()等待它的完成。但是，jobJob会触发＆＃34;查询＆＃34;子作业（让我们称之为queryJob），queryJob提交异步，所以我得到另一个Future（让我们称之为queryFuture ）应该用来解雇＆＃34;过程＆＃34;子工作（让我们来电话processJob）。此时，我只是在完成代表Futures的{{1}}之前简单地嵌套Future并在链中深处阻塞，这意味着我回到原点！！

在我走硬路线并将这种管道实现为有限状态机之前，我看了一下：

jobJob执行者类

ForkJoinPool

ListenableFuture
Guava class

它们似乎都不能满足我对流程管理的要求，或者我可能没有找到关于如何执行这种明显简单任务的明确解释。任何人都可以简单地启发我这个话题吗？

非常感谢任何帮助。

Java Futures Pipelining

0 个答案: