Question

我有一个类似于以下代码：

List<String> ids = expensiveMethod();
List<String> filteredIds = cheapFilterMethod(ids);

if (!filteredIds.isEmpty()) {
    List<SomeEntity> fullEntities = expensiveDatabaseCall(filteredIds);
    List<SomeEntity> filteredFullEntities = anotherCheapFilterFunction(fullEntities);
    if (!filteredFullEntities.isEmpty()) {
        List<AnotherEntity> finalResults = stupidlyExpensiveDatabaseCall(filteredFullEntities);
        relativelyCheapMethod(finalResults);
    }
}

从根本上讲，它是几种昂贵方法的瀑布，这些方法本身都可以从数据库中获取某些内容或过滤以前的数据库结果。这是由于stupidlyExpensiveDatabaseCall需要尽可能少的剩余实体，因此进行了详尽的过滤。

我的问题是其他功能也不都是很便宜，因此它们在stupidlyExpensiveDatabaseCall等待时不执行任何操作而阻塞了几秒钟，直到立即获得全部批处理为止。

我想处理每种方法传入的结果。我知道我可以为每个单独的方法编写一个线程，并在它们之间有一些并发队列，但这是我想避免的样板工作。有更优雅的解决方案吗？

Answer 1

有a post关于并行化的不同方式，不仅是parallelStream（）方式，而且连续的步骤也与您描述的方式并行运行，并通过队列链接。 RxJava在这方面可能适合您的需求。它是java9中较为完整的相当分散的反应式流API的变体。但是我认为，只有将reactive db api与它一起使用时，您才能真正到达那里。

这是RxJava的方式：

public class FlowStream {

@Test
public void flowStream() {
    int items = 10;

    print("\nflow");
    Flowable.range(0, items)
            .map(this::expensiveCall)
            .map(this::expensiveCall)
            .forEach(i -> print("flowed %d", i));

    print("\nparallel flow");
    Flowable.range(0, items)
            .flatMap(v ->
                    Flowable.just(v)
                            .subscribeOn(Schedulers.computation())
                            .map(this::expensiveCall)
            )
            .flatMap(v ->
                    Flowable.just(v)
                            .subscribeOn(Schedulers.computation())
                            .map(this::expensiveCall)
            ).forEach(i -> print("flowed parallel %d", i));

    await(5000);

}

private Integer expensiveCall(Integer i) {
    print("making %d more expensive", i);
    await(Math.round(10f / (Math.abs(i) + 1)) * 50);
    return i;
}

private void await(int i) {
    try {
        Thread.sleep(i);
    } catch (InterruptedException e) {
        throw new RuntimeException(e);
    }
}

private void print(String pattern, Object... values) {
    System.out.println(String.format(pattern, values));
}

}

Maven回购：

   <!-- https://mvnrepository.com/artifact/io.reactivex.rxjava2/rxjava -->
    <dependency>
        <groupId>io.reactivex.rxjava2</groupId>
        <artifactId>rxjava</artifactId>
        <version>2.2.13</version>
    </dependency>

Answer 2

您可以使用CompleteableFuture来划分每个不受CPU限制的步骤。用法类似于javascript promise API。

public void loadEntities() {
    CompletableFuture.supplyAsync(this::expensiveMethod, Executors.newCachedThreadPool())
            .thenApply(this::cheapFilterMethod)
            .thenApplyAsync(this::expensiveDatabaseCall)
            .thenApply(this::anotherCheapFilterFunction)
            .thenApplyAsync(this::stupidlyExpensiveDatabaseCall)
            .thenAccept(this::relativelyCheapMethod);
}

private List<String> expensiveMethod() { ... }
private List<String> cheapFilterMethod(List<String> ids) { ... }
private List<SomeEntity> expensiveDatabaseCall(List<String> ids) { ... }
private List<SomeEntity> anotherCheapFilterFunction(List<SomeEntity> entities) { ... }
private List<AnotherEntity> stupidlyExpensiveDatabaseCall(List<SomeEntity> entities) { ... }
private void relativelyCheapMethod(List<AnotherEntity> entities) { ... }

如果您想更好地控制执行，也可以在每个步骤传递自己的线程池。

Answer 3

您可以使用Java 8 Stream API。因为“结果集”将立即全部出现，因此无法“按其输入”来处理数据库查询。您必须更改方法以处理单个实体。

expensiveMethod().parallelStream()
  .filter(this::cheapFilterMethod)                // Returns Boolean
  .map(this::expensiveDatabaseCallSingle)         // Returns SomeEntity
  .filter(this::anotherCheapFilterFunction)       // Returns boolean for filtered entities
  .map(this::stupidlyExpensiveDatabaseCallSingle) // Returns AnotherEntity
  .forEach(this::relativelyCheapMethod);          // void method

我还建议您使用ExecutorService来管理您的线程，以免您仅创建一堆线程就不会消耗所有资源：

ExecutorService threadPool = Executors.newFixedThreadPool(8);
threadPool.submit(this::methodForParallelStream);

传递来自多层的昂贵方法的结果

3 个答案: