找到findAny匹配后如何停止并行流?

时间:2018-09-19 18:18:26

标签: java parallel-processing java-stream completable-future

我正在尝试查找与给定谓词匹配的列表的第一个(任意)成员,如下所示:

./common/Button

我希望Item item = items.parallelStream() .map(i -> i.doSomethingExpensive()) .filter(predicate) .findAny() .orElse(null); 一旦匹配,它会立即返回,但事实并非如此。相反,它似乎等待map方法对大多数元素完成后才返回。如何立即返回第一个结果并取消其他并行流?有没有比使用findAny()之类的流更好的方法了?

这是一个简单的例子来展示行为:

CompletableFuture

日志输出:

private static void log(String msg) {
    private static void log(String msg) {
    SimpleDateFormat sdf = new SimpleDateFormat("HH:mm:ss.SSS");
    System.out.println(sdf.format(new Date()) + " " + msg);
}

Random random = new Random();
List<Integer> nums = Arrays.asList(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14);
Optional<Integer> num = nums.parallelStream()
  .map(n -> {
    long delay = Math.abs(random.nextLong()) % 10000;
    log("Waiting on " + n + " for " + delay + " ms");
    try { Thread.sleep(delay); }
    catch (InterruptedException e) { System.err.println("Interruption error"); }
    return n * n;
  })
  .filter(n -> n < 30)
  .peek(n -> log("Found match: " + n))
  .findAny();

log("First match: " + num);

一旦找到匹配项(在本例中为16),14:52:27.061 Waiting on 9 for 2271 ms 14:52:27.061 Waiting on 2 for 1124 ms 14:52:27.061 Waiting on 13 for 547 ms 14:52:27.061 Waiting on 4 for 517 ms 14:52:27.061 Waiting on 1 for 1210 ms 14:52:27.061 Waiting on 6 for 2646 ms 14:52:27.061 Waiting on 0 for 4393 ms 14:52:27.061 Waiting on 12 for 5520 ms 14:52:27.581 Found match: 16 14:52:27.582 Waiting on 3 for 5365 ms 14:52:28.188 Found match: 4 14:52:28.275 Found match: 1 14:52:31.457 Found match: 0 14:52:32.950 Found match: 9 14:52:32.951 First match: Optional[0] 不会立即返回,而是阻塞直到其余线程结束。在这种情况下,呼叫者要等待另外5秒钟才能找到匹配的内容再返回。

3 个答案:

答案 0 :(得分:1)

您可以使用以下代码来说明parallelStream的工作方式:

final List<String> list = Arrays.asList("first", "second", "third", "4th", "5th", "7th", "8th", "9th", "10th", "11th", "12th", "13th");

    String result = list.parallelStream()
                        .map(s -> {
                            System.out.println("map: " + s);
                            return s;
                        })
                        .filter(s -> {
                            System.out.println("fiter: " + s);
                            return s.equals("8th");
                        })
                        .findFirst()
                        .orElse(null);

    System.out.println("result=" + result);

有两种方法可以实现所需的功能,以停止使用过滤器的昂贵操作:

  1. 完全不使用流,使用简单的或增强的
  2. 首先过滤,然后通过昂贵的操作进行映射

答案 1 :(得分:1)

  

相反,它似乎在返回大多数元素之前等待map方法完成。

这是不正确的。

当谈到已经被处理的元素时,它将等待所有元素的完成,因为Stream API允许并发处理本质上不是线程安全的数据结构。在从终端操作返回之前,必须确保所有潜在的并发访问均已完成。

在谈论整个流时,在8核计算机上仅测试14个元素的流根本不公平。当然,至少要启动8个并发操作,这就是全部内容。您正在使用findFirst()而不是findAny()向火焰中添加燃料,因为这并不意味着按处理顺序返回第一个找到的元素,而是按遇到顺序返回第一个元素,即您的正好为零例如,因此与第一个块不同的线程不能假设其结果是正确的答案,并且比起findAny()来说,更愿意帮助处理其他候选对象。

使用时

List<Integer> nums = IntStream.range(0, 200).boxed().collect(Collectors.toList());
Optional<Integer> num = nums.parallelStream()
        .map(n -> {
            long delay = ThreadLocalRandom.current().nextInt(10_000);
            log("Waiting on " + n + " for " + delay + " ms");
            LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(delay));
            return n * n;
        })
        .filter(n -> n < 40_000)
        .peek(n -> log("Found match: " + n))
        .findAny();

log("First match: " + num);

尽管流元素的数量要多得多,但您将获得差不多数量的任务运行完成。

请注意,CompletableFuture还不支持中断,因此我想到的唯一返回内置结果并取消其他作业的内置功能就是旧的ExecutorService.invokeAny

要为其构建映射和过滤功能,我们可以使用以下帮助器功能:

static <T,R> Callable<R> mapAndfilter(T t, Function<T,R> f, Predicate<? super R> p) {
    return () -> {
        R r = f.apply(t);
        if(!p.test(r)) throw new NoSuchElementException();
        return r;
    };
}

不幸的是,只有一个值或一个例外值可以完成,因此我们必须对不匹配的元素使用例外。

然后我们可以像使用它

ExecutorService es = ForkJoinPool.commonPool();
Integer result = es.invokeAny(IntStream.range(0, 100)
    .mapToObj(i -> mapAndfilter(i,
        n -> {
            long delay = ThreadLocalRandom.current().nextInt(10_000);
            log("Waiting on " + n + " for " + delay + " ms");
            LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(delay));
            return n * n;
        },
        n -> n < 10_000))
    .collect(Collectors.toList()));

log("result: "+result);

它不仅会取消待处理的任务,还会返回而不会等待它们完成。

当然,这意味着源数据及其上的作业必须是不可变的或线程安全的。

答案 2 :(得分:0)

这里有几件事在起作用。第一件事是parallelStream()默认使用公共ForkJoinPool,这也使调用线程也参与了。这意味着,如果调用线程上当前正在运行一个较慢的任务,则它必须在调用者获得控制权之前完成。

您可以通过稍微修改代码以记录线程名称并在完成等待时记录日志来查看此信息:

private static void log(String msg) {
    SimpleDateFormat sdf = new SimpleDateFormat("HH:mm:ss.SSS");
    System.out.println(sdf.format(new Date()) + " [" + Thread.currentThread().getName() + "] " + " " + msg);
}

public static void main(String[] args) {
    Random random = new Random();
    List<Integer> nums = Arrays.asList(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14);
    Optional<Integer> num = nums.parallelStream()
            .map(n -> {
                long delay = Math.abs(random.nextLong()) % 10000;
                log("Waiting on " + n + " for " + delay + " ms");
                try {
                    Thread.sleep(delay);
                } catch (InterruptedException e) {
                    System.err.println("Interruption error");
                }
                log("finished waiting");
                return n * n;
            })
            .filter(n -> n < 30)
            .peek(n -> log("Found match: " + n))
            .findAny();

    log("First match: " + num);
}

示例输出:

13:56:52.954 [main]  Waiting on 9 for 9936 ms
13:56:52.956 [ForkJoinPool.commonPool-worker-1]  Waiting on 4 for 7436 ms
13:56:52.970 [ForkJoinPool.commonPool-worker-2]  Waiting on 1 for 6523 ms
13:56:52.983 [ForkJoinPool.commonPool-worker-3]  Waiting on 6 for 7488 ms
13:56:59.494 [ForkJoinPool.commonPool-worker-2]  finished waiting
13:56:59.496 [ForkJoinPool.commonPool-worker-2]  Found match: 1
13:57:00.392 [ForkJoinPool.commonPool-worker-1]  finished waiting
13:57:00.392 [ForkJoinPool.commonPool-worker-1]  Found match: 16
13:57:00.471 [ForkJoinPool.commonPool-worker-3]  finished waiting
13:57:02.892 [main]  finished waiting
13:57:02.894 [main]  First match: Optional[1]

如您所见,这里找到2个匹配项,但是主线程仍然很忙,因此它现在无法返回匹配项。

尽管如此,这并不总是能解释所有情况:

13:58:52.116 [main]  Waiting on 9 for 5256 ms
13:58:52.143 [ForkJoinPool.commonPool-worker-1]  Waiting on 4 for 4220 ms
13:58:52.148 [ForkJoinPool.commonPool-worker-2]  Waiting on 1 for 2136 ms
13:58:52.158 [ForkJoinPool.commonPool-worker-3]  Waiting on 6 for 7262 ms
13:58:54.294 [ForkJoinPool.commonPool-worker-2]  finished waiting
13:58:54.295 [ForkJoinPool.commonPool-worker-2]  Found match: 1
13:58:56.364 [ForkJoinPool.commonPool-worker-1]  finished waiting
13:58:56.364 [ForkJoinPool.commonPool-worker-1]  Found match: 16
13:58:57.399 [main]  finished waiting
13:58:59.422 [ForkJoinPool.commonPool-worker-3]  finished waiting
13:58:59.424 [main]  First match: Optional[1]

这可以通过fork-join池合并结果的方式来解释。似乎有些改进是可能的。

作为替代方案,您确实可以使用CompletableFuture

// you should probably also pass your own executor to supplyAsync()
List<CompletableFuture<Integer>> futures = nums.stream().map(n -> CompletableFuture.supplyAsync(() -> {
    long delay = Math.abs(random.nextLong()) % 10000;
    log("Waiting on " + n + " for " + delay + " ms");
    try {
        Thread.sleep(delay);
    } catch (InterruptedException e) {
        System.err.println("Interruption error");
    }
    log("finished waiting");
    return n * n;
})).collect(Collectors.toList());
CompletableFuture<Integer> result = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
        .thenApply(unused -> futures.stream().map(CompletableFuture::join).filter(n -> n < 30).findAny().orElse(null));
// shortcircuiting
futures.forEach(f -> f.thenAccept(r -> {
    if (r < 30) {
        log("Found match: " + r);
        result.complete(r);
    }
}));
// cancelling remaining tasks
result.whenComplete((r, t) -> futures.forEach(f -> f.cancel(true)));

log("First match: " + result.join());

输出:

14:57:39.815 [ForkJoinPool.commonPool-worker-1]  Waiting on 0 for 7964 ms
14:57:39.815 [ForkJoinPool.commonPool-worker-3]  Waiting on 2 for 5743 ms
14:57:39.817 [ForkJoinPool.commonPool-worker-2]  Waiting on 1 for 9179 ms
14:57:45.562 [ForkJoinPool.commonPool-worker-3]  finished waiting
14:57:45.563 [ForkJoinPool.commonPool-worker-3]  Found match: 4
14:57:45.564 [ForkJoinPool.commonPool-worker-3]  Waiting on 3 for 7320 ms
14:57:45.566 [main]  First match: 4

请注意,cancel(true)实际上并没有取消正在进行的任务(例如不会发生中断),但是它阻止了其他任务的运行(您甚至可以看到它可能不是立即执行的,因为工作人员3仍然开始执行下一个)。

您还应该使用自己的执行器,并根据CPU或I / O的使用量来确定其大小。如您所见,默认值使用公共池,因此它不使用所有内核。

如果找不到匹配项,则主要需要allOf()。如果可以保证至少有一个匹配项,则可以简单地使用`new CompletableFuture()。

最后,作为一种简单的方法,我重复了filter检查,但是很容易将该逻辑移到主逻辑中,返回null或标记,然后在两个地方都进行测试。

另请参阅How to make a future that gets completed when any of the given CompletableFutures is completed with a result that matches a certain predicate?