我正在使用parallelStream并行执行一些文件上传,有些是大文件,有些是小文件。我注意到并非所有工人都在使用。
首先,一切运行正常,所有线程都被使用(我将parallelism选项设置为16)。然后在某个时候(一旦它到达更大的文件),它仅使用一个线程
简化代码:
files.parallelStream().forEach((file) -> {
try (FileInputStream fileInputStream = new FileInputStream(file)) {
IDocumentStorageAdaptor uploader = null;
try {
logger.debug("Adaptors before taking: " + uploaderPool.size());
uploader = uploaderPool.take();
logger.debug("Took an adaptor!");
logger.debug("Adaptors after taking: " + uploaderPool.size());
uploader.addNewFile(file);
} finally {
if (uploader != null) {
logger.debug("Adding one back!");
uploaderPool.put(uploader);
logger.debug("Adaptors after putting: " + uploaderPool.size());
}
}
} catch (InterruptedException | IOException e) {
throw new UploadException(e);
}
});
uploaderPool是一个ArrayBlockingQueue。 日志:
[ForkJoinPool.commonPool-worker-8] - Adaptors before taking: 0
[ForkJoinPool.commonPool-worker-15] - Adding one back!
[ForkJoinPool.commonPool-worker-8] - Took an adaptor!
[ForkJoinPool.commonPool-worker-15] - Adaptors after putting: 0
...
...
...
[ForkJoinPool.commonPool-worker-10] - Adding one back!
[ForkJoinPool.commonPool-worker-10] - Adaptors after putting: 16
[ForkJoinPool.commonPool-worker-10] - Adaptors before taking: 16
[ForkJoinPool.commonPool-worker-10] - Took an adaptor!
[ForkJoinPool.commonPool-worker-10] - Adaptors after taking: 15
[ForkJoinPool.commonPool-worker-10] - Adding one back!
[ForkJoinPool.commonPool-worker-10] - Adaptors after putting: 16
[ForkJoinPool.commonPool-worker-10] - Adaptors before taking: 16
[ForkJoinPool.commonPool-worker-10] - Took an adaptor!
[ForkJoinPool.commonPool-worker-10] - Adaptors after taking: 15
似乎所有工作(列表中的项目)都分布在16个线程中,委派给一个线程的事情只会等到线程可以自由工作而不是使用可用线程为止。有没有一种方法可以更改parallelStream的工作队列方式?我阅读了forkjoinpool文档,其中提到了工作窃取,但仅适用于生成的子任务。
我的另一个计划是也许对我正在使用parallelStream的列表的排序进行随机化,也许这样可以使事情保持平衡。
谢谢!
答案 0 :(得分:4)
针对并行流的split-vs-compute启发式方法已针对数据并行操作而不是针对IO并行操作进行了调整。 (换句话说,它们经过调整以使CPU保持忙碌,但不会产生比您拥有的CPU更多的任务。)因此,它们倾向于计算而不是派生。当前没有任何选项可以替代这些选择。