//Unit of logic I want to make it to run in parallel
public PagesDTO convertOCRStreamToDTO(String pageId, Integer pageSequence) throws Exception {
LOG.info("Get OCR begin for pageId [{}] thread name {}",pageId, Thread.currentThread().getName());
OcrContent ocrContent = getOcrContent(pageId);
OcrDTO ocrData = populateOCRData(ocrContent.getInputStream());
PagesDTO pageDTO = new PagesDTO(pageId, pageSequence.toString(), ocrData);
return pageDTO;
}
并行执行convertOCRStreamToDTO(..)的逻辑然后在个人线程执行完成时收集其结果
List<PagesDTO> pageDTOList = new ArrayList<>();
//javadoc: Creates a work-stealing thread pool using all available processors as its target parallelism level.
ExecutorService newWorkStealingPool = Executors.newWorkStealingPool();
Instant start = Instant.now();
List<CompletableFuture<PagesDTO>> pendingTasks = new ArrayList<>();
List<CompletableFuture<PagesDTO>> completedTasks = new ArrayList<>();
CompletableFuture<<PagesDTO>> task = null;
for (InputPageDTO dcInputPageDTO : dcReqDTO.getPages()) {
String pageId = dcInputPageDTO.getPageId();
task = CompletableFuture
.supplyAsync(() -> {
try {
return convertOCRStreamToDTO(pageId, pageSequence.getAndIncrement());
} catch (HttpHostConnectException | ConnectTimeoutException e) {
LOG.error("Error connecting to Redis for pageId [{}]", pageId, e);
CaptureException e1 = new CaptureException(Error.getErrorCodes().get(ErrorCodeConstants.REDIS_CONNECTION_FAILURE),
" Connecting to the Redis failed while getting OCR for pageId ["+pageId +"] " + e.getMessage(), CaptureErrorComponent.REDIS_CACHE, e);
exceptionMap.put(pageId,e1);
} catch (CaptureException e) {
LOG.error("Error in Document Classification Engine Service while getting OCR for pageId [{}]",pageId,e);
exceptionMap.put(pageId,e);
} catch (Exception e) {
LOG.error("Error getting OCR content for the pageId [{}]", pageId,e);
CaptureException e1 = new CaptureException(Error.getErrorCodes().get(ErrorCodeConstants.TECHNICAL_FAILURE),
"Error while getting ocr content for pageId : ["+pageId +"] " + e.getMessage(), CaptureErrorComponent.REDIS_CACHE, e);
exceptionMap.put(pageId,e1);
}
return null;
}, newWorkStealingPool);
//collect all async tasks
pendingTasks.add(task);
}
//TODO: How to avoid unnecessary loops which is happening here just for the sake of waiting for the future tasks to complete???
//TODO: Looking for the best solutions
while(pendingTasks.size() > 0) {
for(CompletableFuture<PagesDTO> futureTask: pendingTasks) {
if(futureTask != null && futureTask.isDone()){
completedTasks.add(futureTask);
pageDTOList.add(futureTask.get());
}
}
pendingTasks.removeAll(completedTasks);
}
//Throw the exception cought while getting converting OCR stream to DTO - for any of the pageId
for(InputPageDTO dcInputPageDTO : dcReqDTO.getPages()) {
if(exceptionMap.containsKey(dcInputPageDTO.getPageId())) {
CaptureException e = exceptionMap.get(dcInputPageDTO.getPageId());
throw e;
}
}
LOG.info("Parallel processing time taken for {} pages = {}", dcReqDTO.getPages().size(),
org.springframework.util.StringUtils.deleteAny(Duration.between(Instant.now(), start).toString().toLowerCase(), "pt-"));
请查看我上面的代码库todo项目,我有以下两个问题,我正在寻找有关stackoverflow的建议:
1)我想避免不必要的循环(在上面的while循环中发生),最好的方法是乐观地等待所有线程完成其异步执行然后从中收集我的结果???请有人有意见吗?
2)ExecutorService实例是在我的服务bean类级别创建的,认为它将被重用于每个请求,而不是在方法的本地创建,最后关闭。我在这做吗?或者我思考过程中的任何纠正?
答案 0 :(得分:1)
只需删除while
和if
,您就可以了:
for(CompletableFuture<PagesDTO> futureTask: pendingTasks) {
completedTasks.add(futureTask);
pageDTOList.add(futureTask.get());
}
get()
(以及join()
)将在返回值之前等待将来完成。此外,无需测试null
,因为您的列表永远不会包含任何内容。
但是,您可能会改变处理异常的方式。 CompletableFuture
有一个特定的机制来处理它们,并在调用get()
/ join()
时重新抛出它们。您可能只想将已检查的例外包装在CompletionException
。