我有一段看起来像这样的代码:
public List<Restaurant> getAllRestaurants() {
List<Restaurant> restaurants = getRestaurants().subList(0, 7); // This takes 234 ms to execute on average.
// There are 7 items in the restaurants list
for (Restaurant restaurant : restaurants) {
PlacesAPIResponse response = callGooglePlacesAPI(restaurant); // A call to the Google API should take 520ms for a given restaurant
restaurant.setRating(response.getRating());
}
return restaurants;
}
如果我在所示的for-each循环中执行上述语句,由于该语句是按顺序运行的,因此我预计该方法的总时间为234ms + (7*520)ms = 3874ms
。这太慢了,因此我想并行化for-each循环中的语句,以便同时为列表中的每个餐厅调用Google Places API。理论上,由于对Google API的调用是并行进行的,因此响应时间应为234ms + max(API call for Restaurant 1, ..., API call for Restaurant 7) = 234ms + 520ms = 754ms
。
根据this link (Java 8: Parallel FOR loop),我应该能够使用parallelStream()
来同时执行以下语句:
long startTime = System.currentTimeMillis();
restaurants.parallelStream().forEach(restaurant -> {
PlacesAPIResponse response = callGooglePlacesAPI(restaurant);
restaurant.setRating(response.getRating());
});
long endTime = System.currentTimeMillis();
System.out.println("Calling Google Places API took " + (endTime - startTime) + " milliseconds");
这似乎可以并行调用每个餐厅的Google Places API,但是现在每次调用Google Places API的时间似乎越来越长。这是我的时间戳的输出:
getRestaurants() took 234 milliseconds
Took 335 milliseconds to call Google Places API for Restaurant 1
Took 337 milliseconds to call Google Places API for Restaurant 2
Took 671 milliseconds to call Google Places API for Restaurant 3
Took 742 milliseconds to call Google Places API for Restaurant 4
Took 1086 milliseconds to call Google Places API for Restaurant 5
Took 1116 milliseconds to call Google Places API for Restaurant 6
Took 1470 milliseconds to call Google Places API for Restaurant 7
Calling Google Places API took 1473 milliseconds
1734ms
比我预期的754ms
大得多。我已经尝试使用并行流以及ExecutorService来同时调用Google Places API,但似乎无法获得所需的响应时间。谁能指出我正确的方向?谢谢。
编辑:根据这篇帖子(Is there a easy way to parallelize a foreach loop in java?),这是我对ExecutorService进行的尝试:
startTime = System.currentTimeMillis();
ExecutorService exe = Executors.newFixedThreadPool(2); // 2 can be changed of course
for (Restaurant restaurant : restaurants) {
exe.submit(() -> {
PlacesAPIResponse response = callGooglePlacesAPI(restaurant); // A call to the Google API should take 520ms for a given restaurant
restaurant.setRating(response.getRating());
});
}
exe.shutdown();
try {
exe.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
endTime = System.currentTimeMillis();
System.out.println("Calling Google Places API took " + (endTime - startTime) + " milliseconds");
return restaurants;
这是我的时间戳记的输出:
getRestaurants() took 234 milliseconds
Took 464 milliseconds to call Google Places API for Restaurant 1
Took 575 milliseconds to call Google Places API for Restaurant 2
Took 452 milliseconds to call Google Places API for Restaurant 3
Took 420 milliseconds to call Google Places API for Restaurant 4
Took 414 milliseconds to call Google Places API for Restaurant 5
Took 444 milliseconds to call Google Places API for Restaurant 6
Took 422 milliseconds to call Google Places API for Restaurant 7
Calling Google Places API took 1757 milliseconds
此方法的响应时间仍然是234ms + 1757 ms
而不是234ms + 575ms
,我不明白为什么。
答案 0 :(得分:1)
这里最好的方法是使用executorService并为它们提供任务作为单独的Runnable()。
或者您可以在此处使用Future。
答案 1 :(得分:1)
这是很久以前的事了,但我想原因在于您选择的线程池大小。线程池大小为 2 意味着您只能并行执行两个作业。剩余的作业将排队,直到线程被释放。因此,您执行 Google Places API 的计算将类似于 max(464+452+414+422, 575+420+444) = max(1752, 1439) = 1752
,它接近实际值。这很好解释 here。
答案 2 :(得分:0)
我想您的瓶颈是与互联网或Google Places服务器的连接,而不是环路。服务器识别相同的IP地址,因此将您的请求排队,以保护自己免受拒绝服务攻击。 这意味着您的循环并行运行,但是Internet请求被堆积在服务器上,这就是为什么每个请求越来越多的时间才被应答并返回的原因。 为避免这种情况,您需要一个像僵尸网络(从不同的计算机发送每个查询)之类的东西,或者Google Places会向您出售用于并行请求的特殊连接。