如何在Java中并行执行for-each循环中的语句?

时间:2019-04-09 16:28:38

标签: java multithreading concurrency parallel-processing

我有一段看起来像这样的代码:

public List<Restaurant> getAllRestaurants() {
    List<Restaurant> restaurants = getRestaurants().subList(0, 7); // This takes 234 ms to execute on average.    

    // There are 7 items in the restaurants list
    for (Restaurant restaurant : restaurants) {
        PlacesAPIResponse response = callGooglePlacesAPI(restaurant); // A call to the Google API should take 520ms for a given restaurant
        restaurant.setRating(response.getRating());
    }
    return restaurants;
}

如果我在所示的for-each循环中执行上述语句,由于该语句是按顺序运行的,因此我预计该方法的总时间为234ms + (7*520)ms = 3874ms。这太慢了,因此我想并行化for-each循环中的语句,以便同时为列表中的每个餐厅调用Google Places API。理论上,由于对Google API的调用是并行进行的,因此响应时间应为234ms + max(API call for Restaurant 1, ..., API call for Restaurant 7) = 234ms + 520ms = 754ms

根据this link (Java 8: Parallel FOR loop),我应该能够使用parallelStream()来同时执行以下语句:

long startTime = System.currentTimeMillis();
restaurants.parallelStream().forEach(restaurant -> {
    PlacesAPIResponse response = callGooglePlacesAPI(restaurant);
    restaurant.setRating(response.getRating());
});
long endTime = System.currentTimeMillis();
System.out.println("Calling Google Places API took " + (endTime - startTime) + " milliseconds");

这似乎可以并行调用每个餐厅的Google Places API,但是现在每次调用Google Places API的时间似乎越来越长。这是我的时间戳的输出:

getRestaurants() took 234 milliseconds
Took 335 milliseconds to call Google Places API for Restaurant 1
Took 337 milliseconds to call Google Places API for Restaurant 2
Took 671 milliseconds to call Google Places API for Restaurant 3
Took 742 milliseconds to call Google Places API for Restaurant 4
Took 1086 milliseconds to call Google Places API for Restaurant 5
Took 1116 milliseconds to call Google Places API for Restaurant 6
Took 1470 milliseconds to call Google Places API for Restaurant 7
Calling Google Places API took 1473 milliseconds

1734ms比我预期的754ms大得多。我已经尝试使用并行流以及ExecutorService来同时调用Google Places API,但似乎无法获得所需的响应时间。谁能指出我正确的方向?谢谢。

编辑:根据这篇帖子(Is there a easy way to parallelize a foreach loop in java?),这是我对ExecutorService进行的尝试:

startTime = System.currentTimeMillis();
ExecutorService exe = Executors.newFixedThreadPool(2);   // 2 can be changed of course
for (Restaurant restaurant : restaurants) {
    exe.submit(() -> {
        PlacesAPIResponse response = callGooglePlacesAPI(restaurant); // A call to the Google API should take 520ms for a given restaurant
        restaurant.setRating(response.getRating());
    });
}    

exe.shutdown();
try {
    exe.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
    e.printStackTrace();
}    

endTime = System.currentTimeMillis();
System.out.println("Calling Google Places API took " + (endTime - startTime) + " milliseconds");
return restaurants; 

这是我的时间戳记的输出:

getRestaurants() took 234 milliseconds
Took 464 milliseconds to call Google Places API for Restaurant 1
Took 575 milliseconds to call Google Places API for Restaurant 2
Took 452 milliseconds to call Google Places API for Restaurant 3
Took 420 milliseconds to call Google Places API for Restaurant 4
Took 414 milliseconds to call Google Places API for Restaurant 5
Took 444 milliseconds to call Google Places API for Restaurant 6
Took 422 milliseconds to call Google Places API for Restaurant 7
Calling Google Places API took 1757 milliseconds

此方法的响应时间仍然是234ms + 1757 ms而不是234ms + 575ms,我不明白为什么。

3 个答案:

答案 0 :(得分:1)

这里最好的方法是使用executorService并为它们提供任务作为单独的Runnable()。

或者您可以在此处使用Future。

答案 1 :(得分:1)

这是很久以前的事了,但我想原因在于您选择的线程池大小。线程池大小为 2 意味着您只能并行执行两个作业。剩余的作业将排队,直到线程被释放。因此,您执行 Google Places API 的计算将类似于 max(464+452+414+422, 575+420+444) = max(1752, 1439) = 1752,它接近实际值。这很好解释 here

答案 2 :(得分:0)

我想您的瓶颈是与互联网或Google Places服务器的连接,而不是环路。服务器识别相同的IP地址,因此将您的请求排队,以保护自己免受拒绝服务攻击。 这意味着您的循环并行运行,但是Internet请求被堆积在服务器上,这就是为什么每个请求越来越多的时间才被应答并返回的原因。 为避免这种情况,您需要一个像僵尸网络(从不同的计算机发送每个查询)之类的东西,或者Google Places会向您出售用于并行请求的特殊连接。