为什么同一任务集中的任务执行时间差异很大?

时间:2015-07-04 09:58:24

标签: apache-spark

我在Spark上做了一些字符串处理。我的代码片段:

val rdd = sc.objectFile[(String, String)]("some hdfs url", 1);
rdd.cache.count // let cache happen

val combOp = (f: List[String], g: List[String]) => {
  for (x <- f) {
    finder.processEntry(x)
  }
  for (x <- g) {
    finder.processEntry(x)
  }
  finder.result
}


val res = rdd.mapPartitions( x => {
  for (e<-x) {
    finder.processEntry(e)
  }
  Iterator(finder.result)
}, true).reduce(combOp)

我拥有的数据集大约为10GB。我在24核机器上运行Spark,内存为48GB。配置文件:

spark.driver.memory 1g
spark.executor.memory 30g
spark.executor.extraJavaOptions -Xloggc:/var/log/gcmemory.log -XX:+PrintGCDetails
spark.executor.cores 4

执行日志摘要:

INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 10.60.1.143, ANY, 1642 bytes)
INFO BlockManagerMasterEndpoint: Registering block manager 10.60.1.143:42850 with 15.5 GB RAM, BlockManagerId(0, 10.60.1.143, 42850)
INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.60.1.143:42850 (size: 1766.0 B, free: 15.5 GB)
INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.60.1.143:42850 (size: 16.8 KB, free: 15.5 GB)
INFO BlockManagerInfo: Added rdd_1_3 in memory on 10.60.1.143:42850 (size: 219.7 MB, free: 15.3 GB)
INFO BlockManagerInfo: Added rdd_1_1 in memory on 10.60.1.143:42850 (size: 229.7 MB, free: 15.1 GB)
INFO BlockManagerInfo: Added rdd_1_2 in memory on 10.60.1.143:42850 (size: 221.5 MB, free: 14.9 GB)
INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 6345 ms on 10.60.1.143 (1/34)
INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 6351 ms on 10.60.1.143 (2/34)
INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6354 ms on 10.60.1.143 (3/34)
INFO BlockManagerInfo: Added rdd_1_0 in memory on 10.60.1.143:42850 (size: 220.6 MB, free: 14.7 GB)
INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6454 ms on 10.60.1.143 (4/34)
INFO BlockManagerInfo: Added rdd_1_5 in memory on 10.60.1.143:42850 (size: 219.9 MB, free: 14.4 GB)
INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 2287 ms on 10.60.1.143 (5/34)
INFO BlockManagerInfo: Added rdd_1_4 in memory on 10.60.1.143:42850 (size: 222.7 MB, free: 14.2 GB)
INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, 10.60.1.143, ANY, 1642 bytes)
INFO BlockManagerInfo: Added rdd_1_6 in memory on 10.60.1.143:42850 (size: 210.7 MB, free: 14.0 GB)
INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 2350 ms on 10.60.1.143 (6/34)
INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 2356 ms on 10.60.1.143 (7/34)
INFO BlockManagerInfo: Added rdd_1_7 in memory on 10.60.1.143:42850 (size: 214.6 MB, free: 13.8 GB)
INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 2289 ms on 10.60.1.143 (8/34)
INFO BlockManagerInfo: Added rdd_1_8 in memory on 10.60.1.143:42850 (size: 216.3 MB, free: 13.6 GB)
INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 2430 ms on 10.60.1.143 (9/34)
INFO BlockManagerInfo: Added rdd_1_11 in memory on 10.60.1.143:42850 (size: 216.5 MB, free: 13.4 GB)
INFO BlockManagerInfo: Added rdd_1_10 in memory on 10.60.1.143:42850 (size: 216.5 MB, free: 13.2 GB)
INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 2416 ms on 10.60.1.143 (10/34)
INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 2445 ms on 10.60.1.143 (11/34)
INFO BlockManagerInfo: Added rdd_1_9 in memory on 10.60.1.143:42850 (size: 231.4 MB, free: 12.9 GB)
INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 2528 ms on 10.60.1.143 (12/34)
INFO BlockManagerInfo: Added rdd_1_12 in memory on 10.60.1.143:42850 (size: 217.3 MB, free: 12.7 GB)
INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 1797 ms on 10.60.1.143 (13/34)
INFO BlockManagerInfo: Added rdd_1_14 in memory on 10.60.1.143:42850 (size: 215.8 MB, free: 12.5 GB)
INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 1748 ms on 10.60.1.143 (14/34)
INFO BlockManagerInfo: Added rdd_1_13 in memory on 10.60.1.143:42850 (size: 220.9 MB, free: 12.3 GB)
INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 1812 ms on 10.60.1.143 (15/34)
INFO BlockManagerInfo: Added rdd_1_15 in memory on 10.60.1.143:42850 (size: 233.8 MB, free: 12.1 GB)
INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 1756 ms on 10.60.1.143 (16/34)
INFO BlockManagerInfo: Added rdd_1_16 in memory on 10.60.1.143:42850 (size: 221.6 MB, free: 11.9 GB)
INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20, 10.60.1.143, ANY, 1642 bytes)
INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 2600 ms on 10.60.1.143 (17/34)

同一任务集中的第一个参赛者如何比后者参赛者执行更长时间?非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

分离器(或执行器对某些分区比其他分区花费更长时间)的常见原因是分区器数据不均匀。我建议你尝试重新分配你的数据。 Spark UI也可能有一些有用的信息(您可以查看输入大小等)。有时某些机器因随机原因而变慢(特别是在虚拟化环境中我们可以在某些机器上有吵闹的邻居),您可以尝试启用推测执行(请参阅https://spark.apache.org/docs/latest/configuration.html)/设置Result标志,以便Spark可以尝试解决另一个执行器上的问题,如果它恰好在一台机器上运行缓慢。