我在一台机器上以独立模式运行火花。我有一个名为productUserVectors
的RDD,就像这个
[("11342",Map(..)),("21435",Map(..)),...]
normalisedVectors
中的行数是8164.我想获得此RDD行之间所有可能的对组合,并根据每行中的映射计算得分。我使用cartesian
来获取所有可能的对,并且我将如下所示过滤它们
scala> val normalisedVectors = productUserVector.map(line=>utilInst.normaliseVector(line)).sortBy(_._1.toInt)
scala> val combinedRDD = normalisedVectors.cartesian(normalisedVectors).filter(line=>line._1._1.toInt > line._2._1.toInt && utilInst.filterStyleAtp(line._1._1,line._2._1))
scala> val scoresRDD = combinedRDD.map(line=>utilInst.getScore(line)).filter(line=>line._3 > 0)
scala> val finalRDD = scoresRDD.map(line=> (line._1,List((line._2,line._3)))).reduceByKey(_ ++ _)
scala> finalRDD.saveAsTextFile(outputPath)
我将驱动程序内存设置为8GB,执行程序内存设置为2GB。这里,utilInst
及其函数用于从原始RDD的cartesian
的结果中过滤对。但是,输出显示它进入无限循环,如下面的日志所示
16/11/17 18:50:14 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/11/17 18:50:14 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/11/17 18:50:14 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/11/17 18:50:14 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/11/17 18:50:14 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/11/17 18:50:31 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 1491 bytes result sent to driver
16/11/17 18:50:31 INFO executor.Executor: Finished task 5.0 in stage 0.0 (TID 5). 1491 bytes result sent to driver
16/11/17 18:50:31 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 17339 ms on localhost (1/6)
16/11/17 18:50:31 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 17346 ms on localhost (2/6)
16/11/17 18:50:31 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 1491 bytes result sent to driver
16/11/17 18:50:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 17423 ms on localhost (3/6)
16/11/17 18:50:32 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1491 bytes result sent to driver
16/11/17 18:50:32 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 1491 bytes result sent to driver
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 18092 ms on localhost (4/6)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 18063 ms on localhost (5/6)
16/11/17 18:50:32 INFO executor.Executor: Finished task 4.0 in stage 0.0 (TID 4). 1491 bytes result sent to driver
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 18073 ms on localhost (6/6)
16/11/17 18:50:32 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/11/17 18:50:32 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (union at iterateUsers.scala:84) finished in 18.125 s
16/11/17 18:50:32 INFO scheduler.DAGScheduler: looking for newly runnable stages
16/11/17 18:50:32 INFO scheduler.DAGScheduler: running: Set()
16/11/17 18:50:32 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
16/11/17 18:50:32 INFO scheduler.DAGScheduler: failed: Set()
16/11/17 18:50:32 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[11] at reduceByKey at iterateUsers.scala:87), which has no missing parents
16/11/17 18:50:32 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 4.1 GB)
16/11/17 18:50:32 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1819.0 B, free 4.1 GB)
16/11/17 18:50:32 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 127.0.0.1:60497 (size: 1819.0 B, free: 4.1 GB)
16/11/17 18:50:32 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012
16/11/17 18:50:32 INFO scheduler.DAGScheduler: Submitting 6 missing tasks from ResultStage 1 (ShuffledRDD[11] at reduceByKey at iterateUsers.scala:87)
16/11/17 18:50:32 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 6 tasks
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 6, localhost, partition 0, ANY, 5126 bytes)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 7, localhost, partition 1, ANY, 5126 bytes)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 1.0 (TID 8, localhost, partition 2, ANY, 5126 bytes)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 1.0 (TID 9, localhost, partition 3, ANY, 5126 bytes)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 1.0 (TID 10, localhost, partition 4, ANY, 5126 bytes)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 1.0 (TID 11, localhost, partition 5, ANY, 5126 bytes)
16/11/17 18:50:32 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 6)
16/11/17 18:50:32 INFO executor.Executor: Running task 5.0 in stage 1.0 (TID 11)
16/11/17 18:50:32 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 7)
16/11/17 18:50:32 INFO executor.Executor: Running task 3.0 in stage 1.0 (TID 9)
16/11/17 18:50:32 INFO executor.Executor: Running task 2.0 in stage 1.0 (TID 8)
16/11/17 18:50:32 INFO executor.Executor: Running task 4.0 in stage 1.0 (TID 10)
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms
16/11/17 18:50:32 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
16/11/17 18:50:32 INFO executor.Executor: Finished task 3.0 in stage 1.0 (TID 9). 1512 bytes result sent to driver
16/11/17 18:50:32 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 7). 1512 bytes result sent to driver
16/11/17 18:50:32 INFO executor.Executor: Finished task 4.0 in stage 1.0 (TID 10). 1512 bytes result sent to driver
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 1.0 (TID 9) in 277 ms on localhost (1/6)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 7) in 283 ms on localhost (2/6)
16/11/17 18:50:32 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 1.0 (TID 10) in 279 ms on localhost (3/6)
16/11/17 18:50:37 INFO executor.Executor: Finished task 2.0 in stage 1.0 (TID 8). 1512 bytes result sent to driver
16/11/17 18:50:37 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 6). 1512 bytes result sent to driver
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 6) in 5120 ms on localhost (4/6)
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 1.0 (TID 8) in 5114 ms on localhost (5/6)
16/11/17 18:50:37 INFO executor.Executor: Finished task 5.0 in stage 1.0 (TID 11). 1512 bytes result sent to driver
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 1.0 (TID 11) in 5241 ms on localhost (6/6)
16/11/17 18:50:37 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/11/17 18:50:37 INFO scheduler.DAGScheduler: ResultStage 1 (count at iterateUsers.scala:88) finished in 5.254 s
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Job 0 finished: count at iterateUsers.scala:88, took 23.534860 s
8164
16/11/17 18:50:37 INFO rdd.UnionRDD: Removing RDD 10 from persistence list
16/11/17 18:50:37 INFO storage.BlockManager: Removing RDD 10
16/11/17 18:50:37 INFO spark.SparkContext: Starting job: sortBy at iterateUsers.scala:91
16/11/17 18:50:37 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 191 bytes
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Got job 1 (sortBy at iterateUsers.scala:91) with 6 output partitions
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (sortBy at iterateUsers.scala:91)
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Missing parents: List()
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[15] at sortBy at iterateUsers.scala:91), which has no missing parents
16/11/17 18:50:37 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.4 KB, free 4.1 GB)
16/11/17 18:50:37 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.5 KB, free 4.1 GB)
16/11/17 18:50:37 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 127.0.0.1:60497 (size: 2.5 KB, free: 4.1 GB)
16/11/17 18:50:37 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1012
16/11/17 18:50:37 INFO scheduler.DAGScheduler: Submitting 6 missing tasks from ResultStage 3 (MapPartitionsRDD[15] at sortBy at iterateUsers.scala:91)
16/11/17 18:50:37 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 6 tasks
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 12, localhost, partition 0, ANY, 5210 bytes)
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 13, localhost, partition 1, ANY, 5210 bytes)
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 3.0 (TID 14, localhost, partition 2, ANY, 5210 bytes)
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 3.0 (TID 15, localhost, partition 3, ANY, 5210 bytes)
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 3.0 (TID 16, localhost, partition 4, ANY, 5210 bytes)
16/11/17 18:50:37 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 3.0 (TID 17, localhost, partition 5, ANY, 5210 bytes)
16/11/17 18:50:37 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 12)
16/11/17 18:50:37 INFO executor.Executor: Running task 4.0 in stage 3.0 (TID 16)
16/11/17 18:50:37 INFO executor.Executor: Running task 3.0 in stage 3.0 (TID 15)
16/11/17 18:50:37 INFO executor.Executor: Running task 1.0 in stage 3.0 (TID 13)
16/11/17 18:50:37 INFO executor.Executor: Running task 2.0 in stage 3.0 (TID 14)
16/11/17 18:50:37 INFO executor.Executor: Running task 5.0 in stage 3.0 (TID 17)
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:50:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:50:38 INFO executor.Executor: Finished task 5.0 in stage 3.0 (TID 17). 1818 bytes result sent to driver
16/11/17 18:50:38 INFO executor.Executor: Finished task 4.0 in stage 3.0 (TID 16). 1818 bytes result sent to driver
16/11/17 18:50:38 INFO executor.Executor: Finished task 3.0 in stage 3.0 (TID 15). 1728 bytes result sent to driver
16/11/17 18:50:38 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 12). 1724 bytes result sent to driver
16/11/17 18:50:38 INFO executor.Executor: Finished task 2.0 in stage 3.0 (TID 14). 1727 bytes result sent to driver
16/11/17 18:50:38 INFO executor.Executor: Finished task 1.0 in stage 3.0 (TID 13). 1734 bytes result sent to driver
16/11/17 18:50:38 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 3.0 (TID 17) in 117 ms on localhost (1/6)
16/11/17 18:50:38 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 3.0 (TID 16) in 120 ms on localhost (2/6)
16/11/17 18:50:38 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 3.0 (TID 15) in 123 ms on localhost (3/6)
16/11/17 18:50:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 12) in 130 ms on localhost (4/6)
16/11/17 18:50:38 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 3.0 (TID 14) in 128 ms on localhost (5/6)
16/11/17 18:50:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 13) in 130 ms on localhost (6/6)
16/11/17 18:50:38 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
16/11/17 18:50:38 INFO scheduler.DAGScheduler: ResultStage 3 (sortBy at iterateUsers.scala:91) finished in 0.133 s
16/11/17 18:50:38 INFO scheduler.DAGScheduler: Job 1 finished: sortBy at iterateUsers.scala:91, took 0.154474 s
16/11/17 18:50:38 INFO rdd.ShuffledRDD: Removing RDD 11 from persistence list
16/11/17 18:50:38 INFO storage.BlockManager: Removing RDD 11
16/11/17 18:50:44 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 127.0.0.1:60497 in memory (size: 2.5 KB, free: 4.1 GB)
16/11/17 18:50:44 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 127.0.0.1:60497 in memory (size: 1819.0 B, free: 4.1 GB)
16/11/17 18:51:37 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 127.0.0.1:60497 in memory (size: 3.1 KB, free: 4.1 GB)
16/11/17 18:52:48 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/17 18:52:48 INFO spark.SparkContext: Starting job: saveAsTextFile at iterateUsers.scala:99
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Registering RDD 13 (sortBy at iterateUsers.scala:91)
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Registering RDD 22 (map at iterateUsers.scala:98)
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Got job 2 (saveAsTextFile at iterateUsers.scala:99) with 36 output partitions
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (saveAsTextFile at iterateUsers.scala:99)
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
16/11/17 18:52:48 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 5 (MapPartitionsRDD[13] at sortBy at iterateUsers.scala:91), which has no missing parents
16/11/17 18:52:50 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 33.5 MB, free 4.1 GB)
16/11/17 18:52:50 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.0 MB, free 4.1 GB)
16/11/17 18:52:50 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 127.0.0.1:60497 (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:52:50 INFO memory.MemoryStore: Block broadcast_4_piece1 stored as bytes in memory (estimated size 4.0 MB, free 4.1 GB)
16/11/17 18:52:50 INFO storage.BlockManagerInfo: Added broadcast_4_piece1 in memory on 127.0.0.1:60497 (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:52:50 INFO memory.MemoryStore: Block broadcast_4_piece2 stored as bytes in memory (estimated size 4.0 MB, free 4.0 GB)
16/11/17 18:52:50 INFO storage.BlockManagerInfo: Added broadcast_4_piece2 in memory on 127.0.0.1:60497 (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:52:50 INFO memory.MemoryStore: Block broadcast_4_piece3 stored as bytes in memory (estimated size 2.9 MB, free 4.0 GB)
16/11/17 18:52:50 INFO storage.BlockManagerInfo: Added broadcast_4_piece3 in memory on 127.0.0.1:60497 (size: 2.9 MB, free: 4.1 GB)
16/11/17 18:52:50 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1012
16/11/17 18:52:50 INFO scheduler.DAGScheduler: Submitting 6 missing tasks from ShuffleMapStage 5 (MapPartitionsRDD[13] at sortBy at iterateUsers.scala:91)
16/11/17 18:52:50 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 6 tasks
16/11/17 18:52:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 18, localhost, partition 0, ANY, 5207 bytes)
16/11/17 18:52:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 5.0 (TID 19, localhost, partition 1, ANY, 5207 bytes)
16/11/17 18:52:50 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 5.0 (TID 20, localhost, partition 2, ANY, 5207 bytes)
16/11/17 18:52:50 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 5.0 (TID 21, localhost, partition 3, ANY, 5207 bytes)
16/11/17 18:52:50 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 5.0 (TID 22, localhost, partition 4, ANY, 5207 bytes)
16/11/17 18:52:50 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 5.0 (TID 23, localhost, partition 5, ANY, 5207 bytes)
16/11/17 18:52:50 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 18)
16/11/17 18:52:50 INFO executor.Executor: Running task 1.0 in stage 5.0 (TID 19)
16/11/17 18:52:50 INFO executor.Executor: Running task 2.0 in stage 5.0 (TID 20)
16/11/17 18:52:50 INFO executor.Executor: Running task 3.0 in stage 5.0 (TID 21)
16/11/17 18:52:50 INFO executor.Executor: Running task 4.0 in stage 5.0 (TID 22)
16/11/17 18:52:50 INFO executor.Executor: Running task 5.0 in stage 5.0 (TID 23)
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:02 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:53:02 INFO executor.Executor: Finished task 2.0 in stage 5.0 (TID 20). 1883 bytes result sent to driver
16/11/17 18:53:02 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 18). 1883 bytes result sent to driver
16/11/17 18:53:02 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 5.0 (TID 20) in 12006 ms on localhost (1/6)
16/11/17 18:53:02 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 18) in 12011 ms on localhost (2/6)
16/11/17 18:53:02 INFO executor.Executor: Finished task 5.0 in stage 5.0 (TID 23). 1883 bytes result sent to driver
16/11/17 18:53:02 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 5.0 (TID 23) in 12019 ms on localhost (3/6)
16/11/17 18:53:02 INFO executor.Executor: Finished task 4.0 in stage 5.0 (TID 22). 1883 bytes result sent to driver
16/11/17 18:53:02 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 5.0 (TID 22) in 12027 ms on localhost (4/6)
16/11/17 18:53:02 INFO executor.Executor: Finished task 3.0 in stage 5.0 (TID 21). 1883 bytes result sent to driver
16/11/17 18:53:02 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 5.0 (TID 21) in 12044 ms on localhost (5/6)
16/11/17 18:53:02 INFO executor.Executor: Finished task 1.0 in stage 5.0 (TID 19). 1883 bytes result sent to driver
16/11/17 18:53:02 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 5.0 (TID 19) in 12059 ms on localhost (6/6)
16/11/17 18:53:02 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
16/11/17 18:53:02 INFO scheduler.DAGScheduler: ShuffleMapStage 5 (sortBy at iterateUsers.scala:91) finished in 12.061 s
16/11/17 18:53:02 INFO scheduler.DAGScheduler: looking for newly runnable stages
16/11/17 18:53:02 INFO scheduler.DAGScheduler: running: Set()
16/11/17 18:53:02 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 6, ResultStage 7)
16/11/17 18:53:02 INFO scheduler.DAGScheduler: failed: Set()
16/11/17 18:53:02 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[22] at map at iterateUsers.scala:98), which has no missing parents
16/11/17 18:53:05 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 33.5 MB, free 4.0 GB)
16/11/17 18:53:05 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 4.0 MB, free 4.0 GB)
16/11/17 18:53:05 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 127.0.0.1:60497 (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:53:05 INFO memory.MemoryStore: Block broadcast_5_piece1 stored as bytes in memory (estimated size 4.0 MB, free 4.0 GB)
16/11/17 18:53:05 INFO storage.BlockManagerInfo: Added broadcast_5_piece1 in memory on 127.0.0.1:60497 (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:53:05 INFO memory.MemoryStore: Block broadcast_5_piece2 stored as bytes in memory (estimated size 4.0 MB, free 4.0 GB)
16/11/17 18:53:05 INFO storage.BlockManagerInfo: Added broadcast_5_piece2 in memory on 127.0.0.1:60497 (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:53:05 INFO memory.MemoryStore: Block broadcast_5_piece3 stored as bytes in memory (estimated size 2.9 MB, free 4.0 GB)
16/11/17 18:53:05 INFO storage.BlockManagerInfo: Added broadcast_5_piece3 in memory on 127.0.0.1:60497 (size: 2.9 MB, free: 4.1 GB)
16/11/17 18:53:05 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1012
16/11/17 18:53:05 INFO scheduler.DAGScheduler: Submitting 36 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[22] at map at iterateUsers.scala:98)
16/11/17 18:53:05 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 36 tasks
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 24, localhost, partition 0, ANY, 5411 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 25, localhost, partition 1, ANY, 5420 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 6.0 (TID 26, localhost, partition 2, ANY, 5420 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 6.0 (TID 27, localhost, partition 3, ANY, 5420 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 6.0 (TID 28, localhost, partition 4, ANY, 5420 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 6.0 (TID 29, localhost, partition 5, ANY, 5420 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 6.0 (TID 30, localhost, partition 6, ANY, 5420 bytes)
16/11/17 18:53:05 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 6.0 (TID 31, localhost, partition 7, ANY, 5411 bytes)
16/11/17 18:53:05 INFO executor.Executor: Running task 1.0 in stage 6.0 (TID 25)
16/11/17 18:53:05 INFO executor.Executor: Running task 0.0 in stage 6.0 (TID 24)
16/11/17 18:53:05 INFO executor.Executor: Running task 4.0 in stage 6.0 (TID 28)
16/11/17 18:53:05 INFO executor.Executor: Running task 2.0 in stage 6.0 (TID 26)
16/11/17 18:53:05 INFO executor.Executor: Running task 3.0 in stage 6.0 (TID 27)
16/11/17 18:53:05 INFO executor.Executor: Running task 5.0 in stage 6.0 (TID 29)
16/11/17 18:53:05 INFO executor.Executor: Running task 6.0 in stage 6.0 (TID 30)
16/11/17 18:53:05 INFO executor.Executor: Running task 7.0 in stage 6.0 (TID 31)
16/11/17 18:53:13 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on 127.0.0.1:60497 in memory (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:53:13 INFO storage.BlockManagerInfo: Removed broadcast_4_piece3 on 127.0.0.1:60497 in memory (size: 2.9 MB, free: 4.1 GB)
16/11/17 18:53:13 INFO storage.BlockManagerInfo: Removed broadcast_4_piece2 on 127.0.0.1:60497 in memory (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:53:13 INFO storage.BlockManagerInfo: Removed broadcast_4_piece1 on 127.0.0.1:60497 in memory (size: 4.0 MB, free: 4.1 GB)
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/17 18:53:30 INFO storage.ShuffleBlockFetcherIterator: Getting 6 non-empty blocks out of 6 blocks
在将storage.ShuffleBlockFetcherIterator
存储到文本文件中时,它会无休止地陷入最后finalRDD
阶段。我不知道它为什么会发生。我们非常感谢您解决此问题的任何帮助。