这是我的MR工作的控制台上的输出。工作顺利完成。但我有两个问题。
1)我指定了-D mapred.reduce.slowstart.completed.maps=0.75
。然而,当地图完成75%时,减速器没有启动,如下所示
2)我指定了-D mapred.reduce.tasks=2
。但启动的减速器任务数量为3(如下所示)。
为什么这两个参数还没有被照顾?
hadoop jar hadoop-examples-1.2.1.jar wordcount -D mapred.reduce.slowstart.completed.maps=0.75 -D mapred.reduce.tasks=2 /data /output/result1
2014-07-06 22:25:11.733 java[3236:1903] Unable to load realm info from SCDynamicStore
14/07/06 22:25:13 INFO input.FileInputFormat: Total input paths to process : 4
14/07/06 22:25:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/06 22:25:13 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/06 22:25:13 INFO mapred.JobClient: Running job: job_201407061919_0015
14/07/06 22:25:14 INFO mapred.JobClient: map 0% reduce 0%
14/07/06 22:25:49 INFO mapred.JobClient: map 25% reduce 0%
14/07/06 22:25:50 INFO mapred.JobClient: map 50% reduce 0%
14/07/06 22:26:08 INFO mapred.JobClient: map 75% reduce 0%
14/07/06 22:26:14 INFO mapred.JobClient: map 100% reduce 0%
14/07/06 22:26:23 INFO mapred.JobClient: map 100% reduce 8%
14/07/06 22:26:26 INFO mapred.JobClient: map 100% reduce 33%
14/07/06 22:26:29 INFO mapred.JobClient: map 100% reduce 37%
14/07/06 22:26:30 INFO mapred.JobClient: map 100% reduce 54%
14/07/06 22:26:33 INFO mapred.JobClient: map 100% reduce 66%
14/07/06 22:26:37 INFO mapred.JobClient: map 100% reduce 86%
14/07/06 22:26:39 INFO mapred.JobClient: map 100% reduce 100%
14/07/06 22:26:50 INFO mapred.JobClient: Job complete: job_201407061919_0015
14/07/06 22:26:50 INFO mapred.JobClient: Counters: 29
14/07/06 22:26:50 INFO mapred.JobClient: Job Counters
14/07/06 22:26:50 INFO mapred.JobClient: Launched reduce tasks=3
14/07/06 22:26:50 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=107522
14/07/06 22:26:50 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/07/06 22:26:50 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/07/06 22:26:50 INFO mapred.JobClient: Launched map tasks=4
14/07/06 22:26:50 INFO mapred.JobClient: Data-local map tasks=4
14/07/06 22:26:50 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=51153
14/07/06 22:26:50 INFO mapred.JobClient: File Output Format Counters
14/07/06 22:26:50 INFO mapred.JobClient: Bytes Written=880862
14/07/06 22:26:50 INFO mapred.JobClient: FileSystemCounters
14/07/06 22:26:50 INFO mapred.JobClient: FILE_BYTES_READ=2217446
14/07/06 22:26:50 INFO mapred.JobClient: HDFS_BYTES_READ=3672001
14/07/06 22:26:50 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4030974
14/07/06 22:26:50 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880862
14/07/06 22:26:50 INFO mapred.JobClient: File Input Format Counters
14/07/06 22:26:50 INFO mapred.JobClient: Bytes Read=3671571
14/07/06 22:26:50 INFO mapred.JobClient: Map-Reduce Framework
14/07/06 22:26:50 INFO mapred.JobClient: Map output materialized bytes=1474437
14/07/06 22:26:50 INFO mapred.JobClient: Map input records=77934
14/07/06 22:26:50 INFO mapred.JobClient: Reduce shuffle bytes=1474437
14/07/06 22:26:50 INFO mapred.JobClient: Spilled Records=255974
14/07/06 22:26:50 INFO mapred.JobClient: Map output bytes=6076197
14/07/06 22:26:50 INFO mapred.JobClient: Total committed heap usage (bytes)=589447168
14/07/06 22:26:50 INFO mapred.JobClient: CPU time spent (ms)=19030
14/07/06 22:26:50 INFO mapred.JobClient: Combine input records=629184
14/07/06 22:26:50 INFO mapred.JobClient: SPLIT_RAW_BYTES=430
14/07/06 22:26:50 INFO mapred.JobClient: Reduce input records=102328
14/07/06 22:26:50 INFO mapred.JobClient: Reduce input groups=82339
14/07/06 22:26:50 INFO mapred.JobClient: Combine output records=102328
14/07/06 22:26:50 INFO mapred.JobClient: Physical memory (bytes) snapshot=888221696
14/07/06 22:26:50 INFO mapred.JobClient: Reduce output records=82339
14/07/06 22:26:50 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6509461504
14/07/06 22:26:50 INFO mapred.JobClient: Map output records=629184
编辑:命令行上没有任何参数:
hadoop jar hadoop-examples-1.2.1.jar wordcount /data/ /output/results2
2014-07-06 20:05:29.428 java[2869:1903] Unable to load realm info from SCDynamicStore
14/07/06 20:05:29 INFO input.FileInputFormat: Total input paths to process : 4
14/07/06 20:05:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/06 20:05:29 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/06 20:05:30 INFO mapred.JobClient: Running job: job_201407061919_0009
14/07/06 20:05:31 INFO mapred.JobClient: map 0% reduce 0%
14/07/06 20:05:43 INFO mapred.JobClient: map 25% reduce 0%
14/07/06 20:05:47 INFO mapred.JobClient: map 50% reduce 0%
14/07/06 20:05:48 INFO mapred.JobClient: map 100% reduce 0%
14/07/06 20:05:55 INFO mapred.JobClient: map 100% reduce 33%
14/07/06 20:05:57 INFO mapred.JobClient: map 100% reduce 100%
14/07/06 20:06:00 INFO mapred.JobClient: Job complete: job_201407061919_0009
14/07/06 20:06:00 INFO mapred.JobClient: Counters: 29
14/07/06 20:06:00 INFO mapred.JobClient: Job Counters
14/07/06 20:06:00 INFO mapred.JobClient: Launched reduce tasks=1
14/07/06 20:06:00 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=53468
14/07/06 20:06:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/07/06 20:06:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/07/06 20:06:00 INFO mapred.JobClient: Launched map tasks=4
14/07/06 20:06:00 INFO mapred.JobClient: Data-local map tasks=4
14/07/06 20:06:00 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14440
14/07/06 20:06:00 INFO mapred.JobClient: File Output Format Counters
14/07/06 20:06:00 INFO mapred.JobClient: Bytes Written=880862
14/07/06 20:06:00 INFO mapred.JobClient: FileSystemCounters
14/07/06 20:06:00 INFO mapred.JobClient: FILE_BYTES_READ=2214915
14/07/06 20:06:00 INFO mapred.JobClient: HDFS_BYTES_READ=3672001
14/07/06 20:06:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3974001
14/07/06 20:06:00 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880862
14/07/06 20:06:00 INFO mapred.JobClient: File Input Format Counters
14/07/06 20:06:00 INFO mapred.JobClient: Bytes Read=3671571
14/07/06 20:06:00 INFO mapred.JobClient: Map-Reduce Framework
14/07/06 20:06:00 INFO mapred.JobClient: Map output materialized bytes=1474413
14/07/06 20:06:00 INFO mapred.JobClient: Map input records=77934
14/07/06 20:06:00 INFO mapred.JobClient: Reduce shuffle bytes=1474413
14/07/06 20:06:00 INFO mapred.JobClient: Spilled Records=255974
14/07/06 20:06:00 INFO mapred.JobClient: Map output bytes=6076197
14/07/06 20:06:00 INFO mapred.JobClient: Total committed heap usage (bytes)=557662208
14/07/06 20:06:00 INFO mapred.JobClient: CPU time spent (ms)=10370
14/07/06 20:06:00 INFO mapred.JobClient: Combine input records=629184
14/07/06 20:06:00 INFO mapred.JobClient: SPLIT_RAW_BYTES=430
14/07/06 20:06:00 INFO mapred.JobClient: Reduce input records=102328
14/07/06 20:06:00 INFO mapred.JobClient: Reduce input groups=82339
14/07/06 20:06:00 INFO mapred.JobClient: Combine output records=102328
14/07/06 20:06:00 INFO mapred.JobClient: Physical memory (bytes) snapshot=802287616
14/07/06 20:06:00 INFO mapred.JobClient: Reduce output records=82339
14/07/06 20:06:00 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5418221568
14/07/06 20:06:00 INFO mapred.JobClient: Map output records=629184
答案 0 :(得分:2)
1)我指定了-D mapred.reduce.slowstart.completed.maps = 0.75。然而,当地图完成75%时,减速器没有启动,如下所示
75%的地图进度并不一定意味着75%的地图任务已完成。此命令的含义是,当地图任务的75%(在您的情况下为4个中的3个地图任务)完成时,减速器将启动混洗阶段。有关如何定义进度的详细信息,请参阅this post。
2)我指定了-D mapred.reduce.tasks = 2。但启动的减速器任务数量为3(如下所示)。
可以在多个节点(称为“speculative execution”)上启动相同的reduce任务。当其中一个节点首先完成时,另一个节点会获得此任务的“终止”信号。
另一种可能性是,其中一个reduce任务在一个节点中失败,并在另一个节点中成功执行。