日志文件中的映射和减少任务计数不正确

时间:2014-02-20 12:32:15

标签: hadoop mapreduce

我正在运行一个正确运行的mapreduce作业。但是我对生成的日志文件存在一定的困惑。

运行map-red

的命令
hadoop jar mapred-0.0.1-SNAPSHOT.jar tcs.hadoop.org.mapreduce.MaxTemperatureDriver /priya/sample.txt /output

14/02/20 17:35:10 INFO input.FileInputFormat: Total input paths to process : 1
14/02/20 17:35:10 WARN snappy.LoadSnappy: Snappy native library is available
14/02/20 17:35:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/20 17:35:10 INFO snappy.LoadSnappy: Snappy native library loaded
14/02/20 17:35:10 INFO mapred.JobClient: Running job: job_201402111203_0034
14/02/20 17:35:11 INFO mapred.JobClient:  map 0% reduce 0%
14/02/20 17:35:22 INFO mapred.JobClient:  map 100% reduce 0%
14/02/20 17:35:36 INFO mapred.JobClient:  map 100% reduce 100%
14/02/20 17:35:39 INFO mapred.JobClient: Job complete: job_201402111203_0034
14/02/20 17:35:40 INFO mapred.JobClient: Counters: 26
14/02/20 17:35:40 INFO mapred.JobClient:   Job Counters 
14/02/20 17:35:40 INFO mapred.JobClient:     Launched reduce tasks=2
14/02/20 17:35:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11900
14/02/20 17:35:40 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/20 17:35:40 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/20 17:35:40 INFO mapred.JobClient:     Launched map tasks=1
14/02/20 17:35:40 INFO mapred.JobClient:     Data-local map tasks=1
14/02/20 17:35:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=23142
14/02/20 17:35:40 INFO mapred.JobClient:   FileSystemCounters
14/02/20 17:35:40 INFO mapred.JobClient:     FILE_BYTES_READ=34
14/02/20 17:35:40 INFO mapred.JobClient:     HDFS_BYTES_READ=633
14/02/20 17:35:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=154973
14/02/20 17:35:40 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=17
14/02/20 17:35:40 INFO mapred.JobClient:   Map-Reduce Framework
14/02/20 17:35:40 INFO mapred.JobClient:     Map input records=5
14/02/20 17:35:40 INFO mapred.JobClient:     Reduce shuffle bytes=34
14/02/20 17:35:40 INFO mapred.JobClient:     Spilled Records=4
14/02/20 17:35:40 INFO mapred.JobClient:     Map output bytes=45
14/02/20 17:35:40 INFO mapred.JobClient:     CPU time spent (ms)=4420
14/02/20 17:35:40 INFO mapred.JobClient:     Total committed heap usage (bytes)=172822528
14/02/20 17:35:40 INFO mapred.JobClient:     Combine input records=5
14/02/20 17:35:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=103
14/02/20 17:35:40 INFO mapred.JobClient:     Reduce input records=2
14/02/20 17:35:40 INFO mapred.JobClient:     Reduce input groups=2
14/02/20 17:35:40 INFO mapred.JobClient:     Combine output records=2
14/02/20 17:35:40 INFO mapred.JobClient:     Physical memory (bytes) snapshot=300945408
14/02/20 14/02/20 17:35:40 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=7375564800
14/02/20 17:35:40 INFO mapred.JobClient:     Map output records=517:35:40 INFO mapred.JobClient:     Reduce output records=2

因此,我可以看到我正在创建一个地图任务和两个减少任务。

但是当我查看驻留在$ HADOOP_HOME / logs / history目录中的作业历史记录日志时,我发现5个任务已被作业跟踪器触发,如下所示(仅提供日志行)。 我无法理解5个任务的原因,而不是3个。

MapAttempt TASK_TYPE="SETUP" TASKID="task_201402111203_0034_m_000002" TASK_ATTEMPT_ID="attempt_201402111203_0034_m_000002_0" START_TIME="1392897911096" TRACKER_NAME="tracker_IMBDBOX1:IMBDBOX
1/157\.227\.44\.207:40925" HTTP_PORT="50060" .

MapAttempt TASK_TYPE="MAP" TASKID="task_201402111203_0034_m_000000" TASK_ATTEMPT_ID="attempt_201402111203_0034_m_000000_0" TASK_STATUS="SUCCESS" FINISH_TIME="1392897989806" HOSTNAME="/defaul

ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201402111203_0034_r_000001" TASK_ATTEMPT_ID="attempt_201402111203_0034_r_000001_0" START_TIME="1392897947754" TRACKER_NAME="tracker_IMBDBOX3:loc
alhost/127\.0\.0\.1:34625" HTTP_PORT="50060" 

ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201402111203_0034_r_000000" TASK_ATTEMPT_ID="attempt_201402111203_0034_r_000000_0" START_TIME="1392897992388" TRACKER_NAME="tracker_IMBDBOX4:loc
alhost/127\.0\.0\.1:59439" HTTP_PORT="50060" .

MapAttempt TASK_TYPE="CLEANUP" TASKID="task_201402111203_0034_m_000001" TASK_ATTEMPT_ID="attempt_201402111203_0034_m_000001_0" START_TIME="1392898004324" TRACKER_NAME="tracker_IMBDBOX4:local
host/127\.0\.0\.1:59439" HTTP_PORT="50060" 

再次,当我进入位于$ HADOOP_HOME / logs / userlogs的userlog时,我只能看到一个地图任务已生成日志。 为什么没有生成其他map和reduce任务日志?

请帮忙。谢谢!

用户日志目录

total 8
-rw-r----- 1 hadoop hdusers 497 2014-02-20 17:35 job-acls.xml
lrwxrwxrwx 1 hadoop hdusers  96 2014-02-20 17:35 attempt_201402111203_0034_m_000002_0 -> /app/hadoop/tmp/mapred/local/userlogs/job_201402111203_0034/attempt_201402111203_0034_m_000002_0

1 个答案:

答案 0 :(得分:0)

请注意这个字符串:

TASK_TYPE="SETUP"
TASK_TYPE="CLEANUP"

Hadoop需要在Map Reduce工作生命周期内运行多个附件作业。