Hadoop映射任务初始化花费的时间太长

时间:2012-04-11 09:28:25

标签: hadoop

好吧,最近,在我正在运行的任何Hadoop流程中,我遇到某个地图节点(主要作为奴隶工作)的3分10秒延迟。在初始化延迟之后,它会恢复正常并立即执行。

例如,在运行QuasiMonteCarlo示例时:

Task Id                                 Start Time  Finish Time <br>
attempt_201204101957_0006_m_000003_0    10/04 20:14:54  10/04 20:18:05 (3mins, 10sec)   /default-rack/master

2012-04-10 20:18:04,470 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library<br>2012-04-10 20:18:04,646 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=<br>
2012-04-10 20:18:04,647 WARN org.apache.hadoop.conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name<br>
2012-04-10 20:18:04,751 INFO org.apache.hadoop.mapreduce.util.ProcessTree: setsid exited with exit code 0<br>
2012-04-10 20:18:04,754 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@79ee2c2c<br>
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)<br>
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100<br>
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: soft limit at 83886080<br>
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600<br>
2012-04-10 20:18<br>:04,912 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
2012-04-10 20:18:04,939 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output<br>
2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: Spilling map output<br>
2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600<br>
2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600<br>
2012-04-10 20:18:04,972 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0<br>
2012-04-10 20:18:04,975 INFO org.apache.hadoop.mapred.Task: Task:attempt_201204101957_0006_m_000003_0 is done. And is in the process of commiting<br>
2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201204101957_0006_m_000003_0' done.<br>

任务跟踪器日志更具说服力:

2012-04-10 **20:14:54,615** INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201204101957_0006_m_000003_0 which needs 1 slots<br>
2012-04-10 20:14:54,685 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201204101957_0006_m_377512887 spawned.<br>
2012-04-10 20:16:34,041 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1<br>
2012-04-10 **20:18:04,433** INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201204101957_0006_m_377512887 given task: attempt_201204101957_0006_m_000003_0<br>
2012-04-10 20:18:04,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204101957_0006_m_000003_0 0.0% <br>
2012-04-10 20:18:05,056 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204101957_0006_m_000003_0 0.667% Generated 1000 samples. <br>
  

排序
  2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker:任务尝试_201204101957_0006_m_000003_0完成。
  2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker:为attempt_201204101957_0006_m_000003_0报告的输出大小为28
  2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker:addFreeSlot:当前空闲插槽:2
  2012-04-10 20:18:05,213 INFO org.apache.hadoop.mapreduce.util.ProcessTree:向进程组-23030的所有成员发送信号:SIGTERM。退出代码1
  2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker:发出28个字节以减少地图中的0:attempt_201204101957_0006_m_000003_0给出28/24
  2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker:Shuffled 1maps(mapIds = attempt_201204101957_0006_m_000003_0)减少29s中的0   2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:src:147.102.7.173:50060,dest:147.102.7.175:579289,maps:1,op:MAPRED_SHUFFLE,reduceID:0 ,时长:29
  2012-04-10 20:18:10,217 INFO org.apache.hadoop.mapred.JvmManager:JVM:jvm_201204101957_0006_m_377512887退出,退出代码为0.运行的任务数量:1

我怀疑这里存在网络问题,但我可以ping和ssh没问题。

0 个答案:

没有答案