在Hadoop中,作业执行后会提供以下指标:
我无法找到这些时间的确切定义,因为所有来源都不清楚如何准确计算这些时间。这就是我的看法:
我不确定大胆的事情。我的分析是否正确?
答案 0 :(得分:4)
我决定调查Hadoop代码以获得更多洞察力。下图解释了我的发现。
我发现:
以下代码代码支持以下代码:
在Shuffle使用的ReduceTask类中,我们看到" copy"阶段之后是"排序"相。
copyPhase.complete(); // copy is already complete
taskStatus.setPhase(TaskStatus.Phase.SORT);
reduceTask.statusUpdate(umbilical);
// Finish the on-going merges...
RawKeyValueIterator kvIter = null;
try {
kvIter = merger.close();
} catch (Throwable e) {
throw new ShuffleError("Error while doing final merge " , e);
}
在TaskStatus课程中,我们看到shuffletime是排序阶段之前的时间,排序时间是shuffle和reduce阶段之间的时间。
public void setPhase(Phase phase){
TaskStatus.Phase oldPhase = getPhase();
if (oldPhase != phase){
// sort phase started
if (phase == TaskStatus.Phase.SORT){
if (oldPhase == TaskStatus.Phase.MAP) {
setMapFinishTime(System.currentTimeMillis());
}
else {
setShuffleFinishTime(System.currentTimeMillis());
}
}else if (phase == TaskStatus.Phase.REDUCE){
setSortFinishTime(System.currentTimeMillis());
}
this.phase = phase;
}
...
在JobInfo类中,我们看到shuffle time对应于复制,并且合并时间是" sort"我们上面提到的时间。
switch (task.getType()) {
case MAP:
successfulMapAttempts += successful;
failedMapAttempts += failed;
killedMapAttempts += killed;
if (attempt.getState() == TaskAttemptState.SUCCEEDED) {
numMaps++;
avgMapTime += (attempt.getFinishTime() - attempt.getLaunchTime());
}
break;
case REDUCE:
successfulReduceAttempts += successful;
failedReduceAttempts += failed;
killedReduceAttempts += killed;
if (attempt.getState() == TaskAttemptState.SUCCEEDED) {
numReduces++;
avgShuffleTime += (attempt.getShuffleFinishTime() - attempt
.getLaunchTime());
avgMergeTime += attempt.getSortFinishTime()
- attempt.getShuffleFinishTime();
avgReduceTime += (attempt.getFinishTime() - attempt
.getSortFinishTime());
}
}
有关如何分别从类MapTask和ReduceTask派生缩减和地图任务的更多信息。
最后,我想指出,我在链接中引用的源代码大多对应于Hadoop 2.7.1代码。