我刚开始学习Hadoop,但不了解datanode如何成为reducer节点。
但如何 jobtracker决定哪个节点成为reducer节点?我正在阅读 Hadoop权威指南,但书中未提及此步骤。
谢谢, Bruckwald
答案 0 :(得分:6)
非常first-come, first-serve
。任务由心跳分配,因此如果Tasktracker对Jobtracker执行活动,它将获得可能包含要运行的新任务的响应:
List<Task> tasks = getSetupAndCleanupTasks(taskTrackerStatus);
if (tasks == null ) {
tasks = taskScheduler.assignTasks(taskTrackerStatus);
}
if (tasks != null) {
for (Task task : tasks) {
expireLaunchingTasks.addNewTask(task.getTaskID());
LOG.debug(trackerName + " -> LaunchTask: " + task.getTaskID());
actions.add(new LaunchTaskAction(task));
}
}
Here's the relevant source code of the Jobtracker。因此,除了首先使用哪个tasktracker之外,taskscheduler还将检查资源条件(例如,是否存在空闲插槽,或者单个节点是否未过载)。
可以找到相关代码here(这并不特别令人兴奋):
//
// Same thing, but for reduce tasks
// However we _never_ assign more than 1 reduce task per heartbeat
//
final int trackerCurrentReduceCapacity =
Math.min((int)Math.ceil(reduceLoadFactor * trackerReduceCapacity),
trackerReduceCapacity);
final int availableReduceSlots =
Math.min((trackerCurrentReduceCapacity - trackerRunningReduces), 1);
boolean exceededReducePadding = false;
if (availableReduceSlots > 0) {
exceededReducePadding = exceededPadding(false, clusterStatus,
trackerReduceCapacity);
synchronized (jobQueue) {
for (JobInProgress job : jobQueue) {
if (job.getStatus().getRunState() != JobStatus.RUNNING ||
job.numReduceTasks == 0) {
continue;
}
Task t = job.obtainNewReduceTask(taskTracker, numTaskTrackers, taskTrackerManager.getNumberOfUniqueHosts());
if (t != null) {
assignedTasks.add(t);
break;
}
// Don't assign reduce tasks to the hilt!
// Leave some free slots in the cluster for future task-failures,
// speculative tasks etc. beyond the highest priority job
if (exceededReducePadding) {
break;
}
}
}
基本上,第一个跟踪Jobtracker并且有足够可用插槽的任务跟踪器将获得减少任务。