mapreduce工作的进展

时间:2015-02-10 10:53:00

标签: java hadoop mapreduce

我正在开发一个MapReduce应用程序,我想知道我正在运行的工作的进度。我已经熟悉job.mapprocess和job.reducerprocess方法。这些方法仅在作业完成时才起作用的问题。

是否有任何方法可以在作业运行时实时提供作业进度,而不仅仅是在完成作业时。

2 个答案:

答案 0 :(得分:1)

在新的Hadoop API中,您可以通过以下方式从mapper或reducer类中的Context对象访问进度值:

public class MyMapper extends Mapper<Writable, Writable, Writable, Writable> {

    @Override
    public void map(Writable key, Writable value, Mapper<Writable, Writable, Writable, Writable>.Context context) throws IOException, InterruptedException {
        context.getProgress();
}

答案 1 :(得分:0)

如果您的意思是编程访问,那么您需要使用JobClient API:

https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/JobClient.html

https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html

您可以通过JobClient提交作业:

JobClient jobClient = new JobClient(jobConf);
RunningJob job = jobClient.submitJob(jobConf);
float mapProgress = job.mapProgress();
float redProgress = job.reduceProgress();

或者可以查看现有工作:

JobClient jobClient = new JobClient(jobConf);
RunningJob job = jobClient.getJob("your_job_id");
...