我正在开发一个MapReduce应用程序,我想知道我正在运行的工作的进度。我已经熟悉job.mapprocess和job.reducerprocess方法。这些方法仅在作业完成时才起作用的问题。
是否有任何方法可以在作业运行时实时提供作业进度,而不仅仅是在完成作业时。
答案 0 :(得分:1)
在新的Hadoop API中,您可以通过以下方式从mapper或reducer类中的Context对象访问进度值:
public class MyMapper extends Mapper<Writable, Writable, Writable, Writable> {
@Override
public void map(Writable key, Writable value, Mapper<Writable, Writable, Writable, Writable>.Context context) throws IOException, InterruptedException {
context.getProgress();
}
答案 1 :(得分:0)
如果您的意思是编程访问,那么您需要使用JobClient API:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/JobClient.html
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html
您可以通过JobClient提交作业:
JobClient jobClient = new JobClient(jobConf);
RunningJob job = jobClient.submitJob(jobConf);
float mapProgress = job.mapProgress();
float redProgress = job.reduceProgress();
或者可以查看现有工作:
JobClient jobClient = new JobClient(jobConf);
RunningJob job = jobClient.getJob("your_job_id");
...