我尝试从作业跟踪器中收集一些信息。对于初学者,我想首先获得运行的工作信息,例如工作ID或工作名称等。但是已经卡住了,这就是我所拥有的(打印出当前正在运行的工作的工作ID):
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "zk1.myhost,zk2.myhost,zk3.myhost");
conf.set("hbase.zookeeper.property.clientPort", "2181");
InetSocketAddress jobtracker = new InetSocketAddress("jobtracker.mapredhost.myhost", 8021);
JobClient jobClient = new JobClient(jobtracker, conf);
JobStatus[] jobs = jobClient.jobsToComplete();
for (int i = 0; i < jobs.length; i++) {
JobStatus js = jobs[i];
if (js.getRunState() == JobStatus.RUNNING) {
JobID jobId = js.getJobID();
System.out.println(jobId);
}
}
}
这个上面在尝试显示作业ID时起到了魅力作用,但现在我也想显示作业名称。所以我在打印作业ID后添加了这一行:
System.out.println(jobClient.getJob(jobId).getJobName());
我得到了这个例外:
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:226)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1080)
at org.apache.test.JobTracker.main(JobTracker.java:28)
jobClient
不是null
。我知道这是因为我尝试使用null check if语句,但jobClient.getJob(jobId)
是null
。我在这做错了什么?
根据API,我应该没问题,
首先从jobClient获取RunningJob
而不是在您运行作业之后获取http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/RunningJob.html#getJobName()
之前有人做过这样的事吗?我可以使用jsoup通过GET请求获取此信息,但我认为这是获取此信息的更好方法。
这里的问题更新是我的hadoop / hbase依赖项:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>0.23.1-mr1-cdh4.0.0b2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.23.1-mr1-cdh4.0.0b2</version>
<exclusions>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.92.1-cdh4b2-SNAPSHOT</version>
</dependency>
Bounty更新:
以下是我的导入:
import java.io.IOException;
import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobID;
import org.apache.hadoop.mapred.JobStatus;
以下是System.out.println(jobId)
的输出:
job_201207031810_1603
目前只有一项工作正在运行。
答案 0 :(得分:17)
查看NetworkedJob
的内部课程JobClient
。
(来源:/home/user/hadoop/src/mapred/org/apache/hadoop/mapred/JobClient.java)
它的构造函数尝试从第225行的Configuration
中获取JobClient
对象,但由于new JobClient(InetSocketAddress jobTrackAddr, Configuration conf)
未设置它,因此它为null:
// Set the completion poll interval from the configuration.
// Default is 5 seconds.
Configuration conf = JobClient.this.getConf();
this.completionPollIntervalMillis = conf.getInt(COMPLETION_POLL_INTERVAL_KEY,
DEFAULT_COMPLETION_POLL_INTERVAL); //NPE occurs here!
作为解决方法,请在创建JobClient对象后手动设置它。这将解决您的问题:
..
JobClient jobClient = new JobClient(jobtracker, conf);
jobClient.setConf(conf);
....
旁注:
我通过以下方式实例化Configuration
对象:
Configuration conf = new Configuration();
conf.addResource(new Path("/path_to/core-site.xml"));
conf.addResource(new Path("/path_to/hdfs-site.xml"));