我正在使用CDH5.3,我正在尝试编写一个mapreduce程序来扫描一个表并进行一些处理。我创建了一个扩展TableMapper
的mapper,下面是我的代码。
public class TestTableMapper extends TableMapper {
@Override
protected void setup(Context context) throws IOException, InterruptedException {
super.setup(context);
}
@Override
protected void map(Object key, Object value, Context context) throws IOException, InterruptedException {
//do something here
}
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
super.cleanup(context);
}
}
public class Driver extends Configured implements Tool {
public static void main(String[] args) {
try {
int exitCode = ToolRunner.run(new Driver(), args);
if (exitCode==0){
System.exit(exitCode);
}
} catch (Exception e) {
e.printStackTrace();
}
}
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), args[0]);
job.setJarByClass(Driver.class);
job.setInputFormatClass(TextInputFormat.class);
TableMapReduceUtil.initTableMapperJob("TestTable",new Scan(), TestTableMapper.class, Text.class, TextInputFormat.class, job);
TableMapReduceUtil.addDependencyJars(job);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
return 0;
}
}
当我运行打包的jar时,我得到以下异常。
15/12/16 18:15:37 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/data/hadoop/mapred/staging/hduser1308282625/.staging/job_local1308282625_0001
15/12/16 18:15:37 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://hadoopHost:9000/home/hduser/applciations/hadoop-2.5.0-cdh5.3.0/share/hadoop/common/lib/netty-3.6.2.Final.jar
java.io.FileNotFoundException: File does not exist: hdfs://hadoopHost:9000/home/hduser/applciations/hadoop-2.5.0-cdh5.3.0/share/hadoop/common/lib/netty-3.6.2.Final.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:481)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at com.ugam.test.Driver.run(Driver.java:47)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.ugam.test.Driver.main(Driver.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
netty jar存在于hadoop共享库中,但我不确定它为什么要在hdfs路径中搜索。
如果我使用没有TableMapper
的普通Mapper,那么即使我在mapper中使用与netty相关的东西,每一件事都运行良好。