在HDFS上执行JAR文件命令时出现错误,如下所示
#hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57
15/11/06 19:46:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/11/06 19:46:32 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:8020/var/lib/hadoop-0.20/cache/mapred/mapred/staging/root/.staging/job_201511061734_0003
15/11/06 19:46:32 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:921)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:526)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:556)
at MapReduce.WordCountNew.main(WordCountNew.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
My Driver class Program is as below
public static void main(String[] args) throws IOException, Exception {
// Configutation details w. r. t. Job, Jar file
Configuration conf = new Configuration();
Job job = new Job(conf, "WORDCOUNTJOB");
// Setting Driver class
job.setJarByClass(MapReduceWordCount.class);
// Setting the Mapper class
job.setMapperClass(TokenizerMapper.class);
// Setting the Combiner class
job.setCombinerClass(IntSumReducer.class);
// Setting the Reducer class
job.setReducerClass(IntSumReducer.class);
// Setting the Output Key class
job.setOutputKeyClass(Text.class);
// Setting the Output value class
job.setOutputValueClass(IntWritable.class);
// Adding the Input path
FileInputFormat.addInputPath(job, new Path(args[0]));
// Setting the output path
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// System exit strategy
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
有人可以在我的代码中纠正这个问题吗?
此致 Pranav
答案 0 :(得分:2)
您需要检查输出目录是否已存在,如果存在,则将其删除。 MapReduce不能(或不会)将文件写入存在的目录。它需要创建目录以确保。
添加:
Path outPath = new Path(args[1]);
FileSystem dfs = FileSystem.get(outPath.toUri(), conf);
if (dfs.exists(outPath)) {
dfs.delete(outPath, true);
}
答案 1 :(得分:0)
在执行程序之前,输出目录不应该存在。删除现有目录或提供新目录或删除程序中的输出目录。
我更喜欢在从命令提示符执行程序之前从命令提示符删除输出目录。
从命令提示符:
hdfs dfs -rm -r <your_output_directory_HDFS_URL>
来自java:
Chris Gerken code is good enough.
答案 2 :(得分:0)
您尝试创建存储输出的输出目录已经存在。所以尝试删除以前的同名目录或更改输出目录的名称。
答案 3 :(得分:0)
正如其他人所说,您收到错误是因为输出目录已经存在,很可能是因为您之前尝试过执行此作业。
您可以在运行作业之前删除现有的输出目录,即:
#hadoop fs -rm -r /MROutput57 && \
hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57