Question

主要问题是程序启动了 Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/davide/wordcount/input already exists

我运行以启动作业的命令如下： hadoop jar wordcount.jar org.wordcount.WordCount /user/davide/wordcount/input /user/davide/wordcount/output似乎正确（如hadoop所假，输出目录不存在）。

在java文件中，路径似乎已正确设置：

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

我尝试了几种解决方案，但无法找出问题所在。

谢谢。

Answer 1

问题出在参数编号上：args[0]实际上是org.wordcount.WordCount，因此您需要使用args[1]进行输入，使用args[2]进行输出。如果您注意到了，错误提示为Output directory hdfs://quickstart.cloudera:8020/user/davide/wordcount/input already exists-它正在尝试使用input文件夹作为输出。

要解决此问题：

FileInputFormat.addInputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));

Hadoop-输入目录问题

1 个答案: