我的hadoop程序有问题。我正在尝试将文件读入映射器,但我总是收到错误,告诉我该文件不存在。
代码如下:
Configuration conf = new Configuration();
//String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
conf.set("mapreduce.job.queuename", "alpha");
conf.setLong("mapreduce.task.timeout", 1000 * 60 * 60);
conf.setDouble("mapreduce.job.reduce.slowstart.completedmaps", 0.75);
conf.set("mapred.textoutputformat.separator", "\t");
job.setMapperClass(MapperCollector.class);
// job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(MetaDataReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("/user/myuser/theData.csv"));
FileSystem hdfs = FileSystem.get(new Configuration());
Path outFolder = new Path("/user/myuser/outFolder/");
if (hdfs.exists(outFolder)) {
hdfs.delete(outFolder, true); //Delete existing Directory
}
FileOutputFormat.setOutputPath(job, outFolder);
System.exit(job.waitForCompletion(true) ? 0 : 1);
并且失败并显示错误:
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/myuser/theData.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at myuser.mypackage.GenerateTrainingData.main(GenerateTrainingData.java:82)
代码之前正在运行,但在重新启动群集后它无效。而且,我可以做“hadoop df -cat /user/myuser/theData.csv”,它运行得很好。
我似乎hadoop现在正在查看本地磁盘,但该文件位于hdfs中。我不知道为什么会这样。
答案 0 :(得分:0)
如果有人像我一样白痴,我就跑了:
java -jar mycode.jar
而不是
hadoop jar mycode.jar
正确地完成后一切都很完美。