我是Hadoop的新手。我正在尝试使用以下代码读取HDFS上的现有文件。配置似乎是文件,文件路径也是正确的。 -
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
private static Text f1, f2, hdfsfilepath;
private static HashMap<String, ArrayList<String>> friendsData = new HashMap<>();
public void setup(Context context) throws IOException {
Configuration conf = context.getConfiguration();
Path path = new Path("hdfs://cshadoop1" + conf.get("hdfsfilepath"));
FileSystem fs = FileSystem.get(path.toUri(), conf);
if (fs.exists(path)) {
BufferedReader br = new BufferedReader(
new InputStreamReader(fs.open(path)));
String line;
line = br.readLine();
while (line != null) {
StringTokenizer str = new StringTokenizer(line, ",");
String friend = str.nextToken();
ArrayList<String> friendDetails = new ArrayList<>();
while (str.hasMoreTokens()) {
friendDetails.add(str.nextToken());
}
friendsData.put(friend, friendDetails);
}
}
}
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
for (String k : friendsData.keySet()) {
context.write(new Text(k), new Text(friendsData.get(k).toString()));
}
}
}
运行代码时,我得到以下异常 -
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://cshadoop1/socNetData/userdata/userdata.txt already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
我只是想读一个现有的文件。我在这里缺少什么想法?感谢任何帮助。
答案 0 :(得分:2)
异常告诉您输出目录已存在,但不应该存在。删除它或更改其名称。
此外,输出目录'userdata.txt'的名称类似于文件名。因此,请检查输入/输出目录中是否存在错误。