Question

我是Hadoop的新手。我正在尝试使用以下代码读取HDFS上的现有文件。配置似乎是文件，文件路径也是正确的。 -

public static class Map extends Mapper<LongWritable, Text, Text, Text> {

    private static Text f1, f2, hdfsfilepath;
    private static HashMap<String, ArrayList<String>> friendsData = new HashMap<>();

    public void setup(Context context) throws IOException {
      Configuration conf = context.getConfiguration();
      Path path = new Path("hdfs://cshadoop1" + conf.get("hdfsfilepath"));
      FileSystem fs = FileSystem.get(path.toUri(), conf);
      if (fs.exists(path)) {
        BufferedReader br = new BufferedReader(
            new InputStreamReader(fs.open(path)));
        String line;
        line = br.readLine();
        while (line != null) {
          StringTokenizer str = new StringTokenizer(line, ",");
          String friend = str.nextToken();
          ArrayList<String> friendDetails = new ArrayList<>();
          while (str.hasMoreTokens()) {
            friendDetails.add(str.nextToken());
          }
          friendsData.put(friend, friendDetails);
        }
      }
    }

    public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
      for (String k : friendsData.keySet()) {
        context.write(new Text(k), new Text(friendsData.get(k).toString()));
      }
    }
  }

运行代码时，我得到以下异常 -

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://cshadoop1/socNetData/userdata/userdata.txt already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)

我只是想读一个现有的文件。我在这里缺少什么想法？感谢任何帮助。

Answer 1

异常告诉您输出目录已存在，但不应该存在。删除它或更改其名称。

此外，输出目录'userdata.txt'的名称类似于文件名。因此，请检查输入/输出目录中是否存在错误。

Hadoop Map Reduce - 读取HDFS文件 - FileAlreadyExists错误

1 个答案: