Question

在作业中有两个输入文件，它们位于两个不同的目录中。在Hadoop job taking input files from multiple directories中，我们可以从多个目录中读取文件。这些文件具有相同的名称，但它们位于不同的名称文件夹中。 C1/part-0000 C2/part-0000 是否有可能在地图阶段检测文件？
就像是： public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if (First file) { ... context.write(outputKey, outputValue); } } else { //Second file ... context.write(outputKey, outputValue); } }

Answer 1

在设置阶段检查

@Override
protected void setup(Context context) throws IOException, InterruptedException {
    FileSplit split = (FileSplit) context.getInputSplit();
    Path path = split.getPath();
    String name = path.getName();
    ...

不要在map方法中为每一行检查它，因为每个映射器都是为1个输入拆分创建的。

Hadoop作业从多个目录中获取输入文件，并在映射阶段检测每个目录

1 个答案: