Question

我需要读取2个不同的输入文件并写入2个输出文件。第一个文件是主输入文件，第二个是字典。我的工作应该在mapper和reducers中同时处理这两个文件。据我所知，我不能使用多输入。我尝试使用BufferedReader和BufferedWriter。但后来我在mapper中创建了另一个工作，还有另一个在reducer中的工作。我该如何解决问题？

Answer 1

您可以使用多个文件输入，请参阅http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/MultipleInputs.html。

MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyMapper.class);

您可以在inputPath1中输入多个文件，inputPath2 ..

Answer 2

If the size of your second file is less you can use Distributed Cache and use the file in mappers for processing. Refer to http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

在hadoop mapreduce中读取2个输入文件

2 个答案: