Question

我想在Elastic MapReduce中的Hadoop作业中使用多个输出。所以，我在MultipleOutputs方法中设置了main()，如下所示：

MultipleOutputs.addNamedOutput(hadoopJob, "One",
    TextOutputFormat.class, NullWritable.class, Text.class);

MultipleOutputs.addNamedOutput(hadoopJob, "Two",
    TextOutputFormat.class, NullWritable.class, Text.class);

我想要＆＃34;一个＆＃34;包含来自Mapper的输出，而＃34;两个＆＃34;包含Reducer的输出。

在mapper和reducer的setup方法中，我调用：

outputWriters = new MultipleOutputs(context);

在映射器中，我打电话给：

outputWriters.write("One", nothing, sampleOutput, "One");

在减速机中，我打电话给：

outputWriters.write("Two", nothing, new Text(thing.getStuff()), "Two");

最后，在mapper和reducer的cleanup方法中，我调用：

outputWriters.close();

当我这样做时，我得到一个＆＃34;文件已经存在＆＃34; Reducer中的异常 - 它尝试重新创建已由映射器创建的输出文件。

我可以通过从映射器outputWriters.close()方法中删除cleanup来解决此问题，但它引入了另一个问题：我没有获得任何映射器输出。

在映射器中使用MultipleOutputs和在reducer中使用一个的正确方法是什么？ JavaDocs没有提到这种情况，我在StackOverflow上找不到任何有用的东西。

更新：这似乎在本地运行良好。但是，如果我尝试在带有S3输出的Elastic MapReduce中运行它，我会遇到＆＃34;文件已经存在的错误。＆＃34;有关解决方法的任何想法吗？

Hadoop MapReduce MultipleOutputs - 一个在Mapper中，一个在Reducer中

0 个答案: