单个地图中的多个输出格式减少

时间:2016-03-04 16:06:35

标签: hadoop mapreduce parquet

我有一个mapreduce工作,它读取文本文件并从中创建镶木地板文件,同时写入简单的文本文件作为输出。我已经使用了多种输出格式。但是可以初始化多个输出格式对象,以便一次写入镶木地板文件或文本文件。我需要在单个映射器中容纳两者。任何帮助都非常感谢。

1 个答案:

答案 0 :(得分:0)

Not sure it's the best way, but you can just initialize a StringBuilder in our mapper's setup() method, append all text values to it during the map() method and then write it to disk in the cleanup method. Depends on the size of your text output and if you have enough memory or not. That way the text file doesn't need to be a mapper output at all, and your mapper output can be the Parquet data only.

You could use context.getInputSplit() or something similar as the text output file names so that each mapper outputs a unique file name and you know which output correponds to which input.