我正在研究hadoop,我希望每个地图功能都能在多行上运行。我发现我可以使用属性mapreduce.input.lineinputformat.linespermap,但是如果我理解了它,我可以指定单个映射器的行数而不是每个映射函数。我怎样才能做到这一点?提前谢谢。
答案 0 :(得分:0)
1)您必须编写自定义文本格式。
2)您必须为此创建自己的自定义记录阅读器,您将在其中实现逻辑。
You will extend from TextInputFormat class to create your own NLinesInputFormat .
You will also create your own RecordReader class called NLinesRecordReader where you will implement the logic of feeding 3 lines/records at a time.
You will make a change in our driver program to use our new NLinesInputFormat class.
please follow the link for complete details :
请按照以下链接获取详细方法: http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/