mapreduce慢合并,如何加速?

时间:2018-05-20 04:50:26

标签: hadoop mapreduce

我提交了一份9000+ mapper和4096 reducer的工作 每个映射器的输入都是一个hdfs文件,大约10GB(5000万行)。
通常,一个映射器运行25分钟。
然而,总有大约1~10个映射器运行成本为1小时 这些地图制作者'清理(​​)发生在mapper的设置()后20分钟 合并成本为40分钟 如何加快合并? 一些日志:

2018-05-19 12:03:14,791 INFO [main] org.apache.hadoop.mapred.Merger: Merging 8 sorted segments
2018-05-19 12:03:14,792 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 1282276 bytes
2018-05-19 12:03:14,805 INFO [main] org.apache.hadoop.mapred.Merger: Merging 8 sorted segments
2018-05-19 12:03:14,810 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 1181792 bytes
2018-05-19 12:03:14,845 INFO [main] org.apache.hadoop.mapred.Merger: Merging 8 sorted segments
2018-05-19 12:03:14,904 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 1028459 bytes
2018-05-19 12:03:14,912 INFO [main] org.apache.hadoop.mapred.Merger: Merging 8 sorted segments
2018-05-19 12:03:14,913 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 988898 bytes

0 个答案:

没有答案