使用m mapper和r reducer发出多少个键值对,为wordcount程序发出k个唯一字?

时间:2015-10-09 01:28:20

标签: hadoop mapreduce

这似乎是我在互联网上找到的一个问题。根据我的回答应该是k / r而不是大约k / r?你有什么看法?我知道它将是r文件作为输出。

If you run the word count MapReduce program with m mappers and r reducers, how many output files will you get at the end of the job? And how many key-value pairs will there be in each file? Assume k is the number of unique words in the input files.
A. There will be r files, each with exactly k/r key-value pairs.
B. There will be r files, each with approximately k/m key-value pairs.
C. There will be r files, each with approximately k/r key-value pairs.
D.There will be m files, each with exactly k/m key value pairs.
E.There will be m files, each with approximately k/m key-value pairs.

1 个答案:

答案 0 :(得分:2)

选项C是正确的。

mapreduce作业生成的输出文件数等于执行的reducer数。所以,there will be r files created

默认情况下,mapreduce框架使用HashPartition来分区密钥。

Partition = (Hash value of the key) % (Number of reducers) 

因此,如果两个或多个键具有相同的哈希值,那么它将转到同一个分区。在这种情况下,我们不能指望确切的k / r键值对。

当且仅当所有键的哈希值不同时,我们才能获得准确的k / r键值。

因此,最终答案为each with approximately k/r key-value pairs.