Question

这似乎是我在互联网上找到的一个问题。根据我的回答应该是k / r而不是大约k / r？你有什么看法？我知道它将是r文件作为输出。

If you run the word count MapReduce program with m mappers and r reducers, how many output files will you get at the end of the job? And how many key-value pairs will there be in each file? Assume k is the number of unique words in the input files.
A. There will be r files, each with exactly k/r key-value pairs.
B. There will be r files, each with approximately k/m key-value pairs.
C. There will be r files, each with approximately k/r key-value pairs.
D.There will be m files, each with exactly k/m key value pairs.
E.There will be m files, each with approximately k/m key-value pairs.

Answer 1

选项C是正确的。

mapreduce作业生成的输出文件数等于执行的reducer数。所以，there will be r files created。

默认情况下，mapreduce框架使用HashPartition来分区密钥。

Partition = (Hash value of the key) % (Number of reducers)

因此，如果两个或多个键具有相同的哈希值，那么它将转到同一个分区。在这种情况下，我们不能指望确切的k / r键值对。

当且仅当所有键的哈希值不同时，我们才能获得准确的k / r键值。

因此，最终答案为each with approximately k/r key-value pairs.

使用m mapper和r reducer发出多少个键值对，为wordcount程序发出k个唯一字？

1 个答案: