这似乎是我在互联网上找到的一个问题。根据我的回答应该是k / r而不是大约k / r?你有什么看法?我知道它将是r文件作为输出。
If you run the word count MapReduce program with m mappers and r reducers, how many output files will you get at the end of the job? And how many key-value pairs will there be in each file? Assume k is the number of unique words in the input files.
A. There will be r files, each with exactly k/r key-value pairs.
B. There will be r files, each with approximately k/m key-value pairs.
C. There will be r files, each with approximately k/r key-value pairs.
D.There will be m files, each with exactly k/m key value pairs.
E.There will be m files, each with approximately k/m key-value pairs.
答案 0 :(得分:2)
选项C是正确的。
mapreduce作业生成的输出文件数等于执行的reducer数。所以,there will be r files created
。
默认情况下,mapreduce框架使用HashPartition来分区密钥。
Partition = (Hash value of the key) % (Number of reducers)
因此,如果两个或多个键具有相同的哈希值,那么它将转到同一个分区。在这种情况下,我们不能指望确切的k / r键值对。
当且仅当所有键的哈希值不同时,我们才能获得准确的k / r键值。
因此,最终答案为each with approximately k/r key-value pairs.