在MapReduce中重新创建Python字典结果?

时间:2017-12-03 21:57:14

标签: python hadoop mapreduce mrjob

在使用mrjob转换为MapReduce时,无法理解标准Python代码产生意外结果的原因。

来自.txt文件的示例数据:

1  12
1  14
1  15
1  16
1  18
1  12
2  11
2  11
2  13
3  12
3  15
3  11
3  10

此代码创建一个字典并执行简单的除法计算:

dic = {}

with open('numbers.txt', 'r') as fi:
    for line in fi:
        parts = line.split()
        dic.setdefault(parts[0],[]).append(int(parts[1]))

print(dic)

for k, v in dic.items():
    print (k, 1/len(v), v)

结果:

{'1': [12, 14, 15, 16, 18, 12], '2': [11, 11, 13], '3': [12, 15, 11, 10]}

1 0.16666666666666666 [12, 14, 15, 16, 18, 12]
2 0.3333333333333333 [11, 11, 13]
3 0.25 [12, 15, 11, 10]

但是当使用mrjob转换为MapReduce时:

from mrjob.job import MRJob
from mrjob.step import MRStep
from collections import defaultdict

class test(MRJob):

    def steps(self):
        return [MRStep(mapper=self.divided_vals)]

    def divided_vals(self, _, line):

        dic = {}
        parts = line.split() 
        dic.setdefault(parts[0],[]).append(int(parts[1]))

        for k, v in dic.items():
            yield (k, 1/len(v)), v 

if __name__ == '__main__': 
    test.run()

结果:

["2", 1.0]  [11]
["2", 1.0]  [13]
["3", 1.0]  [12]
["3", 1.0]  [15]
["3", 1.0]  [11]
["3", 1.0]  [10]
["1", 1.0]  [12]
["1", 1.0]  [14]
["1", 1.0]  [15]
["1", 1.0]  [16]
["1", 1.0]  [18]
["1", 1.0]  [12]
["2", 1.0]  [11]

为什么没有MapReduce组并以同样的方式计算?如何在MapReduce中重新创建标准Python结果?

0 个答案:

没有答案