执行mapreduce作业时无法识别对象

时间:2016-06-16 14:41:14

标签: python mapreduce mrjob

我正在尝试运行简单的地图缩减作业并获得以下数据集:

bike.txt

1   Bike 1
2   Bike 2
3   Bike 4
4   Bike 4
5   Bike 4

bikenames.txt

1,Aap
2,Noot
3,Greet
4,Mies
5,Gazelle

我的目标是编写一个mapreduce作业,它出自最常出现的自行车名称。因此我写了以下内容:

from mrjob.job import MRJob
from mrjob.step import MRStep

class MostPopularBike(MRJob):
def configure_options(self):
        super(MostPopularBike, self).configure_options()
        self.add_file_option('--items', help='Path to u.item')

    def steps(self):
        return [
            MRStep(mapper=self.mapper_get_ratings,
                   reducer=self.reducer_count_ratings),
            MRStep(mapper = self.mapper_passthrough,
                   reducer = self.reducer_find_max)
        ]  

    def mapper_get_ratings(self, _, line):
        (bikeID, name) = line.split('\t')
        yield bikeID, 1

    def reducer_init(self):
        self.bikeNames = {}

        with open("bikenames.txt`") as f:
            for line in f:
                fields = line.split(',')
                self.bikeNames[fields[0]] = fields[1]

    def reducer_count_ratings(self, key, values):
        yield None, (sum(values), self.bikeNames[key])

    def mapper_passthrough(self, key, value):
        yield key, value

    def reducer_find_max(self, key, values):
        yield max(values)

if __name__ == '__main__':
     MostPopularBike.run() 

如果我尝试使用以下方式运行它:

!python MostPopularBike.py --items=bikenames.txt bike.txt

但它会产生以下错误:

AttributeError: 'MostPopularBike' object has no attribute 'bikeNames'

对这里出了什么问题的想法?

1 个答案:

答案 0 :(得分:1)

bikeNames仅在reducer_init()中定义,因此不得调用此函数。无论如何,它并不是每个步骤的初始化函数;它看起来更像是工作的初始化。

在创建reducer_init实例时,将函数名称从__init__更改为MostPopularBike以执行初始化。或者,如果您确实希望在每个步骤中执行初始化,请将steps更新为:

def steps(self):
    return [
        MRStep(reducer_init=self.reducer_init,
               mapper=self.mapper_get_ratings,
               reducer=self.reducer_count_ratings),
        MRStep(reducer_init=self.reducer_init,
               mapper = self.mapper_passthrough,
               reducer = self.reducer_find_max)
    ]