Question

我正在编写一个简单的Map Reduce程序，该程序计算体育赛事期间每天发送的推文的数量。

def mapper(self,_,line):
    #Extracting the fields of csv line
    fields = line.split(";")
    #To choose the actual tweet we extract field[4]
    for field[4] in fields:
        time_epoch = int(fields[0])/1000
        #Extract date tweet was sent
        day = time.strftime("%d",time.gmtime(time_epoch))
        #For each date, count num of tweets sent
        #Since calculating the number of tweets sent each day
        #Shouldn't day be the key, and intermediate value be 1
        yield(day, 1)

现在，reducer代码接受中间键，值并执行聚合：

def reducer(self, day, counts):
    #For each day during the sporting event, calculate the total tweets sent
    yield(day, sum(counts))

我正在努力确定减速器的密钥应该是实际推文还是发送推文的日期。但是，我得出的结论是，由于我想每天计算总计，因此应该以特定的日期为准。

但是，我遇到一个错误，我想知道是否有明显的遗漏？非常感谢！

Answer 1

您没有提到您的错误，但是for field[4] in fields:在语法上不正确。

也许您是说if len(fields) >= 4:

但是，我得出的结论是，由于我想每天计算总计，因此应该以特定的日子为关键

听起来对我来说

MapReduce，调整Mapper方法

1 个答案: