我正在尝试隐藏此脚本,使我无法在mrjob map reduce中运行
import csv
cities=[]
with open('data.csv', newline='') as csvfile:
file = csv.reader(csvfile, delimiter=',')
for row in file:
if row[0] not in cities:
cities.append(row[0])
print (len(cities))
我目前有
from mrjob.job import MRJob
from mrjob.step import MRStep
class citiesCount(MRJob):
# each input lines consists of city, productcities, price, and paymentMode
def mapper(self, _, line):
# create a key-value pair with key: productcities and value: 1
line_cols = line.split(',')
yield line_cols[0], 0
def reducer(self, cities, counts):
# final consolidation of key-value pairs at reducer nodes
yield None, (cities, counts)
def reducer_count(self, cities):
yield (len(cities))
def steps(self):
return [
MRStep(mapper=self.mapper,
reducer=self.reducer),
MRStep(reducer=self.reducer_count)
]
if __name__ == '__main__':
citiesCount.run()
我知道我已经接近了,但是我不知道自己在想什么 无法发布该数据集的500,000个项目,但教授在此链接https://users.cs.fiu.edu/~prabakar/database/4722sp19/abarr054-3495916/上下载了