我正在尝试在python中执行mapreduce,我的csv文件如下所示,
trip_id taxi_id pickup_time dropoff_time ... total
0 20117 2455.0 2013-05-05 09:45:00 50.44
1 44691 1779.0 2013-06-24 11:30:00 66.78
我的代码是,
import pandas as pd
import numpy as np
from mrjob.job import MRJob
class MRCount(MRJob):
def mapper(self, _, line):
datarow = line.replace(' ','').replace('N/A','').split(',')
trip_id = datarow[0]
total = datarow[14]
total = np.float(total)
yield ((trip_id), (total))
由于我的代码将所有行传递给mapper,所以它以字符串行(索引)开头,但是我想要使用total来播放,这样当我运行文件时,它会出错
TypeError: float() argument must be a string or a number, not 'generator'
处理mapper函数时如何跳过csv文件的第一行?
答案 0 :(得分:2)
不确定具体内容' line'具有。解决问题的一个简单方法是尝试/除浮动。
def mapper(self, _, line):
datarow = line.replace(' ','').replace('N/A','').split(',')
trip_id = datarow[0]
total = datarow[14]
try:
total = np.float(total)
except TypeError:
print("skipping line with value", datarow[14])
else:
yield ((trip_id), (total))