以下是数据集的一部分:
18,8,307,130,3504,12,70,1,chevrolet
15,8,350,165,3693,11.5,70,1,buick
18,8,318,150,3436,11,70,1,plymouth
16,8,304,150,3433,12,70,1,amc
17,8,302,140,3449,10.5,70,1,ford
15,8,429,198,4341,10,70,1,ford
14,8,454,220,4354,9,70,1,chevrolet
14,8,440,215,4312,8.5,70,1,plymouth
以下是代码:
data = sc.textFile("hw6/auto_mpg_original.csv")
records = data.map(lambda x: x.split(","))
hp = float(records.map(lambda x: x[3]))
disp = np.array(float(records.map(lambda x: x[2])))
final_data_1 = LabeledPoint(hp, disp)
这是错误:
Traceback (most recent call last):
File "/home/cloudera/Desktop/hw6.py", line 41, in <module>
hp = float(records.map(lambda x: x[3]))
TypeError: float() argument must be a string or a number
这看起来很基本,但我真的无法找到解决方案。
答案 0 :(得分:1)
检查records.map()
的类型RDD
。您可以在float()
中应用map()
,例如:
hp = records.map(lambda x: float(x[3]))
但是在使用之前你需要.collect()
结果,例如:
hp = records.map(lambda x: float(x[3])).collect()
disp = np.array(records.map(lambda x: float(x[2])).collect())
答案 1 :(得分:0)
CSV输入存在问题,该列为空或包含非数值