我正在尝试通过此循环创建票价范围(1/2/3),但似乎不起作用
traindf['FareBand'] = np.nan
for index, row in traindf.iterrows():
if row['Fare'] <= 13.675550:
row['FareBand'] = 1
elif row['Fare'] <= 20.662183 and row['Fare'] > 13.675550:
row['FareBand'] = 2
else:
row['FareBand'] = 3
运行.head()将显示我在列提示框下的所有行均为NaN
traindf.head(20)
Output:
0 NaN
1 NaN
2 NaN
3 NaN
...
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
Name: FareBand, dtype: float64
可能是什么原因?
答案 0 :(得分:4)
我建议使用numpy.select
:
traindf = pd.DataFrame({'Fare':[10,15,3,30]})
m1 = traindf['Fare'] <= 13.675550
m2 = (traindf['Fare'] <= 20.662183) & (traindf['Fare'] > 13.675550)
traindf['FareBand'] = np.select([m1, m2], [1,2], 3)
print (traindf)
Fare FareBand
0 10 1
1 15 2
2 3 1
3 30 3
您的解决方案可以按索引更改选择值,但请不要使用它,因为它很慢:
for index, row in traindf.iterrows():
if traindf.loc[index, 'Fare'] <= 13.675550:
traindf.loc[index, 'FareBand'] = 1
elif row['Fare'] <= 20.662183 and traindf.loc[index, 'Fare'] > 13.675550:
traindf.loc[index, 'FareBand'] = 2
else:
traindf.loc[index, 'FareBand'] = 3
print (traindf)
Fare FareBand
0 10 1.0
1 15 2.0
2 3 1.0
3 30 3.0
答案 1 :(得分:1)
您可以在没有循环的情况下分三个步骤进行操作:
traindf['FareBand'] = 3
traindf.loc[traindf['Fare'].between(13.675550, 20.662183), 'FareBand'] = 2
traindf.loc[traindf['Fare'].le(13.675550), 'FareBand'] = 1
答案 2 :(得分:1)
如果要使用您描述的方法,在循环内应用更改,您所需要做的就是将数据框行的值设置在特定的索引位置:
for index, row in traindf.iterrows():
if row['Fare'] <= 13.675550:
row['FareBand'] = 1
elif row['Fare'] <= 20.662183 and row['Fare'] > 13.675550:
row['FareBand'] = 2
else:
row['FareBand'] = 3
traindf.loc[index] = row