我不明白为什么我的数据集中的列是NaN

时间:2018-12-06 11:41:46

标签: python python-3.x pandas dataframe

我正在尝试通过此循环创建票价范围(1/2/3),但似乎不起作用

traindf['FareBand'] = np.nan

for index, row in traindf.iterrows():
    if row['Fare'] <= 13.675550:
        row['FareBand'] = 1
    elif row['Fare'] <= 20.662183 and row['Fare'] > 13.675550:
        row['FareBand'] = 2
    else:
        row['FareBand'] = 3

运行.head()将显示我在列提示框下的所有行均为NaN

traindf.head(20)

Output:
       0    NaN
       1    NaN
       2    NaN
       3    NaN
       ...
       12   NaN
       13   NaN
       14   NaN
       15   NaN
       16   NaN
       17   NaN
       18   NaN
       19   NaN
       Name: FareBand, dtype: float64

可能是什么原因?

3 个答案:

答案 0 :(得分:4)

我建议使用numpy.select

traindf = pd.DataFrame({'Fare':[10,15,3,30]})

m1 = traindf['Fare'] <= 13.675550
m2 = (traindf['Fare'] <= 20.662183) & (traindf['Fare'] > 13.675550)

traindf['FareBand'] = np.select([m1, m2], [1,2], 3)
print (traindf)
   Fare  FareBand
0    10         1
1    15         2
2     3         1
3    30         3

您的解决方案可以按索引更改选择值,但请不要使用它,因为它很慢:

for index, row in traindf.iterrows():
    if traindf.loc[index, 'Fare'] <= 13.675550:
        traindf.loc[index, 'FareBand'] = 1
    elif row['Fare'] <= 20.662183 and traindf.loc[index, 'Fare'] > 13.675550:
        traindf.loc[index, 'FareBand'] = 2
    else:
        traindf.loc[index, 'FareBand'] = 3

print (traindf)
   Fare  FareBand
0    10       1.0
1    15       2.0
2     3       1.0
3    30       3.0

答案 1 :(得分:1)

您可以在没有循环的情况下分三个步骤进行操作:

traindf['FareBand'] = 3
traindf.loc[traindf['Fare'].between(13.675550, 20.662183), 'FareBand'] = 2
traindf.loc[traindf['Fare'].le(13.675550), 'FareBand'] = 1

答案 2 :(得分:1)

如果要使用您描述的方法,在循环内应用更改,您所需要做的就是将数据框行的值设置在特定的索引位置:

for index, row in traindf.iterrows():
    if row['Fare'] <= 13.675550:
        row['FareBand'] = 1
    elif row['Fare'] <= 20.662183 and row['Fare'] > 13.675550:
        row['FareBand'] = 2
    else:
        row['FareBand'] = 3
    traindf.loc[index] = row