Question

我有一只熊猫df（5568,108），其中感兴趣的列是df.Age，它有一些NaN（303）。我想保留NaN，但放弃一些异常值。 df.drop（df [df.Age＆lt; 18]）和df.drop（df [df.Age> 90]）。

我试过

for index, rows in df.iterrows():
if (df.loc[index, 'Age'] > 0.0 & df.loc[index, 'Age'] < 18.0):
    df.drop(df.iloc[index])
elif (df.loc[index, 'Age'] > 0.0 & df.loc[index, 'Age'] > 90.0):
    df.drop(df.iloc[index])
else:
    continue

但这会导致

TypeError：＆amp;：＆＃39; float＆＃39;不支持的操作数类型和＆＃39; numpy.float64＆＃39;

有关如何实现这一目标的任何想法？

Answer 1

存在operator precedence问题。用括号括起来。 (df.loc[index, 'Age'] > 0.0) & ...等。&之前评估>，导致表达式0.0 & df.loc[index, 'Age']。

Answer 2

我认为您需要使用boolean indexing between和isnull进行过滤，条件最常见的是drop：

df = pd.DataFrame({'Age':[10,20,90,88,np.nan], 'a': [10,20,40,50,90]})
print (df)
    Age   a
0  10.0  10
1  20.0  20
2  90.0  40
3  88.0  50
4   NaN  90

print ((df['Age'].between(18,90, inclusive=False)) | (df['Age'].isnull()))
0    False
1     True
2    False
3     True
4     True
Name: Age, dtype: bool

df = df[(df['Age'].between(18,90, inclusive=False)) | (df['Age'].isnull())]    
print (df)
    Age   a
1  20.0  20
3  88.0  50
4   NaN  90

根据保留NaN的列值删除行

2 个答案: