我有一个名为xxx
的数据框。一列xxx
是最终的,xxx
看起来像这样
FpPropeTypCode DTE_DATE_DEATH Area Final
0 FP NaN Ame_MidEast_Lnd NaN
1 FP NaN Southern_Europe W.E.M. Lines
2 FP NaN NaN NaN
3 ZP NaN Ame_MidEast_Lnd NaN
4 YY NaN Ame_MidEast_Lnd NaN
我想删除所有具有NaN for Final的行,所以我做的是
xxx= xxx.drop(pd.isnull(data_file_fp4['Final']))
不幸的是我得到的是
FpPropeTypCode DTE_DATE_DEATH Area Final
2 FP NaN NaN NaN
3 ZP NaN Ame_MidEast_Lnd NaN
4 YY NaN Ame_MidEast_Lnd NaN
5 NN NaN Ame_MidEast_Lnd NORTH ARM TRANSPORTATION LTD
6 CP NaN Northern_Europe MPC Group
这显然不对......
我实际需要做的是根据两个条件删除行:最终为NaN,Area为Ame_MidEast_Lnd。所以我不能真正使用dropna
我目前的代码出了什么问题只是为了做第一个条件?提前谢谢。
答案 0 :(得分:4)
您正在寻找的具体命令可能类似于:
xxx = xxx.dropna(axis=0, subset=['Final'])
axis = 0指定要删除行而不是列 subset指定您要放弃“最后”的位置。是NaN
编辑:提问者不能使用dropna,因为他们的过滤逻辑更复杂。
如果你想要更复杂的逻辑,你可能最好只做支架逻辑。我会稍后尝试验证,但你能尝试这样的事情:
xxx = xxx[~xxx['Final'].isnull()]
如果你想要逻辑的第二部分,你同时拥有NaN过滤器和列过滤器,你会这样做:
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]
我已通过运行以下python文件验证了这一点:
import pandas as pd
import numpy as np
xxx = pd.DataFrame([
['FP', np.nan, 'Ame_MidEast_Lnd', np.nan],
['FP', np.nan, 'Southern_Europe', 'W.E.M. Lines'],
['FP', np.nan, np.nan, np.nan],
['ZP', np.nan, 'Ame_MidEast_Lnd', np.nan],
['YY', np.nan, 'Ame_MidEast_Lnd', np.nan]],
columns=['FpPropeTypCode','DTE_DATE_DEATH','Area', 'Final']
)
# before
print xxx
# whatever rows have both 'Final' as NaN and 'Area' containing Ame_MidEast_Lnd, we do NOT want those rows
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]
# after
print xxx
您将看到解决方案以您希望的方式运行。