我有一个包含大量列的DataFrame,我想删除某些列的值为null的行。我知道如何用一列来做到这一点:
df = df[df['Column'] != '']
我想用一组列来实现这一点,如下所示:
df = df['' not in [df['Column1'], df['Column2'], df['Column3']]'
但是,这会产生错误:
ValueError:系列的真值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。
我该怎么做?
答案 0 :(得分:3)
如果值为空字符串,则创建子集,并且对于每行True
,添加all
或any
:
df = df[(df[['Column1', 'Column2', 'Column1']] != '').all(axis=1)]
df = df[~(df[['Column1', 'Column2', 'Column1']] == '').any(axis=1)]
如果值为NaN
s,则None
使用带有参数subset
的{{3}}:
df = df.dropna(subset=['Column1', 'Column2', 'Column1'])
样品:
df = pd.DataFrame({'A':[np.nan,'','p','hh','f'],
'B':['',np.nan,'','','o'],
'C':['a','s','d','f','g'],
'D':['f','g','h','j','k'],
'E':['l','i',np.nan,'u','o'],
'F':['','','o','i',np.nan]})
print (df)
A B C D E F
0 NaN a f l
1 NaN s g i
2 p d h NaN o
3 hh f j u i
4 f o g k o NaN
df1 = df.dropna(subset=['A', 'B', 'F'])
print (df1)
A B C D E F
2 p d h NaN o
3 hh f j u i
df2 = df[(df[['A', 'B', 'F']] != '').all(axis=1)]
print (df2)
A B C D E F
4 f o g k o NaN
df2 = df[~(df[['A', 'B', 'F']] == '').any(axis=1)]
print (df2)
A B C D E F
4 f o g k o NaN
编辑:
对于比较字符串和某些列是数字get:
TypeError:无法将['']与块值进行比较
有两种解决方案 - 比较dropna
创建的numpy数组或values
将值转换为string
:
df = pd.DataFrame({'A':[np.nan,7,8,8,8],
'B':['',np.nan,'','','o'],
'C':['a','s','d','f','g'],
'D':['f','g','h','j','k'],
'E':['l','i',np.nan,'u','o'],
'F':['','','o','i',np.nan]})
print (df)
A B C D E F
0 NaN a f l
1 7.0 NaN s g i
2 8.0 d h NaN o
3 8.0 f j u i
4 8.0 o g k o NaN
df2 = df[(df[['A', 'B', 'F']].values != '').all(axis=1)]
print (df2)
A B C D E F
4 8.0 o g k o NaN
df2 = df[(df[['A', 'B', 'F']].astype(str) != '').all(axis=1)]
print (df2)
A B C D E F
4 8.0 o g k o NaN
答案 1 :(得分:2)