如果任何一组值为null,则删除行

时间:2017-06-29 08:35:45

标签: python pandas

我有一个包含大量列的DataFrame,我想删除某些列的值为null的行。我知道如何用一列来做到这一点:

df = df[df['Column'] != '']

我想用一组列来实现这一点,如下所示:

df = df['' not in [df['Column1'], df['Column2'], df['Column3']]'

但是,这会产生错误:

  

ValueError:系列的真值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。

我该怎么做?

2 个答案:

答案 0 :(得分:3)

如果值为空字符串,则创建子集,并且对于每行True,添加allany

df = df[(df[['Column1', 'Column2', 'Column1']] != '').all(axis=1)]

df = df[~(df[['Column1', 'Column2', 'Column1']] == '').any(axis=1)]

如果值为NaN s,则None使用带有参数subset的{​​{3}}:

df = df.dropna(subset=['Column1', 'Column2', 'Column1'])

样品:

df = pd.DataFrame({'A':[np.nan,'','p','hh','f'],
                   'B':['',np.nan,'','','o'],
                   'C':['a','s','d','f','g'],
                   'D':['f','g','h','j','k'],
                   'E':['l','i',np.nan,'u','o'],
                   'F':['','','o','i',np.nan]})

print (df)
     A    B  C  D    E    F
0  NaN       a  f    l     
1       NaN  s  g    i     
2    p       d  h  NaN    o
3   hh       f  j    u    i
4    f    o  g  k    o  NaN

df1 = df.dropna(subset=['A', 'B', 'F'])
print (df1)
   A B  C  D    E  F
2   p    d  h  NaN  o
3  hh    f  j    u  i

df2 = df[(df[['A', 'B', 'F']] != '').all(axis=1)]
print (df2)
   A  B  C  D  E    F
4  f  o  g  k  o  NaN

df2 = df[~(df[['A', 'B', 'F']] == '').any(axis=1)]
print (df2)
   A  B  C  D  E    F
4  f  o  g  k  o  NaN

编辑:

对于比较字符串和某些列是数字get:

  

TypeError:无法将['']与块值进行比较

有两种解决方案 - 比较dropna创建的numpy数组或values将值转换为string

df = pd.DataFrame({'A':[np.nan,7,8,8,8],
                   'B':['',np.nan,'','','o'],
                   'C':['a','s','d','f','g'],
                   'D':['f','g','h','j','k'],
                   'E':['l','i',np.nan,'u','o'],
                   'F':['','','o','i',np.nan]})

print (df)
     A    B  C  D    E    F
0  NaN       a  f    l     
1  7.0  NaN  s  g    i     
2  8.0       d  h  NaN    o
3  8.0       f  j    u    i
4  8.0    o  g  k    o  NaN

df2 = df[(df[['A', 'B', 'F']].values != '').all(axis=1)]
print (df2)
     A  B  C  D  E    F
4  8.0  o  g  k  o  NaN

df2 = df[(df[['A', 'B', 'F']].astype(str) != '').all(axis=1)]
print (df2)
     A  B  C  D  E    F
4  8.0  o  g  k  o  NaN

答案 1 :(得分:2)

你正在寻找

df.dropna()